[ https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693953#comment-14693953 ]
Siddharth Seth commented on TEZ-2003: ------------------------------------- bq. All the new service plugins should be in runtime-api package where there rest of the user defined plugins are currently placed. They should not be in DAG API which deal with defining the DAG structure (as opposed to pluggable user components) They're not in dag api. There's a separate package called serviceplugin.api where they reside. They don't belong in runtime-api, which is primarily for Inputs, Outputs, Processor bq. executeInContainer should be part of the default execution context that is created by the user (for other frameworks) or created internally by TezClient or the DAGAppMaster if no default is specified. That way things continue to run on YARN as is. That's already the case. executeInContainer is in there for cases where the dag level defaults are set to something else, and a specific vertex needs to be executed in containers. bq. This change for hybrid execution is a fundamental and important change for Tez. ... The changes are limited to the DAGAppMaser, Client and controllers for the individual plugins, and are not spread all over the place. It definitely makes testing easier - at least the way the tests are structured rightnow, and have already been changed. We can think of changing this at a later point, to setup everything in the DAGAppMaster. Changing the tests is the tiresome part there. bq. Until now, there wasnt any special logic in VertexImpl for local mode. ... Earlier there were no plugins. The default scheduler that was setup is the one that would end up getting used. Now a schedulerId needs to be sent along with request events (as well as launcher and taksComm ids). For local mode - this would always be 0. For regular container mode, it would also be 0. However there would be no checks while sending this out. The execution context is not sent along with the payload from the client if it is not specified. That's only available in the DAGPlan and VertexPlan - which is why it's read in DAG and Vertex. DAGAppMaster should not be in the business of parsing and understanding plan bits which are not relevant to it. VertexImpl seems to be the right place to handle override handling. bq. Similar creating a SchedulingPlugin object is an example of defensive programming where we pass around this object which has semantic meaning instead of passing around int types. Sure, entities which need 2 out of 3 can choose to use only the getters of 2 out 3. But essentially tracking that object allows us to clearly see which parts of the code/events are related to plugins and which parts are not. Tracking ints does not provide that visibility. I think that's far more unsafe, when a method exposes all three - but only 1 of the three may have been set. The current approach is very explicit with what needs to be set (and hence available) and what does not. bq. An uber comment on uber mode is that it seems dangerous to run uber mode tasks within the AM. I think some more work is required for uber mode before it's officially supported. I can see both modes working - within the AM process, or as a sub-process. In any case - I'm going to call this out in the Javadocs, untill additional work is done to formalize support for uber mode. bq. Not sure how this is fixed. Here is the code fragment from my initial comment. In some places we are arbitrarily passing back schedulerId = 0. The 0s were bugs. The jira to handle different nodes was fixed. TEZ-2707 fixed the 0s and TEZ-2313 added tests for this. bq. Please take a look at the MockDAGAppMaster code. numUpdates is used internally by that code. So the increment is still needed. Will do. bq. What will happen if the dag starts to run/launch new tasks while the communicator is still procesing the completion of the previous dag? Say launch on communicator will be invoked. To process the launch it may call getDAG() which will then either return the wrong dag or stuck (or deadlocked?) behind the dagChangedReadLock? The plugins are informed of DAG completion, and should be written in a way to handle this. I don't think there's a lot more that can be done to protect against updates coming in from an old DAG, while a new DAG has been submitted. bq. Not sure why SchedulerEvent/ContainerEvent base classes would cause complications. Every scheduler event now needs a scheduler id. So every new event needs to have that specified. So a base class that keeps that code in one place sounds like a transparent change. SchedulerEvents fixed in 2707. I was likely confusing the complexity with AMNodeEvents which had been changed earlier. Container events don't need this. Only a single event - the launch request - actually contains any details about the plugins. > [Umbrella] Allow Tez to co-ordinate execution to external services > ------------------------------------------------------------------ > > Key: TEZ-2003 > URL: https://issues.apache.org/jira/browse/TEZ-2003 > Project: Apache Tez > Issue Type: Improvement > Reporter: Siddharth Seth > Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt, > 2003_20150807.2.txt, Tez With External Services.pdf > > > The Tez engine itself takes care of co-ordinating execution - controlling how > data gets routed (different connection patterns), fault tolerance, scheduling > of work, etc. > This is currently tied to TaskSpecs defined within Tez and on containers > launched by Tez itself (TezChild). > The proposal is to allow Tez to work with external services instead of just > containers launched by Tez. This involves several more pluggable layers to > work with alternate Task Specifications, custom launch and task allocation > mechanics, as well as custom scheduling sources. > A simple example would be a simple a process with the capability to execute > multiple Tez TaskSpecs as threads. In such a case, a container launch isn't > really need and can be mocked. Sourcing / scheduling containers would need to > be pluggable. > A more advanced example would be LLAP (HIVE-7926; > https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf). > This works with custom interfaces - which would need to be supported by Tez, > along with a custom event model which would need translation hooks. > Tez should be able to work with a combination of certain vertices running in > external services and others running in regular Tez containers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)