[jira] [Commented] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services

Siddharth Seth (JIRA) Wed, 12 Aug 2015 11:05:36 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14693953#comment-14693953
 ]


Siddharth Seth commented on TEZ-2003:
-------------------------------------

bq. All the new service plugins should be in runtime-api package where there 
rest of the user defined plugins are currently placed. They should not be in 
DAG API which deal with defining the DAG structure (as opposed to pluggable 
user components)
They're not in dag api. There's a separate package called serviceplugin.api 
where they reside. They don't belong in runtime-api, which is primarily for 
Inputs, Outputs, Processor

bq. executeInContainer should be part of the default execution context that is 
created by the user (for other frameworks) or created internally by TezClient 
or the DAGAppMaster if no default is specified. That way things continue to run 
on YARN as is.
That's already the case. executeInContainer is in there for cases where the dag 
level defaults are set to something else, and a specific vertex needs to be 
executed in containers.

bq. This change for hybrid execution is a fundamental and important change for 
Tez. ...
The changes are limited to the DAGAppMaser, Client and controllers for the 
individual plugins, and are not spread all over the place. It definitely makes 
testing easier - at least the way the tests are structured rightnow, and have 
already been changed. We can think of changing this at a later point, to setup 
everything in the DAGAppMaster. Changing the tests is the tiresome part there.

bq. Until now, there wasnt any special logic in VertexImpl for local mode. ...
Earlier there were no plugins. The default scheduler that was setup is the one 
that would end up getting used. Now a schedulerId needs to be sent along with 
request events (as well as launcher and taksComm ids). For local mode - this 
would always be 0. For regular container mode, it would also be 0. However 
there would be no checks while sending this out.
The execution context is not sent along with the payload from the client if it 
is not specified. That's only available in the DAGPlan and VertexPlan - which 
is why it's read in DAG and Vertex. DAGAppMaster should not be in the business 
of parsing and understanding plan bits which are not relevant to it. VertexImpl 
seems to be the right place to handle override handling.


bq. Similar creating a SchedulingPlugin object is an example of defensive 
programming where we pass around this object which has semantic meaning instead 
of passing around int types. Sure, entities which need 2 out of 3 can choose to 
use only the getters of 2 out 3. But essentially tracking that object allows us 
to clearly see which parts of the code/events are related to plugins and which 
parts are not. Tracking ints does not provide that visibility.
I think that's far more unsafe, when a method exposes all three - but only 1 of 
the three may have been set. The current approach is very explicit with what 
needs to be set (and hence available) and what does not.

bq. An uber comment on uber mode  is that it seems dangerous to run uber mode 
tasks within the AM. 
I think some more work is required for uber mode before it's officially 
supported. I can see both modes working - within the AM process, or as a 
sub-process. In any case - I'm going to call this out in the Javadocs, untill 
additional work is done to formalize support for uber mode.

bq. Not sure how this is fixed. Here is the code fragment from my initial 
comment. In some places we are arbitrarily passing back schedulerId = 0.
The 0s were bugs. The jira to handle different nodes was fixed. TEZ-2707 fixed 
the 0s and TEZ-2313 added tests for this.

bq. Please take a look at the MockDAGAppMaster code. numUpdates is used 
internally by that code. So the increment is still needed.
Will do. 

bq. What will happen if the dag starts to run/launch new tasks while the 
communicator is still procesing the completion of the previous dag? Say launch 
on communicator will be invoked. To process the launch it may call getDAG() 
which will then either return the wrong dag or stuck (or deadlocked?) behind 
the dagChangedReadLock?
The plugins are informed of DAG completion, and should be written in a way to 
handle this. I don't think there's a lot more that can be done to protect 
against updates coming in from an old DAG, while a new DAG has been submitted.

bq. Not sure why SchedulerEvent/ContainerEvent base classes would cause 
complications. Every scheduler event now needs a scheduler id. So every new 
event needs to have that specified. So a base class that keeps that code in one 
place sounds like a transparent change.
SchedulerEvents fixed in 2707. I was likely confusing the complexity with 
AMNodeEvents which had been changed earlier. Container events don't need this. 
Only a single event - the launch request - actually contains any details about 
the plugins.

> [Umbrella] Allow Tez to co-ordinate execution to external services
> ------------------------------------------------------------------
>
>                 Key: TEZ-2003
>                 URL: https://issues.apache.org/jira/browse/TEZ-2003
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>         Attachments: 2003_20150728.1.txt, 2003_20150807.1.txt, 
> 2003_20150807.2.txt, Tez With External Services.pdf
>
>
> The Tez engine itself takes care of co-ordinating execution - controlling how 
> data gets routed (different connection patterns), fault tolerance, scheduling 
> of work, etc.
> This is currently tied to TaskSpecs defined within Tez and on containers 
> launched by Tez itself (TezChild).
> The proposal is to allow Tez to work with external services instead of just 
> containers launched by Tez. This involves several more pluggable layers to 
> work with alternate Task Specifications, custom launch and task allocation 
> mechanics, as well as custom scheduling sources.
> A simple example would be a simple a process with the capability to execute 
> multiple Tez TaskSpecs as threads. In such a case, a container launch isn't 
> really need and can be mocked. Sourcing / scheduling containers would need to 
> be pluggable.
> A more advanced example would be LLAP (HIVE-7926; 
> https://issues.apache.org/jira/secure/attachment/12665704/LLAPdesigndocument.pdf).
> This works with custom interfaces - which would need to be supported by Tez, 
> along with a custom event model which would need translation hooks.
> Tez should be able to work with a combination of certain vertices running in 
> external services and others running in regular Tez containers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2003) [Umbrella] Allow Tez to co-ordinate execution to external services

Reply via email to