[
https://issues.apache.org/jira/browse/TEZ-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029992#comment-14029992
]
Siddharth Seth commented on TEZ-1170:
-------------------------------------
Ahh, didn't see the Monday note in the comment. Sorry about that.
Comments on the patch.
- It's simplified some usage, but there's places where the transitions out of
INITIALIZING and INITED are inconsistent. E.g. V_ONE_TO_ONE_SOURCE_SPLIT keeps
the vertex in it's current state and generates a READY_TO_INIT event. OTOH,
NULL_EDGE_SET / ROOT_INPUT_INITIALIZED move the vertex into the next state
directly. Was expecting the behaviour to be similar for all such events which
can move the Vertex to another state (either all via an extra event, or all via
direct transitions). Direct transitions is likely better since it doesn't
unnecessarily queue up behind other events. Unrelated, but the same is true of
moving into the STARTED state - we end up generating an additional event back
to the same Vertex, which also goes to the back of the queue (will create a
JIRA for the START event).
- In the same context, V_PARALLELISM_INITIALIZED was a better name, since the
renamed event (V_READY_TO_INIT) is only generated when parallelism is updated.
Most of the state machines are following a model where the event indicates what
happened (PARALLELISM_INITIALIZED) vs what needs to be done (V_READY_TO_INIT).
- {code} .addTransition(VertexState.NEW,
EnumSet.of(VertexState.NEW),
VertexEventType.V_NULL_EDGE_INITIALIZED,
new NullEdgeInitializedTransition()){code}
This is a little confusing - since NullEdgeInitializedTransition removes from
the emptyEdge set, which may not have been populated yet. Works though, since
the edge specified won't make it into the map when the map is finally populated.
- {code} .addTransition(VertexState.TERMINATING,
EnumSet.of(VertexState.TERMINATING),
VertexEventType.V_ROOT_INPUT_INITIALIZED,
new RootInputInitializedTransition()){code}
Didn't quite understand this change. Why should the
RootInputInitializerTransition be invoked if the Vertex is already in
TERMINATING state (or FAILED state etc). If this is to shut down the threads
which are initializing, a shutdown is already being invoked when in the
INITIALIZING state.
- Unrelated to this patch, is ONE_TO_ONE_SOURCE_SPLIT expected in RUNNING state
- possibly from a different edge ?, or should that be an error. RUNNING itself
seems to be an error condition for this event.
It's not absolutely necessary for RootInputs to complete, to move a Vertex out
of INITIALIZING state. Once parallelism is set (and associated
InputSpecUpdates, LocationHints etc), and Null Edges have been setup - it
should be possible to just start a vertex. Assuming this was left in place so
that tasks don't start, while they may not have events ready for the Inputs, or
may not have generated data for the inputs?
> Simplify Vertex Initializing transition
> ---------------------------------------
>
> Key: TEZ-1170
> URL: https://issues.apache.org/jira/browse/TEZ-1170
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Siddharth Seth
> Assignee: Bikas Saha
> Fix For: 0.5.0
>
> Attachments: TEZ-1170.1.patch, TEZ-1170.2.patch
>
>
> After TEZ-1145 and 1151, a vertex should only need to stay in INITIALZING
> state when it has an uninitialized edge, or when the parallelism is at -1
> (not set yet). Waiting for all RootInputInitializers to complete should not
> be required - as long as one of them sets parallelism (via a VertexManager).
--
This message was sent by Atlassian JIRA
(v6.2#6252)