[ 
https://issues.apache.org/jira/browse/TEZ-703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120540#comment-14120540
 ] 

Siddharth Seth commented on TEZ-703:
------------------------------------

Pasting comment from [~bikassaha] on TEZ-1345 to continue the discussion 
relevant to this jira
bq. Here is a summary from an offline discussion with Hitesh.
The root cause of the issue is the inherent race condition in the flow (II is 
input initializer, VM is Vertex Manager)
1) Vertex starts IIs
2) IIs sends Vertex events when they are done
3) Vertex forwards events to VM and changes state to INITED
4) VM forwards events back the Vertex (potentially after changing some things). 
But by that time the Vertex has already INITED.
This race condition already existed but was never a problem until recovery came 
into the picture.
The ideal solution is to remove this race condition. An option for that is 
TEZ-703 that aims to remove II control to the VM. So instead of the Vertex 
starting IIs, the vertex starts the VM, VM starts the IIs. IIs send their 
events back to VM directly. VM sends final events (after modifying them if 
needed) to the Vertex via InputInitDone notification. At this point the vertex 
knows the final events and can change change to INITED. This greatly simplifies 
the Vertex state machine and also removes the race condition. Nothing 
materially changes in the IIPlugin or the VMPlugin and so it should be 
backwards compatible.
The other stop-gap solution is to have 
vertex.vertexManager.onRootVertexInitialized() return the init events in the 
return value. This way the init events can be logged before the transition 
completes. In order to do this compatibly, VM.addRootInputEvents() could cache 
the events instead of sending them via the dispatcher and return the cached 
value in the return of onRootVertexInitialized(). This is similar to the inline 
event routing patch except that it does not leak event routing logic outside of 
the VertexImpl code.
We can evaluate the effort and risk of TEZ-703 and if its too much we can do 
the stop gap solution in the interim.

> Simplify input initialization in Vertex
> ---------------------------------------
>
>                 Key: TEZ-703
>                 URL: https://issues.apache.org/jira/browse/TEZ-703
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Assignee: Bikas Saha
>
> The flow of input initialization is fairly complex and adds vertex 
> transitions that are hard to follow. The number of comments needed to clarify 
> TEZ-683 show this complexity. It may be possible to collapse the INITIALIZING 
> state of the vertex into the NEW state and reduce the complexity of the init 
> transition. The vertex could wait for inputs to be inited just like it waits 
> for input vertices to be inited. This would unify the code paths and simplify 
> things a bit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to