[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375512#comment-14375512
 ] 

Jeff Zhang commented on TEZ-714:
--------------------------------

Upload a new patch. [~bikassaha] Please help review it.

* Wrap the commit in the CallableEvent both in DAG & Vertex, but for the abort, 
still call it inline. Make the abort asyn will complicate the patch, so still 
keep it a sync call as before.
* Introduce new state COMMITTING for Vertex & DAG
** Vertex's COMMITTING means vertex is in the middle of committing, if vertex 
has no committers or the option of TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
true, vertex would not to to COMMITTING state.
** DAG's COMMITTING has 2 cases, one is when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true and all the vertices are 
completed, another case is that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is 
false and all the vertices are completed, but still some vertex group 
committers are running.
* Regarding the issue of "not sure why group-commit and non-group commit need 
to be differentiated in different transitions.", I rename it to 
NonFinalCommitCompletedTransition and FinalCommitCompletetionTransition (maybe 
there's better names ). One mean the committer when 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the other means 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true. The reason I differentiate 
them is that for the NonFinalCommitCompletedEvent, we need to log the recovery 
log of VertexGroupCommitCompletedEvent while it is not necessary for 
FinalCommitCompletedEvent.
* Unit test is still not perfect. Because currently in the DAGImpl/VertexImpl 
we run the shared thread pool in the AsynDispatcher thread ( that means 
Committer still run in the thread of AsynDispather) so this may hide some 
potential issues and under this thread mode, it is not possible for test some 
cases like kill dag while it is in committing. I am trying to think of ways to 
simulate the shared thread pool in the unit test.
* For the some existing transition, like (RUNNING to ERROR due to INTERNAL 
ERROR), I am not sure why it go to ERROR directly rather than TERMINATING. 
Maybe it is to allow the client get the final status as earyl as possible.


> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to