[ 
https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386086#comment-14386086
 ] 

Jeff Zhang edited comment on TEZ-714 at 3/30/15 7:03 AM:
---------------------------------------------------------

bq. This is a known issue. There is a comment somewhere in the code but I dont 
think its tracked by a jira.
Create TEZ-2249 to track it.

bq. After the first if() check shouldn't the code simply return 
vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()?
The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and 
the commit has not been started. In this case call commitOrFinished which would 
call async commit if there's any committer for this vertex or just finish if no 
committer. The second if() means either 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are 
completed successfully.  May need to refactor code here to make it more clear

bq. Why is checkForCompletion called after tasks finished and commit finished? 
Those 2 sounds like different logical steps and should use separate methods. 
Perhaps a preCommit() and postCommit() method. If there is nothing to commit 
then both can be called back to back.
Both task finishing and commit finishing may be the last step to the finished 
state (if no commit then task finishing is the last step, and committing finish 
is the last step if there's commit), so they would both trigger 
checkForCompletion.

bq. Optimistic case: Ignore failed commits and allow all commits to complete 
(with success or fail). If all commits succeed then proceed to succeeded. If 
any commit fails, then proceed to failed.
What i am doing is not exactly this way. I won't ignore failed commits, stead I 
will cancel pending commits and wait for them to complete and then move to 
failed state. This behavior is consistent with other cases when there's any 
fail event happens (commit fail, vertex termination event and etc ), all the 
cases would cancel pending commits and wait for them to complete and then move 
to finished state. 

bq. For the optimistic case, all failures should go into the Terminating state. 
Errors should not cancel commits. 
Why Errors should not cancel commits ? Just because Errors don't have effort on 
the data to be committed ?

bq. bq.  E.g. TaskCompletedAfterVertexSuccessTransition is clearing all 
successful commits and choosing the pessimistic case. 
TaskCompletedAfterVertexSuccessTransition is aborting all pending commits. So 
the logic does not seem to be consistent everywhere. We need to make this 
consistent. Either choose the pessimistic case or choose the optimistic case.
bq. For the abort method itself, it could take a boolean parameter on whether 
to abort all or not instead of looking at the successful completions map. This 
avoids side-effect code like having to clear the successful maps before 
invoking abortVertex().
After second thought, I think it may be not necessary to check whether the 
commits are successful. The semantics of abort should mean clean up any 
intermediate data during committing, should not have side effort on the 
committed data. So that means abort has no effect on the successful commits.
 






was (Author: zjffdu):
bq. This is a known issue. There is a comment somewhere in the code but I dont 
think its tracked by a jira.
Create TEZ-2249 to track it.

bq. After the first if() check shouldn't the code simply return 
vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()?
The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and 
the commit has not been started. In this case call commitOrFinished which would 
call async commit if there's any committer for this vertex or just finish if no 
committer. The second if() means either 
TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are 
completed successfully.  May need to refactor code here to make it more clear

bq. Why is checkForCompletion called after tasks finished and commit finished? 
Those 2 sounds like different logical steps and should use separate methods. 
Perhaps a preCommit() and postCommit() method. If there is nothing to commit 
then both can be called back to back.
Both task finishing and commit finishing may be the last step to the finished 
state (if no commit then task finishing is the last step, and committing finish 
is the last step if there's commit), so they would both trigger 
checkForCompletion.

bq. Optimistic case: Ignore failed commits and allow all commits to complete 
(with success or fail). If all commits succeed then proceed to succeeded. If 
any commit fails, then proceed to failed.
What i am doing is not exactly this way. I won't ignore failed commits, stead I 
will cancel pending commits and wait for them to complete and then move to 
failed state. This behavior is consistent with other cases when there's any 
fail event happens (commit fail, vertex termination event and etc ), all the 
cases would cancel pending commits and wait for them to complete and then move 
to finished state. 

bq. For the optimistic case, all failures should go into the Terminating state. 
Errors should not cancel commits. 
Why Errors should not cancel commits ? Just because Errors don't have effort on 
the data to be committed ?

bq. bq.  E.g. TaskCompletedAfterVertexSuccessTransition is clearing all 
successful commits and choosing the pessimistic case. 
TaskCompletedAfterVertexSuccessTransition is aborting all pending commits. So 
the logic does not seem to be consistent everywhere. We need to make this 
consistent. Either choose the pessimistic case or choose the optimistic case.
bq. For the abort method itself, it could take a boolean parameter on whether 
to abort all or not instead of looking at the successful completions map. This 
avoids side-effect code like having to clear the successful maps before 
invoking abortVertex().
After second thought, I think it may be not necessary to check whether the 
commits are successful. The semantics of abort should mean clean up any 
intermediate data during committing, should not have side effort on the 
committed data. 
 





> OutputCommitters should not run in the main AM dispatcher thread
> ----------------------------------------------------------------
>
>                 Key: TEZ-714
>                 URL: https://issues.apache.org/jira/browse/TEZ-714
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, 
> TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf
>
>
> Follow up jira from TEZ-41.
> 1) If there's multiple OutputCommitters on a Vertex, they can be run in 
> parallel.
> 2) Running an OutputCommitter in the main thread blocks all other event 
> handling, w.r.t the DAG, and causes the event queue to back up.
> 3) This should also cover shared commits that happen in the DAG.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to