[ https://issues.apache.org/jira/browse/TEZ-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386086#comment-14386086 ]
Jeff Zhang edited comment on TEZ-714 at 3/30/15 7:03 AM: --------------------------------------------------------- bq. This is a known issue. There is a comment somewhere in the code but I dont think its tracked by a jira. Create TEZ-2249 to track it. bq. After the first if() check shouldn't the code simply return vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()? The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the commit has not been started. In this case call commitOrFinished which would call async commit if there's any committer for this vertex or just finish if no committer. The second if() means either TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are completed successfully. May need to refactor code here to make it more clear bq. Why is checkForCompletion called after tasks finished and commit finished? Those 2 sounds like different logical steps and should use separate methods. Perhaps a preCommit() and postCommit() method. If there is nothing to commit then both can be called back to back. Both task finishing and commit finishing may be the last step to the finished state (if no commit then task finishing is the last step, and committing finish is the last step if there's commit), so they would both trigger checkForCompletion. bq. Optimistic case: Ignore failed commits and allow all commits to complete (with success or fail). If all commits succeed then proceed to succeeded. If any commit fails, then proceed to failed. What i am doing is not exactly this way. I won't ignore failed commits, stead I will cancel pending commits and wait for them to complete and then move to failed state. This behavior is consistent with other cases when there's any fail event happens (commit fail, vertex termination event and etc ), all the cases would cancel pending commits and wait for them to complete and then move to finished state. bq. For the optimistic case, all failures should go into the Terminating state. Errors should not cancel commits. Why Errors should not cancel commits ? Just because Errors don't have effort on the data to be committed ? bq. bq. E.g. TaskCompletedAfterVertexSuccessTransition is clearing all successful commits and choosing the pessimistic case. TaskCompletedAfterVertexSuccessTransition is aborting all pending commits. So the logic does not seem to be consistent everywhere. We need to make this consistent. Either choose the pessimistic case or choose the optimistic case. bq. For the abort method itself, it could take a boolean parameter on whether to abort all or not instead of looking at the successful completions map. This avoids side-effect code like having to clear the successful maps before invoking abortVertex(). After second thought, I think it may be not necessary to check whether the commits are successful. The semantics of abort should mean clean up any intermediate data during committing, should not have side effort on the committed data. So that means abort has no effect on the successful commits. was (Author: zjffdu): bq. This is a known issue. There is a comment somewhere in the code but I dont think its tracked by a jira. Create TEZ-2249 to track it. bq. After the first if() check shouldn't the code simply return vertex.finished(SUCCEEDED) ? Why check again for commitFutures.isEmpty()? The first if() means that TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is false and the commit has not been started. In this case call commitOrFinished which would call async commit if there's any committer for this vertex or just finish if no committer. The second if() means either TEZ_AM_COMMIT_ALL_OUTPUTS_ON_DAG_SUCCESS is true or all the committers are completed successfully. May need to refactor code here to make it more clear bq. Why is checkForCompletion called after tasks finished and commit finished? Those 2 sounds like different logical steps and should use separate methods. Perhaps a preCommit() and postCommit() method. If there is nothing to commit then both can be called back to back. Both task finishing and commit finishing may be the last step to the finished state (if no commit then task finishing is the last step, and committing finish is the last step if there's commit), so they would both trigger checkForCompletion. bq. Optimistic case: Ignore failed commits and allow all commits to complete (with success or fail). If all commits succeed then proceed to succeeded. If any commit fails, then proceed to failed. What i am doing is not exactly this way. I won't ignore failed commits, stead I will cancel pending commits and wait for them to complete and then move to failed state. This behavior is consistent with other cases when there's any fail event happens (commit fail, vertex termination event and etc ), all the cases would cancel pending commits and wait for them to complete and then move to finished state. bq. For the optimistic case, all failures should go into the Terminating state. Errors should not cancel commits. Why Errors should not cancel commits ? Just because Errors don't have effort on the data to be committed ? bq. bq. E.g. TaskCompletedAfterVertexSuccessTransition is clearing all successful commits and choosing the pessimistic case. TaskCompletedAfterVertexSuccessTransition is aborting all pending commits. So the logic does not seem to be consistent everywhere. We need to make this consistent. Either choose the pessimistic case or choose the optimistic case. bq. For the abort method itself, it could take a boolean parameter on whether to abort all or not instead of looking at the successful completions map. This avoids side-effect code like having to clear the successful maps before invoking abortVertex(). After second thought, I think it may be not necessary to check whether the commits are successful. The semantics of abort should mean clean up any intermediate data during committing, should not have side effort on the committed data. > OutputCommitters should not run in the main AM dispatcher thread > ---------------------------------------------------------------- > > Key: TEZ-714 > URL: https://issues.apache.org/jira/browse/TEZ-714 > Project: Apache Tez > Issue Type: Improvement > Reporter: Siddharth Seth > Assignee: Jeff Zhang > Priority: Critical > Attachments: DAG_2.pdf, TEZ-714-1.patch, TEZ-714-2.patch, > TEZ-714-3.patch, TEZ-714-4.patch, TEZ-714-5.patch, Vertex_2.pdf > > > Follow up jira from TEZ-41. > 1) If there's multiple OutputCommitters on a Vertex, they can be run in > parallel. > 2) Running an OutputCommitter in the main thread blocks all other event > handling, w.r.t the DAG, and causes the event queue to back up. > 3) This should also cover shared commits that happen in the DAG. -- This message was sent by Atlassian JIRA (v6.3.4#6332)