[ https://issues.apache.org/jira/browse/TEZ-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hitesh Shah updated TEZ-1744: ----------------------------- Target Version/s: 0.7.0 (was: 0.6.0) > It is not necessary to check whether dag is commit in RecoveryTransition > ------------------------------------------------------------------------ > > Key: TEZ-1744 > URL: https://issues.apache.org/jira/browse/TEZ-1744 > Project: Apache Tez > Issue Type: Bug > Affects Versions: 0.5.1 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Attachments: TEZ-1744.patch > > > It is not necessary to check whether dag is commit in RecoveryTransition, > because we already check that in RecoveryParser by using the summary event. > Copy the comments from TEZ-1737, > bq. But even the non-summary VertexFinishedEvent is seen, its > VertexRecoverableEventsGeneratedEvent may still lost. I think there's no > guaranteed that VertexRecoverableEventsGeneratedEvent is logged before > VertexFinishedEvent. > The expectation was that all tasks are completed before a vertex has > finished. Also, a TaskFinishedEvent is only seen after all its datamovement > events are generated and therefore logged. > The handling for for the general case where there are a lot of data movement > events generated, commit started and then ended. In a scenario, where commit > starts but does not end, the summary log helps catch the problem. Now, in a > scenario, where commit finished successfully, there could be a situation > where the AM crashed before all data movements are stored to recovery. In > this scenario, we cannot do anything as the commit has already been done but > we have no idea what was lost. The main crux to answer your question is that > a committer cannot be invoked twice. > Agree that VertexRecoverableEventsGeneratedEvent is a different problem. In > such cases, I believe that if VertexRecoverableEventsGeneratedEvent is not > seen before a VertexFinished is seen, there needs to be some additional > handling for that scenario too. If a VertexRecoverableEventsGeneratedEvent is > always guaranteed to be generated for a vertex and it is not seen, then that > means it is a potential non-recoverable case when the vertex itself was seen > to have been completed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)