[ https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969319#comment-14969319 ]
Jason Lowe commented on TEZ-2581: --------------------------------- bq. Ideally we should provide API in VertexMangerPlugin to allow user to define the completeness, but it would cause api incompatibility. In this phase, I plan to not change the API, focus on stabilize the recovery framework. My point is that by ignoring the vertex manager plugin upon recovery we _are_ changing the API. If the vertex manager is relying on getting vertex status updates even after tasks have been scheduled then the API has been broken semantically even if it hasn't syntactically. That's why I brought up adding a new method to the abstract base class to indicate whether the vertex manager supports recovery. It can default to false, and vertex managers that support it can override it to say they know how to properly participate in recovery. Or we can just change the semantics of the API as proposed in this patch, but we need to realize this is still a backwards incompatible change. As for the parallelism changing, the code does check but yet we have issues in 0.7 recovery because the parallelism _is_ changing during the recovery. Therefore that check apparently isn't sufficient, probably because the vertex manager is messing with the parallelism during the recovery and before the vertex realizes tasks had already started. bq. Let me know if you need me to rebase the patch for your tez version. We're currently on 0.7 with no immediate plans to move to 0.8. We will get there eventually but not for a while yet. It would be nice to have this on 0.7, but the concern is that it's a very large patch that could destabilize a number of things. > Umbrella for Tez Recovery Redesign > ---------------------------------- > > Key: TEZ-2581 > URL: https://issues.apache.org/jira/browse/TEZ-2581 > Project: Apache Tez > Issue Type: Improvement > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-2.patch, > TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch, > TEZ-2581-WIP-6.patch, TezRecoveryRedesignProposal.pdf, > TezRecoveryRedesignV1.1.pdf > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)