[ 
https://issues.apache.org/jira/browse/TEZ-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14969319#comment-14969319
 ] 

Jason Lowe commented on TEZ-2581:
---------------------------------

bq. Ideally we should provide API in VertexMangerPlugin to allow user to define 
the completeness, but it would cause api incompatibility. In this phase, I plan 
to not change the API, focus on stabilize the recovery framework. 
My point is that by ignoring the vertex manager plugin upon recovery we _are_ 
changing the API.  If the vertex manager is relying on getting vertex status 
updates even after tasks have been scheduled then the API has been broken 
semantically even if it hasn't syntactically.  That's why I brought up adding a 
new method to the abstract base class to indicate whether the vertex manager 
supports recovery.  It can default to false, and vertex managers that support 
it can override it to say they know how to properly participate in recovery.  
Or we can just change the semantics of the API as proposed in this patch, but 
we need to realize this is still a backwards incompatible change.

As for the parallelism changing, the code does check but yet we have issues in 
0.7 recovery because the parallelism _is_ changing during the recovery.  
Therefore that check apparently isn't sufficient, probably because the vertex 
manager is messing with the parallelism during the recovery and before the 
vertex realizes tasks had already started.

bq. Let me know if you need me to rebase the patch for your tez version.
We're currently on 0.7 with no immediate plans to move to 0.8.  We will get 
there eventually but not for a while yet.  It would be nice to have this on 
0.7, but the concern is that it's a very large patch that could destabilize a 
number of things.

> Umbrella for Tez Recovery Redesign
> ----------------------------------
>
>                 Key: TEZ-2581
>                 URL: https://issues.apache.org/jira/browse/TEZ-2581
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2581-WIP-1.patch, TEZ-2581-WIP-2.patch, 
> TEZ-2581-WIP-3.patch, TEZ-2581-WIP-4.patch, TEZ-2581-WIP-5.patch, 
> TEZ-2581-WIP-6.patch, TezRecoveryRedesignProposal.pdf, 
> TezRecoveryRedesignV1.1.pdf
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to