[ 
https://issues.apache.org/jira/browse/TEZ-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220767#comment-14220767
 ] 

Jeff Zhang edited comment on TEZ-992 at 11/21/14 10:50 AM:
-----------------------------------------------------------

Attach one patch. It is one initial patch, should refine it later, please help 
give some early feedback about it. 

* Before all the transition to SUCCEEDED/FAILED/KILLED/ERROR, it would go to 
FINISH_SAVING first.
* There's will be conflict if task rescheduled happens when DAG/Vertex is in 
the middle of FINIHSH_SAVING (because it may be in the middle of committing). 
So if the DAG/Vertex is in the state of FINISH_SAVING, I will put 
TaskRescheduleEvent again into AyscDispatcher until DAG/Vertex get out of 
FINISH_SAVING. ( TaskRetroactiveFailureTransition & 
TaskRetroactiveKilledTransition)
* Question on INTERNAL_ERROR_TRANSITION
** DAG.INTERNAL_ERROR_TRANSITION will kill its vertices, but it go to ERROR 
directly rather than waiting vertices completed, is it expected ?
** Vertex.INTERNAL_ERROR_TRANSITION will transite to ERROR directly ? But 
should it go to terminating first if there're task running ?
** Can Internal_ERROR been ignored if DAG/Vertex is in FINIHSH_SAVING ?




was (Author: zjffdu):
Attach one patch. It is one initial patch, should refine it later, please help 
give some early feedback about it. 

* Before all the transition to SUCCEEDED/FAILED/KILLED/ERROR, it would go to 
FINISH_SAVING first.
* Can Internal_ERROR been ignored if DAG/Vertex is in FINIHSH_SAVING ?
* There's will be conflict if task rescheduled happens when DAG/Vertex is in 
the middle of FINIHSH_SAVING (because it may be in the middle of committing). 
So if the DAG/Vertex is in the state of FINISH_SAVING, I will put 
TaskRescheduleEvent again into AyscDispatcher until DAG/Vertex get out of 
FINISH_SAVING. ( TaskRetroactiveFailureTransition & 
TaskRetroactiveKilledTransition)
* Question on INTERNAL_ERROR_TRANSITION
** DAG.INTERNAL_ERROR_TRANSITION will kill its vertices, but it go to ERROR 
directly rather than waiting vertices completed, is it expected ?
** Vertex.INTERNAL_ERROR_TRANSITION will transite to ERROR directly ? But 
should it go to terminating first if there're task running ?



> Recovery data should not be written on AsyncDispatcher thread
> -------------------------------------------------------------
>
>                 Key: TEZ-992
>                 URL: https://issues.apache.org/jira/browse/TEZ-992
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Bikas Saha
>            Assignee: Jeff Zhang
>         Attachments: DAG_FinishSaving.gv, DAG_FinishSaving_2.gv, 
> TEZ-992.patch, Vertex_FinishSaving.gv, Vertex_FinishSaving_2.gv
>
>
> This may block the DAG operations in case the recovery data needs to be 
> synchronously stored. The operations requiring this blocking operation should 
> change their state machines to wait for the store operation before moving 
> ahead. They will move ahead after they receive notification from the 
> RecoveryService that their operation has completed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to