[ 
https://issues.apache.org/jira/browse/TEZ-850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14100295#comment-14100295
 ] 

Jeff Zhang commented on TEZ-850:
--------------------------------

[~hitesh] There's already a class for this kind of test: TestDAGRecovery which 
introduce customized VertexManger that fail at certain points. Here're the 
cases that I can think of need this kind of test:
* Test AM recovery with all data movement types including 1-1, broadcast, 
scatter-gather with/without shuffle. AM should die in 2 scenarios: first-vertex 
task finishes completely and partially. 
* TezCounter recovery

But I have one concern about the current implementation of TestDAGRecovery, it 
has limitation that it only verify the DAG could finish successfully in recover 
mode. But it may not enough to say that the recovery work normally when the job 
finish successfully (e.g. the task attempt may not recover successfully, it 
just rerun itself which could also make DAG finished successfully finally ). 
One way to verify it is that we could read the recovery log to verify whether 
the recovery works normally.  What do you think ?

> Recovery unit tests
> -------------------
>
>                 Key: TEZ-850
>                 URL: https://issues.apache.org/jira/browse/TEZ-850
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Hitesh Shah
>            Assignee: Jeff Zhang
>
> Tests for custom edge managers, groups handling, etc.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to