[jira] [Updated] (TEZ-3914) Recovering a large DAG hang job

2018-04-16 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3914:
-
Attachment: TEZ-3914.003.patch

> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch, 
> TEZ-3914.003.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3914:
-
Attachment: TEZ-3914.002.patch

> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3914:
-
Attachment: TEZ-3914.001.patch

> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)