[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2017-08-14 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16126600#comment-16126600
 ] 

TezQA commented on TEZ-3817:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12881824/TEZ-3817.001.patch
  against master revision 823b1bb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 31 javac 
compiler warnings (more than the master's current 24 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.dag.app.TestMockDAGAppMaster

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2615//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2615//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2615//console

This message is automatically generated.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3817.001.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2017-08-15 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16127751#comment-16127751
 ] 

TezQA commented on TEZ-3817:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12881982/TEZ-3817.002.patch
  against master revision 823b1bb.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2618//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2618//console

This message is automatically generated.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2017-08-16 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129147#comment-16129147
 ] 

Jason Lowe commented on TEZ-3817:
-

Thanks for the report and the patch!

I think the patch as-is will do an admiral job of getting the DAGImpl into the 
ERROR state, but I worry it could do so in such a way that the AM will still 
hang.  For example, if the error is thrown before the DAG finished event is 
posted then I think there's an excellent chance that the AM will just sit 
around, ignoring every DAG event thereafter (because it's in the ERROR state) 
and yet the AM won't exit.  I think we either need to make an extra effort to 
post the finished event if an exception is thrown, move this try/catch logic to 
dag.finished so it does that instead, or simply declare an emergency if we end 
up throwing when trying to move to the internal error state and shutdown more 
directly.


> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-04 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426270#comment-16426270
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Updated patch to add the try block to \{{dag.finished()}}. The 
\{{DAGAppMasterEventDAGFinished}} event should do the necessary IMO. If the AM 
is in session mode, today, the AM does not shutdown even if the DAG error-ed 
out. This patch maintains that behavior.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-05 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427099#comment-16427099
 ] 

Kuhu Shukla commented on TEZ-3817:
--

{color:red}-1 overall\{color}.  Here are the results of testing the latest 
attachment  
[http://issues.apache.org/jira/secure/attachment/12917623/TEZ-3817.003.patch]
  against master revision 55a6b9d.

\{color:green}+1 @author\{color}.  The patch does not contain any @author 
tags.

\{color:green}+1 tests included\{color}.  The patch appears to include 1 
new or modified test files.

\{color:green}+1 javac\{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

\{color:green}+1 javadoc\{color}.  There were no new javadoc warning 
messages.

\{color:green}+1 findbugs\{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

\{color:green}+1 release audit\{color}.  The applied patch does not 
increase the total number of release audit warnings.

\{color:red}-1 core tests\{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.shuffle.orderedgrouped.TestFetcher
  org.apache.tez.http.TestHttpConnection
  org.apache.tez.test.TestSecureShuffle
  org.apache.tez.test.TestRecovery

Test results: 
[https://builds.apache.org/job/PreCommit-TEZ-Build/2748//testReport/]
Console output: 
[https://builds.apache.org/job/PreCommit-TEZ-Build/2748//console]

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-06 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428752#comment-16428752
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Appreciate any comments on the latest patch [~jlowe] . CC: [~jeagles]. The 
latest precommit I ran for the patch is clean. 
(https://builds.apache.org/job/PreCommit-TEZ-Build/2750/testReport/)

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-06 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16428903#comment-16428903
 ] 

Jonathan Eagles commented on TEZ-3817:
--

[~kshukla], couple of small things I would like to see. The overall approach 
seems in line with Jason's original review. I think that if this JIRA were 
aimed at refactoring the shutdown system, I would have suggested a 
try/catch/finally design approach to commonize the event sending logic and 
error handling. But given the scope of the JIRA and the current messiness of 
the error shutdown system (e.g. why do we fail a suceeded DAG if recovery fails 
to log), this seems like a good approach.

- Let's rename recoveryError since it is used in a more general way
- Let's log the exception in all cases (Not just catch-all non-IOException 
cases) so we get the stack trace of what happened in every case.
- Let's change the LOG.info call to LOG.warn (or maybe even ERROR?) to 
emphasize in the log where the bad things are happening
- The tests look like they are passing, but it looks like it is throwing an 
NPE. Let's keep the test going through the logUnsuccessful History path, but we 
need to ensure 1) path was taken and 2) final state is correct

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-09 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431105#comment-16431105
 ] 

TezQA commented on TEZ-3817:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12917623/TEZ-3817.003.patch
  against master revision a030800.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2750//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2750//console

This message is automatically generated.


> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-09 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16431477#comment-16431477
 ] 

TezQA commented on TEZ-3817:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12917623/TEZ-3817.003.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2754//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2754//console

This message is automatically generated.


> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-13 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437909#comment-16437909
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Thank you for the review comments Jon! I have attached a revised patch and the 
test now checks for the appropriate end state. One thing to note here is that 
the end state is FAILED and not error since we catch Exception and let the 
{{finalState}} passed to {{finished()}} call decide the dag's internal state. 
The AM is still notified of the DAG error.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-13 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16438014#comment-16438014
 ] 

TezQA commented on TEZ-3817:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12919009/TEZ-3817.004.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

  {color:red}-1 javac{color}.  The applied patch generated 31 javac 
compiler warnings (more than the master's current 24 warnings).

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2757//testReport/
Javac warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2757//artifact/patchprocess/diffJavacWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2757//console

This message is automatically generated.


> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-19 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444247#comment-16444247
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Fixed javac warning. Updated patch

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-19 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444361#comment-16444361
 ] 

TezQA commented on TEZ-3817:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12919826/TEZ-3817.005.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2766//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2766//console

This message is automatically generated.


> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-19 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16444589#comment-16444589
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Request for comments/review [~jlowe], [~jeagles]. Thanks a lot!

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-23 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448720#comment-16448720
 ] 

Jonathan Eagles commented on TEZ-3817:
--

+1 on v5 patch from me. Let me check with [~jlowe] to see if he accepts the 
changes based on his comments.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-23 Thread Jason Lowe (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448840#comment-16448840
 ] 

Jason Lowe commented on TEZ-3817:
-

+1 lgtm.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3817) DAGs can hang after more than one uncaught Exception during doTransition.

2018-04-23 Thread Kuhu Shukla (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16448882#comment-16448882
 ] 

Kuhu Shukla commented on TEZ-3817:
--

Thank you [~jlowe], [~jeagles] for the review. I am going to commit this 
shortly.

> DAGs can hang after more than one uncaught Exception during doTransition.
> -
>
> Key: TEZ-3817
> URL: https://issues.apache.org/jira/browse/TEZ-3817
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.1, 0.9.0
>Reporter: Kuhu Shukla
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3817.001.patch, TEZ-3817.002.patch, 
> TEZ-3817.003.patch, TEZ-3817.004.patch, TEZ-3817.005.patch, 
> TEZ-3817.test.patch
>
>
> A Tez DAG can hang in the last "sane" state if the 
> statemachine.doTransition() throws a runtime exception more than once. The 
> transition for the Error state itself throws an exception, the DAG hangs. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)