[jira] [Commented] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information
[ https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594457#comment-16594457 ] TezQA commented on TEZ-3958: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12937362/TEZ-3958.5.patch against master revision 261bbdd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//console This message is automatically generated. > Add internal vertex priority information into the tez dag.dot debug > information > --- > > Key: TEZ-3958 > URL: https://issues.apache.org/jira/browse/TEZ-3958 > Project: Apache Tez > Issue Type: Improvement >Reporter: Gopal V >Assignee: Jaume M >Priority: Major > Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, > TEZ-3958.4.patch, TEZ-3958.5.patch > > > Adding the actual vertex priority as computed by Tez into the debug dag.dot > file would allows the debugging of task pre-emption issues when the DAG is no > longer a tree. > There are pre-emption issues with isomerization of Tez DAGs, where the a > R-isomer dag with mirror rotation runs at a different speed than the L-isomer > dag, due to priorities at the same level changing due to the vertex-id order. > Since the problem is hard to debug through, it would be good to record the > computed priority in the DAG .dot file in the logging directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Success: TEZ-3958 PreCommit Build #2901
Jira: https://issues.apache.org/jira/browse/TEZ-3958 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2901/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 381.08 KB...] [INFO] hadoop-shim-2.8 SUCCESS [ 0.996 s] [INFO] tez-dist ... SUCCESS [ 47.256 s] [INFO] Tez 0.10.0-SNAPSHOT SUCCESS [ 0.040 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 58:34 min [INFO] Finished at: 2018-08-28T03:16:51Z [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12937362/TEZ-3958.5.patch against master revision 261bbdd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2901//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Archiving artifacts [Fast Archiver] Compressed 4.10 MB of artifacts by 22.9% relative to #2900 [description-setter] Description set: TEZ-3958 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3980) ShuffleRunner: the wake loop needs to check for shutdown
[ https://issues.apache.org/jira/browse/TEZ-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594400#comment-16594400 ] Gunther Hagleitner commented on TEZ-3980: - +1 > ShuffleRunner: the wake loop needs to check for shutdown > > > Key: TEZ-3980 > URL: https://issues.apache.org/jira/browse/TEZ-3980 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: TEZ-3980.1.patch > > > In the ShuffleRunner threads, there's a loop which does not terminate if the > task threads get killed. > {code} > while ((runningFetchers.size() >= numFetchers || > pendingHosts.isEmpty()) > && numCompletedInputs.get() < numInputs) { > inputContext.notifyProgress(); > boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS); > } > {code} > The wakeLoop signal does not exit this out of the loop and is missing a break > for shut-down. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information
[ https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-3958: - Attachment: TEZ-3958.5.patch > Add internal vertex priority information into the tez dag.dot debug > information > --- > > Key: TEZ-3958 > URL: https://issues.apache.org/jira/browse/TEZ-3958 > Project: Apache Tez > Issue Type: Improvement >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, > TEZ-3958.4.patch, TEZ-3958.5.patch > > > Adding the actual vertex priority as computed by Tez into the debug dag.dot > file would allows the debugging of task pre-emption issues when the DAG is no > longer a tree. > There are pre-emption issues with isomerization of Tez DAGs, where the a > R-isomer dag with mirror rotation runs at a different speed than the L-isomer > dag, due to priorities at the same level changing due to the vertex-id order. > Since the problem is hard to debug through, it would be good to record the > computed priority in the DAG .dot file in the logging directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information
[ https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned TEZ-3958: Assignee: Jaume M (was: Gopal V) > Add internal vertex priority information into the tez dag.dot debug > information > --- > > Key: TEZ-3958 > URL: https://issues.apache.org/jira/browse/TEZ-3958 > Project: Apache Tez > Issue Type: Improvement >Reporter: Gopal V >Assignee: Jaume M >Priority: Major > Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, > TEZ-3958.4.patch, TEZ-3958.5.patch > > > Adding the actual vertex priority as computed by Tez into the debug dag.dot > file would allows the debugging of task pre-emption issues when the DAG is no > longer a tree. > There are pre-emption issues with isomerization of Tez DAGs, where the a > R-isomer dag with mirror rotation runs at a different speed than the L-isomer > dag, due to priorities at the same level changing due to the vertex-id order. > Since the problem is hard to debug through, it would be good to record the > computed priority in the DAG .dot file in the logging directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Success: TEZ-3985 PreCommit Build #2900
Jira: https://issues.apache.org/jira/browse/TEZ-3985 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2900/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 382.55 KB...] [INFO] hadoop-shim-impls .. SUCCESS [ 0.041 s] [INFO] hadoop-shim-2.8 SUCCESS [ 0.943 s] [INFO] tez-dist ... SUCCESS [ 44.763 s] [INFO] Tez 0.10.0-SNAPSHOT SUCCESS [ 0.048 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 58:58 min [INFO] Finished at: 2018-08-28T01:49:42Z [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12937355/TEZ-3985.1.patch against master revision 261bbdd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Archiving artifacts [description-setter] Description set: TEZ-3985 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup
[ https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594396#comment-16594396 ] TezQA commented on TEZ-3985: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12937355/TEZ-3985.1.patch against master revision 261bbdd. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2900//console This message is automatically generated. > Correctness: Throw a clear exception for DMEs sent during cleanup > - > > Key: TEZ-3985 > URL: https://issues.apache.org/jira/browse/TEZ-3985 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Assignee: Jaume M >Priority: Major > Attachments: TEZ-3985.1.patch > > > If a DME is sent during cleanup, that implies that the .close() of the > LogicalIOProcessorRuntimeTask did not succeed and therefore these events are > an error condition. > These events should not be sent and more importantly should be received by > the AM. > Throw a clear exception, in case of this & allow the developers to locate the > extraneous event from the backtrace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (TEZ-3958) Add internal vertex priority information into the tez dag.dot debug information
[ https://issues.apache.org/jira/browse/TEZ-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V reassigned TEZ-3958: Assignee: Gopal V (was: Jaume M) > Add internal vertex priority information into the tez dag.dot debug > information > --- > > Key: TEZ-3958 > URL: https://issues.apache.org/jira/browse/TEZ-3958 > Project: Apache Tez > Issue Type: Improvement >Reporter: Gopal V >Assignee: Gopal V >Priority: Major > Attachments: TEZ-3958.1.patch, TEZ-3958.2.patch, TEZ-3958.3.patch, > TEZ-3958.4.patch > > > Adding the actual vertex priority as computed by Tez into the debug dag.dot > file would allows the debugging of task pre-emption issues when the DAG is no > longer a tree. > There are pre-emption issues with isomerization of Tez DAGs, where the a > R-isomer dag with mirror rotation runs at a different speed than the L-isomer > dag, due to priorities at the same level changing due to the vertex-id order. > Since the problem is hard to debug through, it would be good to record the > computed priority in the DAG .dot file in the logging directories. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup
[ https://issues.apache.org/jira/browse/TEZ-3985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jaume M reassigned TEZ-3985: Assignee: Jaume M > Correctness: Throw a clear exception for DMEs sent during cleanup > - > > Key: TEZ-3985 > URL: https://issues.apache.org/jira/browse/TEZ-3985 > Project: Apache Tez > Issue Type: Bug >Reporter: Gopal V >Assignee: Jaume M >Priority: Major > Attachments: TEZ-3985.1.patch > > > If a DME is sent during cleanup, that implies that the .close() of the > LogicalIOProcessorRuntimeTask did not succeed and therefore these events are > an error condition. > These events should not be sent and more importantly should be received by > the AM. > Throw a clear exception, in case of this & allow the developers to locate the > extraneous event from the backtrace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-3985) Correctness: Throw a clear exception for DMEs sent during cleanup
Gopal V created TEZ-3985: Summary: Correctness: Throw a clear exception for DMEs sent during cleanup Key: TEZ-3985 URL: https://issues.apache.org/jira/browse/TEZ-3985 Project: Apache Tez Issue Type: Bug Reporter: Gopal V If a DME is sent during cleanup, that implies that the .close() of the LogicalIOProcessorRuntimeTask did not succeed and therefore these events are an error condition. These events should not be sent and more importantly should be received by the AM. Throw a clear exception, in case of this & allow the developers to locate the extraneous event from the backtrace. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors
[ https://issues.apache.org/jira/browse/TEZ-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16594304#comment-16594304 ] Gopal V commented on TEZ-3984: -- Specific sequence of events is - input throws exception. {code} 2018-08-27T17:25:15,579 WARN [TezTR-437616_7273_9_0_0_0 (1520459437616_7273_9_00_00_0)] runtime.LogicalIOProcessorRuntimeTask: Ignoring exception when closing input calls(cleanup). Exception class=java.io.IOException, message ... {code} Output gets closed for memory recovery {code} 2018-08-27T17:25:15,579 INFO [TezTR-437616_7273_9_0_0_0 (1520459437616_7273_9_00_00_0)] impl.PipelinedSorter: Reducer 2: Starting flush of map output {code} Sorter pushes event to the output context directly {code} 2018-08-27T17:25:15,990 INFO [TezTR-437616_7273_9_0_0_0 (1520459437616_7273_9_00_00_0)] impl.PipelinedSorter: Reducer 2: Adding spill event for spill (final update=true), spillId=0 {code} And the Reducer 2 gets the event routed to it. > Shuffle: Out of Band DME event sending causes errors > > > Key: TEZ-3984 > URL: https://issues.apache.org/jira/browse/TEZ-3984 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.4, 0.9.1, 0.10.0 >Reporter: Gopal V >Priority: Critical > Labels: correctness > > In case of a task Input throwing an exception, the outputs are also closed in > the LogicalIOProcessorRuntimeTask.cleanup(). > Cleanup ignore all the events returned by output close, however if any output > tries to send an event out of band by directly calling > outputContext.sendEvents(events), then those events can reach the AM before > the task failure is reported. > This can cause correctness issues with shuffle since zero sized events can be > sent out due to an input failure and downstream tasks may never reattempt a > fetch from the valid attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors
[ https://issues.apache.org/jira/browse/TEZ-3984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gopal V updated TEZ-3984: - Labels: correctness (was: ) > Shuffle: Out of Band DME event sending causes errors > > > Key: TEZ-3984 > URL: https://issues.apache.org/jira/browse/TEZ-3984 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.8.4, 0.9.1, 0.10.0 >Reporter: Gopal V >Priority: Critical > Labels: correctness > > In case of a task Input throwing an exception, the outputs are also closed in > the LogicalIOProcessorRuntimeTask.cleanup(). > Cleanup ignore all the events returned by output close, however if any output > tries to send an event out of band by directly calling > outputContext.sendEvents(events), then those events can reach the AM before > the task failure is reported. > This can cause correctness issues with shuffle since zero sized events can be > sent out due to an input failure and downstream tasks may never reattempt a > fetch from the valid attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-3984) Shuffle: Out of Band DME event sending causes errors
Gopal V created TEZ-3984: Summary: Shuffle: Out of Band DME event sending causes errors Key: TEZ-3984 URL: https://issues.apache.org/jira/browse/TEZ-3984 Project: Apache Tez Issue Type: Bug Affects Versions: 0.9.1, 0.8.4, 0.10.0 Reporter: Gopal V In case of a task Input throwing an exception, the outputs are also closed in the LogicalIOProcessorRuntimeTask.cleanup(). Cleanup ignore all the events returned by output close, however if any output tries to send an event out of band by directly calling outputContext.sendEvents(events), then those events can reach the AM before the task failure is reported. This can cause correctness issues with shuffle since zero sized events can be sent out due to an input failure and downstream tasks may never reattempt a fetch from the valid attempt. -- This message was sent by Atlassian JIRA (v7.6.3#76005)