[jira] [Commented] (TEZ-3761) NPE in Fetcher under load
[ https://issues.apache.org/jira/browse/TEZ-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060344#comment-16060344 ] Rajesh Balamohan commented on TEZ-3761: --- Patch lgtm. +1. Thanks [~jeagles]. > NPE in Fetcher under load > - > > Key: TEZ-3761 > URL: https://issues.apache.org/jira/browse/TEZ-3761 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Eagles > Attachments: TEZ-3618.2.patch, TEZ-3618.3.patch, TEZ-3761.debug.patch > > > Env: apache tez + apache hive master > {noformat} > 2017-06-14 00:24:53,795 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1490656001509_5009_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Reducer 36, > taskAttemptId=attempt_1490656001509_5009_1_15_13_0, > creationTime=1497414223481, allocationTime=1497414290240, > startTime=1497414290240, finishTime=1497414293795, timeTaken=3555, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=INPUT_READ_ERROR, > diagnostics=Error: Error while running task ( failure ) : > java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > , errorMessage=Fetch failed:java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Query for ref: Q4 with 10 TB TPC-DS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3767) Shuffle should not report error to AM during inputContext.killSelf()
[ https://issues.apache.org/jira/browse/TEZ-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060284#comment-16060284 ] TezQA commented on TEZ-3767: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874161/TEZ-3767.2.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2540//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2540//console This message is automatically generated. > Shuffle should not report error to AM during inputContext.killSelf() > > > Key: TEZ-3767 > URL: https://issues.apache.org/jira/browse/TEZ-3767 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-3767.1.patch, TEZ-3767.2.patch, TEZ-3767.2.patch > > > {{ShuffleScheduler::killSelf}} kills the current attempt when it encounters > certain errors. As a part of cleanup, it invokes {{close}} which internally > releases the resources. > If merge is happening in the middle, it could throw the following exception. > This is caught in {{RunShuffleCallable}} and reported to AM immediately. This > causes tasks to fail. > {noformat} > » Error: Error while running task ( failure ) : > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:320) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265) > at java.util.AbstractCollection.toArray(AbstractCollection.java:141) > at java.util.ArrayList.addAll(ArrayList.java:577) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:636) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:316) > ... 6 more > {noformat} > When {{isShutDown}} is set to true, it would be good to avoid sending error > messages to AM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3767 PreCommit Build #2540
Jira: https://issues.apache.org/jira/browse/TEZ-3767 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2540/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 332.46 KB...] [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-yarn-timeline-history-with-acls [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874161/TEZ-3767.2.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2540//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2540//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 9a03d395ec167b55e85c396a5baeb1eea86878b8 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 3 tests failed. FAILED: org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.testDagLoggingEnabled Error Message: expected:<200> but was:<404> Stack Trace: java.lang.AssertionError: expected:<200> but was:<404> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.testDagLoggingEnabled(TestATSHistoryWithACLs.java:460) FAILED: org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.testSimpleAMACls Error Message: expected:<200> but was:<404> Stack Trace: java.lang.AssertionError: expected:<200> but was:<404> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.getTimelineData(TestATSHistoryWithACLs.java:149) at org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.getDomain(TestATSHistoryWithACLs.java:161) at org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.testSimpleAMACls(TestATSHistoryWithACLs.java:238) FAILED: org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.testDAGACls Error Message: expected:<200> but was:<404> Stack Trace: java.lang.AssertionError: expected:<200> but was:<404> at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.junit.Assert.assertEquals(Assert.java:542) at org.apache.tez.dag.history.ats.acls.TestATSHistoryWithACLs.getTimelineData(TestATSHisto
[jira] [Updated] (TEZ-3767) Shuffle should not report error to AM during inputContext.killSelf()
[ https://issues.apache.org/jira/browse/TEZ-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3767: -- Attachment: TEZ-3767.2.patch Thanks [~sseth]. Attaching revised patch. > Shuffle should not report error to AM during inputContext.killSelf() > > > Key: TEZ-3767 > URL: https://issues.apache.org/jira/browse/TEZ-3767 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-3767.1.patch, TEZ-3767.2.patch, TEZ-3767.2.patch > > > {{ShuffleScheduler::killSelf}} kills the current attempt when it encounters > certain errors. As a part of cleanup, it invokes {{close}} which internally > releases the resources. > If merge is happening in the middle, it could throw the following exception. > This is caught in {{RunShuffleCallable}} and reported to AM immediately. This > causes tasks to fail. > {noformat} > » Error: Error while running task ( failure ) : > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:320) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265) > at java.util.AbstractCollection.toArray(AbstractCollection.java:141) > at java.util.ArrayList.addAll(ArrayList.java:577) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:636) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:316) > ... 6 more > {noformat} > When {{isShutDown}} is set to true, it would be good to avoid sending error > messages to AM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3767) Shuffle should not report error to AM during inputContext.killSelf()
[ https://issues.apache.org/jira/browse/TEZ-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060182#comment-16060182 ] Siddharth Seth commented on TEZ-3767: - Exceptions from ShuffleCallable are supposed to be handled by the onFailure method in ShuffleRunnerFutureCallback, which already has checks in place for whether shutdown has been invoked or not. The killSelf invocation in ShuffleScheduler ends up invoking close on the ShuffleScheduler, which is part of Shuffle. Normally, Shuffle is supposed to control this component shutting down. I think it will be better if we extend the ExceptionReporter interface implemented by Shuffle to include the killSelf functionality. With that, Shuffle will continue to be responsible for the ShuffleScheduler lifecycle, after it kills itself. > Shuffle should not report error to AM during inputContext.killSelf() > > > Key: TEZ-3767 > URL: https://issues.apache.org/jira/browse/TEZ-3767 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Rajesh Balamohan > Attachments: TEZ-3767.1.patch, TEZ-3767.2.patch > > > {{ShuffleScheduler::killSelf}} kills the current attempt when it encounters > certain errors. As a part of cleanup, it invokes {{close}} which internally > releases the resources. > If merge is happening in the middle, it could throw the following exception. > This is caught in {{RunShuffleCallable}} and reported to AM immediately. This > causes tasks to fail. > {noformat} > » Error: Error while running task ( failure ) : > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$ShuffleError: > Error while doing final merge > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:320) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:285) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.util.ConcurrentModificationException > at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1211) > at java.util.TreeMap$KeyIterator.next(TreeMap.java:1265) > at java.util.AbstractCollection.toArray(AbstractCollection.java:141) > at java.util.ArrayList.addAll(ArrayList.java:577) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.MergeManager.close(MergeManager.java:636) > at > org.apache.tez.runtime.library.common.shuffle.orderedgrouped.Shuffle$RunShuffleCallable.callInternal(Shuffle.java:316) > ... 6 more > {noformat} > When {{isShutDown}} is set to true, it would be good to avoid sending error > messages to AM. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3772) Allow slowstart for small vertices to be treated differently
[ https://issues.apache.org/jira/browse/TEZ-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060145#comment-16060145 ] TezQA commented on TEZ-3772: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874144/tez-3772.001.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-TEZ-Build/2539//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2539//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2539//console This message is automatically generated. > Allow slowstart for small vertices to be treated differently > > > Key: TEZ-3772 > URL: https://issues.apache.org/jira/browse/TEZ-3772 > Project: Apache Tez > Issue Type: Improvement >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Attachments: tez-3772.001.patch > > > If there are a small number of reduces (configurable), then having a > different threshold can benefit. Performance of jobs with a small number of > reduce tasks can benefit significantly. Yes, the job could specify slowstart > as 0.0 instead of the default, but that requires job owners to do something. > It would be better if the defaults did something more optimal for both large > and small jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3772 PreCommit Build #2539
Jira: https://issues.apache.org/jira/browse/TEZ-3772 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2539/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 339.72 KB...] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 52:07 min [INFO] Finished at: 2017-06-22T22:58:29Z [INFO] Final Memory: 97M/1415M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874144/tez-3772.001.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. See https://builds.apache.org/job/PreCommit-TEZ-Build/2539//artifact/patchprocess/diffJavadocWarnings.txt for details. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2539//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2539//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 259eabbe01b6bba55a7c9bbea4389a13efa66cf4 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3772) Allow slowstart for small vertices to be treated differently
[ https://issues.apache.org/jira/browse/TEZ-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060096#comment-16060096 ] Siddharth Seth commented on TEZ-3772: - [~samirkhan] - is there a specific observation which leads to this change. For smaller jobs, should reducers start immediately, only after the mappers are complete? > Allow slowstart for small vertices to be treated differently > > > Key: TEZ-3772 > URL: https://issues.apache.org/jira/browse/TEZ-3772 > Project: Apache Tez > Issue Type: Improvement >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Attachments: tez-3772.001.patch > > > If there are a small number of reduces (configurable), then having a > different threshold can benefit. Performance of jobs with a small number of > reduce tasks can benefit significantly. Yes, the job could specify slowstart > as 0.0 instead of the default, but that requires job owners to do something. > It would be better if the defaults did something more optimal for both large > and small jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3771) Tez UI: WASB/ADLS counters should be listed on the Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060092#comment-16060092 ] Siddharth Seth commented on TEZ-3771: - Actually, Is there a way where a site deployment can add specific fields. This is the default set. Along with this, whatever is present in a custom config file gets added to the list ? > Tez UI: WASB/ADLS counters should be listed on the Tez UI > -- > > Key: TEZ-3771 > URL: https://issues.apache.org/jira/browse/TEZ-3771 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3771.1.patch > > > Under Group : org.apache.tez.common.counters.FileSystemCounter > Counter names : WASB_BYTES_READ, WASB_BYTES_WRITTEN, ADL_BYTES_READ and > ADL_BYTES_WRITTEN must be added -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3771) Tez UI: WASB/ADLS counters should be listed on the Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16060090#comment-16060090 ] Siddharth Seth commented on TEZ-3771: - +1. > Tez UI: WASB/ADLS counters should be listed on the Tez UI > -- > > Key: TEZ-3771 > URL: https://issues.apache.org/jira/browse/TEZ-3771 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3771.1.patch > > > Under Group : org.apache.tez.common.counters.FileSystemCounter > Counter names : WASB_BYTES_READ, WASB_BYTES_WRITTEN, ADL_BYTES_READ and > ADL_BYTES_WRITTEN must be added -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3772) Allow slowstart for small vertices to be treated differently
[ https://issues.apache.org/jira/browse/TEZ-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muhammad Samir Khan updated TEZ-3772: - Summary: Allow slowstart for small vertices to be treated differently (was: Allow slowstart for small jobs to be treated differently) > Allow slowstart for small vertices to be treated differently > > > Key: TEZ-3772 > URL: https://issues.apache.org/jira/browse/TEZ-3772 > Project: Apache Tez > Issue Type: Improvement >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Attachments: tez-3772.001.patch > > > If there are a small number of reduces (configurable), then having a > different threshold can benefit. Performance of jobs with a small number of > reduce tasks can benefit significantly. Yes, the job could specify slowstart > as 0.0 instead of the default, but that requires job owners to do something. > It would be better if the defaults did something more optimal for both large > and small jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3772) Allow slowstart for small jobs to be treated differently
[ https://issues.apache.org/jira/browse/TEZ-3772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Muhammad Samir Khan updated TEZ-3772: - Attachment: tez-3772.001.patch Added new configurations to distinguish between small/regular vertex and to set different min/max fractions for slowstart for small vertices. > Allow slowstart for small jobs to be treated differently > > > Key: TEZ-3772 > URL: https://issues.apache.org/jira/browse/TEZ-3772 > Project: Apache Tez > Issue Type: Improvement >Reporter: Muhammad Samir Khan >Assignee: Muhammad Samir Khan > Attachments: tez-3772.001.patch > > > If there are a small number of reduces (configurable), then having a > different threshold can benefit. Performance of jobs with a small number of > reduce tasks can benefit significantly. Yes, the job could specify slowstart > as 0.0 instead of the default, but that requires job owners to do something. > It would be better if the defaults did something more optimal for both large > and small jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3772) Allow slowstart for small jobs to be treated differently
Muhammad Samir Khan created TEZ-3772: Summary: Allow slowstart for small jobs to be treated differently Key: TEZ-3772 URL: https://issues.apache.org/jira/browse/TEZ-3772 Project: Apache Tez Issue Type: Improvement Reporter: Muhammad Samir Khan Assignee: Muhammad Samir Khan If there are a small number of reduces (configurable), then having a different threshold can benefit. Performance of jobs with a small number of reduce tasks can benefit significantly. Yes, the job could specify slowstart as 0.0 instead of the default, but that requires job owners to do something. It would be better if the defaults did something more optimal for both large and small jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3771 PreCommit Build #2538
Jira: https://issues.apache.org/jira/browse/TEZ-3771 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2538/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 339.09 KB...] [INFO] Total time: 55:43 min [INFO] Finished at: 2017-06-22T20:02:51Z [INFO] Final Memory: 89M/1324M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874121/TEZ-3771.1.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2538//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2538//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 691315a52bd4d1e05d8a97deee364868fc1fe56c logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Compressed 3.50 MB of artifacts by 27.7% relative to #2537 [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3771) Tez UI: WASB/ADLS counters should be listed on the Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059916#comment-16059916 ] TezQA commented on TEZ-3771: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874121/TEZ-3771.1.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2538//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2538//console This message is automatically generated. > Tez UI: WASB/ADLS counters should be listed on the Tez UI > -- > > Key: TEZ-3771 > URL: https://issues.apache.org/jira/browse/TEZ-3771 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3771.1.patch > > > Under Group : org.apache.tez.common.counters.FileSystemCounter > Counter names : WASB_BYTES_READ, WASB_BYTES_WRITTEN, ADL_BYTES_READ and > ADL_BYTES_WRITTEN must be added -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3771) Tez UI: WASB/ADLS counters should be listed on the Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sreenath Somarajapuram updated TEZ-3771: Attachment: TEZ-3771.1.patch [~sseth] Please help in reviewing the patch. > Tez UI: WASB/ADLS counters should be listed on the Tez UI > -- > > Key: TEZ-3771 > URL: https://issues.apache.org/jira/browse/TEZ-3771 > Project: Apache Tez > Issue Type: Bug >Reporter: Sreenath Somarajapuram >Assignee: Sreenath Somarajapuram > Attachments: TEZ-3771.1.patch > > > Under Group : org.apache.tez.common.counters.FileSystemCounter > Counter names : WASB_BYTES_READ, WASB_BYTES_WRITTEN, ADL_BYTES_READ and > ADL_BYTES_WRITTEN must be added -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3771) Tez UI: WASB/ADLS counters should be listed on the Tez UI
Sreenath Somarajapuram created TEZ-3771: --- Summary: Tez UI: WASB/ADLS counters should be listed on the Tez UI Key: TEZ-3771 URL: https://issues.apache.org/jira/browse/TEZ-3771 Project: Apache Tez Issue Type: Bug Reporter: Sreenath Somarajapuram Assignee: Sreenath Somarajapuram Under Group : org.apache.tez.common.counters.FileSystemCounter Counter names : WASB_BYTES_READ, WASB_BYTES_WRITTEN, ADL_BYTES_READ and ADL_BYTES_WRITTEN must be added -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3761) NPE in Fetcher under load
[ https://issues.apache.org/jira/browse/TEZ-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059810#comment-16059810 ] Jonathan Eagles commented on TEZ-3761: -- TestAnalyzer Test failure is unrelated to this patch. [~rajesh.balamohan], can you have another look? > NPE in Fetcher under load > - > > Key: TEZ-3761 > URL: https://issues.apache.org/jira/browse/TEZ-3761 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Eagles > Attachments: TEZ-3618.2.patch, TEZ-3618.3.patch, TEZ-3761.debug.patch > > > Env: apache tez + apache hive master > {noformat} > 2017-06-14 00:24:53,795 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1490656001509_5009_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Reducer 36, > taskAttemptId=attempt_1490656001509_5009_1_15_13_0, > creationTime=1497414223481, allocationTime=1497414290240, > startTime=1497414290240, finishTime=1497414293795, timeTaken=3555, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=INPUT_READ_ERROR, > diagnostics=Error: Error while running task ( failure ) : > java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > , errorMessage=Fetch failed:java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Query for ref: Q4 with 10 TB TPC-DS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Success: TEZ-3770 PreCommit Build #2537
Jira: https://issues.apache.org/jira/browse/TEZ-3770 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2537/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 340.61 KB...] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 54:35 min [INFO] Finished at: 2017-06-22T17:40:47Z [INFO] Final Memory: 93M/1303M [INFO] {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874106/TEZ-3770.001.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2537//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2537//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 15d74f4d2381cb8d7a49f75f7a208773f8e77abb logged out == == Finished build. == == Archiving artifacts Compressed 3.51 MB of artifacts by 26.7% relative to #2534 [description-setter] Description set: TEZ-3770 Recording test results Email was triggered for: Success Sending email for trigger: Success ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Commented] (TEZ-3770) DAG-aware YARN task scheduler
[ https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059731#comment-16059731 ] TezQA commented on TEZ-3770: {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874106/TEZ-3770.001.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2537//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2537//console This message is automatically generated. > DAG-aware YARN task scheduler > - > > Key: TEZ-3770 > URL: https://issues.apache.org/jira/browse/TEZ-3770 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: TEZ-3770.001.patch > > > There are cases where priority alone does not convey the relationship between > tasks, and this can cause problems when scheduling or preempting tasks. If > the YARN task scheduler was aware of the relationship between tasks then it > could make smarter decisions when trying to assign tasks to containers or > preempt running tasks to schedule pending tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3761) NPE in Fetcher under load
[ https://issues.apache.org/jira/browse/TEZ-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059692#comment-16059692 ] TezQA commented on TEZ-3761: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874104/TEZ-3618.3.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.analyzer.TestAnalyzer Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2536//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2536//console This message is automatically generated. > NPE in Fetcher under load > - > > Key: TEZ-3761 > URL: https://issues.apache.org/jira/browse/TEZ-3761 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Eagles > Attachments: TEZ-3618.2.patch, TEZ-3618.3.patch, TEZ-3761.debug.patch > > > Env: apache tez + apache hive master > {noformat} > 2017-06-14 00:24:53,795 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1490656001509_5009_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Reducer 36, > taskAttemptId=attempt_1490656001509_5009_1_15_13_0, > creationTime=1497414223481, allocationTime=1497414290240, > startTime=1497414290240, finishTime=1497414293795, timeTaken=3555, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=INPUT_READ_ERROR, > diagnostics=Error: Error while running task ( failure ) : > java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > , errorMessage=Fetch failed:java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Query for ref: Q4 with 10 TB TPC-DS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3761 PreCommit Build #2536
Jira: https://issues.apache.org/jira/browse/TEZ-3761 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2536/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 331.00 KB...] [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-job-analyzer [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874104/TEZ-3618.3.patch against master revision 4a7719b. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.analyzer.TestAnalyzer Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2536//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2536//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. d2fe00fe0493fd16c5e47ba816e55bfa511f536c logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 1 tests failed. FAILED: org.apache.tez.analyzer.TestAnalyzer.testWithATS Error Message: null Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tez.analyzer.TestAnalyzer.getDagInfo(TestAnalyzer.java:264) at org.apache.tez.analyzer.TestAnalyzer.verify(TestAnalyzer.java:251) at org.apache.tez.analyzer.TestAnalyzer.runTests(TestAnalyzer.java:390) at org.apache.tez.analyzer.TestAnalyzer.testWithATS(TestAnalyzer.java:354)
[jira] [Updated] (TEZ-3770) DAG-aware YARN task scheduler
[ https://issues.apache.org/jira/browse/TEZ-3770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Lowe updated TEZ-3770: Attachment: TEZ-3770.001.patch Attaching a patch that provides a new scheduler class, DagAwareYarnTaskScheduler. The scheduler is a very tricky place to change, so to mitigate risk I implemented it as a separate scheduler that must be enabled via configuration. This scheduler has the following high-level behavioral differences from the existing YarnTaskSchedulerService class: - It tries to schedule new containers for tasks that match its priority before trying to schedule the highest priority task first. This avoids hanging onto unused, lower priority containers because higher priority requests are pending (see TEZ-3535). - New task allocation requests are first matched against idle containers before requesting resources from the RM. This cuts down on AM-RM protocol churn. - Task requests for tasks that are DAG-descendants of pending task requests will not be allocated to help reduce priority inversions that could lead to preemption. - Running tasks will only be preempted if they are DAG-descendants of tasks that have pending allocation requests. > DAG-aware YARN task scheduler > - > > Key: TEZ-3770 > URL: https://issues.apache.org/jira/browse/TEZ-3770 > Project: Apache Tez > Issue Type: New Feature >Reporter: Jason Lowe >Assignee: Jason Lowe > Attachments: TEZ-3770.001.patch > > > There are cases where priority alone does not convey the relationship between > tasks, and this can cause problems when scheduling or preempting tasks. If > the YARN task scheduler was aware of the relationship between tasks then it > could make smarter decisions when trying to assign tasks to containers or > preempt running tasks to schedule pending tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-394) Better scheduling for uneven DAGs
[ https://issues.apache.org/jira/browse/TEZ-394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059611#comment-16059611 ] Jason Lowe commented on TEZ-394: I filed TEZ-3770 for the DAG-aware YARN task scheduler. > Better scheduling for uneven DAGs > - > > Key: TEZ-394 > URL: https://issues.apache.org/jira/browse/TEZ-394 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Rohini Palaniswamy >Assignee: Jason Lowe > Attachments: TEZ-394.001.patch, TEZ-394.002.patch, TEZ-394.003.patch > > > Consider a series of joins or group by on dataset A with few datasets that > takes 10 hours followed by a final join with a dataset X. The vertex that > loads dataset X will be one of the top vertexes and initialized early even > though its output is not consumed till the end after 10 hours. > 1) Could either use delayed start logic for better resource allocation > 2) Else if they are started upfront, need to handle failure/recovery cases > where the nodes which executed the MapTask might have gone down when the > final join happens. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3770) DAG-aware YARN task scheduler
Jason Lowe created TEZ-3770: --- Summary: DAG-aware YARN task scheduler Key: TEZ-3770 URL: https://issues.apache.org/jira/browse/TEZ-3770 Project: Apache Tez Issue Type: New Feature Reporter: Jason Lowe Assignee: Jason Lowe There are cases where priority alone does not convey the relationship between tasks, and this can cause problems when scheduling or preempting tasks. If the YARN task scheduler was aware of the relationship between tasks then it could make smarter decisions when trying to assign tasks to containers or preempt running tasks to schedule pending tasks. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3761) NPE in Fetcher under load
[ https://issues.apache.org/jira/browse/TEZ-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3761: - Attachment: TEZ-3618.3.patch > NPE in Fetcher under load > - > > Key: TEZ-3761 > URL: https://issues.apache.org/jira/browse/TEZ-3761 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Eagles > Attachments: TEZ-3618.2.patch, TEZ-3618.3.patch, TEZ-3761.debug.patch > > > Env: apache tez + apache hive master > {noformat} > 2017-06-14 00:24:53,795 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1490656001509_5009_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Reducer 36, > taskAttemptId=attempt_1490656001509_5009_1_15_13_0, > creationTime=1497414223481, allocationTime=1497414290240, > startTime=1497414290240, finishTime=1497414293795, timeTaken=3555, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=INPUT_READ_ERROR, > diagnostics=Error: Error while running task ( failure ) : > java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > , errorMessage=Fetch failed:java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Query for ref: Q4 with 10 TB TPC-DS -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3768) Test timeout value for TestShuffleHandlerJobs is low
[ https://issues.apache.org/jira/browse/TEZ-3768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059515#comment-16059515 ] Jonathan Eagles commented on TEZ-3768: -- +1. Thanks, [~kshukla]. > Test timeout value for TestShuffleHandlerJobs is low > > > Key: TEZ-3768 > URL: https://issues.apache.org/jira/browse/TEZ-3768 > Project: Apache Tez > Issue Type: Bug >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla >Priority: Minor > Attachments: TEZ-3768.001.patch, TEZ-3768.002.patch > > > The test can fail with a timeout on slow build machines. One minute is > clearly too less. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3758) Vertex can hang in RUNNING state when two task attempts finish very closely and have retroactive failures
[ https://issues.apache.org/jira/browse/TEZ-3758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059513#comment-16059513 ] Jonathan Eagles commented on TEZ-3758: -- +1. Thanks, [~kshukla] > Vertex can hang in RUNNING state when two task attempts finish very closely > and have retroactive failures > - > > Key: TEZ-3758 > URL: https://issues.apache.org/jira/browse/TEZ-3758 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.7.1, 0.9.0 >Reporter: Kuhu Shukla >Assignee: Kuhu Shukla > Attachments: TEZ-3758.001.patch, TEZ-3758.002.patch, > TEZ-3758.003.patch, TEZ-3758.004.patch > > > A vertex's count of what tasks are done can go off in a case where two task > attempts finish very closely, say within a millisecond of each other. We had > a case where this task, which was marked successful, never scheduled another > attempt upon getting a retroactive failure since it thought it had one > uncompleted task attempt already. This is because the attempt that finished 1 > ms later transitioned to SUCCEEDED but we don't take any action on the > taskAttempStatus data structure and it stays false. This JIRA will attempt to > solve that race. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3769) Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled
[ https://issues.apache.org/jira/browse/TEZ-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16059318#comment-16059318 ] TezQA commented on TEZ-3769: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874074/TEZ-3769.1.patch against master revision a925c83. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 23 javac compiler warnings (more than the master's current 21 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//console This message is automatically generated. > Unordered: Fix wrong stats being sent out in the last event, when final merge > is disabled > - > > Key: TEZ-3769 > URL: https://issues.apache.org/jira/browse/TEZ-3769 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: TEZ-3769.1.patch > > > When final merge is disabled (without pipelining), wrong stats was sent out > in the last event. > It was based on {{numRecordsPerPartition}} which contains the overall > partition data. It should be ideally be based on the spill result and its > buffers. > Also, {{finalSpill}} was unncessarily sending events when no data was present > (i.e, when currentBuffer didn't have any data). This can be optimized to > reduce the number of events being sent across. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
Failed: TEZ-3769 PreCommit Build #2535
Jira: https://issues.apache.org/jira/browse/TEZ-3769 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2535/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 340.17 KB...] [INFO] [INFO] Total time: 54:59 min [INFO] Finished at: 2017-06-22T13:13:25Z [INFO] Final Memory: 94M/1392M [INFO] {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12874074/TEZ-3769.1.patch against master revision a925c83. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:red}-1 javac{color}. The applied patch generated 23 javac compiler warnings (more than the master's current 21 warnings). {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//testReport/ Javac warnings: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//artifact/patchprocess/diffJavacWarnings.txt Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2535//console This message is automatically generated. == == Adding comment to Jira. == == Comment added. 80eace128ab6f039b9e010fc72183762f0ad20e5 logged out == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts Compressed 3.51 MB of artifacts by 30.3% relative to #2534 [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## All tests passed
[jira] [Updated] (TEZ-3769) Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled
[ https://issues.apache.org/jira/browse/TEZ-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3769: -- Attachment: TEZ-3769.1.patch [~sseth], [~aplusplus], [~harishjp], [~jeagles] - Please review when you find time. Patch contains TEZ-3762 changes as well. > Unordered: Fix wrong stats being sent out in the last event, when final merge > is disabled > - > > Key: TEZ-3769 > URL: https://issues.apache.org/jira/browse/TEZ-3769 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > Attachments: TEZ-3769.1.patch > > > When final merge is disabled (without pipelining), wrong stats was sent out > in the last event. > It was based on {{numRecordsPerPartition}} which contains the overall > partition data. It should be ideally be based on the spill result and its > buffers. > Also, {{finalSpill}} was unncessarily sending events when no data was present > (i.e, when currentBuffer didn't have any data). This can be optimized to > reduce the number of events being sent across. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (TEZ-3769) Unordered: Fix wrong stats being sent out in the last event when final merge is disabled
Rajesh Balamohan created TEZ-3769: - Summary: Unordered: Fix wrong stats being sent out in the last event when final merge is disabled Key: TEZ-3769 URL: https://issues.apache.org/jira/browse/TEZ-3769 Project: Apache Tez Issue Type: Bug Reporter: Rajesh Balamohan When final merge is disabled (without pipelining), wrong stats was sent out in the last event. It was based on {{numRecordsPerPartition}} which contains the overall partition data. It should be ideally be based on the spill result and its buffers. Also, {{finalSpill}} was unncessarily sending events when no data was present (i.e, when currentBuffer didn't have any data). This can be optimized to reduce the number of events being sent across. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (TEZ-3769) Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled
[ https://issues.apache.org/jira/browse/TEZ-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajesh Balamohan updated TEZ-3769: -- Summary: Unordered: Fix wrong stats being sent out in the last event, when final merge is disabled (was: Unordered: Fix wrong stats being sent out in the last event when final merge is disabled) > Unordered: Fix wrong stats being sent out in the last event, when final merge > is disabled > - > > Key: TEZ-3769 > URL: https://issues.apache.org/jira/browse/TEZ-3769 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan > > When final merge is disabled (without pipelining), wrong stats was sent out > in the last event. > It was based on {{numRecordsPerPartition}} which contains the overall > partition data. It should be ideally be based on the spill result and its > buffers. > Also, {{finalSpill}} was unncessarily sending events when no data was present > (i.e, when currentBuffer didn't have any data). This can be optimized to > reduce the number of events being sent across. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (TEZ-3761) NPE in Fetcher under load
[ https://issues.apache.org/jira/browse/TEZ-3761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16058953#comment-16058953 ] Rajesh Balamohan commented on TEZ-3761: --- Thanks for sharing the patch [~jeagles]. Would be good to add {{inputAttemptIdentifier}} as well in the exception? > NPE in Fetcher under load > - > > Key: TEZ-3761 > URL: https://issues.apache.org/jira/browse/TEZ-3761 > Project: Apache Tez > Issue Type: Bug >Reporter: Rajesh Balamohan >Assignee: Jonathan Eagles > Attachments: TEZ-3618.2.patch, TEZ-3761.debug.patch > > > Env: apache tez + apache hive master > {noformat} > 2017-06-14 00:24:53,795 [INFO] [Dispatcher thread {Central}] > |HistoryEventHandler.criticalEvents|: > [HISTORY][DAG:dag_1490656001509_5009_1][Event:TASK_ATTEMPT_FINISHED]: > vertexName=Reducer 36, > taskAttemptId=attempt_1490656001509_5009_1_15_13_0, > creationTime=1497414223481, allocationTime=1497414290240, > startTime=1497414290240, finishTime=1497414293795, timeTaken=3555, > status=FAILED, taskFailureType=NON_FATAL, errorEnum=INPUT_READ_ERROR, > diagnostics=Error: Error while running task ( failure ) : > java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > , errorMessage=Fetch failed:java.lang.NullPointerException > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.fetchInputs(Fetcher.java:914) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:599) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.doHttpFetch(Fetcher.java:486) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:284) > at > org.apache.tez.runtime.library.common.shuffle.Fetcher.callInternal(Fetcher.java:76) > at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > {noformat} > Query for ref: Q4 with 10 TB TPC-DS -- This message was sent by Atlassian JIRA (v6.4.14#64029)