[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502847#comment-16502847 ] TezQA commented on TEZ-3944: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926680/TEZ-3944.002.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//console This message is automatically generated. > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Failed: TEZ-3944 PreCommit Build #2830
Jira: https://issues.apache.org/jira/browse/TEZ-3944 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2830/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 377.76 KB...] [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException [ERROR] [Help 2] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-library [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926680/TEZ-3944.002.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 10 tests failed. FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, DISABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642) FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, ENABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.
[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502821#comment-16502821 ] Jonathan Eagles commented on TEZ-3944: -- Updated the tests to be in line with the level of mock that DagAwareYarnTaskScheduler is using. > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3944: - Attachment: TEZ-3944.002.patch > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502715#comment-16502715 ] TezQA commented on TEZ-3331: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926663/TEZ-3331.6.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.task.TestTaskExecution2 org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//console This message is automatically generated. > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey >Assignee: Hitesh Shah >Priority: Major > Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, > TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, > TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Failed: TEZ-3331 PreCommit Build #2829
Jira: https://issues.apache.org/jira/browse/TEZ-3331 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2829/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 379.41 KB...] [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-internals [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926663/TEZ-3331.6.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.task.TestTaskExecution2 org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 11 tests failed. FAILED: org.apache.tez.runtime.task.TestTaskExecution2.testMultipleSuccessfulTasks Error Message: null Stack Trace: java.lang.AssertionError: null at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.tez.runtime.task.TestTaskExecution2.verifySysCounters(TestTaskExecution2.java:682) at org.apache.tez.runtime.task.TestTaskExecution2.testMultipleSuccessfulTasks(TestTaskExecution2.java:180) FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, DISABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642) FAILED: org.apache.tez.runtime.library.common.writers.TestUnorder
[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502666#comment-16502666 ] TezQA commented on TEZ-3951: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926655/TEZ-3951.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//console This message is automatically generated. > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Failed: TEZ-3951 PreCommit Build #2828
Jira: https://issues.apache.org/jira/browse/TEZ-3951 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2828/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 383.84 KB...] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-library [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926655/TEZ-3951.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 12 tests failed. FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge[test[true, DISABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge(TestUnorderedPartitionedKVWriter.java:344) FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge[test[true, MEMORY_OPTIMIZED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(Dis
[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502663#comment-16502663 ] Prasanth Jayachandran commented on TEZ-3331: Rebased patch. Just removed the changes to root pom.xml from .5 patch which changes hadoop version. Since master is already at hadoop 3.0.2 we no longer required the root pom.xml changes. [~EricWohlstadter] / [~gopalv] can someone please review and commit this patch? > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey >Assignee: Hitesh Shah >Priority: Major > Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, > TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, > TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI
[ https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Prasanth Jayachandran updated TEZ-3331: --- Attachment: TEZ-3331.6.patch > Add operation specific HDFS counters for Tez UI > --- > > Key: TEZ-3331 > URL: https://issues.apache.org/jira/browse/TEZ-3331 > Project: Apache Tez > Issue Type: Bug >Reporter: Jitendra Nath Pandey >Assignee: Hitesh Shah >Priority: Major > Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, > TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, > TEZ-3331.wip.patch > > > Hadoop has added several operation specific counters in the FileSystem > statistics (HADOOP-13065). These counters are useful to track file system > operations more granularly. It would be great to track these counters for Tez > and expose them via UI as well. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3952) Allow Tez task speculation to allow for greater customization
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502640#comment-16502640 ] TezQA commented on TEZ-3952: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926650/TEZ-3952.001.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//console This message is automatically generated. > Allow Tez task speculation to allow for greater customization > - > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Attachments: TEZ-3952.001.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Failed: TEZ-3952 PreCommit Build #2827
Jira: https://issues.apache.org/jira/browse/TEZ-3952 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2827/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 379.78 KB...] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-library [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926650/TEZ-3952.001.patch against master revision 09102e5. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 10 tests failed. FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, DISABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:211) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.getSpillPathDetails(UnorderedPartitionedKVWriter.java:963) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.getSpillPathDetails(UnorderedPartitionedKVWriter.java:931) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.writeLargeRecord(UnorderedPartitionedKVWriter.java:1077) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:412) at org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:368) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedP
[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502606#comment-16502606 ] Sergey Shelukhin commented on TEZ-3951: --- Pewarm itself is a pretty obscure feature, and the time to wait to shut down prewarm DAG seems too esoteric to be a config setting. Any reason people would want to change it? > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502594#comment-16502594 ] Jonathan Eagles commented on TEZ-3951: -- [~sershe], is there a reason not to make the wait time configurable? > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502587#comment-16502587 ] Sergey Shelukhin commented on TEZ-3951: --- [~ewohlstadter] can you take a look? > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3951: -- Attachment: TEZ-3951.patch > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3951: -- Attachment: (was: TEZ-3951.patch) > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3951: -- Attachment: TEZ-3951.patch > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG
[ https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Shelukhin updated TEZ-3951: -- Summary: TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG (was: TezClient wait too long for the DAGClient for prewarm) > TezClient wait too long for the DAGClient for prewarm; tries to shut down the > wrong DAG > --- > > Key: TEZ-3951 > URL: https://issues.apache.org/jira/browse/TEZ-3951 > Project: Apache Tez > Issue Type: Bug >Reporter: Sergey Shelukhin >Assignee: Sergey Shelukhin >Priority: Major > Attachments: TEZ-3951.patch > > > Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Dash updated TEZ-3952: -- Attachment: (was: TEZ-3952.001.patch) > Allow Tez task speculation to allow for greater customization > - > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Attachments: TEZ-3952.001.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Dash updated TEZ-3952: -- Attachment: (was: TEZ-3952.001.patch) > Allow Tez task speculation to allow for greater customization > - > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Attachments: TEZ-3952.001.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Dash updated TEZ-3952: -- Attachment: TEZ-3952.001.patch > Allow Tez task speculation to allow for greater customization > - > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > Attachments: TEZ-3952.001.patch, TEZ-3952.001.patch > > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization
[ https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishant Dash updated TEZ-3952: -- Description: Many of the settings for Tez task speculation are hardcoded and should instead be configurable. For example, there's no equivalent config settings for the following MapReduce settings: - mapreduce.job.speculative.speculative-cap-running-tasks - mapreduce.job.speculative.retry-after-no-speculate - mapreduce.job.speculative.retry-after-speculate - mapreduce.job.speculative.minimum-allowed-tasks - mapreduce.job.speculative.speculative-cap-total-tasks > Allow Tez task speculation to allow for greater customization > - > > Key: TEZ-3952 > URL: https://issues.apache.org/jira/browse/TEZ-3952 > Project: Apache Tez > Issue Type: Improvement >Reporter: Nishant Dash >Assignee: Nishant Dash >Priority: Major > > Many of the settings for Tez task speculation are hardcoded and should > instead be configurable. For example, there's no equivalent config settings > for the following MapReduce settings: > - mapreduce.job.speculative.speculative-cap-running-tasks > - mapreduce.job.speculative.retry-after-no-speculate > - mapreduce.job.speculative.retry-after-speculate > - mapreduce.job.speculative.minimum-allowed-tasks > - mapreduce.job.speculative.speculative-cap-total-tasks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm
Sergey Shelukhin created TEZ-3951: - Summary: TezClient wait too long for the DAGClient for prewarm Key: TEZ-3951 URL: https://issues.apache.org/jira/browse/TEZ-3951 Project: Apache Tez Issue Type: Bug Reporter: Sergey Shelukhin Assignee: Sergey Shelukhin Follow-up from TEZ-3943 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-3952) Allow Tez task speculation to allow for greater customization
Nishant Dash created TEZ-3952: - Summary: Allow Tez task speculation to allow for greater customization Key: TEZ-3952 URL: https://issues.apache.org/jira/browse/TEZ-3952 Project: Apache Tez Issue Type: Improvement Reporter: Nishant Dash Assignee: Nishant Dash -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress
[ https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502525#comment-16502525 ] Jonathan Eagles commented on TEZ-3938: -- +1. Committing to master and branch-0.9 > Task attempts failing due to not making progress > > > Key: TEZ-3938 > URL: https://issues.apache.org/jira/browse/TEZ-3938 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch > > > Last progress time is initialized at TaskAttemptImpl object creation. > Heartbeats can be sent over the umbilical as soon as the container is > assigned an attempt. If the container assignment takes longer than the task > progress timeout, we can timeout the task on the first heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress
[ https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502459#comment-16502459 ] Kuhu Shukla commented on TEZ-3938: -- Verified test failure is unrelated. [~jeagles], request for review! Thanks lot! > Task attempts failing due to not making progress > > > Key: TEZ-3938 > URL: https://issues.apache.org/jira/browse/TEZ-3938 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch > > > Last progress time is initialized at TaskAttemptImpl object creation. > Heartbeats can be sent over the umbilical as soon as the container is > assigned an attempt. If the container assignment takes longer than the task > progress timeout, we can timeout the task on the first heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3949) TestATSHistoryV15 is failing with hadoop3+
[ https://issues.apache.org/jira/browse/TEZ-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502434#comment-16502434 ] Kuhu Shukla commented on TEZ-3949: -- +1. Thank you [~jeagles] for tracking this down. Committing this to master shortly. > TestATSHistoryV15 is failing with hadoop3+ > -- > > Key: TEZ-3949 > URL: https://issues.apache.org/jira/browse/TEZ-3949 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jonathan Eagles >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3949.001.patch > > > This is another case of the hadoop-mapreduce-client-shuffle dependency shift > in hadoop3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (TEZ-3912) Fetchers should be more robust to corrupted inputs
[ https://issues.apache.org/jira/browse/TEZ-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla reassigned TEZ-3912: Assignee: Kuhu Shukla > Fetchers should be more robust to corrupted inputs > -- > > Key: TEZ-3912 > URL: https://issues.apache.org/jira/browse/TEZ-3912 > Project: Apache Tez > Issue Type: Bug >Reporter: Jason Lowe >Assignee: Kuhu Shukla >Priority: Major > > I recently saw a case where a bad node in the cluster produced corrupted > shuffle data that caused the codec to throw IllegalArgumentException when > trying to fetch. Fetchers currently only handle IOException and > InternalError, and any other type of exception will cause the entire task to > be torn down. We should consider catching Exception like MapReduce does to > be more robust in light of other types of errors coming from the codec and > allow retries to occur. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502290#comment-16502290 ] Jonathan Eagles commented on TEZ-3944: -- Fixed my comment above and corrected to HADOOP-15450 > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500865#comment-16500865 ] Jonathan Eagles edited comment on TEZ-3944 at 6/5/18 6:18 PM: -- Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and is going to be fixed in 3.0.3 release HADOOP-15450 {noformat} java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) {noformat} was (Author: jeagles): Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and is going to be fixed in 3.0.3 release 15450 {noformat} java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) {noformat} > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500865#comment-16500865 ] Jonathan Eagles edited comment on TEZ-3944 at 6/5/18 6:18 PM: -- Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and is going to be fixed in 3.0.3 release 15450 {noformat} java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) {noformat} was (Author: jeagles): Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and is going to be fixed in 3.0.3 release HADOOP-1545 {noformat} java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) {noformat} > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3
[ https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502223#comment-16502223 ] Eric Wohlstadter commented on TEZ-3944: --- [~jeagles] I think HADOOP-1545 isn't the ticket you meant to tag for the DiskChecker performance regression. > TestTaskScheduler times-out on Hadoop3 > -- > > Key: TEZ-3944 > URL: https://issues.apache.org/jira/browse/TEZ-3944 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Eric Wohlstadter >Assignee: Jonathan Eagles >Priority: Major > Attachments: TEZ-3944.001.patch, > org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt > > > TestTaskScheduler times-out intermittently. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress
[ https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502186#comment-16502186 ] TezQA commented on TEZ-3938: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926586/TEZ-3938.002.patch against master revision b0eb9dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler org.apache.tez.dag.history.ats.acls.TestATSHistoryV15 Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//console This message is automatically generated. > Task attempts failing due to not making progress > > > Key: TEZ-3938 > URL: https://issues.apache.org/jira/browse/TEZ-3938 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch > > > Last progress time is initialized at TaskAttemptImpl object creation. > Heartbeats can be sent over the umbilical as soon as the container is > assigned an attempt. If the container assignment takes longer than the task > progress timeout, we can timeout the task on the first heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Failed: TEZ-3938 PreCommit Build #2826
Jira: https://issues.apache.org/jira/browse/TEZ-3938 Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2826/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 377.08 KB...] [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn -rf :tez-runtime-library [INFO] Build failures were ignored. {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12926586/TEZ-3938.002.patch against master revision b0eb9dc. {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified test files. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. There were no new javadoc warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 3.0.1) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests in : org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter The following test timeouts occurred in : org.apache.tez.dag.app.rm.TestTaskScheduler org.apache.tez.dag.history.ats.acls.TestATSHistoryV15 Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//testReport/ Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//console This message is automatically generated. == == Adding comment to Jira. == == == == Finished build. == == Build step 'Execute shell' marked build as failure Archiving artifacts [description-setter] Could not determine description. Recording test results Email was triggered for: Failure - Any Sending email for trigger: Failure - Any ### ## FAILED TESTS (if any) ## 10 tests failed. FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, DISABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351) at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133) at org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473) at org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642) FAILED: org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false, ENABLED]] Error Message: test timed out after 1 milliseconds Stack Trace: java.lang.Exception: test timed out after 1 milliseconds at java.io.FileDescriptor.sync(Native Method) at org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249) at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220) at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82) at org.apache.hadoo
[jira] [Commented] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED
[ https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502133#comment-16502133 ] Jonathan Eagles commented on TEZ-3950: -- The race is present in LocalTaskSchedulerService. However, the race in DagAwareYarnTaskScheduler and YarnTaskSchedulerService is easier to lose since there is no message queue in those services and the containerBeingReleased is called synchronously. > Preempted task attempts intermittently marked as FAILED instead of KILLED > - > > Key: TEZ-3950 > URL: https://issues.apache.org/jira/browse/TEZ-3950 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Jonathan Eagles >Priority: Major > Attachments: TEZ-3950.fail.patch > > > TestMockDAGAppMaster.testInternalPreemption intermittently fails with > expected: but was: > Crux of the matter is TaskSchedulerManager sends two events > - > TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends > AMContainerStopRequest -> TA_CONTAINER_TERMINATING > - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM > In order to kill a task attempt correctly the second message loop must > complete first. The first path is longer so the second message loop completes > almost always first. When the first message loop completes first, then the > task attempt is marked as FAILED and not KILLED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED
[ https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502124#comment-16502124 ] Jonathan Eagles commented on TEZ-3950: -- Attaching a patch that helps the first message loop complete first to induce the test failure for TestMockDAGAppMaster.testInternalPreemption > Preempted task attempts intermittently marked as FAILED instead of KILLED > - > > Key: TEZ-3950 > URL: https://issues.apache.org/jira/browse/TEZ-3950 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Jonathan Eagles >Priority: Major > Attachments: TEZ-3950.fail.patch > > > TestMockDAGAppMaster.testInternalPreemption intermittently fails with > expected: but was: > Crux of the matter is TaskSchedulerManager sends two events > - > TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends > AMContainerStopRequest -> TA_CONTAINER_TERMINATING > - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM > In order to kill a task attempt correctly the second message loop must > complete first. The first path is longer so the second message loop completes > almost always first. When the first message loop completes first, then the > task attempt is marked as FAILED and not KILLED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED
Jonathan Eagles created TEZ-3950: Summary: Preempted task attempts intermittently marked as FAILED instead of KILLED Key: TEZ-3950 URL: https://issues.apache.org/jira/browse/TEZ-3950 Project: Apache Tez Issue Type: Bug Affects Versions: 0.9.2, 0.10.0 Reporter: Jonathan Eagles Attachments: TEZ-3950.fail.patch TestMockDAGAppMaster.testInternalPreemption intermittently fails with expected: but was: Crux of the matter is TaskSchedulerManager sends two events - TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends AMContainerStopRequest -> TA_CONTAINER_TERMINATING - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM In order to kill a task attempt correctly the second message loop must complete first. The first path is longer so the second message loop completes almost always first. When the first message loop completes first, then the task attempt is marked as FAILED and not KILLED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED
[ https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3950: - Attachment: TEZ-3950.fail.patch > Preempted task attempts intermittently marked as FAILED instead of KILLED > - > > Key: TEZ-3950 > URL: https://issues.apache.org/jira/browse/TEZ-3950 > Project: Apache Tez > Issue Type: Bug >Affects Versions: 0.9.2, 0.10.0 >Reporter: Jonathan Eagles >Priority: Major > Attachments: TEZ-3950.fail.patch > > > TestMockDAGAppMaster.testInternalPreemption intermittently fails with > expected: but was: > Crux of the matter is TaskSchedulerManager sends two events > - > TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends > AMContainerStopRequest -> TA_CONTAINER_TERMINATING > - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM > In order to kill a task attempt correctly the second message loop must > complete first. The first path is longer so the second message loop completes > almost always first. When the first message loop completes first, then the > task attempt is marked as FAILED and not KILLED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails
[ https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles resolved TEZ-3005. -- Resolution: Cannot Reproduce > TestMockDAGAppMaster.testInternalPreemption fails > - > > Key: TEZ-3005 > URL: https://issues.apache.org/jira/browse/TEZ-3005 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jeff Zhang >Priority: Major > > {code} > testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster) Time > elapsed: 0.458 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211) > {code} > https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console > \cc [~bikassaha] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails
[ https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3005: - Attachment: TEZ-3005.fail.patch > TestMockDAGAppMaster.testInternalPreemption fails > - > > Key: TEZ-3005 > URL: https://issues.apache.org/jira/browse/TEZ-3005 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jeff Zhang >Priority: Major > > {code} > testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster) Time > elapsed: 0.458 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211) > {code} > https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console > \cc [~bikassaha] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails
[ https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles updated TEZ-3005: - Attachment: (was: TEZ-3005.fail.patch) > TestMockDAGAppMaster.testInternalPreemption fails > - > > Key: TEZ-3005 > URL: https://issues.apache.org/jira/browse/TEZ-3005 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jeff Zhang >Priority: Major > > {code} > testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster) Time > elapsed: 0.458 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211) > {code} > https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console > \cc [~bikassaha] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Reopened] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails
[ https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Eagles reopened TEZ-3005: -- > TestMockDAGAppMaster.testInternalPreemption fails > - > > Key: TEZ-3005 > URL: https://issues.apache.org/jira/browse/TEZ-3005 > Project: Apache Tez > Issue Type: Sub-task >Reporter: Jeff Zhang >Priority: Major > > {code} > testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster) Time > elapsed: 0.458 sec <<< FAILURE! > java.lang.AssertionError: expected: but was: > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at org.junit.Assert.assertEquals(Assert.java:144) > at > org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211) > {code} > https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console > \cc [~bikassaha] -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress
[ https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501977#comment-16501977 ] Kuhu Shukla commented on TEZ-3938: -- bq. Consider a MockClock instead of a SytemClock and then incrementTime instead of doing an actual sleep Done. bq. Remove unnecessary if failed event check. With this change my understanding is the task attempt will always enter the submitted state. Made changes to handle the fail progress event (as it is unexpected) and just check the final state. bq. The status update check now checks to see if it is initialized before failing due to lack of progress, but there is no test to prove status update before submitted transition works. Based on the state machine, task init followed by a status update is not possible. I have no added a test to check for it for this reason. Thank you for the review comments [~jeagles]. Appreciate further comments post pre-commit. The test failures from the earlier precommit are not related to this fix. > Task attempts failing due to not making progress > > > Key: TEZ-3938 > URL: https://issues.apache.org/jira/browse/TEZ-3938 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch > > > Last progress time is initialized at TaskAttemptImpl object creation. > Heartbeats can be sent over the umbilical as soon as the container is > assigned an attempt. If the container assignment takes longer than the task > progress timeout, we can timeout the task on the first heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (TEZ-3938) Task attempts failing due to not making progress
[ https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kuhu Shukla updated TEZ-3938: - Attachment: TEZ-3938.002.patch > Task attempts failing due to not making progress > > > Key: TEZ-3938 > URL: https://issues.apache.org/jira/browse/TEZ-3938 > Project: Apache Tez > Issue Type: Bug >Reporter: Jonathan Eagles >Assignee: Kuhu Shukla >Priority: Major > Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch > > > Last progress time is initialized at TaskAttemptImpl object creation. > Heartbeats can be sent over the umbilical as soon as the container is > assigned an attempt. If the container assignment takes longer than the task > progress timeout, we can timeout the task on the first heartbeat. -- This message was sent by Atlassian JIRA (v7.6.3#76005)