[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502847#comment-16502847
 ] 

TezQA commented on TEZ-3944:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926680/TEZ-3944.002.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2830//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//console

This message is automatically generated.


> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3944 PreCommit Build #2830

2018-06-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3944
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2830/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 377.76 KB...]
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please 
read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-runtime-library
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926680/TEZ-3944.002.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2830//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2830//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
10 tests failed.
FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 DISABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
at 
org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642)


FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 ENABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.

[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502821#comment-16502821
 ] 

Jonathan Eagles commented on TEZ-3944:
--

Updated the tests to be in line with the level of mock that 
DagAwareYarnTaskScheduler is using.

> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3944:
-
Attachment: TEZ-3944.002.patch

> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, TEZ-3944.002.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2018-06-05 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502715#comment-16502715
 ] 

TezQA commented on TEZ-3331:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926663/TEZ-3331.6.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution2
  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2829//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//console

This message is automatically generated.


> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Hitesh Shah
>Priority: Major
> Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, 
> TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, 
> TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3331 PreCommit Build #2829

2018-06-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3331
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2829/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 379.41 KB...]
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-runtime-internals
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926663/TEZ-3331.6.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 2 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.runtime.task.TestTaskExecution2
  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2829//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2829//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
11 tests failed.
FAILED:  
org.apache.tez.runtime.task.TestTaskExecution2.testMultipleSuccessfulTasks

Error Message:
null

Stack Trace:
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at 
org.apache.tez.runtime.task.TestTaskExecution2.verifySysCounters(TestTaskExecution2.java:682)
at 
org.apache.tez.runtime.task.TestTaskExecution2.testMultipleSuccessfulTasks(TestTaskExecution2.java:180)


FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 DISABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
at 
org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642)


FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorder

[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502666#comment-16502666
 ] 

TezQA commented on TEZ-3951:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926655/TEZ-3951.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2828//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//console

This message is automatically generated.


> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3951 PreCommit Build #2828

2018-06-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3951
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2828/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 383.84 KB...]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-runtime-library
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926655/TEZ-3951.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2828//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2828//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
12 tests failed.
FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge[test[true,
 DISABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
at 
org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge(TestUnorderedPartitionedKVWriter.java:344)


FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testTextMixedRecordsWithoutFinalMerge[test[true,
 MEMORY_OPTIMIZED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(Dis

[jira] [Commented] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2018-06-05 Thread Prasanth Jayachandran (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502663#comment-16502663
 ] 

Prasanth Jayachandran commented on TEZ-3331:


Rebased patch. Just removed the changes to root pom.xml from .5 patch which 
changes hadoop version.  Since master is already at hadoop 3.0.2 we no longer 
required the root pom.xml changes. 

[~EricWohlstadter] / [~gopalv] can someone please review and commit this patch?

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Hitesh Shah
>Priority: Major
> Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, 
> TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, 
> TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3331) Add operation specific HDFS counters for Tez UI

2018-06-05 Thread Prasanth Jayachandran (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated TEZ-3331:
---
Attachment: TEZ-3331.6.patch

> Add operation specific HDFS counters for Tez UI
> ---
>
> Key: TEZ-3331
> URL: https://issues.apache.org/jira/browse/TEZ-3331
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jitendra Nath Pandey
>Assignee: Hitesh Shah
>Priority: Major
> Attachments: TEZ-3331.6.patch, TEZ-3331.wip.2.patch, 
> TEZ-3331.wip.3.patch, TEZ-3331.wip.4.patch, TEZ-3331.wip.5.patch, 
> TEZ-3331.wip.patch
>
>
> Hadoop has added several operation specific counters in the FileSystem 
> statistics (HADOOP-13065). These counters are useful to track file system 
> operations more granularly. It would be great to track these counters for Tez 
> and expose them via UI as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502640#comment-16502640
 ] 

TezQA commented on TEZ-3952:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926650/TEZ-3952.001.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2827//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//console

This message is automatically generated.


> Allow Tez task speculation to allow for greater customization
> -
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Attachments: TEZ-3952.001.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3952 PreCommit Build #2827

2018-06-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3952
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2827/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 379.78 KB...]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-runtime-library
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926650/TEZ-3952.001.patch
  against master revision 09102e5.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2827//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2827//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
10 tests failed.
FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 DISABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
at 
org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillFileForWrite(TezTaskOutputFiles.java:211)
at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.getSpillPathDetails(UnorderedPartitionedKVWriter.java:963)
at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.getSpillPathDetails(UnorderedPartitionedKVWriter.java:931)
at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.writeLargeRecord(UnorderedPartitionedKVWriter.java:1077)
at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:412)
at 
org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:368)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedP

[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502606#comment-16502606
 ] 

Sergey Shelukhin commented on TEZ-3951:
---

Pewarm itself is a pretty obscure feature, and the time to wait to shut down 
prewarm DAG seems too esoteric to be a config setting. Any reason people would 
want to change it?

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502594#comment-16502594
 ] 

Jonathan Eagles commented on TEZ-3951:
--

[~sershe], is there a reason not to make the wait time configurable?

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502587#comment-16502587
 ] 

Sergey Shelukhin commented on TEZ-3951:
---

[~ewohlstadter] can you take a look?



> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3951:
--
Attachment: TEZ-3951.patch

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3951:
--
Attachment: (was: TEZ-3951.patch)

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3951:
--
Attachment: TEZ-3951.patch

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm; tries to shut down the wrong DAG

2018-06-05 Thread Sergey Shelukhin (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated TEZ-3951:
--
Summary: TezClient wait too long for the DAGClient for prewarm; tries to 
shut down the wrong DAG  (was: TezClient wait too long for the DAGClient for 
prewarm)

> TezClient wait too long for the DAGClient for prewarm; tries to shut down the 
> wrong DAG
> ---
>
> Key: TEZ-3951
> URL: https://issues.apache.org/jira/browse/TEZ-3951
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>Priority: Major
> Attachments: TEZ-3951.patch
>
>
> Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread Nishant Dash (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Dash updated TEZ-3952:
--
Attachment: (was: TEZ-3952.001.patch)

> Allow Tez task speculation to allow for greater customization
> -
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Attachments: TEZ-3952.001.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread Nishant Dash (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Dash updated TEZ-3952:
--
Attachment: (was: TEZ-3952.001.patch)

> Allow Tez task speculation to allow for greater customization
> -
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Attachments: TEZ-3952.001.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread Nishant Dash (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Dash updated TEZ-3952:
--
Attachment: TEZ-3952.001.patch

> Allow Tez task speculation to allow for greater customization
> -
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
> Attachments: TEZ-3952.001.patch, TEZ-3952.001.patch
>
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread Nishant Dash (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nishant Dash updated TEZ-3952:
--
Description: 
Many of the settings for Tez task speculation are hardcoded and should instead 
be configurable. For example, there's no equivalent config settings for the 
following MapReduce settings:
- mapreduce.job.speculative.speculative-cap-running-tasks
- mapreduce.job.speculative.retry-after-no-speculate
- mapreduce.job.speculative.retry-after-speculate
- mapreduce.job.speculative.minimum-allowed-tasks
- mapreduce.job.speculative.speculative-cap-total-tasks

> Allow Tez task speculation to allow for greater customization
> -
>
> Key: TEZ-3952
> URL: https://issues.apache.org/jira/browse/TEZ-3952
> Project: Apache Tez
>  Issue Type: Improvement
>Reporter: Nishant Dash
>Assignee: Nishant Dash
>Priority: Major
>
> Many of the settings for Tez task speculation are hardcoded and should 
> instead be configurable. For example, there's no equivalent config settings 
> for the following MapReduce settings:
> - mapreduce.job.speculative.speculative-cap-running-tasks
> - mapreduce.job.speculative.retry-after-no-speculate
> - mapreduce.job.speculative.retry-after-speculate
> - mapreduce.job.speculative.minimum-allowed-tasks
> - mapreduce.job.speculative.speculative-cap-total-tasks



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-3951) TezClient wait too long for the DAGClient for prewarm

2018-06-05 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created TEZ-3951:
-

 Summary: TezClient wait too long for the DAGClient for prewarm
 Key: TEZ-3951
 URL: https://issues.apache.org/jira/browse/TEZ-3951
 Project: Apache Tez
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin


Follow-up from TEZ-3943



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-3952) Allow Tez task speculation to allow for greater customization

2018-06-05 Thread Nishant Dash (JIRA)
Nishant Dash created TEZ-3952:
-

 Summary: Allow Tez task speculation to allow for greater 
customization
 Key: TEZ-3952
 URL: https://issues.apache.org/jira/browse/TEZ-3952
 Project: Apache Tez
  Issue Type: Improvement
Reporter: Nishant Dash
Assignee: Nishant Dash






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502525#comment-16502525
 ] 

Jonathan Eagles commented on TEZ-3938:
--

+1. Committing to master and branch-0.9

> Task attempts failing due to not making progress
> 
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress

2018-06-05 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502459#comment-16502459
 ] 

Kuhu Shukla commented on TEZ-3938:
--

Verified test failure is unrelated. [~jeagles], request for review! Thanks lot!

> Task attempts failing due to not making progress
> 
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3949) TestATSHistoryV15 is failing with hadoop3+

2018-06-05 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502434#comment-16502434
 ] 

Kuhu Shukla commented on TEZ-3949:
--

+1. Thank you [~jeagles] for tracking this down. Committing this to master 
shortly.

> TestATSHistoryV15 is failing with hadoop3+
> --
>
> Key: TEZ-3949
> URL: https://issues.apache.org/jira/browse/TEZ-3949
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3949.001.patch
>
>
> This is another case of the hadoop-mapreduce-client-shuffle dependency shift 
> in hadoop3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (TEZ-3912) Fetchers should be more robust to corrupted inputs

2018-06-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla reassigned TEZ-3912:


Assignee: Kuhu Shukla

> Fetchers should be more robust to corrupted inputs
> --
>
> Key: TEZ-3912
> URL: https://issues.apache.org/jira/browse/TEZ-3912
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jason Lowe
>Assignee: Kuhu Shukla
>Priority: Major
>
> I recently saw a case where a bad node in the cluster produced corrupted 
> shuffle data that caused the codec to throw IllegalArgumentException when 
> trying to fetch.  Fetchers currently only handle IOException and 
> InternalError, and any other type of exception will cause the entire task to 
> be torn down.  We should consider catching Exception like MapReduce does to 
> be more robust in light of other types of errors coming from the codec and 
> allow retries to occur.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502290#comment-16502290
 ] 

Jonathan Eagles commented on TEZ-3944:
--

Fixed my comment above and corrected to HADOOP-15450

> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500865#comment-16500865
 ] 

Jonathan Eagles edited comment on TEZ-3944 at 6/5/18 6:18 PM:
--

Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and 
is going to be fixed in 3.0.3 release HADOOP-15450
{noformat}
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
{noformat}


was (Author: jeagles):
Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and 
is going to be fixed in 3.0.3 release 15450
{noformat}
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
{noformat}

> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16500865#comment-16500865
 ] 

Jonathan Eagles edited comment on TEZ-3944 at 6/5/18 6:18 PM:
--

Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and 
is going to be fixed in 3.0.3 release 15450
{noformat}
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
{noformat}


was (Author: jeagles):
Test failure is due to DiskChecker performance regression in hadoop 3.0.2 and 
is going to be fixed in 3.0.3 release HADOOP-1545
{noformat}
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
{noformat}

> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3944) TestTaskScheduler times-out on Hadoop3

2018-06-05 Thread Eric Wohlstadter (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502223#comment-16502223
 ] 

Eric Wohlstadter commented on TEZ-3944:
---

[~jeagles]

I think HADOOP-1545 isn't the ticket you meant to tag for the DiskChecker 
performance regression.


> TestTaskScheduler times-out on Hadoop3
> --
>
> Key: TEZ-3944
> URL: https://issues.apache.org/jira/browse/TEZ-3944
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Eric Wohlstadter
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3944.001.patch, 
> org.apache.tez.dag.app.rm.TestTaskScheduler-output.txt
>
>
> TestTaskScheduler times-out intermittently.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress

2018-06-05 Thread TezQA (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502186#comment-16502186
 ] 

TezQA commented on TEZ-3938:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926586/TEZ-3938.002.patch
  against master revision b0eb9dc.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler
org.apache.tez.dag.history.ats.acls.TestATSHistoryV15

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2826//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//console

This message is automatically generated.


> Task attempts failing due to not making progress
> 
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3938 PreCommit Build #2826

2018-06-05 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3938
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2826/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 377.08 KB...]
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-runtime-library
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12926586/TEZ-3938.002.patch
  against master revision b0eb9dc.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 1 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter

  The following test timeouts occurred in :
 org.apache.tez.dag.app.rm.TestTaskScheduler
org.apache.tez.dag.history.ats.acls.TestATSHistoryV15

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2826//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2826//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
10 tests failed.
FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 DISABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.createPath(LocalDirAllocator.java:351)
at 
org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:426)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:152)
at 
org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:133)
at 
org.apache.tez.runtime.library.common.task.local.output.TezTaskOutputFiles.getSpillIndexFileForWrite(TezTaskOutputFiles.java:234)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.textTest(TestUnorderedPartitionedKVWriter.java:473)
at 
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle(TestUnorderedPartitionedKVWriter.java:642)


FAILED:  
org.apache.tez.runtime.library.common.writers.TestUnorderedPartitionedKVWriter.testLargeKvPairs_WithPipelinedShuffle[test[false,
 ENABLED]]

Error Message:
test timed out after 1 milliseconds

Stack Trace:
java.lang.Exception: test timed out after 1 milliseconds
at java.io.FileDescriptor.sync(Native Method)
at 
org.apache.hadoop.util.DiskChecker.diskIoCheckWithoutNativeIo(DiskChecker.java:249)
at org.apache.hadoop.util.DiskChecker.doDiskIo(DiskChecker.java:220)
at org.apache.hadoop.util.DiskChecker.checkDir(DiskChecker.java:82)
at 
org.apache.hadoo

[jira] [Commented] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502133#comment-16502133
 ] 

Jonathan Eagles commented on TEZ-3950:
--

The race is present in LocalTaskSchedulerService. However, the race in 
DagAwareYarnTaskScheduler and YarnTaskSchedulerService is easier to lose since 
there is no message queue in those services and the containerBeingReleased is 
called synchronously.

> Preempted task attempts intermittently marked as FAILED instead of KILLED
> -
>
> Key: TEZ-3950
> URL: https://issues.apache.org/jira/browse/TEZ-3950
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3950.fail.patch
>
>
> TestMockDAGAppMaster.testInternalPreemption intermittently fails with 
> expected: but was:
> Crux of the matter is TaskSchedulerManager sends two events
> - 
> TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
>  AMContainerStopRequest -> TA_CONTAINER_TERMINATING
> - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM
> In order to kill a task attempt correctly the second message loop must 
> complete first. The first path is longer so the second message loop completes 
> almost always first. When the first message loop completes first, then the 
> task attempt is marked as FAILED and not KILLED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED

2018-06-05 Thread Jonathan Eagles (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16502124#comment-16502124
 ] 

Jonathan Eagles commented on TEZ-3950:
--

Attaching a patch that helps the first message loop complete first to induce 
the test failure for TestMockDAGAppMaster.testInternalPreemption

> Preempted task attempts intermittently marked as FAILED instead of KILLED
> -
>
> Key: TEZ-3950
> URL: https://issues.apache.org/jira/browse/TEZ-3950
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3950.fail.patch
>
>
> TestMockDAGAppMaster.testInternalPreemption intermittently fails with 
> expected: but was:
> Crux of the matter is TaskSchedulerManager sends two events
> - 
> TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
>  AMContainerStopRequest -> TA_CONTAINER_TERMINATING
> - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM
> In order to kill a task attempt correctly the second message loop must 
> complete first. The first path is longer so the second message loop completes 
> almost always first. When the first message loop completes first, then the 
> task attempt is marked as FAILED and not KILLED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED

2018-06-05 Thread Jonathan Eagles (JIRA)
Jonathan Eagles created TEZ-3950:


 Summary: Preempted task attempts intermittently marked as FAILED 
instead of KILLED
 Key: TEZ-3950
 URL: https://issues.apache.org/jira/browse/TEZ-3950
 Project: Apache Tez
  Issue Type: Bug
Affects Versions: 0.9.2, 0.10.0
Reporter: Jonathan Eagles
 Attachments: TEZ-3950.fail.patch

TestMockDAGAppMaster.testInternalPreemption intermittently fails with 
expected: but was:


Crux of the matter is TaskSchedulerManager sends two events

- 
TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
 AMContainerStopRequest -> TA_CONTAINER_TERMINATING
- AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM

In order to kill a task attempt correctly the second message loop must complete 
first. The first path is longer so the second message loop completes almost 
always first. When the first message loop completes first, then the task 
attempt is marked as FAILED and not KILLED.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3950) Preempted task attempts intermittently marked as FAILED instead of KILLED

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3950:
-
Attachment: TEZ-3950.fail.patch

> Preempted task attempts intermittently marked as FAILED instead of KILLED
> -
>
> Key: TEZ-3950
> URL: https://issues.apache.org/jira/browse/TEZ-3950
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.9.2, 0.10.0
>Reporter: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3950.fail.patch
>
>
> TestMockDAGAppMaster.testInternalPreemption intermittently fails with 
> expected: but was:
> Crux of the matter is TaskSchedulerManager sends two events
> - 
> TaskScheduler#deallocatedContainer->TaskSchedulerManager#containerBeingReleased->Sends
>  AMContainerStopRequest -> TA_CONTAINER_TERMINATING
> - AMContainerEventCompleted -> TA_CONTAINER_TERMINATED_BY_SYSTEM
> In order to kill a task attempt correctly the second message loop must 
> complete first. The first path is longer so the second message loop completes 
> almost always first. When the first message loop completes first, then the 
> task attempt is marked as FAILED and not KILLED.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles resolved TEZ-3005.
--
Resolution: Cannot Reproduce

> TestMockDAGAppMaster.testInternalPreemption fails
> -
>
> Key: TEZ-3005
> URL: https://issues.apache.org/jira/browse/TEZ-3005
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Priority: Major
>
> {code}
> testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster)  Time 
> elapsed: 0.458 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211)
> {code}
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console
> \cc [~bikassaha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3005:
-
Attachment: TEZ-3005.fail.patch

> TestMockDAGAppMaster.testInternalPreemption fails
> -
>
> Key: TEZ-3005
> URL: https://issues.apache.org/jira/browse/TEZ-3005
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Priority: Major
>
> {code}
> testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster)  Time 
> elapsed: 0.458 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211)
> {code}
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console
> \cc [~bikassaha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3005:
-
Attachment: (was: TEZ-3005.fail.patch)

> TestMockDAGAppMaster.testInternalPreemption fails
> -
>
> Key: TEZ-3005
> URL: https://issues.apache.org/jira/browse/TEZ-3005
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Priority: Major
>
> {code}
> testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster)  Time 
> elapsed: 0.458 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211)
> {code}
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console
> \cc [~bikassaha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Reopened] (TEZ-3005) TestMockDAGAppMaster.testInternalPreemption fails

2018-06-05 Thread Jonathan Eagles (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3005?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles reopened TEZ-3005:
--

> TestMockDAGAppMaster.testInternalPreemption fails
> -
>
> Key: TEZ-3005
> URL: https://issues.apache.org/jira/browse/TEZ-3005
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Jeff Zhang
>Priority: Major
>
> {code}
> testInternalPreemption(org.apache.tez.dag.app.TestMockDAGAppMaster)  Time 
> elapsed: 0.458 sec  <<< FAILURE!
> java.lang.AssertionError: expected: but was:
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:743)
>   at org.junit.Assert.assertEquals(Assert.java:118)
>   at org.junit.Assert.assertEquals(Assert.java:144)
>   at 
> org.apache.tez.dag.app.TestMockDAGAppMaster.testInternalPreemption(TestMockDAGAppMaster.java:211)
> {code}
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/226/console
> \cc [~bikassaha]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3938) Task attempts failing due to not making progress

2018-06-05 Thread Kuhu Shukla (JIRA)


[ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16501977#comment-16501977
 ] 

Kuhu Shukla commented on TEZ-3938:
--

bq. Consider a MockClock instead of a SytemClock and then incrementTime instead 
of doing an actual sleep
Done.
bq. Remove unnecessary if failed event check. With this change my understanding 
is the task attempt will always enter the submitted state.
Made changes to handle the fail progress event (as it is unexpected) and just 
check the final state.
bq. The status update check now checks to see if it is initialized before 
failing due to lack of progress, but there is no test to prove status update 
before submitted transition works.
Based on the state machine, task init followed by a status update is not 
possible. I have no added a test to check for it for this reason.

Thank you for the review comments [~jeagles]. Appreciate  further comments post 
pre-commit.

The test failures from the earlier precommit are not related to this fix.

> Task attempts failing due to not making progress
> 
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TEZ-3938) Task attempts failing due to not making progress

2018-06-05 Thread Kuhu Shukla (JIRA)


 [ 
https://issues.apache.org/jira/browse/TEZ-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuhu Shukla updated TEZ-3938:
-
Attachment: TEZ-3938.002.patch

> Task attempts failing due to not making progress
> 
>
> Key: TEZ-3938
> URL: https://issues.apache.org/jira/browse/TEZ-3938
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Kuhu Shukla
>Priority: Major
> Attachments: TEZ-3938.001.patch, TEZ-3938.002.patch
>
>
> Last progress time is initialized at TaskAttemptImpl object creation. 
> Heartbeats can be sent over the umbilical as soon as the container is 
> assigned an attempt. If the container assignment takes longer than the task 
> progress timeout, we can timeout the task on the first heartbeat.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)