[jira] [Commented] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436537#comment-16436537
 ] 

TezQA commented on TEZ-3914:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12918835/TEZ-3914.002.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestRecovery
  org.apache.tez.test.TestDAGRecovery
  org.apache.tez.test.TestAMRecovery
  
org.apache.tez.runtime.library.common.sort.impl.dflt.TestDefaultSorter
  
org.apache.tez.dag.history.events.TestHistoryEventsProtoConversion
  org.apache.tez.dag.app.TestRecoveryParser
  org.apache.tez.dag.app.dag.impl.TestDAGRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2756//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2756//console

This message is automatically generated.


> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3914 PreCommit Build #2756

2018-04-12 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3914
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2756/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 348.19 KB...]
[ERROR]   mvn  -rf :tez-runtime-library
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12918835/TEZ-3914.002.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestRecovery
  org.apache.tez.test.TestDAGRecovery
  org.apache.tez.test.TestAMRecovery
  
org.apache.tez.runtime.library.common.sort.impl.dflt.TestDefaultSorter
  
org.apache.tez.dag.history.events.TestHistoryEventsProtoConversion
  org.apache.tez.dag.app.TestRecoveryParser
  org.apache.tez.dag.app.dag.impl.TestDAGRecovery

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/2756//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2756//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
28 tests failed.
FAILED:  org.apache.tez.dag.app.TestRecoveryParser.testRecoveryData

Error Message:
GC overhead limit exceeded

Stack Trace:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:149)
at java.lang.StringCoding.decode(StringCoding.java:193)
at java.lang.String.(String.java:426)
at 
com.google.protobuf.LiteralByteString.toString(LiteralByteString.java:148)
at com.google.protobuf.ByteString.toStringUtf8(ByteString.java:572)
at 
com.google.protobuf.LazyStringArrayList.get(LazyStringArrayList.java:92)
at 
com.google.protobuf.LazyStringArrayList.get(LazyStringArrayList.java:64)
at java.util.AbstractList$Itr.next(AbstractList.java:358)
at 
com.google.protobuf.UnmodifiableLazyStringList$2.next(UnmodifiableLazyStringList.java:138)
at 
com.google.protobuf.UnmodifiableLazyStringList$2.next(UnmodifiableLazyStringList.java:128)
at java.util.AbstractCollection.addAll(AbstractCollection.java:343)
at java.util.HashSet.(HashSet.java:120)
at 
org.apache.tez.dag.api.DagTypeConverters.convertVertexLocationHintFromProto(DagTypeConverters.java:700)
at 
org.apache.tez.dag.history.events.VertexConfigurationDoneEvent.fromProto(VertexConfigurationDoneEvent.java:126)
at 
org.apache.tez.dag.history.events.VertexConfigurationDoneEvent.fromProtoStream(VertexConfigurationDoneEvent.java:168)
at 
org.apache.tez.dag.app.RecoveryParser.getNextEvent(RecoveryParser.java:342)
at 
org.apache.tez.dag.app.RecoveryParser.parseRecoveryData(RecoveryParser.java:756)
at 
org.apache.tez.dag.app.TestRecoveryParser.testRecoveryData(TestRecoveryParser.java:727)


FAILED:  
org.apache.tez.dag.app.TestRecoveryParser.testLastCorruptedRecoveryRecord

Error Message:
null

Stack Trace:
java.lang.AssertionError: null
at org.junit.Assert.fail(Assert.java:86)
at 

[jira] [Updated] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3914:
-
Attachment: TEZ-3914.002.patch

> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch, TEZ-3914.002.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16436385#comment-16436385
 ] 

TezQA commented on TEZ-3914:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12918828/TEZ-3914.001.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2755//console

This message is automatically generated.


> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Failed: TEZ-3914 PreCommit Build #2755

2018-04-12 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3914
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/2755/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 6.90 KB...]
patching file 
tez-dag/src/main/java/org/apache/tez/dag/history/events/VertexGroupCommitStartedEvent.java
patching file 
tez-dag/src/main/java/org/apache/tez/dag/history/events/VertexInitializedEvent.java
patching file 
tez-dag/src/main/java/org/apache/tez/dag/history/events/VertexStartedEvent.java
patching file 
tez-dag/src/main/java/org/apache/tez/dag/history/recovery/RecoveryService.java
patching file 
tez-dag/src/test/java/org/apache/tez/dag/app/TestRecoveryParser.java
patching file 
tez-dag/src/test/java/org/apache/tez/dag/history/events/TestHistoryEventsProtoConversion.java
patching file 
tez-tests/src/test/java/org/apache/tez/test/RecoveryServiceWithEventHandlingHook.java


==
==
Determining number of patched javac warnings.
==
==


/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch > 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt
 2>&1




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12918828/TEZ-3914.001.patch
  against master revision 871ea80.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/2755//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==




==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
ERROR: Step ?Publish JUnit test result report? failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Updated] (TEZ-3914) Recovering a large DAG hang job

2018-04-12 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3914:
-
Attachment: TEZ-3914.001.patch

> Recovering a large DAG hang job
> ---
>
> Key: TEZ-3914
> URL: https://issues.apache.org/jira/browse/TEZ-3914
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
>Priority: Major
> Attachments: TEZ-3914.001.patch
>
>
> Any failure to parse recovery event is ignore and treated as eof. Job can 
> hang since some task completions may be missed and shuffle will hang.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)