[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377310#comment-14377310
 ] 

Hitesh Shah commented on TEZ-2204:
--

Comments:

{code}
// don't handle events if DAGAppMaster is in the state of STOPPED,
720   // otherwise there may be dead-lock happen.  TEZ-2204
721   if (DAGAppMaster.this.getServiceState() == STATE.STOPPED) {
722 return;
723   }
{code}

Can you add a log message to identify what events are being received after the 
AM is stopped? 

+1 after the above comment is addressed. 

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
 TEZ-2204-4.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377018#comment-14377018
 ] 

Jeff Zhang commented on TEZ-2204:
-

Upload new patch (exclude the findbugs warning )

[~hitesh] [~bikassaha] Please help review it.

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-23 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377074#comment-14377074
 ] 

Hadoop QA commented on TEZ-2204:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12706785/TEZ-2204-4.patch
  against master revision 6d0b10a.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/333//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/333//console

This message is automatically generated.

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch, TEZ-2204-3.patch, 
 TEZ-2204-4.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-20 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371062#comment-14371062
 ] 

Jeff Zhang commented on TEZ-2204:
-

Upload patch. [~hitesh] [~bikassaha] Please help review it.

2 potential dead lock:
* Related to YARN-2917. Tez's AsyncDispatcher doesn't integrate its patch.
* Deadlock in DAGAppMaster. method DAGAppMaster::handle  
DAGAppMaster:stopService.  While stopService is called, it would stop the 
AsyncDispatcher, while AsyncDispatcher will drain its events which may call 
DAGAppMaster::handle.  And method handle()  stopService both has the 
synchronized keyword.

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-20 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371273#comment-14371273
 ] 

Jeff Zhang commented on TEZ-2204:
-

The findbug issue should be OK.

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371255#comment-14371255
 ] 

Hadoop QA commented on TEZ-2204:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12705906/TEZ-2204-2.patch
  against master revision 9b845f2.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:red}-1 findbugs{color}.  The patch appears to introduce 1 new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-TEZ-Build/319//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-TEZ-Build/319//artifact/patchprocess/newPatchFindbugsWarningstez-common.html
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/319//console

This message is automatically generated.

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang
 Attachments: TEZ-2204-1.patch, TEZ-2204-2.patch


 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-19 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369378#comment-14369378
 ] 

Jeff Zhang commented on TEZ-2204:
-

Also found another deal lock in DAGAppMaster. 

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang

 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-18 Thread Jeff Zhang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14368377#comment-14368377
 ] 

Jeff Zhang commented on TEZ-2204:
-

It is may be an issue related to YARN-2917. Because tez has its own 
AsyncDispatcher, but hasn't include of the patch of YARN-2917

Copy the jstack
{code}
Thread-1 prio=5 tid=0x7f9d13011800 nid=0xe507 in Object.wait() 
[0x000117559000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x0007fed1c360 (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1281)
- locked 0x0007fed1c360 (a java.lang.Thread)
at java.lang.Thread.join(Thread.java:1355)
at 
org.apache.tez.common.AsyncDispatcher.serviceStop(AsyncDispatcher.java:162)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x0007fed61000 (a java.lang.Object)
at 
org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at 
org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at 
org.apache.tez.dag.app.DAGAppMaster.stopServices(DAGAppMaster.java:1539)
at 
org.apache.tez.dag.app.DAGAppMaster.serviceStop(DAGAppMaster.java:1674)
- locked 0x0007fed0dc50 (a org.apache.tez.dag.app.DAGAppMaster)
at 
org.apache.hadoop.service.AbstractService.stop(AbstractService.java:221)
- locked 0x0007fed0de80 (a java.lang.Object)
at 
org.apache.tez.dag.app.DAGAppMaster$DAGAppMasterShutdownHook.run(DAGAppMaster.java:1940)
at 
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)

   Locked ownable synchronizers:
- None

App Shared Pool - #1 daemon prio=5 tid=0x7f9d13e60800 nid=0xdd03 in 
Object.wait() [0x00011714c000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0x0007ff1193b8 (a 
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1281)
- locked 0x0007ff1193b8 (a 
org.apache.hadoop.util.ShutdownHookManager$1)
at java.lang.Thread.join(Thread.java:1355)
at 
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
at 
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
at java.lang.Shutdown.runHooks(Shutdown.java:123)
at java.lang.Shutdown.sequence(Shutdown.java:167)
at java.lang.Shutdown.exit(Shutdown.java:212)
- locked 0x0007ff111ec8 (a java.lang.Class for java.lang.Shutdown)
at java.lang.Runtime.exit(Runtime.java:109)
at java.lang.System.exit(System.java:962)
at 
org.apache.tez.test.TestAMRecovery$ControlledImmediateStartVertexManager.onSourceTaskCompleted(TestAMRecovery.java:601)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventSourceTaskCompleted.invoke(VertexManager.java:525)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:580)
- locked 0x0007fb82fac8 (a 
org.apache.tez.dag.app.dag.impl.VertexManager)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:1)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:575)
at 
org.apache.tez.dag.app.dag.event.CallableEvent.call(CallableEvent.java:27)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

   Locked ownable synchronizers:
- 0x0007fbc182d8 (a 
java.util.concurrent.ThreadPoolExecutor$Worker)
{code}

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang

 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2204) TestAMRecovery increasingly flaky on jenkins builds.

2015-03-17 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365908#comment-14365908
 ] 

Hitesh Shah commented on TEZ-2204:
--

Latest failure: https://builds.apache.org/job/Tez-Build/941

 TestAMRecovery increasingly flaky on jenkins builds. 
 -

 Key: TEZ-2204
 URL: https://issues.apache.org/jira/browse/TEZ-2204
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah
Assignee: Jeff Zhang

 In recent pre-commit builds and daily builds, there seem to have been some 
 occurrences of TestAMRecovery failing or timing out. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)