[jira] [Commented] (TEZ-3087) Tez UI 2: Log links must added in task & attempt details page

2016-03-02 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177387#comment-15177387
 ] 

Sreenath Somarajapuram commented on TEZ-3087:
-

[~hitesh] Task attempt redirection doesn't work as expected. Once thats 
corrected, the log links can be added flawlessly.

> Tez UI 2: Log links must added in task & attempt details page
> -
>
> Key: TEZ-3087
> URL: https://issues.apache.org/jira/browse/TEZ-3087
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>
> - UI1 implementation takes you to the container logs and doesn't works as 
> expected
> - Ensure that the functionality is inline with TEZ-3101



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3087) Tez UI 2: Log links must added in task & attempt details page

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15177163#comment-15177163
 ] 

Hitesh Shah commented on TEZ-3087:
--

[~Sreenath] can you please clarify what is the exact gap here?   

> Tez UI 2: Log links must added in task & attempt details page
> -
>
> Key: TEZ-3087
> URL: https://issues.apache.org/jira/browse/TEZ-3087
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>
> - UI1 implementation takes you to the container logs and doesn't works as 
> expected
> - Ensure that the functionality is inline with TEZ-3101



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176864#comment-15176864
 ] 

TezQA commented on TEZ-3156:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791051/TEZ-3156.2.patch
  against master revision 3f5a7f3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1538//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1538//console

This message is automatically generated.

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch, TEZ-3156.2.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> 

Failed: TEZ-3156 PreCommit Build #1538

2016-03-02 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3156
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1538/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4401 lines...]
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791051/TEZ-3156.2.patch
  against master revision 3f5a7f3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1538//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1538//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
a322ee63de9c827ee0c7a32f3d03b0c0e777b96c logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
7 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs

Error Message:
expected: but was:

Stack Trace:
java.lang.AssertionError: expected: but was:
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.failNotEquals(Assert.java:743)
at org.junit.Assert.assertEquals(Assert.java:118)
at org.junit.Assert.assertEquals(Assert.java:144)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:141)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testRandomFailingInputs(TestFaultTolerance.java:763)


FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:261)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 

[jira] [Updated] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3156:
-
Attachment: TEZ-3156.2.patch

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch, TEZ-3156.2.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176687#comment-15176687
 ] 

TezQA commented on TEZ-3156:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791046/TEZ-3156.1.patch
  against master revision 3f5a7f3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1537//console

This message is automatically generated.

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3156 PreCommit Build #1537

2016-03-02 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3156
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1537/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 86 lines...]
patching file 
tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientImpl.java
patching file 
tez-api/src/main/java/org/apache/tez/dag/api/client/DAGClientTimelineImpl.java
patching file 
tez-api/src/main/java/org/apache/tez/dag/api/client/rpc/DAGClientRPCImpl.java


==
==
Determining number of patched javac warnings.
==
==


/home/jenkins/tools/maven/latest/bin/mvn clean test -DskipTests -Ptest-patch > 
/home/jenkins/jenkins-slave/workspace/PreCommit-TEZ-Build/../patchprocess/patchJavacWarnings.txt
 2>&1




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791046/TEZ-3156.1.patch
  against master revision 3f5a7f3.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:red}-1 javac{color:red}.  The patch appears to cause the build to 
fail.

Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1537//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
4f9550982acd46c1be1ee2b93de350bfa224b485 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
ERROR: Step ‘Publish JUnit test result report’ failed: No test report files 
were found. Configuration error?
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
No tests ran.

[jira] [Comment Edited] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176648#comment-15176648
 ] 

Hitesh Shah edited comment on TEZ-3156 at 3/2/16 10:51 PM:
---

[~bikassaha] [~sseth] please review. 

This seems like a weird edge-case where the RM is restarted without a state 
store and it "forgets" the application. At this point, the client knows the app 
is in a running state and therefore is still trying to get in touch with the AM 
and expects the RM to either point it to a new AM or provide an app completed 
status. We did not handle the fact that the RM may not know about the app at 
this point. 

Introduced a new DAGClientInternal to cleanly handle AppNotFound without 
changing the public API ( as internal impl classes were re-using public facing 
DAGClient ). 

Tested manually. 


was (Author: hitesh):
[~bikassaha] [~sseth] please review. 

This seems like a weird edge-case where the RM is restarted without a state 
store and it "forgets" the application. At this point, the client is still 
trying to get in touch with the AM and expects the RM to either point it to a 
new AM or provide an app completed status. We did not handle the fact that the 
RM may not know about the app at this point. 

Introduced a new DAGClientInternal to cleanly handle AppNotFound without 
changing the public API ( as internal impl classes were re-using public facing 
DAGClient ). 

Tested manually. 

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at 

[jira] [Comment Edited] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176648#comment-15176648
 ] 

Hitesh Shah edited comment on TEZ-3156 at 3/2/16 10:50 PM:
---

[~bikassaha] [~sseth] please review. 

This seems like a weird edge-case where the RM is restarted without a state 
store and it "forgets" the application. At this point, the client is still 
trying to get in touch with the AM and expects the RM to either point it to a 
new AM or provide an app completed status. We did not handle the fact that the 
RM may not know about the app at this point. 

Introduced a new DAGClientInternal to cleanly handle AppNotFound without 
changing the public API ( as internal impl classes were re-using public facing 
DAGClient ). 

Tested manually. 


was (Author: hitesh):
[~bikassaha] [~sseth] please review. 

This seems like a weird edge-case where the RM is restarted without a state 
store and it "forgets" the application. At this point, the client is still 
trying to get in touch with the AM and expects the RM to either point it to a 
new AM or provide an app completed status. 

Introduced a new DAGClientInternal to cleanly handle AppNotFound without 
changing the public API ( as internal impl classes were re-using public facing 
DAGClient ). 

Tested manually. 

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at 

[jira] [Updated] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3156:
-
Attachment: TEZ-3156.1.patch

[~bikassaha] [~sseth] please review. 

This seems like a weird edge-case where the RM is restarted without a state 
store and it "forgets" the application. At this point, the client is still 
trying to get in touch with the AM and expects the RM to either point it to a 
new AM or provide an app completed status. 

Introduced a new DAGClientInternal to cleanly handle AppNotFound without 
changing the public API ( as internal impl classes were re-using public facing 
DAGClient ). 

Tested manually. 

> Tez client keep trying to talk to RM if RM does not know application
> 
>
> Key: TEZ-3156
> URL: https://issues.apache.org/jira/browse/TEZ-3156
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Yesha Vora
>Assignee: Hitesh Shah
> Attachments: TEZ-3156.1.patch
>
>
> Scenario : 
> * Set RM/NM recovery to false.
> {code}
>  
>   yarn.resourcemanager.recovery.enabled
>   false
> 
>  
>   yarn.nodemanager.recovery.enabled
>   false
> 
> {code}
> * Start Mrrsleep application (application_1456883132071_0001)
> {code}
> hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
> {code}
> * When application is running, restart RM
> Since recovery is disabled and RM is restarted, it forgets mrrsleep 
> application. At this point, mrrsleep application's tez-client keep trying to 
> communicate with RM and loads RM with below exception. 
> {code}
> 2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 18 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500250 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> 2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
> handler 27 on 8050, call 
> org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
> from xx.xx.xx.xxx:36191 Call#500251 Retry#0
> org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
> with id 'application_1456883132071_0001' doesn't exist in RM.
>   at 
> org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
>   at 
> org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
>   at 
> org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
>   at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
>   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3156) Tez client keep trying to talk to RM if RM does not know application

2016-03-02 Thread Yesha Vora (JIRA)
Yesha Vora created TEZ-3156:
---

 Summary: Tez client keep trying to talk to RM if RM does not know 
application
 Key: TEZ-3156
 URL: https://issues.apache.org/jira/browse/TEZ-3156
 Project: Apache Tez
  Issue Type: Bug
Reporter: Yesha Vora
Assignee: Hitesh Shah


Scenario : 
* Set RM/NM recovery to false.
{code}
 
  yarn.resourcemanager.recovery.enabled
  false

 
  yarn.nodemanager.recovery.enabled
  false

{code}
* Start Mrrsleep application (application_1456883132071_0001)
{code}
hadoop jar tez-tests-*.jar mrrsleep -m 1 -r 1 -mt 100 -rt 1000
{code}
* When application is running, restart RM

Since recovery is disabled and RM is restarted, it forgets mrrsleep 
application. At this point, mrrsleep application's tez-client keep trying to 
communicate with RM and loads RM with below exception. 

{code}
2016-03-02 02:01:24,708 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
handler 18 on 8050, call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
from xx.xx.xx.xxx:36191 Call#500250 Retry#0
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1456883132071_0001' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
2016-03-02 02:01:24,709 INFO  ipc.Server (Server.java:run(2172)) - IPC Server 
handler 27 on 8050, call 
org.apache.hadoop.yarn.api.ApplicationClientProtocolPB.getApplicationReport 
from xx.xx.xx.xxx:36191 Call#500251 Retry#0
org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
with id 'application_1456883132071_0001' doesn't exist in RM.
at 
org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:328)
at 
org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:175)
at 
org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:417)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)
{code}





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176546#comment-15176546
 ] 

Hitesh Shah commented on TEZ-2863:
--

In this scenario, I think nodeId and nodeHttpAddress may be needed as currently 
we probably are not restoring state for known nodes/containers. That may happen 
later when we do a better job at work-preserving restarts

> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : localhost:57381
> 2015-10-01 10:03:56,741 [INFO] [ContainerLauncher #0] 
> |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: 
> containerId=container_1443711816411_0001_01_02, launchTime=1443711836741
> 2015-10-01 10:03:57,647 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Allocated container 
> completed:container_1443711816411_0001_01_02 last allocated to task: 
> attempt_1443711816411_0001_1_00_00_0
> 2015-10-01 10:03:57,648 [INFO] [Dispatcher thread {Central}] 
> |container.AMContainerImpl|: Container container_1443711816411_0001_01_02 
> exited with diagnostics set to Container failed, exitCode=1. Exception from 
> container-launch.
> Container id: container_1443711816411_0001_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
>   at 
> 

[jira] [Comment Edited] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176541#comment-15176541
 ] 

Hitesh Shah edited comment on TEZ-2863 at 3/2/16 9:47 PM:
--

[~jeagles] The same event object is used for both recovery and history but the 
data written out is different. The general approach usually has been that if 
something is needed for recovery but can be constructed from other data, it 
does not need to be written to recovery. For example, at the task attempt 
level, vertexname and the log urls would be re-constructible if needed. 

bq. Also, I don't have an exact understanding for what fails by not having 
these correctly filled out, since this patch works in practice for filling out 
the in progress and completed urls. Must be related to recovery. Can you help 
to explain?

The main thing is that after recovering/restoring state, the object should not 
end up in a situation where it expects a particular field to be non-null but 
ends up being null. 


was (Author: hitesh):
[~jeagles] The same event object is used for both recovery and history but the 
data written out is different. The general approach usually has been that if 
something is needed for recovery but can be constructed from other data, it 
does not need to be written to recovery. For example, at the task attempt 
level, vertexname and the log urls would be re-constructible if needed. 

bq. Also, I don't have an exact understanding for what fails by not having 
these correctly filled out, since this patch works in practice for filling out 
the in progress and completed urls. Must be related to recovery. Can you help 
to explain?

The main thing is that after recovering state, the object should not end up in 
a situation where it expects a particular field to be non-null but ends up 
being null. 

> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 

[jira] [Comment Edited] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176541#comment-15176541
 ] 

Hitesh Shah edited comment on TEZ-2863 at 3/2/16 9:46 PM:
--

[~jeagles] The same event object is used for both recovery and history but the 
data written out is different. The general approach usually has been that if 
something is needed for recovery but can be constructed from other data, it 
does not need to be written to recovery. For example, at the task attempt 
level, vertexname and the log urls would be re-constructible if needed. 

bq. Also, I don't have an exact understanding for what fails by not having 
these correctly filled out, since this patch works in practice for filling out 
the in progress and completed urls. Must be related to recovery. Can you help 
to explain?

The main thing is that after recovering state, the object should not end up in 
a situation where it expects a particular field to be non-null but ends up 
being null. 


was (Author: hitesh):
[~jeagles] The same event object is used for both recovery and history but the 
data written out is different. The general approach usually has been that if 
something is needed for recovery but can be constructed from other data, it 
does not need to be written to recovery. For example, at the task attempt 
level, vertexname and the log urls would be re-constructible if needed. 

> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : localhost:57381
> 2015-10-01 10:03:56,741 [INFO] [ContainerLauncher #0] 
> |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: 
> containerId=container_1443711816411_0001_01_02, launchTime=1443711836741
> 2015-10-01 10:03:57,647 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Allocated container 
> 

[jira] [Commented] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176541#comment-15176541
 ] 

Hitesh Shah commented on TEZ-2863:
--

[~jeagles] The same event object is used for both recovery and history but the 
data written out is different. The general approach usually has been that if 
something is needed for recovery but can be constructed from other data, it 
does not need to be written to recovery. For example, at the task attempt 
level, vertexname and the log urls would be re-constructible if needed. 

> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : localhost:57381
> 2015-10-01 10:03:56,741 [INFO] [ContainerLauncher #0] 
> |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: 
> containerId=container_1443711816411_0001_01_02, launchTime=1443711836741
> 2015-10-01 10:03:57,647 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Allocated container 
> completed:container_1443711816411_0001_01_02 last allocated to task: 
> attempt_1443711816411_0001_1_00_00_0
> 2015-10-01 10:03:57,648 [INFO] [Dispatcher thread {Central}] 
> |container.AMContainerImpl|: Container container_1443711816411_0001_01_02 
> exited with diagnostics set to Container failed, exitCode=1. Exception from 
> container-launch.
> Container id: container_1443711816411_0001_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
>   at 
> org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
>   at 
> 

[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176511#comment-15176511
 ] 

TezQA commented on TEZ-3115:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1536//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1536//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Failed: TEZ-3115 PreCommit Build #1536

2016-03-02 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3115
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1536/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4393 lines...]
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
[ERROR] [Help 2] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR] 
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR]   mvn  -rf :tez-tests
[INFO] Build failures were ignored.




{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 core tests{color}.  The patch failed these unit tests in :
   org.apache.tez.test.TestFaultTolerance

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1536//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1536//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
3a6b75566c8b65a2a0d592c99fa2a2bd3d8ba3c9 logged out


==
==
Finished build.
==
==


Build step 'Execute shell' marked build as failure
Archiving artifacts
[description-setter] Could not determine description.
Recording test results
Email was triggered for: Failure - Any
Sending email for trigger: Failure - Any



###
## FAILED TESTS (if any) 
##
6 tests failed.
FAILED:  org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testBasicInputFailureWithExit(TestFaultTolerance.java:261)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:124)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:120)
at 
org.apache.tez.test.TestFaultTolerance.testInputFailureRerunCanSendOutputToTwoDownstreamVertices(TestFaultTolerance.java:703)


FAILED:  
org.apache.tez.test.TestFaultTolerance.testMultipleInputFailureWithoutExit

Error Message:
TezSession has already shutdown. No cluster diagnostics found.

Stack Trace:
org.apache.tez.dag.api.SessionNotRunning: TezSession has already shutdown. No 
cluster diagnostics found.
at org.apache.tez.client.TezClient.waitTillReady(TezClient.java:784)
at 
org.apache.tez.test.TestFaultTolerance.runDAGAndVerify(TestFaultTolerance.java:129)
at 

[jira] [Commented] (TEZ-3077) TezClient.waitTillReady should support timeout

2016-03-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176507#comment-15176507
 ] 

Siddharth Seth commented on TEZ-3077:
-

Comments on the patch, and apologies for the late review.
- We should not change the behaviour of the existing waitTillReady API - that's 
a backward incompatible change
- Instead of defining a configuration parameter, a new method 
waitTillReady(long timeout, TimeUnit timeUnit) should be sufficient.
- Please ignore my comment earlier about throwing a TimeoutException - a return 
status (boolean) would be more consistent with the way such time based APIs 
work in Java libraries. e.g. 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html#await(long,%20java.util.concurrent.TimeUnit),
 or 
https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/BlockingQueue.html#poll(long,%20java.util.concurrent.TimeUnit)
- It should be possible to make use of this new method in the existing 
waitTillReady with an infinite timeout (0)
- Also, the sleep interval in this new method would need to be modified to 
match with the timeout specified in the api cal. i.e. min(SLEEP_FOR_READY, 
sleepTimeRemaining)

> TezClient.waitTillReady should support timeout
> --
>
> Key: TEZ-3077
> URL: https://issues.apache.org/jira/browse/TEZ-3077
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Assignee: Kuhu Shukla
> Attachments: TEZ-3077.001.patch
>
>
> Also preWarm.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-2863) Container, node, and logs not available in UI for tasks that fail to launch

2016-03-02 Thread Jonathan Eagles (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-2863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176475#comment-15176475
 ] 

Jonathan Eagles commented on TEZ-2863:
--

[~zjffdu], When I look at TaskAttemptStartedProto (before patch version), it is 
missing vertexName, inProgressLogsUrl, completedLogsUrl, and nodeHttpAddress. 
TaskAttemptFinishedProto is missing vertexName (before patch version) and 
containerId, nodeId, inProgressLogsUrl, completedLogsUrl, nodeHttpAddress after 
this patch. Do I need the ones missing before the patch as well. Also, I don't 
have an exact understanding for what fails by not having these correctly filled 
out, since this patch works in practice for filling out the in progress and 
completed urls. Must be related to recovery. Can you help to explain?

> Container, node, and logs not available in UI for tasks that fail to launch
> ---
>
> Key: TEZ-2863
> URL: https://issues.apache.org/jira/browse/TEZ-2863
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Jonathan Eagles
>Assignee: Jonathan Eagles
> Attachments: TEZ-2863.1.patch, TEZ-2863.2-branch-0.7.patch, 
> TEZ-2863.2.patch, TEZ-2863.3-branch-0.7.patch, 
> TEZ-2863.3-branch-0.7.patch.addendum, TEZ-2863.3.patch, 
> TEZ-2863.3.patch.addendum, TEZ-2863.4-branch-0.7.patch, TEZ-2863.4.patch
>
>
> While running a sample tez job
> {noformat}
> tez-examples-*.jar orderedwordcount -Dtez.task.resource.memory.mb=1 
> -Dtez.task.launch.cmd-opts="-Xmx1m" input output
> {noformat}
> It was noticed that the Tez UI task attempt 
> http://timelineserverhost:port/ws/v1/timeline/TEZ_TASK_ATTEMPT_ID/attempt_id 
> was missing the TEZ_ATTEMPT_STARTED event
> {noformat}
> 2015-10-01 10:03:55,344 [INFO] [Dispatcher thread {Central}] 
> |history.HistoryEventHandler|: 
> [HISTORY][DAG:dag_1443711816411_0001_1][Event:TASK_STARTED]: 
> vertexName=Tokenizer, taskId=task_1443711816411_0001_1_00_00, 
> scheduledTime=1443711835342, launchTime=1443711835342
> 2015-10-01 10:03:55,346 [INFO] [Dispatcher thread {Central}] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,356 [INFO] [TaskSchedulerEventHandlerThread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:55,364 [INFO] [TaskSchedulerEventHandlerThread] 
> |rm.YarnTaskSchedulerService|: Allocation request for task: 
> attempt_1443711816411_0001_1_00_00_0 with request: Capability[ vCores:1>]Priority[2] host: localhost rack: null
> 2015-10-01 10:03:56,639 [INFO] [AMRM Heartbeater thread] 
> |impl.AMRMClientImpl|: Received new token for : localhost:57381
> 2015-10-01 10:03:56,646 [INFO] [AMRM Callback Handler Thread] 
> |util.RackResolver|: Resolved localhost to /default-rack
> 2015-10-01 10:03:56,648 [INFO] [DelayedContainerManager] 
> |rm.YarnTaskSchedulerService|: Assigning container to task: 
> containerId=container_1443711816411_0001_01_02, 
> task=attempt_1443711816411_0001_1_00_00_0, containerHost=localhost:57381, 
> containerPriority= 2, containerResources=, 
> localityMatchType=NodeLocal, matchedLocation=localhost, 
> honorLocalityFlags=true, reusedContainer=false, delayedContainers=0
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,649 [INFO] [DelayedContainerManager] |util.RackResolver|: 
> Resolved localhost to /default-rack
> 2015-10-01 10:03:56,686 [INFO] [TaskSchedulerAppCaller #0] 
> |node.AMNodeTracker|: Adding new node: localhost:57381
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |launcher.ContainerLauncherImpl|: Launching 
> container_1443711816411_0001_01_02
> 2015-10-01 10:03:56,700 [INFO] [ContainerLauncher #0] 
> |impl.ContainerManagementProtocolProxy|: Opening proxy : localhost:57381
> 2015-10-01 10:03:56,741 [INFO] [ContainerLauncher #0] 
> |history.HistoryEventHandler|: [HISTORY][DAG:N/A][Event:CONTAINER_LAUNCHED]: 
> containerId=container_1443711816411_0001_01_02, launchTime=1443711836741
> 2015-10-01 10:03:57,647 [INFO] [AMRM Callback Handler Thread] 
> |rm.YarnTaskSchedulerService|: Allocated container 
> completed:container_1443711816411_0001_01_02 last allocated to task: 
> attempt_1443711816411_0001_1_00_00_0
> 2015-10-01 10:03:57,648 [INFO] [Dispatcher thread {Central}] 
> |container.AMContainerImpl|: Container container_1443711816411_0001_01_02 
> exited with diagnostics set to Container failed, exitCode=1. Exception from 
> container-launch.
> Container id: container_1443711816411_0001_01_02
> Exit code: 1
> Stack trace: ExitCodeException exitCode=1: 
>   at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
>   at org.apache.hadoop.util.Shell.run(Shell.java:455)
>   at 
> 

[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread TezQA (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176472#comment-15176472
 ] 

TezQA commented on TEZ-3115:


{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1535//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1535//console

This message is automatically generated.

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Fix For: 0.7.1, 0.8.3
>
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Success: TEZ-3115 PreCommit Build #1535

2016-03-02 Thread Apache Jenkins Server
Jira: https://issues.apache.org/jira/browse/TEZ-3115
Build: https://builds.apache.org/job/PreCommit-TEZ-Build/1535/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 4565 lines...]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 01:00 h
[INFO] Finished at: 2016-03-02T20:55:25+00:00
[INFO] Final Memory: 75M/1259M
[INFO] 




{color:green}+1 overall{color}.  Here are the results of testing the latest 
attachment
  http://issues.apache.org/jira/secure/attachment/12791002/TEZ-3115.4.patch
  against master revision ac0fd8b.

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified test files.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  There were no new javadoc warning messages.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 3.0.1) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-TEZ-Build/1535//testReport/
Console output: https://builds.apache.org/job/PreCommit-TEZ-Build/1535//console

This message is automatically generated.


==
==
Adding comment to Jira.
==
==


Comment added.
0e18d022f2c67b530b7eeb2ac9a0a245031c2885 logged out


==
==
Finished build.
==
==


Archiving artifacts
Compressed 2.47 MB of artifacts by 11.4% relative to #1524
[description-setter] Description set: TEZ-3115
Recording test results
Email was triggered for: Success
Sending email for trigger: Success



###
## FAILED TESTS (if any) 
##
All tests passed

[jira] [Updated] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3155:
-
Assignee: Zhiyuan Yang

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>Assignee: Zhiyuan Yang
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Hitesh Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176314#comment-15176314
 ] 

Hitesh Shah commented on TEZ-3155:
--

Thanks [~aplusplus]. Added you to the contributor list - you should now be able 
to pick up jiras 

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Zhiyuan Yang (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176309#comment-15176309
 ] 

Zhiyuan Yang commented on TEZ-3155:
---

Would you mind assigning this task to me? I'm willing to take this task.

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread Siddharth Seth (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176305#comment-15176305
 ] 

Siddharth Seth commented on TEZ-3115:
-

+1. Thanks [~jeagles]
Wasn't aware of the improvements to interning in Java7 etc. I supposed either 
can be used in that case..

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3155:
-
Description: Currently, dag submissions fail if the dag plan exceeds the 
hadoop ipc limits. One option would be to fall back to local resources if the 
dag plan is too large.   (was: Currently, dag submissions fail if the dag plan 
exceeds the hadoop ipc limits. )

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. One option would be to fall back to local resources if the dag plan 
> is too large. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Hitesh Shah (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hitesh Shah updated TEZ-3155:
-
Description: Currently, dag submissions fail if the dag plan exceeds the 
hadoop ipc limits. 

> Support a way to submit DAGs to a session where the DAG plan exceeds hadoop 
> ipc limits 
> ---
>
> Key: TEZ-3155
> URL: https://issues.apache.org/jira/browse/TEZ-3155
> Project: Apache Tez
>  Issue Type: Bug
>Reporter: Hitesh Shah
>
> Currently, dag submissions fail if the dag plan exceeds the hadoop ipc 
> limits. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TEZ-3155) Support a way to submit DAGs to a session where the DAG plan exceeds hadoop ipc limits

2016-03-02 Thread Hitesh Shah (JIRA)
Hitesh Shah created TEZ-3155:


 Summary: Support a way to submit DAGs to a session where the DAG 
plan exceeds hadoop ipc limits 
 Key: TEZ-3155
 URL: https://issues.apache.org/jira/browse/TEZ-3155
 Project: Apache Tez
  Issue Type: Bug
Reporter: Hitesh Shah






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3115:
-
Attachment: TEZ-3115.4.patch

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3115:
-
Attachment: (was: TEZ-3115.4.patch)

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3115) Shuffle string handling adds significant memory overhead

2016-03-02 Thread Jonathan Eagles (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Eagles updated TEZ-3115:
-
Attachment: TEZ-3115.4.patch
TEZ-3115.4-branch-0.7.patch

Thanks for the review, [~sseth]. Updated the patch for master and branch-0.7.

One thing to note is the difference between StringInterner.weakIntern vs 
String.intern is much smaller since jdk 7 when interned string were placed in 
the heap as opposed to perm gen.  Of course it is better to remain consistent 
in the project. Good catch.

http://java-performance.info/string-intern-in-java-6-7-8/

> Shuffle string handling adds significant memory overhead
> 
>
> Key: TEZ-3115
> URL: https://issues.apache.org/jira/browse/TEZ-3115
> Project: Apache Tez
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Jason Lowe
>Assignee: Jonathan Eagles
> Attachments: TEZ-3115.1.patch, TEZ-3115.2.patch, 
> TEZ-3115.3-branch-0.7.patch, TEZ-3115.3.patch, TEZ-3115.4-branch-0.7.patch, 
> TEZ-3115.4.patch
>
>
> While investigating the OOM heap dump from TEZ-3114 I noticed that the 
> ShuffleManager and other shuffle-related objects were holding onto many 
> strings that added up to over a hundred megabytes of memory.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3152) Tez UI 2: Build fails when run by multiple users or when node_modules is old

2016-03-02 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-3152:

Summary: Tez UI 2: Build fails when run by multiple users or when 
node_modules is old  (was: Tez UI 2: Build fails when run by multiple users on 
the same system)

> Tez UI 2: Build fails when run by multiple users or when node_modules is old
> 
>
> Key: TEZ-3152
> URL: https://issues.apache.org/jira/browse/TEZ-3152
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3152.wip.1.patch
>
>
> We have two separate cases in which the build fails:
> # The build uses Nodejs v0.12.2. When the webapp folder already have a 
> node_modules folder installed by an old version of node, build fails.
> # async-disk-cache package creates files in tmpDir (/tmp). When run from a 
> different user, because of user permission on there files, the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TEZ-3152) Tez UI 2: Build fails when run by multiple users on the same system

2016-03-02 Thread Sreenath Somarajapuram (JIRA)

 [ 
https://issues.apache.org/jira/browse/TEZ-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sreenath Somarajapuram updated TEZ-3152:

Description: 
We have two separate cases in which the build fails:
# The build uses Nodejs v0.12.2. When the webapp folder already have a 
node_modules folder installed by an old version of node, build fails.
# async-disk-cache package creates files in tmpDir (/tmp). When run from a 
different user, because of user permission on there files, the build fails.

  was:- async-disk-cache package creates files in tmpDir (/tmp). When run from 
a different user, because of user permission on there files, the build fails.


> Tez UI 2: Build fails when run by multiple users on the same system
> ---
>
> Key: TEZ-3152
> URL: https://issues.apache.org/jira/browse/TEZ-3152
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3152.wip.1.patch
>
>
> We have two separate cases in which the build fails:
> # The build uses Nodejs v0.12.2. When the webapp folder already have a 
> node_modules folder installed by an old version of node, build fails.
> # async-disk-cache package creates files in tmpDir (/tmp). When run from a 
> different user, because of user permission on there files, the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TEZ-3152) Tez UI 2: Build fails when run by multiple users on the same system

2016-03-02 Thread Sreenath Somarajapuram (JIRA)

[ 
https://issues.apache.org/jira/browse/TEZ-3152?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175421#comment-15175421
 ] 

Sreenath Somarajapuram commented on TEZ-3152:
-

Thanks [~hitesh]
Had been through that. We are using the latest version as in the above link, 
and still we are facing the issue.
Current solution is to force the build to use a local ./tmp directory (that we 
already use) than the global /tmp directory.

> Tez UI 2: Build fails when run by multiple users on the same system
> ---
>
> Key: TEZ-3152
> URL: https://issues.apache.org/jira/browse/TEZ-3152
> Project: Apache Tez
>  Issue Type: Sub-task
>Reporter: Sreenath Somarajapuram
>Assignee: Sreenath Somarajapuram
> Attachments: TEZ-3152.wip.1.patch
>
>
> - async-disk-cache package creates files in tmpDir (/tmp). When run from a 
> different user, because of user permission on there files, the build fails.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)