[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts

2015-02-10 Thread Kashish Jain (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14315545#comment-14315545
 ] 

Kashish Jain commented on SPARK-5613:
-

Thanks Patrick and Andrew

 YarnClientSchedulerBackend fails to get application report when yarn restarts
 -

 Key: SPARK-5613
 URL: https://issues.apache.org/jira/browse/SPARK-5613
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Kashish Jain
Assignee: Kashish Jain
Priority: Minor
 Fix For: 1.3.0, 1.2.2

   Original Estimate: 24h
  Remaining Estimate: 24h

 Steps to Reproduce
 1) Run any spark job
 2) Stop yarn while the spark job is running (an application id has been 
 generated by now)
 3) Restart yarn now
 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
 exception. This leads to termination of thread. 
 Here is the StackTrace
 15/02/05 05:22:37 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:38 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:39 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 5/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 Exception in thread Yarn application state monitor 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1423113179043_0003' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
 Source)
   at java.lang.reflect.Constructor.newInstance(Unknown Source)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
   at 
 org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
   at 
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
 Caused by: 
 

[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts

2015-02-10 Thread Andrew Or (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314922#comment-14314922
 ] 

Andrew Or commented on SPARK-5613:
--

Thanks Patrick. I just verified that it was merged into all of 1.2, 1.3 and 
Master. Closing this again.

 YarnClientSchedulerBackend fails to get application report when yarn restarts
 -

 Key: SPARK-5613
 URL: https://issues.apache.org/jira/browse/SPARK-5613
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Kashish Jain
Assignee: Kashish Jain
Priority: Minor
 Fix For: 1.3.0, 1.2.2

   Original Estimate: 24h
  Remaining Estimate: 24h

 Steps to Reproduce
 1) Run any spark job
 2) Stop yarn while the spark job is running (an application id has been 
 generated by now)
 3) Restart yarn now
 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
 exception. This leads to termination of thread. 
 Here is the StackTrace
 15/02/05 05:22:37 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:38 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:39 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 5/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 Exception in thread Yarn application state monitor 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1423113179043_0003' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
 Source)
   at java.lang.reflect.Constructor.newInstance(Unknown Source)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
   at 
 org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
   at 
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
 Caused by: 
 

[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts

2015-02-10 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14314821#comment-14314821
 ] 

Patrick Wendell commented on SPARK-5613:


I have cherry picked it into the 1.3 branch.

 YarnClientSchedulerBackend fails to get application report when yarn restarts
 -

 Key: SPARK-5613
 URL: https://issues.apache.org/jira/browse/SPARK-5613
 Project: Spark
  Issue Type: Bug
  Components: YARN
Affects Versions: 1.2.0
Reporter: Kashish Jain
Assignee: Kashish Jain
Priority: Minor
 Fix For: 1.3.0, 1.2.2

   Original Estimate: 24h
  Remaining Estimate: 24h

 Steps to Reproduce
 1) Run any spark job
 2) Stop yarn while the spark job is running (an application id has been 
 generated by now)
 3) Restart yarn now
 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
 exception. This leads to termination of thread. 
 Here is the StackTrace
 15/02/05 05:22:37 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:38 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:39 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 5/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 Exception in thread Yarn application state monitor 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1423113179043_0003' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
 Source)
   at java.lang.reflect.Constructor.newInstance(Unknown Source)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
   at 
 org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
   at 
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
 Caused by: 
 

[jira] [Commented] (SPARK-5613) YarnClientSchedulerBackend fails to get application report when yarn restarts

2015-02-05 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306903#comment-14306903
 ] 

Apache Spark commented on SPARK-5613:
-

User 'kasjain' has created a pull request for this issue:
https://github.com/apache/spark/pull/4392

 YarnClientSchedulerBackend fails to get application report when yarn restarts
 -

 Key: SPARK-5613
 URL: https://issues.apache.org/jira/browse/SPARK-5613
 Project: Spark
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Kashish Jain
Priority: Minor
 Fix For: 1.2.0, 1.2.1

   Original Estimate: 24h
  Remaining Estimate: 24h

 Steps to Reproduce
 1) Run any spark job
 2) Stop yarn while the spark job is running (an application id has been 
 generated by now)
 3) Restart yarn now
 4) AsyncMonitorApplication thread fails due to ApplicationNotFoundException 
 exception. This leads to termination of thread. 
 Here is the StackTrace
 15/02/05 05:22:37 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 6 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:38 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 7 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:39 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 8 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 15/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 5/02/05 05:22:40 INFO Client: Retrying connect to server: 
 nn1/192.168.173.176:8032. Already tried 9 time(s); retry policy is 
 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
 MILLISECONDS)
 Exception in thread Yarn application state monitor 
 org.apache.hadoop.yarn.exceptions.ApplicationNotFoundException: Application 
 with id 'application_1423113179043_0003' doesn't exist in RM.
   at 
 org.apache.hadoop.yarn.server.resourcemanager.ClientRMService.getApplicationReport(ClientRMService.java:284)
   at 
 org.apache.hadoop.yarn.api.impl.pb.service.ApplicationClientProtocolPBServiceImpl.getApplicationReport(ApplicationClientProtocolPBServiceImpl.java:145)
   at 
 org.apache.hadoop.yarn.proto.ApplicationClientProtocol$ApplicationClientProtocolService$2.callBlockingMethod(ApplicationClientProtocol.java:321)
   at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:585)
   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:928)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2013)
   at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2009)
   at java.security.AccessController.doPrivileged(Native Method)
   at javax.security.auth.Subject.doAs(Unknown Source)
   at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2007)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
   at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown 
 Source)
   at java.lang.reflect.Constructor.newInstance(Unknown Source)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
   at 
 org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:101)
   at 
 org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:166)
   at sun.reflect.GeneratedMethodAccessor18.invoke(Unknown Source)
   at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
   at java.lang.reflect.Method.invoke(Unknown Source)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:190)
   at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:103)
   at com.sun.proxy.$Proxy12.getApplicationReport(Unknown Source)
   at 
 org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:291)
   at 
 org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:116)
   at 
 org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$$anon$1.run(YarnClientSchedulerBackend.scala:120)
 Caused by: