[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException
[ https://issues.apache.org/jira/browse/YARN-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504386#comment-13504386 ] Bikas Saha commented on YARN-244: - Did you check if AM retries were enabled to be > 1? Without that the last attempt will delete the files. If the AM is being retried by the RM then this value should already be > 1 though. So there could be a bug. > Application Master Retries fail due to FileNotFoundException > > > Key: YARN-244 > URL: https://issues.apache.org/jira/browse/YARN-244 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K >Priority: Blocker > > Application attempt1 is deleting the job related files and these are not > present in the HDFS for following retries. > {code:xml} > Application application_1353724754961_0001 failed 4 times due to AM Container > for appattempt_1353724754961_0001_04 exited with exitCode: -1000 due to: > RemoteTrace: java.io.FileNotFoundException: File does not exist: > hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at > org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at > org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at > org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at > java.util.concurrent.FutureTask.run(FutureTask.java:138) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at > java.util.concurrent.FutureTask.run(FutureTask.java:138) at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File > does not exist: > hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:822) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:492) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:221) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) > at > org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) .Failing this > attempt.. Failing the application. > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )
[ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504234#comment-13504234 ] Sandy Ryza commented on YARN-72: Newest patch contains a test and timeout. The timeout is yarn.nodemanager.sleep-delay-before-sigkill.ms + yarn.nodemanager.process-kill-wait.ms + 1000. Should I make this configurable? > NM should handle cleaning up containers when it shuts down ( and kill > containers from an earlier instance when it comes back up after an unclean > shutdown ) > --- > > Key: YARN-72 > URL: https://issues.apache.org/jira/browse/YARN-72 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Sandy Ryza > Attachments: YARN-72-1.patch, YARN-72.patch > > > Ideally, the NM should wait for a limited amount of time when it gets a > shutdown signal for existing containers to complete and kill the containers ( > if we pick an aggressive approach ) after this time interval. > For NMs which come up after an unclean shutdown, the NM should look through > its directories for existing container.pids and try and kill an existing > containers matching the pids found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )
[ https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-72: --- Attachment: YARN-72-1.patch > NM should handle cleaning up containers when it shuts down ( and kill > containers from an earlier instance when it comes back up after an unclean > shutdown ) > --- > > Key: YARN-72 > URL: https://issues.apache.org/jira/browse/YARN-72 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Reporter: Hitesh Shah >Assignee: Sandy Ryza > Attachments: YARN-72-1.patch, YARN-72.patch > > > Ideally, the NM should wait for a limited amount of time when it gets a > shutdown signal for existing containers to complete and kill the containers ( > if we pick an aggressive approach ) after this time interval. > For NMs which come up after an unclean shutdown, the NM should look through > its directories for existing container.pids and try and kill an existing > containers matching the pids found. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages
[ https://issues.apache.org/jira/browse/YARN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504075#comment-13504075 ] Karthik Kambatla commented on YARN-224: --- +1 > Fair scheduler logs too many nodeUpdate INFO messages > - > > Key: YARN-224 > URL: https://issues.apache.org/jira/browse/YARN-224 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-224-1.patch, YARN-224.patch > > > The RM logs are filled with an INFO message the fair scheduler logs every > time it receives a nodeUpdate. It should be taken out or demoted to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages
[ https://issues.apache.org/jira/browse/YARN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated YARN-224: Attachment: YARN-224-1.patch > Fair scheduler logs too many nodeUpdate INFO messages > - > > Key: YARN-224 > URL: https://issues.apache.org/jira/browse/YARN-224 > Project: Hadoop YARN > Issue Type: Bug >Affects Versions: 2.0.2-alpha >Reporter: Sandy Ryza >Assignee: Sandy Ryza > Attachments: YARN-224-1.patch, YARN-224.patch > > > The RM logs are filled with an INFO message the fair scheduler logs every > time it receives a nodeUpdate. It should be taken out or demoted to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries
[ https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504011#comment-13504011 ] Vinod Kumar Vavilapalli commented on YARN-243: -- Agree with Jason. We shouldn't workaround it on the client-side. I think we should close this as a duplicate of MAPREDUCE-4819. > Job Client doesn't give progress for Application Master Retries > --- > > Key: YARN-243 > URL: https://issues.apache.org/jira/browse/YARN-243 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > If we configure the AM retries, if the first attempt fails then RM will > create next attempt but Job Client doesn't give the progress for the retry > attempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries
[ https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503996#comment-13503996 ] Jason Lowe commented on YARN-243: - Filed MAPREDUCE-4819 to track the issue where the AM can re-run the job after reporting success to the client. This could be particularly bad if the job succeeded but was rerun and re-committed its output just as another job was trying to consume it. > Job Client doesn't give progress for Application Master Retries > --- > > Key: YARN-243 > URL: https://issues.apache.org/jira/browse/YARN-243 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > If we configure the AM retries, if the first attempt fails then RM will > create next attempt but Job Client doesn't give the progress for the retry > attempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries
[ https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503896#comment-13503896 ] Jason Lowe commented on YARN-243: - That doesn't sound like something to fix on the client side. If the AM told the client that the job failed then the job should have failed. The fact that the attempt died between the time it told the client the job final status and the RM can happen, and IMHO we should fix things so the subsequent AM attempt doesn't retry the job but rather simply updates the RM with the failed status found from the previous attempt. Otherwise we run into bad situations where we've already told the client the job failed, but the job subsequently retries (possibly from scratch, depending upon the output format support for recovery) and could succeed. If the job has decided to fail and has already told the client, an AM attempt failure while trying to report that same decision to the RM shouldn't allow the job to subsequently succeed, IMHO. > Job Client doesn't give progress for Application Master Retries > --- > > Key: YARN-243 > URL: https://issues.apache.org/jira/browse/YARN-243 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > If we configure the AM retries, if the first attempt fails then RM will > create next attempt but Job Client doesn't give the progress for the retry > attempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables
[ https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503890#comment-13503890 ] Robert Joseph Evans commented on YARN-237: -- You have to be careful with cookies because the web app proxy strips out cookies before sending the data to the application. > Refreshing the RM page forgets how many rows I had in my Datatables > --- > > Key: YARN-237 > URL: https://issues.apache.org/jira/browse/YARN-237 > Project: Hadoop YARN > Issue Type: Improvement > Components: resourcemanager >Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0 >Reporter: Ravi Prakash > > If I choose a 100 rows, and then refresh the page, DataTables goes back to > showing me 20 rows. > This user preference should be stored in a cookie. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries
[ https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503878#comment-13503878 ] Devaraj K commented on YARN-243: If we kill the AM, the client connects to new AM by getting the latest app report from RM, but if the AM attempt fails(Job fails), RM will start the new attempt and client shows the previous attempt status i.e Job Failed status. I think we need handle this case in the client side. > Job Client doesn't give progress for Application Master Retries > --- > > Key: YARN-243 > URL: https://issues.apache.org/jira/browse/YARN-243 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > If we configure the AM retries, if the first attempt fails then RM will > create next attempt but Job Client doesn't give the progress for the retry > attempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode
[ https://issues.apache.org/jira/browse/YARN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503867#comment-13503867 ] Devaraj K commented on YARN-241: As per my observation when I debug, It is having the secret key and mac.init(key) is failing. It fails for all subsequent invocations. If we try with new mac instance with same secret key it succeeds. > Node Manager fails to launch containers after NM restart in secure mode > --- > > Key: YARN-241 > URL: https://issues.apache.org/jira/browse/YARN-241 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Priority: Blocker > > After restarting the Node Manager it fails to launch containers with the > below exception. > {code:xml} > 2012-11-24 17:21:56,141 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 8048: readAndProcess threw exception > java.lang.IllegalArgumentException: Invalid key to HMAC computation from > client 158.1.131.10. Count of bytes read: 0 > java.lang.IllegalArgumentException: Invalid key to HMAC computation > at > org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:153) > at > org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:109) > at > org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:44) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484) > Caused by: java.security.InvalidKeyException: No installed provider supports > this key: javax.crypto.spec.SecretKeySpec > at javax.crypto.Mac.a(DashoA13*..) > at javax.crypto.Mac.init(DashoA13*..) > at > org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:151) > ... 11 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries
[ https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503863#comment-13503863 ] Jason Lowe commented on YARN-243: - I tried replicating this with a sleep job and manually killing the AM to force AM retries. In this case the client reconnected to the new AM attempt and continued to show map/reduce progress for the new attempt. > Job Client doesn't give progress for Application Master Retries > --- > > Key: YARN-243 > URL: https://issues.apache.org/jira/browse/YARN-243 > Project: Hadoop YARN > Issue Type: Bug > Components: client, resourcemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K > > If we configure the AM retries, if the first attempt fails then RM will > create next attempt but Job Client doesn't give the progress for the retry > attempts. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode
[ https://issues.apache.org/jira/browse/YARN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503836#comment-13503836 ] Daryn Sharp commented on YARN-241: -- Is this maybe caused by a race condition where the NM is receiving a container token before the RM registration completes and it receives the secret keys for the container tokens? > Node Manager fails to launch containers after NM restart in secure mode > --- > > Key: YARN-241 > URL: https://issues.apache.org/jira/browse/YARN-241 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Priority: Blocker > > After restarting the Node Manager it fails to launch containers with the > below exception. > {code:xml} > 2012-11-24 17:21:56,141 INFO org.apache.hadoop.ipc.Server: IPC Server > listener on 8048: readAndProcess threw exception > java.lang.IllegalArgumentException: Invalid key to HMAC computation from > client 158.1.131.10. Count of bytes read: 0 > java.lang.IllegalArgumentException: Invalid key to HMAC computation > at > org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:153) > at > org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:109) > at > org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:44) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194) > at > org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220) > at > com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568) > at > com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226) > at > org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199) > at > org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710) > at > org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484) > Caused by: java.security.InvalidKeyException: No installed provider supports > this key: javax.crypto.spec.SecretKeySpec > at javax.crypto.Mac.a(DashoA13*..) > at javax.crypto.Mac.init(DashoA13*..) > at > org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:151) > ... 11 more > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException
[ https://issues.apache.org/jira/browse/YARN-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503832#comment-13503832 ] Jason Lowe commented on YARN-244: - could you provide a bit more detail from the AM logs when this occurs? I'm not able to reproduce this with a sleep job and manually killing the AM to simulate failure. Normally the AM tries to determine if it is the last attempt and only deletes the files if it is convinced there will be more attempts. If you could provide steps to reproduce or details from the AM logs showing why it decided to remove the staging directory that would help clarify what's going on in this case. > Application Master Retries fail due to FileNotFoundException > > > Key: YARN-244 > URL: https://issues.apache.org/jira/browse/YARN-244 > Project: Hadoop YARN > Issue Type: Bug > Components: applications >Affects Versions: 2.0.2-alpha, 2.0.1-alpha >Reporter: Devaraj K >Assignee: Devaraj K >Priority: Blocker > > Application attempt1 is deleting the job related files and these are not > present in the HDFS for following retries. > {code:xml} > Application application_1353724754961_0001 failed 4 times due to AM Container > for appattempt_1353724754961_0001_04 exited with exitCode: -1000 due to: > RemoteTrace: java.io.FileNotFoundException: File does not exist: > hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens > at > org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752) > at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at > org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at > org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at > org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at > org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at > java.util.concurrent.FutureTask.run(FutureTask.java:138) at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at > java.util.concurrent.FutureTask.run(FutureTask.java:138) at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) at LocalTrace: > org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File > does not exist: > hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217) > at > org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:822) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:492) > at > org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:221) > at > org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46) > at > org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at > java.security.AccessController.doPrivileged(Native Method) at > javax.security.auth.Subject.doAs(Subject.java:396) at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) .Failing this > attempt.. Failing the application. > {code} -- This message is automatically ge