date:20121126

[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException

2012-11-26 Thread Bikas Saha (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504386#comment-13504386
 ] 

Bikas Saha commented on YARN-244:
-

Did you check if AM retries were enabled to be > 1? Without that the last 
attempt will delete the files. If the AM is being retried by the RM then this 
value should already be > 1 though. So there could be a bug.

> Application Master Retries fail due to FileNotFoundException
> 
>
> Key: YARN-244
> URL: https://issues.apache.org/jira/browse/YARN-244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Blocker
>
> Application attempt1 is deleting the job related files and these are not 
> present in the HDFS for following retries.
> {code:xml}
> Application application_1353724754961_0001 failed 4 times due to AM Container 
> for appattempt_1353724754961_0001_04 exited with exitCode: -1000 due to: 
> RemoteTrace: java.io.FileNotFoundException: File does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752)
>  at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at 
> org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662) at LocalTrace: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File 
> does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:822)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:492)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:221)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>  at 
> org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) .Failing this 
> attempt.. Failing the application. 
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )

2012-11-26 Thread Sandy Ryza (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504234#comment-13504234
 ] 

Sandy Ryza commented on YARN-72:


Newest patch contains a test and timeout.  The timeout is 
yarn.nodemanager.sleep-delay-before-sigkill.ms + 
yarn.nodemanager.process-kill-wait.ms + 1000.  Should I make this configurable?

> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> ---
>
> Key: YARN-72
> URL: https://issues.apache.org/jira/browse/YARN-72
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Sandy Ryza
> Attachments: YARN-72-1.patch, YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )

2012-11-26 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-72:
---

Attachment: YARN-72-1.patch

> NM should handle cleaning up containers when it shuts down ( and kill 
> containers from an earlier instance when it comes back up after an unclean 
> shutdown )
> ---
>
> Key: YARN-72
> URL: https://issues.apache.org/jira/browse/YARN-72
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Reporter: Hitesh Shah
>Assignee: Sandy Ryza
> Attachments: YARN-72-1.patch, YARN-72.patch
>
>
> Ideally, the NM should wait for a limited amount of time when it gets a 
> shutdown signal for existing containers to complete and kill the containers ( 
> if we pick an aggressive approach ) after this time interval. 
> For NMs which come up after an unclean shutdown, the NM should look through 
> its directories for existing container.pids and try and kill an existing 
> containers matching the pids found. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages

2012-11-26 Thread Karthik Kambatla (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504075#comment-13504075
 ] 

Karthik Kambatla commented on YARN-224:
---

+1

> Fair scheduler logs too many nodeUpdate INFO messages
> -
>
> Key: YARN-224
> URL: https://issues.apache.org/jira/browse/YARN-224
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-224-1.patch, YARN-224.patch
>
>
> The RM logs are filled with an INFO message the fair scheduler logs every 
> time it receives a nodeUpdate.  It should be taken out or demoted to debug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages

2012-11-26 Thread Sandy Ryza (JIRA)


 [ 
https://issues.apache.org/jira/browse/YARN-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sandy Ryza updated YARN-224:


Attachment: YARN-224-1.patch

> Fair scheduler logs too many nodeUpdate INFO messages
> -
>
> Key: YARN-224
> URL: https://issues.apache.org/jira/browse/YARN-224
> Project: Hadoop YARN
>  Issue Type: Bug
>Affects Versions: 2.0.2-alpha
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Attachments: YARN-224-1.patch, YARN-224.patch
>
>
> The RM logs are filled with an INFO message the fair scheduler logs every 
> time it receives a nodeUpdate.  It should be taken out or demoted to debug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

2012-11-26 Thread Vinod Kumar Vavilapalli (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504011#comment-13504011
 ] 

Vinod Kumar Vavilapalli commented on YARN-243:
--

Agree with Jason. We shouldn't workaround it on the client-side. 

I think we should close this as a duplicate of MAPREDUCE-4819.

> Job Client doesn't give progress for Application Master Retries
> ---
>
> Key: YARN-243
> URL: https://issues.apache.org/jira/browse/YARN-243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> If we configure the AM retries, if the first attempt fails then RM will 
> create next attempt but Job Client doesn't give the progress for the retry 
> attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

2012-11-26 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503996#comment-13503996
 ] 

Jason Lowe commented on YARN-243:
-

Filed MAPREDUCE-4819 to track the issue where the AM can re-run the job after 
reporting success to the client.  This could be particularly bad if the job 
succeeded but was rerun and re-committed its output just as another job was 
trying to consume it.

> Job Client doesn't give progress for Application Master Retries
> ---
>
> Key: YARN-243
> URL: https://issues.apache.org/jira/browse/YARN-243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> If we configure the AM retries, if the first attempt fails then RM will 
> create next attempt but Job Client doesn't give the progress for the retry 
> attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

2012-11-26 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503896#comment-13503896
 ] 

Jason Lowe commented on YARN-243:
-

That doesn't sound like something to fix on the client side.  If the AM told 
the client that the job failed then the job should have failed.  The fact that 
the attempt died between the time it told the client the job final status and 
the RM can happen, and IMHO we should fix things so the subsequent AM attempt 
doesn't retry the job but rather simply updates the RM with the failed status 
found from the previous attempt.  Otherwise we run into bad situations where 
we've already told the client the job failed, but the job subsequently retries 
(possibly from scratch, depending upon the output format support for recovery) 
and could succeed.  If the job has decided to fail and has already told the 
client, an AM attempt failure while trying to report that same decision to the 
RM shouldn't allow the job to subsequently succeed, IMHO.

> Job Client doesn't give progress for Application Master Retries
> ---
>
> Key: YARN-243
> URL: https://issues.apache.org/jira/browse/YARN-243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> If we configure the AM retries, if the first attempt fails then RM will 
> create next attempt but Job Client doesn't give the progress for the retry 
> attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables

2012-11-26 Thread Robert Joseph Evans (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503890#comment-13503890
 ] 

Robert Joseph Evans commented on YARN-237:
--

You have to be careful with cookies because the web app proxy strips out 
cookies before sending the data to the application.

> Refreshing the RM page forgets how many rows I had in my Datatables
> ---
>
> Key: YARN-237
> URL: https://issues.apache.org/jira/browse/YARN-237
> Project: Hadoop YARN
>  Issue Type: Improvement
>  Components: resourcemanager
>Affects Versions: 2.0.2-alpha, 0.23.4, 3.0.0
>Reporter: Ravi Prakash
>
> If I choose a 100 rows, and then refresh the page, DataTables goes back to 
> showing me 20 rows.
> This user preference should be stored in a cookie.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

2012-11-26 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503878#comment-13503878
 ] 

Devaraj K commented on YARN-243:


If we kill the AM, the client connects to new AM by getting the latest app 
report from RM, but if the AM attempt fails(Job fails), RM will start the new 
attempt and client shows the previous attempt status i.e Job Failed status. I 
think we need handle this case in the client side.

> Job Client doesn't give progress for Application Master Retries
> ---
>
> Key: YARN-243
> URL: https://issues.apache.org/jira/browse/YARN-243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> If we configure the AM retries, if the first attempt fails then RM will 
> create next attempt but Job Client doesn't give the progress for the retry 
> attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode

2012-11-26 Thread Devaraj K (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503867#comment-13503867
 ] 

Devaraj K commented on YARN-241:


As per my observation when I debug, It is having the secret key and 
mac.init(key) is failing. It fails for all subsequent invocations. If we try 
with new mac instance with same secret key it succeeds.

> Node Manager fails to launch containers after NM restart in secure mode
> ---
>
> Key: YARN-241
> URL: https://issues.apache.org/jira/browse/YARN-241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Priority: Blocker
>
> After restarting the Node Manager it fails to launch containers with the 
> below exception.
>  {code:xml}
> 2012-11-24 17:21:56,141 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8048: readAndProcess threw exception 
> java.lang.IllegalArgumentException: Invalid key to HMAC computation from 
> client 158.1.131.10. Count of bytes read: 0
> java.lang.IllegalArgumentException: Invalid key to HMAC computation
> at 
> org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:153)
> at 
> org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:109)
> at 
> org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:44)
> at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
> at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
> at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199)
> at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484)
> Caused by: java.security.InvalidKeyException: No installed provider supports 
> this key: javax.crypto.spec.SecretKeySpec
> at javax.crypto.Mac.a(DashoA13*..)
> at javax.crypto.Mac.init(DashoA13*..)
> at 
> org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:151)
> ... 11 more
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

2012-11-26 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503863#comment-13503863
 ] 

Jason Lowe commented on YARN-243:
-

I tried replicating this with a sleep job and manually killing the AM to force 
AM retries.  In this case the client reconnected to the new AM attempt and 
continued to show map/reduce progress for the new attempt.

> Job Client doesn't give progress for Application Master Retries
> ---
>
> Key: YARN-243
> URL: https://issues.apache.org/jira/browse/YARN-243
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: client, resourcemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>
> If we configure the AM retries, if the first attempt fails then RM will 
> create next attempt but Job Client doesn't give the progress for the retry 
> attempts.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode

2012-11-26 Thread Daryn Sharp (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503836#comment-13503836
 ] 

Daryn Sharp commented on YARN-241:
--

Is this maybe caused by a race condition where the NM is receiving a container 
token before the RM registration completes and it receives the secret keys for 
the container tokens?

> Node Manager fails to launch containers after NM restart in secure mode
> ---
>
> Key: YARN-241
> URL: https://issues.apache.org/jira/browse/YARN-241
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: nodemanager
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Priority: Blocker
>
> After restarting the Node Manager it fails to launch containers with the 
> below exception.
>  {code:xml}
> 2012-11-24 17:21:56,141 INFO org.apache.hadoop.ipc.Server: IPC Server 
> listener on 8048: readAndProcess threw exception 
> java.lang.IllegalArgumentException: Invalid key to HMAC computation from 
> client 158.1.131.10. Count of bytes read: 0
> java.lang.IllegalArgumentException: Invalid key to HMAC computation
> at 
> org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:153)
> at 
> org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:109)
> at 
> org.apache.hadoop.yarn.server.security.ContainerTokenSecretManager.retrievePassword(ContainerTokenSecretManager.java:44)
> at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
> at 
> org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
> at 
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
> at 
> org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1199)
> at 
> org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1393)
> at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:710)
> at 
> org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:509)
> at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:484)
> Caused by: java.security.InvalidKeyException: No installed provider supports 
> this key: javax.crypto.spec.SecretKeySpec
> at javax.crypto.Mac.a(DashoA13*..)
> at javax.crypto.Mac.init(DashoA13*..)
> at 
> org.apache.hadoop.security.token.SecretManager.createPassword(SecretManager.java:151)
> ... 11 more
>  {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException

2012-11-26 Thread Jason Lowe (JIRA)


[ 
https://issues.apache.org/jira/browse/YARN-244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503832#comment-13503832
 ] 

Jason Lowe commented on YARN-244:
-

could you provide a bit more detail from the AM logs when this occurs?  I'm not 
able to reproduce this with a sleep job and manually killing the AM to simulate 
failure.  Normally the AM tries to determine if it is the last attempt and only 
deletes the files if it is convinced there will be more attempts.  If you could 
provide steps to reproduce or details from the AM logs showing why it decided 
to remove the staging directory that would help clarify what's going on in this 
case.

> Application Master Retries fail due to FileNotFoundException
> 
>
> Key: YARN-244
> URL: https://issues.apache.org/jira/browse/YARN-244
> Project: Hadoop YARN
>  Issue Type: Bug
>  Components: applications
>Affects Versions: 2.0.2-alpha, 2.0.1-alpha
>Reporter: Devaraj K
>Assignee: Devaraj K
>Priority: Blocker
>
> Application attempt1 is deleting the job related files and these are not 
> present in the HDFS for following retries.
> {code:xml}
> Application application_1353724754961_0001 failed 4 times due to AM Container 
> for appattempt_1353724754961_0001_04 exited with exitCode: -1000 due to: 
> RemoteTrace: java.io.FileNotFoundException: File does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752)
>  at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) at 
> org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) at 
> org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) at 
> org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at 
> java.util.concurrent.FutureTask.run(FutureTask.java:138) at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662) at LocalTrace: 
> org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: File 
> does not exist: 
> hdfs://hacluster:8020/tmp/hadoop-yarn/staging/mapred/.staging/job_1353724754961_0001/appTokens
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl.java:217)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb.LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationService.java:822)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocalizationService.java:492)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:221)
>  at 
> org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.LocalizationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java:46)
>  at 
> org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtocolService$2.callBlockingMethod(LocalizationProtocol.java:57)
>  at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
>  at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692) at 
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1688) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
>  at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1686) .Failing this 
> attempt.. Failing the application. 
> {code}

--
This message is automatically ge

[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException

[jira] [Commented] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )

[jira] [Updated] (YARN-72) NM should handle cleaning up containers when it shuts down ( and kill containers from an earlier instance when it comes back up after an unclean shutdown )

[jira] [Commented] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages

[jira] [Updated] (YARN-224) Fair scheduler logs too many nodeUpdate INFO messages

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

[jira] [Commented] (YARN-237) Refreshing the RM page forgets how many rows I had in my Datatables

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode

[jira] [Commented] (YARN-243) Job Client doesn't give progress for Application Master Retries

[jira] [Commented] (YARN-241) Node Manager fails to launch containers after NM restart in secure mode

[jira] [Commented] (YARN-244) Application Master Retries fail due to FileNotFoundException

14 matches

Site Navigation

Mail list logo

Footer information