Hi Vinod,
Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_10. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.
The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_01_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041
Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png
Thank you.
On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
That's an old JIRA. The right solution is not an AM-retry interval but
launching the AM somewhere.
Why is your AM failing in the first place? If it is due to full-disk, the
situation should be better with YARN-1781 - can you use the configuration
(yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
added at YARN-1781?
+Vinod
On Feb 27, 2015, at 7:31 AM, Ted Yu yuzhih...@gmail.com wrote:
Looks like this is related:
https://issues.apache.org/jira/browse/YARN-964
On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
nur.kholis.ma...@gmail.com wrote:
Hi All,
I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?
https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png
Thank you.