NodeManager exit without spesific log messages.

2017-09-22 Thread Nur Kholis Majid
Hi, one of my NM nodes periodically exit with this error log
https://paste.ee/p/hc104

Anyone have idea about this?

Thank you.

-
To unsubscribe, e-mail: user-unsubscr...@hadoop.apache.org
For additional commands, e-mail: user-h...@hadoop.apache.org



Re: How to set AM attempt interval?

2015-03-02 Thread Nur Kholis Majid
Hi Vinod,

Here is Diagnostics message from RM Web UI page:
Application application_1424919411720_0878 failed 10 times due to
Error launching appattempt_1424919411720_0878_10. Got exception:
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readFully(DataInputStream.java:169)
at 
org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:209)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.setupTokens(AMLauncher.java:226)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.createAMContainerLaunchContext(AMLauncher.java:198)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:108)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:254)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
. Failing the application.

The log link only show following messages and doesn't produce some
stdout and stderr file:
Logs not available for container_1424919411720_0878_08_01_14.
Aggregation may not be complete, Check back later or try the
nodemanager at hadoopdn01:8041

Here is the screenshot:
https://dl.dropboxusercontent.com/u/33705885/2015-03-02_163138.png

Thank you.

On Sat, Feb 28, 2015 at 2:56 AM, Vinod Kumar Vavilapalli
vino...@hortonworks.com wrote:
 That's an old JIRA. The right solution is not an AM-retry interval but
 launching the AM somewhere.

 Why is your AM failing in the first place? If it is due to full-disk, the
 situation should be better with YARN-1781 - can you use the configuration
 (yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage)
 added at YARN-1781?

 +Vinod

 On Feb 27, 2015, at 7:31 AM, Ted Yu yuzhih...@gmail.com wrote:

 Looks like this is related:
 https://issues.apache.org/jira/browse/YARN-964

 On Fri, Feb 27, 2015 at 4:29 AM, Nur Kholis Majid
 nur.kholis.ma...@gmail.com wrote:

 Hi All,

 I have many jobs failed because AM trying to rerun job in very short
 interval (only in 6 second). How can I add the interval to bigger
 value?

 https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

 Thank you.





How to set AM attempt interval?

2015-02-27 Thread Nur Kholis Majid
Hi All,

I have many jobs failed because AM trying to rerun job in very short
interval (only in 6 second). How can I add the interval to bigger
value?

https://dl.dropboxusercontent.com/u/33705885/2015-02-27_145104.png

Thank you.


Which script is generated launch_container.sh file?

2014-09-25 Thread Nur Kholis Majid
Hi,

Which script is generated launch_container.sh file while we run yarn
jar? this script is located in
${yarn.nodemanager.local-dirs}/nm-local-dir/usercache/${username}/appcache/application_*\container_*/

I need to insert some command to change LOG_DIRS permission.

Thanks.