[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13935044#comment-13935044
 ] 

Jason Lowe commented on MAPREDUCE-5792:
---------------------------------------

bq.  It becomes very hard to find AM logs for a failed AM because clicking the 
"logs" link from the RM page takes you to the NM it executed on and with log 
aggregation, the logs get pushed to HDFS very quickly, and then the NM just 
throws an error that the container doesn't exist.

This is clearly a bug.  We run with log aggregation and routinely have users 
debugging failed AM startups by clicking on the AM log links.  The link goes to 
the NM which re-directs to the history server and it shows the logs.  If this 
isn't working then there's either a regression or the cluster isn't configured 
properly.  Is yarn.log.server.url configured properly so the NM can redirect to 
the log server after logs have been aggregated?

> When mapreduce.jobhistory.intermediate-done-dir isn't writable, application 
> fails with generic error
> ----------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5792
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5792
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mr-am, mrv2
>    Affects Versions: 2.3.0
>            Reporter: Travis Thompson
>            Assignee: Mohammad Kamrul Islam
>
> When trying to run an application and the permissions are wrong on 
> {{mapreduce.jobhistory.intermediate-done-dir}}, the MapReduce AM fails with a 
> non-descriptive error message:
> {noformat}
> Application application_1394227890066_0004 failed 2 times due to AM Container 
> for appattempt_1394227890066_0004_000002 exited with exitCode: 1 due to: 
> Exception from container-launch:
> org.apache.hadoop.util.Shell$ExitCodeException:
> at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
> at org.apache.hadoop.util.Shell.run(Shell.java:418)
> at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
> at 
> org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
> at 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:744)
> main : command provided 1
> main : user is tthompso
> main : requested yarn user is tthompso
> Container exited with a non-zero exit code 1
> .Failing this attempt.. Failing the application. 
> {noformat}
> When permissions are corrected on this dir, applications are able to run.  
> There should probably be some sort of check on this dir before launching the 
> AM so a more meaningful error message can be thrown.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to