[ https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14589925#comment-14589925 ]
Hari Sekhon commented on MAPREDUCE-6401: ---------------------------------------- Actually the task logs showed the same thing, not much to go on: {code} Exception from container-launch. Container id: container_e199_1434474871820_0001_02_000019 Exit code: 7 Stack trace: ExitCodeException exitCode=7: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is <custom_scrubbed> main : requested yarn user is <custom_scrubbed> Container exited with a non-zero exit code 7 {code} but the full tasks logs don't seem to have been retained by the history server. This made me suspicious so I reset the logging locations to try to get my hands on the full logs and after a yarn restart jobs started working normally again without failed tasks/container launches. Although I'm very certain that the cluster used to log to that dir I reset it to, perhaps Ambari had a bug that lost the location and reset to debug locations that didn't work properly (it wouldn't be the first time, eg. AMBARI-9022) I think we should leave this as a minor todo to improve debugging information, especially when launching shell commands and encountering non-zero exit codes, logging is king. > Container-launch failure gives no debugging output > -------------------------------------------------- > > Key: MAPREDUCE-6401 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 2.6.0 > Environment: HDP 2.2 > Reporter: Hari Sekhon > Attachments: job.log > > > MR jobs are failing on my cluster with Stack trace: ExitCodeException > exitCode=7 but little else in terms of debugging information. Can we please > improve the debugging info? Log file is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)