[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output
[ https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589373#comment-14589373 ] Devaraj K commented on MAPREDUCE-6401: -- Have you checked the container/task logs? You could probably find the reason for this failure in the task logs. Container-launch failure gives no debugging output -- Key: MAPREDUCE-6401 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Attachments: job.log MR jobs are failing on my cluster with Stack trace: ExitCodeException exitCode=7 but little else in terms of debugging information. Can we please improve the debugging info? Log file is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output
[ https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589509#comment-14589509 ] Hari Sekhon commented on MAPREDUCE-6401: Yes the attached log is from the container task logs obtained via the RM. I would have expected a failing shell call should print the command, status code and both stderr and stdout with which to debug/reproduce. It's clear something is different since these are new nodes in the cluster that are giving these problems but it's not clear what is wrong because the debug output is so vague - I'm trying to fix it blindfolded. Container-launch failure gives no debugging output -- Key: MAPREDUCE-6401 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Attachments: job.log MR jobs are failing on my cluster with Stack trace: ExitCodeException exitCode=7 but little else in terms of debugging information. Can we please improve the debugging info? Log file is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output
[ https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589538#comment-14589538 ] Devaraj K commented on MAPREDUCE-6401: -- The attached log seems to be MRAppMaster log(not the task logs), where you could see overall MR Job(all the tasks) logs. bq. I'm trying to fix it blindfolded. I understand that these logs can be improved further to get the detailed failed information instead of just exit codes. If you want to know the reason for container launch failure, you can check the corresponding failed container/task log by going to Job History Server UI(if log aggregation enabled) or directly by going to the app log dir’s for the container stderr/stdout in the failed node. Container-launch failure gives no debugging output -- Key: MAPREDUCE-6401 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Attachments: job.log MR jobs are failing on my cluster with Stack trace: ExitCodeException exitCode=7 but little else in terms of debugging information. Can we please improve the debugging info? Log file is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output
[ https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589925#comment-14589925 ] Hari Sekhon commented on MAPREDUCE-6401: Actually the task logs showed the same thing, not much to go on: {code} Exception from container-launch. Container id: container_e199_1434474871820_0001_02_19 Exit code: 7 Stack trace: ExitCodeException exitCode=7: at org.apache.hadoop.util.Shell.runCommand(Shell.java:538) at org.apache.hadoop.util.Shell.run(Shell.java:455) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : user is custom_scrubbed main : requested yarn user is custom_scrubbed Container exited with a non-zero exit code 7 {code} but the full tasks logs don't seem to have been retained by the history server. This made me suspicious so I reset the logging locations to try to get my hands on the full logs and after a yarn restart jobs started working normally again without failed tasks/container launches. Although I'm very certain that the cluster used to log to that dir I reset it to, perhaps Ambari had a bug that lost the location and reset to debug locations that didn't work properly (it wouldn't be the first time, eg. AMBARI-9022) I think we should leave this as a minor todo to improve debugging information, especially when launching shell commands and encountering non-zero exit codes, logging is king. Container-launch failure gives no debugging output -- Key: MAPREDUCE-6401 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401 Project: Hadoop Map/Reduce Issue Type: Bug Components: mrv2 Affects Versions: 2.6.0 Environment: HDP 2.2 Reporter: Hari Sekhon Attachments: job.log MR jobs are failing on my cluster with Stack trace: ExitCodeException exitCode=7 but little else in terms of debugging information. Can we please improve the debugging info? Log file is attached. -- This message was sent by Atlassian JIRA (v6.3.4#6332)