[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output

2015-06-17 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589373#comment-14589373
 ] 

Devaraj K commented on MAPREDUCE-6401:
--

Have you checked the container/task logs? You could probably find the reason 
for this failure in the task logs.

 Container-launch failure gives no debugging output
 --

 Key: MAPREDUCE-6401
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
 Attachments: job.log


 MR jobs are failing on my cluster with Stack trace: ExitCodeException 
 exitCode=7 but little else in terms of debugging information. Can we please 
 improve the debugging info? Log file is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output

2015-06-17 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589509#comment-14589509
 ] 

Hari Sekhon commented on MAPREDUCE-6401:


Yes the attached log is from the container task logs obtained via the RM.

I would have expected a failing shell call should print the command, status 
code and both stderr and stdout with which to debug/reproduce. It's clear 
something is different since these are new nodes in the cluster that are giving 
these problems but it's not clear what is wrong because the debug output is so 
vague - I'm trying to fix it blindfolded.

 Container-launch failure gives no debugging output
 --

 Key: MAPREDUCE-6401
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
 Attachments: job.log


 MR jobs are failing on my cluster with Stack trace: ExitCodeException 
 exitCode=7 but little else in terms of debugging information. Can we please 
 improve the debugging info? Log file is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output

2015-06-17 Thread Devaraj K (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589538#comment-14589538
 ] 

Devaraj K commented on MAPREDUCE-6401:
--

The attached log seems to be MRAppMaster log(not the task logs), where you 
could see overall MR Job(all the tasks) logs.

bq. I'm trying to fix it blindfolded.
I understand that these logs can be improved further to get the detailed failed 
information instead of just exit codes. If you want to know the reason for 
container launch failure, you can check the corresponding failed container/task 
log by going to Job History Server UI(if log aggregation enabled) or directly 
by going to the app log dir’s for the container stderr/stdout in the failed 
node.


 Container-launch failure gives no debugging output
 --

 Key: MAPREDUCE-6401
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
 Attachments: job.log


 MR jobs are failing on my cluster with Stack trace: ExitCodeException 
 exitCode=7 but little else in terms of debugging information. Can we please 
 improve the debugging info? Log file is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (MAPREDUCE-6401) Container-launch failure gives no debugging output

2015-06-17 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14589925#comment-14589925
 ] 

Hari Sekhon commented on MAPREDUCE-6401:


Actually the task logs showed the same thing, not much to go on:
{code}
Exception from container-launch. Container id: 
container_e199_1434474871820_0001_02_19 Exit code: 7 Stack trace: 
ExitCodeException exitCode=7:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at 
org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:293)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at 
org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745) Shell output: main : command 
provided 1 main : user is custom_scrubbed main : requested yarn user is 
custom_scrubbed Container exited with a non-zero exit code 7
{code}
but the full tasks logs don't seem to have been retained by the history server. 
This made me suspicious so I reset the logging locations to try to get my hands 
on the full logs and after a yarn restart jobs started working normally again 
without failed tasks/container launches. Although I'm very certain that the 
cluster used to log to that dir I reset it to, perhaps Ambari had a bug that 
lost the location and reset to debug locations that didn't work properly (it 
wouldn't be the first time, eg. AMBARI-9022)

I think we should leave this as a minor todo to improve debugging information, 
especially when launching shell commands and encountering non-zero exit codes, 
logging is king.

 Container-launch failure gives no debugging output
 --

 Key: MAPREDUCE-6401
 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6401
 Project: Hadoop Map/Reduce
  Issue Type: Bug
  Components: mrv2
Affects Versions: 2.6.0
 Environment: HDP 2.2
Reporter: Hari Sekhon
 Attachments: job.log


 MR jobs are failing on my cluster with Stack trace: ExitCodeException 
 exitCode=7 but little else in terms of debugging information. Can we please 
 improve the debugging info? Log file is attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)