[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sam liu updated MAPREDUCE-4490:
-------------------------------

    Attachment: MAPREDUCE-4490.patch

Attached patch works well in my local environment and could resolve current 
issue. Any feedback is welcome! 

> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4490
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task-controller, tasktracker
>    Affects Versions: 0.20.205.0, 1.0.3
>            Reporter: George Datskos
>            Assignee: sam liu
>         Attachments: MAPREDUCE-4490.patch
>
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
>         at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
>         at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
>         at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
>         at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
>         at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to