[ https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744793#comment-13744793 ]
sam liu commented on MAPREDUCE-4490: ------------------------------------ Hi, According to the description, I am trying to provide a patch, as we encountered same issue in our Hadoop cluster. First, I added a function in task-controller.c: int initialize_task(const char* user, const char * good_local_dirs, const char *job_id, const char *task_id) { // Prepare the attempt directories for the task JVM. int result = create_attempt_directories(user, good_local_dirs, job_id, task_id); return result; } Of cause, I also modified task-controller.h/task-controller.c/main.c. After that, I try to call this feature through ShellCommandExecutor in LinuxTaskController#createLogDir. However, I found the default LinuxTaskController#createLogDir only has two input parameters (TaskAttemptID taskID,boolean isCleanup), and does not satisfy the input parameters of function initialize_task(const char* user,const char * good_local_dirs, const char *job_id, const char *task_id): we can not get user, dir, jobid, taskid from LinuxTaskController#createLogDir. Any suggestions on the issue which is blocking my progress? Thanks a lot! > JVM reuse is incompatible with LinuxTaskController (and therefore > incompatible with Security) > --------------------------------------------------------------------------------------------- > > Key: MAPREDUCE-4490 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: task-controller, tasktracker > Affects Versions: 0.20.205.0, 1.0.3 > Reporter: George Datskos > > When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > > 1) with more map tasks in a job than there are map slots in the cluster will > result in immediate task failures for the second task in each JVM (and then > the JVM exits). We have investigated this bug and the root cause is as > follows. When using LinuxTaskController, the userlog directory for a task > attempt (../userlogs/job/task-attempt) is created only on the first > invocation (when the JVM is launched) because userlogs directories are > created by the task-controller binary which only runs *once* per JVM. > Therefore, attempting to create log.index is guaranteed to fail with ENOENT > leading to immediate task failure and child JVM exit. > {quote} > 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting > logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM > as that of the first task > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0 > 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running > child > ENOENT: No such file or directory > at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method) > at > org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161) > at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296) > at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369) > at org.apache.hadoop.mapred.Child.main(Child.java:229) > {quote} > The above error occurs in a JVM which runs tasks 6 and 27. Task6 goes > smoothly. Then Task27 starts. The directory > /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0 > is never created so when mapred.Child tries to write the log.index file for > Task27, it fails with ENOENT because the > attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore, > the second task in each JVM is guaranteed to fail (and then the JVM exits) > every time when using LinuxTaskController. Note that this problem does not > occur when using the DefaultTaskController because the userlogs directories > are created for each task (not just for each JVM as with LinuxTaskController). > For each task, the TaskRunner calls the TaskController's createLogDir method > before attempting to write out an index file. > * DefaultTaskController#createLogDir: creates log directory for each task > * LinuxTaskController#createLogDir: does nothing > ** task-controller binary creates log directory [create_attempt_directories] > (but only for the first task) > Possible Solution: add a new command to task-controller *initialize task* to > create attempt directories. Call that command, with ShellCommandExecutor, in > the LinuxTaskController#createLogDir method -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira