[ 
https://issues.apache.org/jira/browse/MAPREDUCE-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744793#comment-13744793
 ] 

sam liu commented on MAPREDUCE-4490:
------------------------------------

Hi,

According to the description, I am trying to provide a patch, as we encountered 
same issue in our Hadoop cluster. 

First, I added a function in task-controller.c:
int initialize_task(const char* user,
    const char * good_local_dirs, const char *job_id, const char *task_id) {
        // Prepare the attempt directories for the task JVM.
        int result = create_attempt_directories(user, good_local_dirs, job_id, 
task_id);
        return result;
}

Of cause, I also modified task-controller.h/task-controller.c/main.c. After 
that, I try to call this feature through ShellCommandExecutor in 
LinuxTaskController#createLogDir. However, I found the default 
LinuxTaskController#createLogDir only has two input parameters (TaskAttemptID 
taskID,boolean isCleanup), and does not satisfy the input parameters of 
function initialize_task(const char* user,const char * good_local_dirs, const 
char *job_id, const char *task_id): we can not get user, dir, jobid, taskid 
from LinuxTaskController#createLogDir.

Any suggestions on the issue which is blocking my progress?

Thanks a lot! 


                
> JVM reuse is incompatible with LinuxTaskController (and therefore 
> incompatible with Security)
> ---------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4490
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4490
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task-controller, tasktracker
>    Affects Versions: 0.20.205.0, 1.0.3
>            Reporter: George Datskos
>
> When using LinuxTaskController, JVM reuse (mapred.job.reuse.jvm.num.tasks > 
> 1) with more map tasks in a job than there are map slots in the cluster will 
> result in immediate task failures for the second task in each JVM (and then 
> the JVM exits). We have investigated this bug and the root cause is as 
> follows. When using LinuxTaskController, the userlog directory for a task 
> attempt (../userlogs/job/task-attempt) is created only on the first 
> invocation (when the JVM is launched) because userlogs directories are 
> created by the task-controller binary which only runs *once* per JVM. 
> Therefore, attempting to create log.index is guaranteed to fail with ENOENT 
> leading to immediate task failure and child JVM exit.
> {quote}
> 2012-07-24 14:29:11,914 INFO org.apache.hadoop.mapred.TaskLog: Starting 
> logging for a new task attempt_201207241401_0013_m_000027_0 in the same JVM 
> as that of the first task 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_000006_0
> 2012-07-24 14:29:11,915 WARN org.apache.hadoop.mapred.Child: Error running 
> child
> ENOENT: No such file or directory
>         at org.apache.hadoop.io.nativeio.NativeIO.open(Native Method)
>         at 
> org.apache.hadoop.io.SecureIOUtils.createForWrite(SecureIOUtils.java:161)
>         at org.apache.hadoop.mapred.TaskLog.writeToIndexFile(TaskLog.java:296)
>         at org.apache.hadoop.mapred.TaskLog.syncLogs(TaskLog.java:369)
>         at org.apache.hadoop.mapred.Child.main(Child.java:229)
> {quote}
> The above error occurs in a JVM which runs tasks 6 and 27.  Task6 goes 
> smoothly. Then Task27 starts. The directory 
> /var/log/hadoop/mapred/userlogs/job_201207241401_0013/attempt_201207241401_0013_m_0000027_0
>  is never created so when mapred.Child tries to write the log.index file for 
> Task27, it fails with ENOENT because the 
> attempt_201207241401_0013_m_0000027_0 directory does not exist. Therefore, 
> the second task in each JVM is guaranteed to fail (and then the JVM exits) 
> every time when using LinuxTaskController. Note that this problem does not 
> occur when using the DefaultTaskController because the userlogs directories 
> are created for each task (not just for each JVM as with LinuxTaskController).
> For each task, the TaskRunner calls the TaskController's createLogDir method 
> before attempting to write out an index file.
> * DefaultTaskController#createLogDir: creates log directory for each task
> * LinuxTaskController#createLogDir: does nothing
> ** task-controller binary creates log directory [create_attempt_directories] 
> (but only for the first task)
> Possible Solution: add a new command to task-controller *initialize task* to 
> create attempt directories.  Call that command, with ShellCommandExecutor, in 
> the LinuxTaskController#createLogDir method

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to