[ http://issues.apache.org/jira/browse/HADOOP-190?page=all ]

[EMAIL PROTECTED] updated HADOOP-190:
-------------------------------------

    Attachment: nocleanifdone.patch

Here's a suggested patch.  If task has been marked 'done', don't remove the 
output (I haven't tested this patch -- the condition is awkward to manufacture).

> Job fails though task succeeded if we fail to exit
> --------------------------------------------------
>
>          Key: HADOOP-190
>          URL: http://issues.apache.org/jira/browse/HADOOP-190
>      Project: Hadoop
>         Type: Bug

>     Reporter: [EMAIL PROTECTED]
>  Attachments: nocleanifdone.patch
>
> This is an odd case.  Main cause will be programmer error but I suppose it 
> could happen during normal processing. Whichever, would be grand if hadoop 
> was better able to deal.
> My map task completed 'successfully' but because I had started threads inside 
> in my task that were not set to be of daemon type that under certain 
> circumstances were left running,  my child stuck around after reporting 
> 'done' -- the JVM wouldn't go down while non-daemon threads still running.  
> After ten minutes, TT steps in,  kills the child and does cleanup of the 
> successful output.  Because JT has been told the task completed successfully, 
> reducers keep showing up looking for the output now removed -- until the job 
> fails.
> Below is illustration of the problem using log output:
> ....
> 060501 090401 task_0001_m_000798_0 0.99491096% adding 
> http://www.score.umd.edu/a
> um.jpg 24891 image/jpeg
> 060501 090401 task_0001_m_000798_0 1.0% adding 
> http://www.score.umd.edu/album.jp
> 24891 image/jpeg
> 060501 090401 Task task_0001_m_000798_0 is done.
> ...
> 060501 091410 task_0001_m_000798_0: Task failed to report status for 608 
> seconds
> Killing.
> ....
> 060501 091410 Calling cleanup because was killed or FAILED 
> task_0001_m_000798_0
> 060501 091410 task_0001_m_000798_0 done; removing files.
> Then, subsequently....
> 060501 091422 SEVERE Can't open map 
> output:/1/hadoop/tmp/task_0001_m_000798_0/pa
> -12.out
> java.io.FileNotFoundException: LocalFS
> ...
> and on and on.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply via email to