[jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Raghu Angadi (JIRA) Fri, 01 Jun 2007 10:47:39 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500778
 ]


Raghu Angadi commented on HADOOP-1431:
--------------------------------------


Doug's comment that was posted to HADOOP-1134 by mistake:

{quote}
Calvin Yu noted on hadoop-user that join() seems to sometimes hang even if the 
thread has been interrupted. In other places we use the idiom of a 'running' 
flag that's checked in a thread's loop in conjunction with an interrupt, rather 
than interrupt+join, and that seems to be reliable. So I think we should switch 
to that here to.

Also, in the current patch, I don't see why the thread is held in a field. I 
worry that someone might add code like 'if (sortProgressThread == null) ...', 
and that we might somehow not always null this field. If it is kept in a local 
variable around the call then this is much less of a risk.

So I think we should convert the createProgressThread method to a nested class 
whose constructor starts the thread and which has a stop() method that sets a 
flag. It would also be good if the 'try' block could be shared between 
'collect()' and 'flush()'. I think this calls for a new method something like:

private void sortWithProgress() {
ProgressThread progress = new ProgressThread();
try { sortAndSpillToDisk(); } finally { progress.stop(); }
}
{quote}


> Map tasks can't timeout for failing to call progress
> ----------------------------------------------------
>
>                 Key: HADOOP-1431
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1431
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.13.0
>            Reporter: Owen O'Malley
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.13.0
>
>         Attachments: HADOOP-1431_1_20070525.patch, 
> HADOOP-1431_2_20070530.patch, HADOOP-1431_3_20070601.patch
>
>
> Currently the map task runner creates a thread that calls progress every 
> second to keep the system from killing the map if the sort takes too long. 
> This is the wrong approach, because it will cause stuck tasks to not be 
> killed. The right solution is to have the sort call progress as it actually 
> makes progress. This is part of what is going on in HADOOP-1374. A map gets 
> stuck at 100% progress, but not done.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress

Reply via email to