[jira] [Commented] (HADOOP-2980) slow reduce copies - map output locations not being fetched even when map complete

Alexandre Normand (JIRA) Mon, 03 Dec 2012 16:58:01 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13509391#comment-13509391
 ]


Alexandre Normand commented on HADOOP-2980:
-------------------------------------------

I googled my way to this ticket after seeing something similar. I'm seeing this 
occasionally where we have failed task attempts where the logs are showing this:
{code}
2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Need another 36 map output(s) where 1 is 
already in progress
2012-12-03 16:46:36,980 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 4 outputs (0 slow hosts and29 
dup hosts)
2012-12-03 16:46:38,983 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 2 outputs (0 slow hosts and29 
dup hosts)
2012-12-03 16:47:36,992 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is 
already in progress
2012-12-03 16:47:36,993 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 
dup hosts)
...
2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is 
already in progress
2012-12-03 16:54:37,080 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 
dup hosts)
2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is 
already in progress
2012-12-03 16:55:37,228 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 
dup hosts)
2012-12-03 16:56:37,235 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Need another 30 map output(s) where 1 is 
already in progress
2012-12-03 16:56:37,236 INFO org.apache.hadoop.mapred.ReduceTask: 
attempt_201212031640_0003_r_000045_0 Scheduled 0 outputs (0 slow hosts and29 
dup hosts)
{code}

Eventually, a task attempt will get killed for not having reported a status in 
600 seconds and the retry will complete quickly. 

Since I'm running with hadoop 2.0.0, I'm wondering if that would be the same 
issue or a completely different one. Also, the fact that this bug remained 
dormant for so long makes me wonder if people are just not seeing that issue 
anymore and if that could be config-related. 
                
> slow reduce copies - map output locations not being fetched even when map 
> complete
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-2980
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2980
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 0.15.3
>            Reporter: Joydeep Sen Sarma
>
> maps are long finished. reduces are stuck looking for map locations. they 
> make progress - but slowly. it almost seems like they get new map locations 
> every minute or so:
> 2008-03-07 18:50:52,737 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200803041231_3586_r_000021_0 done copying 
> task_200803041231_3586_m_004620_0 output from hadoop082.sf2p.facebook.com..
> 2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200803041231_3586_r_000021_0: Got 0 new map-outputs & 0 obsolete 
> map-outputs from tasktracker and 0 map-outputs from previous failures
> 2008-03-07 18:50:53,733 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200803041231_3586_r_000021_0 Got 0 known map output location(s); 
> scheduling...
> ...
> 2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200803041231_3586_r_000021_0 Got 50 known map output location(s); 
> scheduling...
> 2008-03-07 18:51:49,767 INFO org.apache.hadoop.mapred.ReduceTask: 
> task_200803041231_3586_r_000021_0 Scheduled 41 of 50 known outputs (0 slow 
> hosts and 9 dup hosts)
> they get about 50 locations at a time and this 1 minute delay pattern is 
> surprisingly common ..

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HADOOP-2980) slow reduce copies - map output locations not being fetched even when map complete

Reply via email to