[ 
https://issues.apache.org/jira/browse/HADOOP-3327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12600734#action_12600734
 ] 

Jothi Padmanabhan commented on HADOOP-3327:
-------------------------------------------

There are two possible optimizations to help mitigate the problem. 

Optimization 1 (At Job tracker)
============

The Job tracker could decide on when to re execute a map based on the system
load. System load would be characterized by the total number of map slots
available across the whole cluster and the number of unfinished map tasks in
the queue.

For example,
Load = (Total Map Slots available - Total Unfinished Maps) / Total Map Slots

One possible strategy (Possible default vaues for x = 50%, y = 75%)
1. If (Load < x), re-execute on first fetch failure notification itself.
2. If (x < Load < y), re-execute on second fetch failure notification.
3. Always re-execute (irrespective of the system load) on third notification.

Optimization 2 (At reduce task)
===========

The strategy is to categorize the time outs (while fetching map outputs) as
either connection Timeout or Read Timeout and then handle each case
differently. Currently, there is no distinction and all timeouts are handled
the same way.

Handling Connection Timeouts
--------------------------------------------

1. Try connecting with the default timeout of 30s.
2. Follow the existing algorithm of Exponential backoff for retries. This
algorithm is provided below for quick reference.

Handling Read Timeouts
-----------------------------------
1. Read with a time out = MAX(3 minutes, map_run_time)
2. Back off for a value = (map_run_time/2)
3. Send notifications after every read time out.

Exponential Back Off Algorithm
########################
BACKOFF_INIT = 4000

maxFetchRetriesPerMap =
getClosestPowerOf2(map_run_time * 1000 / BACKOFF_INIT) + 1;

currentBackOff = (noFailedFetches <= maxFetchRetriesPerMap)
                                     ? BACKOFF_INIT
                                       * (1 << (noFailedFetches - 1))
                                     : (this.maxBackoff * 1000 / 2);

First notification after maxFetchRetriesPerMap attempts
Second notification after 2 more attempts
Third notification after another 2 attempts

Example scenarios for Optimization 2
Assumptions
Map run time = 5 mins
Only one reducer per node (All fetch failures will be from this task alone)

Case 1. Connect fails.

Existing algorithm:
1. First notification = At end of 6 (maxFetchRetriesPerMap) retries. The 
Exponential backoff algorithm
comes into play here. 
Approx time = 3mins * 7 + (4+8+16+32+64+128) = 21 + 4.2 = 25.2 mins
                                                     ^^^^^^^^^^^^^^^^^^
                                                                EBO
2. Second notification = After 2 attempts. Approx time = 25 + (3+2+3) = 33 mins.
(Back Off = 2 mins after maxFetchRetriesPerMap.)
2. Third notification = After another 2 attempts. Approx time = 41 mins

New algorithm:
1. First Notification = 4.2 mins + 30*7 = 7.5 mins
                                        ^^^^^^
                                          EBO
2. Second Notification = 7.5 + (0.5+2.5+0.5) = 10.5 mins
3. Third Notification = 10.5 + (0.5+2.5+0.5) = 13.5 mins

Case 2. Connect successful, read fails.

Existing algorithm:
Same as Case 1, 41 mins for map re-execution.

New algorithm:
1. First Notification = 5 mins (read time out = map_run_time)
2. Second Notification = 5+2.5+5 = 12.5 mins (back off = map_run_time/2 = 2.5 
mins)
3. Third Notification = 12.5 + 2.5 + 5 = 20 mins





> Shufflinge fetachers waited too long between map output fetch re-tries
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-3327
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3327
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Runping Qi
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to