[ 
https://issues.apache.org/jira/browse/HDFS-767?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777157#action_12777157
 ] 

Ning Zhang commented on HDFS-767:
---------------------------------

The problem may be solved by increase the number of retries to a sufficiently 
large number say (the maximum mapper slots) / 256. But the performance is not 
good since a client could wait up to (3 * #_of_retires) seconds.

A common case we see is that a read request can be served well less than 3 sec 
(could be just subsecond). And it is a wait of time to wait 3 seconds and let 
another bunch of 256 clients read the same block. So we propose the following 
change in the DFSClient to introduce a random factor to the wait time. So 
instead of being a fixed value 3000 as the wait time, it becomes the following 
formular:

  waitTime = 3000 * failures + 3000 * (failures + 1) * rand(0, 1);

where failures is the number of failures (starting from 0), and rand(0, 1) 
returns a random double from 0.0 to 1.0. 

The rationale behind this formula is as follows:

1) At the first time getting a BlockMissingException, the client waits a random 
time from 0-3 seconds and retry. If the block read can be served very quickly, 
the client get get it faster than always waiting for 3 sec. Also by 
distributing all clients evenly in the 3 sec window, more clients will be 
served for this round of retry. 
2) If the client still get the same exception and retry at the second time, it 
may be because the read is too slow or the number of requests are too large and 
the client is not lucky to ensure a spot in the last retry. To solve the first 
problem the second retry will wait 3 seconds before retry to ensure all clients 
in the first retry has already at least started (and hopefully some of them 
have already finished). To solve the second problem, we will increase the 
waiting window to 6 seconds and make sure less conflicts are there for the 3rd 
retry. 
3) Similarly at the 3rd retry, we will wait for 6 second to clean up the 
waiting window from the 2nd retry and make the waiting window to 9 seconds. 

Any comments on the design and proposal for unit test?

> Job failure due to BlockMissingException
> ----------------------------------------
>
>                 Key: HDFS-767
>                 URL: https://issues.apache.org/jira/browse/HDFS-767
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>
> If a block is request by too many mappers/reducers (say, 3000) at the same 
> time, a BlockMissingException is thrown because it exceeds the upper limit (I 
> think 256 by default) of number of threads accessing the same block at the 
> same time. The DFSClient wil catch that exception and retry 3 times after 
> waiting for 3 seconds. Since the wait time is a fixed value, a lot of clients 
> will retry at about the same time and a large portion of them get another 
> failure. After 3 retries, there are about 256*4 = 1024 clients got the block. 
> If the number of clients are more than that, the job will fail. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to