[
https://issues.apache.org/jira/browse/HDFS-16155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18040743#comment-18040743
]
ASF GitHub Bot commented on HDFS-16155:
---------------------------------------
github-actions[bot] commented on PR #3271:
URL: https://github.com/apache/hadoop/pull/3271#issuecomment-3578185134
We're closing this stale PR because it has been open for 100 days with no
activity. This isn't a judgement on the merit of the PR in any way. It's just a
way of keeping the PR queue manageable.
If you feel like this was a mistake, or you would like to continue working
on it, please feel free to re-open it and ask for a committer to remove the
stale tag and review again.
Thanks all for your contribution.
> Allow configurable exponential backoff in DFSInputStream refetchLocations
> -------------------------------------------------------------------------
>
> Key: HDFS-16155
> URL: https://issues.apache.org/jira/browse/HDFS-16155
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: dfsclient
> Reporter: Bryan Beaudreault
> Assignee: Bryan Beaudreault
> Priority: Minor
> Labels: pull-request-available
> Time Spent: 3h 20m
> Remaining Estimate: 0h
>
> The retry policy in
> [DFSInputStream#refetchLocations|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DFSInputStream.java#L1018-L1040]
> was first written many years ago. It allows configuration of the base time
> window, but subsequent retries double in an un-configurable way. This retry
> strategy makes sense in some clusters as it's very conservative and will
> avoid DDOSing the namenode in certain systemic failure modes – for example,
> if a file is being read by a large hadoop job and the underlying blocks are
> moved by the balancer. In this case, enough datanodes would be added to the
> deadNodes list and all hadoop tasks would simultaneously try to refetch the
> blocks. The 3s doubling with random factor helps break up that stampeding
> herd.
> However, not all cluster use-cases are created equal, so there are other
> cases where a more aggressive initial backoff is preferred. For example in a
> low-latency single reader scenario. In this case, if the balancer moves
> enough blocks, the reader hits this 3s backoff which is way too long for a
> low latency use-case.
> One could configure the the window very low (10ms), but then you can hit
> other systemic failure modes which would result in readers DDOSing the
> namenode again. For example, if blocks went missing due to truly dead
> datanodes. In this case, many readers might be refetching locations for
> different files with retry backoffs like 10ms, 20ms, 40ms, etc. It takes a
> while to backoff enough to avoid impacting the namenode with that strategy.
> I suggest adding a configurable multiplier to the backoff strategy so that
> operators can tune this as they see fit for their use-case. In the above low
> latency case, one could set the base very low (say 2ms) and the multiplier
> very high (say 50). This gives an aggressive first retry that very quickly
> backs off.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]