[ 
https://issues.apache.org/jira/browse/HADOOP-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877136#comment-14877136
 ] 

Steve Loughran commented on HADOOP-12421:
-----------------------------------------

worth fixing. FWIW I've encountered that in a large embedded system project 
where all the SSD-based embedded devices booted at exactly the same time after 
a facility-wide power cycle; overloaded TCP links to some servers, with them 
all backing off at exactly the same rate. And even though they had Jitter, it 
was driven off time-since-boot, so they were all in sync too.

moral: choose your randomness for the jitter well enough to handle simultaneous 
cluster restarts

> Add jitter to RetryInvocationHandler
> ------------------------------------
>
>                 Key: HADOOP-12421
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12421
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>
> Calls to NN can become synchronized across a cluster during NN failover. This 
> leads to a spike in requests until things recover. Making an already tricky 
> time worse.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to