[ https://issues.apache.org/jira/browse/HADOOP-12421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14877136#comment-14877136 ]
Steve Loughran commented on HADOOP-12421: ----------------------------------------- worth fixing. FWIW I've encountered that in a large embedded system project where all the SSD-based embedded devices booted at exactly the same time after a facility-wide power cycle; overloaded TCP links to some servers, with them all backing off at exactly the same rate. And even though they had Jitter, it was driven off time-since-boot, so they were all in sync too. moral: choose your randomness for the jitter well enough to handle simultaneous cluster restarts > Add jitter to RetryInvocationHandler > ------------------------------------ > > Key: HADOOP-12421 > URL: https://issues.apache.org/jira/browse/HADOOP-12421 > Project: Hadoop Common > Issue Type: Bug > Reporter: Elliott Clark > Assignee: Elliott Clark > > Calls to NN can become synchronized across a cluster during NN failover. This > leads to a spike in requests until things recover. Making an already tricky > time worse. -- This message was sent by Atlassian JIRA (v6.3.4#6332)