[ https://issues.apache.org/jira/browse/HBASE-22287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16823294#comment-16823294 ]
Sergey Shelukhin commented on HBASE-22287: ------------------------------------------ cc [~Apache9] > inifinite retries on failed server in RSProcedureDispatcher > ----------------------------------------------------------- > > Key: HBASE-22287 > URL: https://issues.apache.org/jira/browse/HBASE-22287 > Project: HBase > Issue Type: Bug > Reporter: Sergey Shelukhin > Priority: Major > > We observed this recently on some cluster, I'm still investigating the root > cause however seems like the retries should have special handling for this > exception; and separately probably a cap on number of retries > {noformat} > 2019-04-20 04:24:27,093 WARN [RSProcedureDispatcher-pool4-t1285] > procedure.RSProcedureDispatcher: request to server ,17020,1555742560432 > failed due to java.io.IOException: Call to :17020 failed on local exception: > org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the > failed servers list: :17020, try=26603, retrying... > {noformat} > The corresponding worker is stuck -- This message was sent by Atlassian JIRA (v7.6.3#76005)