[ https://issues.apache.org/jira/browse/HBASE-15937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15405065#comment-15405065 ]
Joseph commented on HBASE-15937: -------------------------------- Will adjust the timeout so that we actually wait 10 minutes in ReplicationTableBase > Figure out retry limit and timing for replication queue table operations > ------------------------------------------------------------------------ > > Key: HBASE-15937 > URL: https://issues.apache.org/jira/browse/HBASE-15937 > Project: HBase > Issue Type: Sub-task > Components: Replication > Reporter: Joseph > Assignee: Joseph > Attachments: HBASE-15937.patch > > > ReplicationQueuesHBaseImpl will abort the server if any of its HBase Table > writes/reads fails. We should figure out a reasonable retry limit and pause > duration for these operations. > As of now the timeouts look like: > Table initialization: > 240 retries > 1 minute pause (because the Master may not be initialized yet, createTable > retries are immediately rejected by PleaseHoldException, so we should sleep > in between RPC requests) > 1 minute RPC timeouts > Total: At minimum 2 hours of retries > Normal Replication Table operations: > 240 retries > 100 millis pause (because we assume the cluster is in a more stable state, we > assume most exceptions will be RPC timeouts, so I am using the standard RPC > pause) > 1 minute RPC timeouts > Total: Assuming operations fail because of RPC timeouts, a minimum of 2 hours > of retries. With just pauses we only have 24 seconds. > All of these timeouts are configurable too though. -- This message was sent by Atlassian JIRA (v6.3.4#6332)