[ https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13170355#comment-13170355 ]
Harsh J commented on HBASE-3065: -------------------------------- Good point. /me bangs his head on the wall for not trying first :) I'll spend some time in the weekend to try out 0.92 and force this callback to fail. > Retry all 'retryable' zk operations; e.g. connection loss > --------------------------------------------------------- > > Key: HBASE-3065 > URL: https://issues.apache.org/jira/browse/HBASE-3065 > Project: HBase > Issue Type: Bug > Reporter: stack > Assignee: Liyin Tang > Priority: Blocker > Fix For: 0.92.0 > > Attachments: 3065-v3.txt, 3065-v4.txt, HBASE-3065-addendum.patch, > HBase-3065[r1088475]_1.patch, hbase3065_2.patch > > > The 'new' master refactored our zk code tidying up all zk accesses and > coralling them behind nice zk utility classes. One improvement was letting > out all KeeperExceptions letting the client deal. Thats good generally > because in old days, we'd suppress important state zk changes in state. But > there is at least one case the new zk utility could handle for the > application and thats the class of retryable KeeperExceptions. The one that > comes to mind is conection loss. On connection loss we should retry the > just-failed operation. Usually the retry will just work. At worse, on > reconnect, we'll pick up the expired session event. > Adding in this change shouldn't be too bad given the refactor of zk corralled > all zk access into one or two classes only. > One thing to consider though is how much we should retry. We could retry on > a timer or we could retry for ever as long as the Stoppable interface is > passed so if another thread has stopped or aborted the hosting service, we'll > notice and give up trying. Doing the latter is probably better than some > kinda timeout. > HBASE-3062 adds a timed retry on the first zk operation. This issue is about > generalizing what is over there across all zk access. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira