Retry all 'retryable' zk operations; e.g. connection loss
---------------------------------------------------------

                 Key: HBASE-3065
                 URL: https://issues.apache.org/jira/browse/HBASE-3065
             Project: HBase
          Issue Type: Bug
            Reporter: stack


The 'new' master refactored our zk code tidying up all zk accesses and 
coralling them behind nice zk utility classes.  One improvement was letting out 
all KeeperExceptions letting the client deal.  Thats good generally because in 
old days, we'd suppress important state zk changes in state.  But there is at 
least one case the new zk utility could handle for the application and thats 
the class of retryable KeeperExceptions.  The one that comes to mind is 
conection loss.  On connection loss we should retry the just-failed operation.  
Usually the retry will just work.  At worse, on reconnect, we'll pick up the 
expired session event. 

Adding in this change shouldn't be too bad given the refactor of zk corralled 
all zk access into one or two classes only.

One thing to consider though is how much we should retry.  We could retry on a 
timer or we could retry for ever as long as the Stoppable interface is passed 
so if another thread has stopped or aborted the hosting service, we'll notice 
and give up trying.  Doing the latter is probably better than some kinda 
timeout.

HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
generalizing what is over there across all zk access.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to