[jira] [Updated] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

ramkrishna.s.vasudevan (JIRA) Fri, 29 Jul 2011 05:37:37 -0700

     [ 
https://issues.apache.org/jira/browse/HBASE-3065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


ramkrishna.s.vasudevan updated HBASE-3065:
------------------------------------------

    Attachment: HBASE-3065-addendum.patch

> Retry all 'retryable' zk operations; e.g. connection loss
> ---------------------------------------------------------
>
>                 Key: HBASE-3065
>                 URL: https://issues.apache.org/jira/browse/HBASE-3065
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: Liyin Tang
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 3065-v3.txt, 3065-v4.txt, HBASE-3065-addendum.patch, 
> HBase-3065[r1088475]_1.patch, hbase3065_2.patch
>
>
> The 'new' master refactored our zk code tidying up all zk accesses and 
> coralling them behind nice zk utility classes.  One improvement was letting 
> out all KeeperExceptions letting the client deal.  Thats good generally 
> because in old days, we'd suppress important state zk changes in state.  But 
> there is at least one case the new zk utility could handle for the 
> application and thats the class of retryable KeeperExceptions.  The one that 
> comes to mind is conection loss.  On connection loss we should retry the 
> just-failed operation.  Usually the retry will just work.  At worse, on 
> reconnect, we'll pick up the expired session event. 
> Adding in this change shouldn't be too bad given the refactor of zk corralled 
> all zk access into one or two classes only.
> One thing to consider though is how much we should retry.  We could retry on 
> a timer or we could retry for ever as long as the Stoppable interface is 
> passed so if another thread has stopped or aborted the hosting service, we'll 
> notice and give up trying.  Doing the latter is probably better than some 
> kinda timeout.
> HBASE-3062 adds a timed retry on the first zk operation.  This issue is about 
> generalizing what is over there across all zk access.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-3065) Retry all 'retryable' zk operations; e.g. connection loss

Reply via email to