[ 
https://issues.apache.org/jira/browse/HBASE-6523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13431238#comment-13431238
 ] 

nkeywal commented on HBASE-6523:
--------------------------------

I agree, Zookeeper list comes handy for these questions :-).

To me, to be validated by ZK experts, ConnectionLoss means that we lost the 
connection, but we hope it will come back. When it comes back, we receive all 
the events, and there should be no data loss. While for a SessionTimeout, we 
may have lost events, so we should re-initiate the watchers and, from an 
application point of view, take into account that we may have missed events in 
the middle.

The way we manage session timeouts in HBase/RecoverableZK is tricky: we retry, 
because we expect that a parallel abort will have triggered a zk session 
recreation, so our next retry will be on a brand new ZK session (and ZooKeeper 
object in the RecoverableZK ) and so it will work.

As we retry a limited amount of time in the RecovableZK, for connectionLoss we 
may stop to retry before the timeout is happening, and throw the exception to 
the calling layer. As such it may becoming a unrecovable error from an HBase 
point of view. I think that if we want to fix this we should change 
RecoverableZooKeeper to make it retry all the time for a connectionLoss, 
waiting for the session timeout to occur. May be as well we have calls not 
using the recovable ZK (if I'm remember well I've seen a few, and is was 
justified I believe). But we should not re create a session for a connection 
loss (it could have bad side effects with ZK having to manage too many 
sessions, the old and the new, for example).

                
> HConnectionImplementation still does not recover from all ZK issues.
> --------------------------------------------------------------------
>
>                 Key: HBASE-6523
>                 URL: https://issues.apache.org/jira/browse/HBASE-6523
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>         Attachments: 6523.txt
>
>
> During some testing here at Salesforce.com we found another scenario where an 
> HConnectionImplementation would never recover from a lost ZK connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to