EdColeman commented on issue #2689:
URL: https://github.com/apache/accumulo/issues/2689#issuecomment-1175345108

   I am not sure that clearing twice is an issue.  The state transitions of 
ZooKeeper seemed to indicate that you can get a disconnect and then reconnect 
without a close.  Close is a terminal state.
   
   As far a ready monitor itself - it is meant to block without hold locks - it 
functions as a barrier so that when known disconnected everyone is not banging 
on ZooKeeper with retries.  There is an issue that code could get past the 
barrier, and then ZooKeeper connection is lost and I don't think that it can 
ever be prevented.  Is should be a small window, but it will always be there.  
The issue would then be if it gets past the barrier and connection is lost is 
it handling it correctly.
   
   If the code proceeds past the barrier and calls get() and the connection 
fails , the retry will eventually return null.  On the null, then that should 
trigger an error / exception.  
   
   Currently I think that we do not handle connection loss as cleanly as we 
could - if a cluster tries to do a rolling ZK restart, there are a lot of 
tservers that drop offline.  This does not fully fix that, but should get 
things closer and then the other problem areas can be addressed separately.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to