EdColeman commented on issue #2689: URL: https://github.com/apache/accumulo/issues/2689#issuecomment-1175345108
I am not sure that clearing twice is an issue. The state transitions of ZooKeeper seemed to indicate that you can get a disconnect and then reconnect without a close. Close is a terminal state. As far a ready monitor itself - it is meant to block without hold locks - it functions as a barrier so that when known disconnected everyone is not banging on ZooKeeper with retries. There is an issue that code could get past the barrier, and then ZooKeeper connection is lost and I don't think that it can ever be prevented. Is should be a small window, but it will always be there. The issue would then be if it gets past the barrier and connection is lost is it handling it correctly. If the code proceeds past the barrier and calls get() and the connection fails , the retry will eventually return null. On the null, then that should trigger an error / exception. Currently I think that we do not handle connection loss as cleanly as we could - if a cluster tries to do a rolling ZK restart, there are a lot of tservers that drop offline. This does not fully fix that, but should get things closer and then the other problem areas can be addressed separately. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
