GitHub user jiajunwang opened a pull request: https://github.com/apache/helix/pull/103
Fix disconnected zkConnection issue. One issue is found that when zkConnection is using an invalid zookeeper object (null), the callers will get NPE error. Affected Helix components are ZKHelixManager, ZkHelixPropertyStore, and other ZK related classes. For fixing this issue: 1. Override retryUntilConnected() in Helix ZkClient to check the connection before trigger callbacks. This will prevent NPE. But the user will still need to try-catch IllegalStateException and re-create a ZkClient if necessary. 2. For ZKHelixManager, implement handleSessionEstablishmentError to retry establishing a new connection. If the retry fails, Helix invokes a user registered state handler. 3. Add unit test for simulating connection error and test if error handler can recover the connection or trigger user registered callback. You can merge this pull request into a Git repository by running: $ git pull https://github.com/jiajunwang/helix zkFix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/helix/pull/103.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #103 ---- commit 3064c40556afa5afa71c102708c519ffa2ba7b8c Author: Jiajun Wang <jjw...@linkedin.com> Date: 2017-07-25T23:50:22Z Fix disconnected zkConnection issue. One issue is found that when zkConnection may be using an invalid zookeeper object (null). And related calls will get NPE error. Affected Helix components are ZKHelixManager, ZkHelixPropertyStore and other zk related classes. For fixing this issue: 1. Override retryUntilConnected() in Helix ZkClient to check the connection before trigger callbacks. This will prevent NPE. But user will still need to try-catch IllegalStateException, and re-create a ZkClient if necessary. 2. For ZKHelixManager, implement handleSessionEstablishmentError to retry establishing a new connection. If retry fails, Helix invokes a user registered state handler. 3. Add unit test for simulating connection error and test if error handler can recover the connection or trigger user registered callback. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---