----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13386/#review24821 -----------------------------------------------------------
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java <https://reviews.apache.org/r/13386/#comment48973> typo reqlinquish helix-core/src/test/java/org/apache/helix/TestHelper.java <https://reviews.apache.org/r/13386/#comment48998> replace sysout with logger. - Kishore Gopalakrishna On Aug. 7, 2013, 7:54 p.m., Zhen Zhang wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13386/ > ----------------------------------------------------------- > > (Updated Aug. 7, 2013, 7:54 p.m.) > > > Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi > Lu. > > > Repository: helix-git > > > Description > ------- > > repost rb=13345 > > FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk > callbacks are queued in ZkEventThread. It's possible that we are handling a > FINALIZE callback before all Zk callbacks are cleaned up. This creates race > conditions, for example, in zk session expiry, when a GenericController gets > a FINALIZE callback, it unsubscribes all listeners via > ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes > later, and re-subscribe all listeners, causing zk watcher leaking. > > This is observed by setting up two controllers and expire the leader (by > simulating a long gc). The second controller takes the leadership and add all > listeners, but when the former leader recovers from gc, it gets leftover Zk > callbacks and re-subscribe the live-instance listener hence react to all > live-instance changes. > > We fix this by introducing an expected-notification-type field in > CallbackHandler. The field acts like the state for a CallbackHandler. We then > defines the possible state transitions for a CallbackHandler. For example, > at first CallbackHandler expects an INIT type notification, then it could be > either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then > we expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, > then we expect INIT type. By defining this state machine in CallbackHandler, > we can ignore any notifications of CALLBACK type after we already finalize a > CallbackHandler. CallbackHandler can only receive notifications of CALLBACK > type after it gets re-initialized again. > > > Diffs > ----- > > > helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java > a898160 > helix-core/src/main/java/org/apache/helix/manager/zk/AbstractManager.java > a68e1de > helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java > 533650f > helix-core/src/main/java/org/apache/helix/manager/zk/ControllerManager.java > fa5100e > > helix-core/src/main/java/org/apache/helix/manager/zk/DistributedControllerManager.java > 9017132 > > helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java > ab1da23 > > helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java > d175d46 > > helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManagerHelper.java > ec6bd79 > helix-core/src/test/java/org/apache/helix/TestHelper.java 58dc275 > > helix-core/src/test/java/org/apache/helix/integration/manager/ClusterDistributedController.java > e69de29 > > helix-core/src/test/java/org/apache/helix/integration/manager/TestConsecutiveZkSessionExpiry.java > 0f8a27a > > helix-core/src/test/java/org/apache/helix/integration/manager/TestControllerManager.java > e69de29 > > helix-core/src/test/java/org/apache/helix/integration/manager/TestDistributedControllerManager.java > 319633b > > helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java > a0012f2 > > helix-core/src/test/java/org/apache/helix/integration/manager/TestZkCallbackHandlerLeak.java > cf4b9ad > > Diff: https://reviews.apache.org/r/13386/diff/ > > > Testing > ------- > > all tests pass locally > > > Thanks, > > Zhen Zhang > >
