----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13345/ -----------------------------------------------------------
(Updated Aug. 6, 2013, 8:18 p.m.) Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi Lu. Description ------- FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk callbacks are queued in ZkEventThread. It's possible that we are handling a FINALIZE callback before all Zk callbacks are cleaned up. This creates race conditions, for example, in zk session expiry, when a GenericController gets a FINALIZE callback, it unsubscribes all listeners via ZkClient#unsubscribe(), but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all listeners, causing zk watcher leaking. This is observed by setting up two controllers and expire the leader (by simulating a long gc). The second controller takes the leadership and add all listeners, but when the former leader recovers from gc, it gets leftover Zk callbacks and re-subscribe the live-instance listener hence react to all live-instance changes. We fix this by introducing an expected-notification-type field in CallbackHandler. The field acts like the state for a CallbackHandler. We then defines the possible state transitions for a CallbackHandler. For example, at first CallbackHandler expects an INIT type notification, then it could be either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then we expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, then we expect INIT type. By defining this state machine in CallbackHandler, we can ignore any notifications of CALLBACK type after we already finalize a CallbackHandler. CallbackHandler can only receive notifications of CALLBACK type after it gets re-initialized again. Diffs ----- Diff: https://reviews.apache.org/r/13345/diff/ Testing ------- all tests passed locally File Attachments (updated) ---------------- helix-diff https://reviews.apache.org/media/uploaded/files/2013/08/06/helix.diff Thanks, Zhen Zhang
