-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13386/
-----------------------------------------------------------

Review request for helix, Kanak Biscuitwala, Kishore Gopalakrishna, and Shi Lu.


Repository: helix-git


Description
-------

repost rb=13345

FINALIZE callbacks are sent async via CallbackHandler#reset(), while Zk 
callbacks are queued in ZkEventThread. It's possible that we are handling a 
FINALIZE callback before all Zk callbacks are cleaned up. This creates race 
conditions, for example, in zk session expiry, when a GenericController gets a 
FINALIZE callback, it unsubscribes all listeners via ZkClient#unsubscribe(), 
but Zk callbacks leftover in ZkEventThread comes later, and re-subscribe all 
listeners, causing zk watcher leaking. 

This is observed by setting up two controllers and expire the leader (by 
simulating a long gc). The second controller takes the leadership and add all 
listeners, but when the former leader recovers from gc, it gets leftover Zk 
callbacks and re-subscribe the live-instance listener hence react to all 
live-instance changes.

We fix this by introducing an expected-notification-type field in 
CallbackHandler. The field acts like the state for a CallbackHandler. We then 
defines the possible state transitions for a CallbackHandler. For example,  at 
first CallbackHandler expects an INIT type notification, then it could be 
either CALLBACK type or FINALIZE type. If a CALLBACK type is received, then we 
expect either CALLBACK or FINALIZE type. If a FINALIZE type is received, then 
we expect INIT type. By defining this state machine in CallbackHandler, we can 
ignore any notifications of CALLBACK type after we already finalize a 
CallbackHandler. CallbackHandler can only receive notifications of CALLBACK 
type after it gets re-initialized again.


Diffs
-----

  
helix-core/src/main/java/org/apache/helix/controller/GenericHelixController.java
 a898160 
  helix-core/src/main/java/org/apache/helix/manager/zk/AbstractManager.java 
a68e1de 
  helix-core/src/main/java/org/apache/helix/manager/zk/CallbackHandler.java 
533650f 
  helix-core/src/main/java/org/apache/helix/manager/zk/ControllerManager.java 
fa5100e 
  
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedControllerManager.java
 9017132 
  
helix-core/src/main/java/org/apache/helix/manager/zk/DistributedLeaderElection.java
 ab1da23 
  helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManager.java 
d175d46 
  
helix-core/src/main/java/org/apache/helix/manager/zk/ParticipantManagerHelper.java
 ec6bd79 
  helix-core/src/test/java/org/apache/helix/TestHelper.java 58dc275 
  
helix-core/src/test/java/org/apache/helix/integration/manager/ClusterDistributedController.java
 e69de29 
  
helix-core/src/test/java/org/apache/helix/integration/manager/TestConsecutiveZkSessionExpiry.java
 0f8a27a 
  
helix-core/src/test/java/org/apache/helix/integration/manager/TestControllerManager.java
 e69de29 
  
helix-core/src/test/java/org/apache/helix/integration/manager/TestDistributedControllerManager.java
 319633b 
  
helix-core/src/test/java/org/apache/helix/integration/manager/TestParticipantManager.java
 a0012f2 
  
helix-core/src/test/java/org/apache/helix/integration/manager/TestZkCallbackHandlerLeak.java
 cf4b9ad 

Diff: https://reviews.apache.org/r/13386/diff/


Testing
-------

all tests pass locally


Thanks,

Zhen Zhang

Reply via email to