Hi Abhishek, We only carry over the fact that the participant hosted that partition. The state of that partition will be reset to initial state( default:OFFLINE). The idea behind this design was to detect resource deletion when the participant was down and inform that participant when it comes up to drop data or any local state associated with that partition. Once the drop notification is handled, it will be removed from current state and external view.
Can you confirm that resetting the state to OFFLINE after restart is a problem in your case. If you really need to avoid this behavior then you can implement preConnectCallback and remove the previous session info. This wont be a problem with future Helix version but you will have to still confirm that old participant is dead. A better way would be provide a way to explicitly specify a flag to not carry over the previous state. Can you please file a jira for this. I can imagine this being useful in various use cases. thanks, Kishore G On Wed, Feb 20, 2013 at 3:58 PM, Abhishek Rai <[email protected]> wrote: > Hi Helix devs, > > Currently, when creating a session for a new participant, Helix carries > over current states of assigned partitions from previous session of the > same participant. I think this may be undesirable for deployments where > Helix session and assigned partitions by the participant are tightly > coupled. Assume that in such a setup, when a participant loses a session, > it also loses all associated partitions. > > In this scenario, when the participant is restarted, and tries to reconnect > to Helix, ZKHelixManager (handleNewSessionAsParticipant) currently "carries > over" assignments from the previous session, which may not reflect true > state of the restarted participant. Is there an easy way to not carry over > the state, in other words, start from scratch with no assigned partitions > ? If not, can you think of any possible workarounds? I'm considering > directly clearing old "current states" from Zookeeper. I'd avoid doing > this for multiple reasons: (1) compatibility with future Helix versions, > (2) complexity: need to make sure old participant is really dead. > > Thanks, > Abhishek >
