Thanks a lot. Will look into it On Fri, Jan 11, 2019 at 6:18 PM Wang Jiajun <[email protected]> wrote:
> Hi Kishore, > > I have sent a pull request to fix the first 2 issues. > https://github.com/apache/helix/pull/297 > As for the 3rd one, it requires a much larger scope of change. And > actually, it does not break any logic now after we fixed the ephemeral node > owner validate logic. We think it can be scheduled for future release. > > Best Regards, > Jiajun > > > On Mon, Jan 7, 2019 at 3:57 PM Wang Jiajun <[email protected]> wrote: > >> Resending. Reply to all. >> >> We can probably fix the first 2 issues within 2 weeks, considering the >> additional test and validation required. >> For issue 1, we can make the original reset into 2 methods. For new >> session handling, we should not interrupt. For client closing, we shall >> interrupt thread and shut down. >> For issue 2, we need to try catch for zookeeper NPE in addition. >> >> Issue 3 will take more time since we need to change both ZkClient and >> event handler. There may be some interfaces need to be updated. Moreover, >> it changes the current ZkClient behavior. So we'd better run it in the test >> environment for a longer time. >> >> With the ephemeral node's owner fixed, the 3rd issue does not impact >> correctness. So maybe we can plan for fixing the first 2 issues first? And >> then plan for the 3rd issue in the next release? If that's the case, we >> shall have a release candidate after 2 weeks. >> >> Best Regards, >> Jiajun >> >> >> On Mon, Jan 7, 2019 at 3:14 PM kishore g <[email protected]> wrote: >> >>> I think the pending issues are the ones that are affecting us. What does >>> it take to fix those issues? >>> >>> On Mon, Jan 7, 2019 at 2:54 PM Wang Jiajun <[email protected]> >>> wrote: >>> >>>> Hi Kishore, >>>> >>>> Hope you are doing well. >>>> Since last time we met to discuss potential ZkClient improvements in >>>> Helix, we have completed the fix of one issue. However, the resolving of >>>> the whole list will take more time, given Pinot is still waiting for the >>>> new release, I'd like to hear your opinion that whether we shall release >>>> 0.8.3 based on the current situation. >>>> >>>> Fixed issues: >>>> >>>> 1. For an Ephemeral node, the source of truth should be the owner >>>> session Id instead of the node content. >>>> This fixes the leader election issue we found in Pinot cluster. >>>> >>>> Pending issues: >>>> >>>> 1. ZkClient should not interrupt the callback handling during >>>> session reestablishment or other reset logic. Interrupt for shutdown >>>> should >>>> only happen when things are closed. For fixing this problem, we need to >>>> think about how to handle thread leaking. >>>> 2. ZkConnection.getZookeeper() == null potentially cause >>>> retryUntilConnect to terminate earlier than expected. Should keep >>>> waiting >>>> for this error. >>>> 3. The ZkClient event should keep a session Id. The event processor >>>> can discard expired event. >>>> >>>> Best Regards, >>>> Jiajun >>>> >>>
