hey Viswa,
Sorry, cut and paste error on my part.

This PR here;

https://github.com/apache/curator/pull/398

Looks like it may be fixing at least a similar problem. I'll try and take a
look in more detail when I get a minute, but my time for Curator is
currently very limited.
cheers

On Wed, Nov 3, 2021 at 2:25 PM Viswanathan Rajagopal <
viswanathan.rajag...@workday.com> wrote:

> Hi Cam,
>
> Thanks for getting back
>
>
>
> Yes, that was me who had opened Curator Jira. I had raised Curator Jira
> initially, but since there were no responses, thought to open a
> conversation on the same.
>
>
>
> I have also referenced this Jira link in my original conversation below
>
>
>
> Many Thanks,
>
> Viswa
>
>
>
> *From: *Cameron McKenzie <cammcken...@apache.org>
> *Date: *Tuesday, 2 November 2021 at 21:25
> *To: *dev@curator.apache.org <dev@curator.apache.org>
> *Cc: *u...@curator.apache.org <u...@curator.apache.org>
> *Subject: *[External Sender] Re: Double Leadership Issue
>
> hey Viswa,
> I haven't had a chance to look at it in any detail yet, but
> superficially it sounds like it has some similarities to this PR?
>
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CURATOR-2D620&d=DwIFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjVR9Zfxevy2QbIqXzpZz32m&m=4quNit2CApic0UneDxdldPSbKfjBRrFPluHQspXgUQt1HBy_V319jPgxWrKsYi76&s=DfAE8YU4ITE_OOcDW9R_uI5yK3Z-zDSl1gXGpsLiK9Y&e=
>
> cheers
> Cam
>
>
> On Tue, Nov 2, 2021 at 10:48 PM Viswanathan Rajagopal
> <viswanathan.rajag...@workday.com.invalid> wrote:
>
> > Hello Team,
> >
> > Greetings!
> > Any update on the below mentioned observation?
> >
> > Many Thanks,
> > Viswa
> >
> > From: Viswanathan Rajagopal <viswanathan.rajag...@workday.com.INVALID>
> > Date: Wednesday, 27 October 2021 at 16:15
> > To: dev@curator.apache.org <dev@curator.apache.org>,
> > u...@curator.apache.org <u...@curator.apache.org>
> > Subject: [External Sender] Double Leadership Issue
> > Hello Team,
> >
> > Greetings!
> > While using Curator Leader Latch Recipe in our application,  we observed
> a
> > potential issue where two clients have become a leader. Raised a Jira on
> > the same for your reference (Jira Link :
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CURATOR-2D620&d=DwIF-g&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=nFB4puWyvVe8eRiZ3oi_C8Ao1WkqCb9wsonPrIl3LY8&s=3LDys_XJLYEnQ0_K3auTUo8DsOom0xZAMAC7ASgkt0A&e=
> > )
> > Quick summary of below description
> >
> >   *   Our use case explained
> >   *   Issue details
> >   *   Timeline of events mentioned
> >   *   Attached test code to reproduce the reported issue
> >   *   Possible solution given, where we need your suggestions
> > Our use case:
> >
> >   *   Two clients trying to get the leadership using Curator Leader Latch
> > Recipe. On LeaderLatchListener.isLeader() Client would become a leader
> and
> > on LeaderLatchListener.notLeader() Client would lose its leadership
> > Issue details:
> >
> >   *   One of the clients on receiving two CuratorConnectionListener
> > RECONNECTED events in quick succession, we observed that LeaderLatch
> > EventThreads interleave with each other, resulting in "latch node
> deletion"
> > happen after "client becoming a leader", thereby the client will still
> be a
> > leader though its corresponding latch node has been deleted
> >   *   And the other client who tried to get leadership creates its latch
> > node and sees itself in first index and thus become a leader
> >   *   So at this point, two clients have become a leader
> >
> > Timeline of events:
> >
> >   *   Timeline events of Client A whose corresponding latch node is
> > deleted but still be a leader
> >      *   At t1, 1st RECONNECTED event fired
> >      *   At t2, [ EventThread of 1st RECONNECTED event ] Resets
> leadership
> > (true -> false)
> >      *   At t3, [ EventThread of 1st RECONNECTED event ] Fire
> > “listener.notLeader()”
> >      *   At t4, [ EventThread of 1st RECONNECTED event ] Deletes latch
> node
> >      *   At t5, [ EventThread of 1st RECONNECTED event ] Creates new
> latch
> > node
> >      *   At t6, 2nd RECONNECTED event fired
> >      *   At t7, [ EventThread of 2nd RECONNECTED event ] Resets
> leadership
> > (false -> false), Basically NOP
> >      *   At t8, [ EventThread of 2nd RECONNECTED event ] Fire nothing.
> > Basically NOP
> >      *   At t9, [ EventThread of 1st RECONNECTED event ] Get children ->
> > sort them -> check leadership -> Set leadership to true -> Fire “Has
> become
> > a leader” leader listener event
> >      *   At t10, [ EventThread of 2nd RECONNECTED event ] Delete latch
> > node (which actually deletes the latch node with which the Client A has
> > become a leader through previous step)
> >
> >   *   Timeline events of Client B who also become a leader
> >      *   At t11, Client B creates its latch node -> Get children -> sort
> > them -> check leadership -> Set leadership to true -> Fire “Has become a
> > leader” leader listener event
> >
> > This ends up in a situation where both Client A and Client B have become
> a
> > leader
> >
> > As we observe, over the period (t8 -> t10), Client A’s LeaderLatch
> > EventThreads interleave with each other causing leadership latch node
> > deleted but local state still assumes that it’s a leader
> >
> > Reproducing the issue:
> >
> >   *   Wrote a Junit test case firing an artificial curator connection
> > reconnected events and simulated LeaderLatch EventThreads interleave
> > through CountDownLatches
> >   *   Test simulator for 2.5.0:
> >      *
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ViswaNXplore_curator_commit_6a78a3a0de032212175d80caa64f140c743219ae&d=DwIF-g&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=nFB4puWyvVe8eRiZ3oi_C8Ao1WkqCb9wsonPrIl3LY8&s=tveG7d6kAd8SeywmuCN7zyd1ufTvARJdEEc0gxTs2rU&e=
> >      *
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ViswaNXplore_curator_commit_d2b1b33a6885c05619c058aa2bee63962fd6fa08&d=DwIF-g&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=nFB4puWyvVe8eRiZ3oi_C8Ao1WkqCb9wsonPrIl3LY8&s=jixCmfLZiaseXsSWihiUiYMw8cj5cDg1O6gLFJY3kKg&e=
> >   *   Test Simulator for latest Curator version:
> >      *
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ViswaNXplore_curator_commit_0949137f7323a1d5f34afc85a7042e8d9e85a8bc&d=DwIF-g&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=nFB4puWyvVe8eRiZ3oi_C8Ao1WkqCb9wsonPrIl3LY8&s=bzLny0aqbqUHmvLwkWyLdIySm65swqv2rAT1Kn0MKJ0&e=
> >      *
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ViswaNXplore_curator_commit_1aadd4b5dbc8811a2e7a49b92f29170333e8ba4a&d=DwIF-g&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=nFB4puWyvVe8eRiZ3oi_C8Ao1WkqCb9wsonPrIl3LY8&s=GTlqqRRRB_P5y_f1tRSRxv1HZvVjhwFHtlogEk47LAU&e=
> >
> > Possible Solution (where we would like to hear your
> thoughts/suggestions):
> >
> >   *   The current curator code during reset() does
> >      *   setLeadership(false) first followed by
> >      *   setNode(null) (i.e. deleting its latch node)
> >
> >   *   Swapping these two should resolve the issue, as we setting
> > leadership to false once after its latch node gets deleted
> >      *   setNode(null) (i.e. deleting its latch node) first followed by
> >      *   setLeadership(false)
> >
> > Many Thanks,
> > Viswa
> >
>

Reply via email to