I'm now wondering if this is a ZooKeeper bug? In your original timeline you say that Client B sees its original ephemeral node that should have been deleted. ZooKeeper is taking time to delete that ZNode? A simple fix would be to try to write something to that ZNode. I would hope that ZooKeeper would reject the write on the now expired ZNode. So, the lock recipe would be enhanced so that it tries to do a setData on the node before it accepts that is has the lock. Just a thought.
> On Jul 30, 2021, at 9:54 AM, Jordan Zimmerman <[email protected]> > wrote: > > Even more... actually, this should already work right? It's been a long time > since I looked at this code so I wrote a quick test. wait() will already > exits on Connection loss. I apologize, but what I said below is not correct. > The LockInternals watcher gets called on connection loss and interrupts the > thread already via notifyAll(). So, again, sorry for the noise. > > I'll go over the scenario again to see if I have more comments. > > -Jordan > >> On Jul 30, 2021, at 9:18 AM, Jordan Zimmerman <[email protected] >> <mailto:[email protected]>> wrote: >> >> Actually... I'm looking at LockInternals and it already has a watcher set on >> the previous node before it calls wait(). That watcher will get called when >> there's a connection problem. It would be pretty easy to add something in >> that watcher to interrupt the waiting thread. This could be a pretty simple >> change actually. >> >> Watcher set just before waiting: >> https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 >> >> <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301> >> The Watcher: >> https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 >> >> <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64> >> >> So, off the top of my head, an AtomicReference<Thread> field could be added >> to LockInternals. It gets set to the current thread just before waiting. If >> the Watcher is called with a connection event and the AtomicReference isn't >> empty, interrupt the thread. This will cause wait() to throw >> InterruptedException and then the lock node will be deleted. >> >> -Jordan >> >>> On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Making lock code watch its lock node it has created >>> >>> I don't think this is useful. Once the lock method returns the lock code >>> can't notify the caller/user that the lock node has disappeared. We'd need >>> some kind of additional class "lock watcher" or something that client code >>> would have to periodically call to ensure that it still has the lock. tbh - >>> this has been discussed a lot over the years on the ZooKeeper/Curator >>> channels. Just because you have a ZNode doesn't mean it's 100% valid. You >>> should still practice optimistic locks, etc. But, maybe we can reduce the >>> edge cases with this lock watcher or some other mechanism. >>> >>> Session ID changes after wait exits it suggests the lock node should no >>> longer be trusted and it should be deleted and re-created >>> >>> This is a simple change. But, I don't know if it would help much. It would >>> be better to interrupt waiting lockers when the connection is lost. If an >>> internal mechanism forced any locks blocked in wait() to throw an exception >>> then those lock threads will already delete their ZNodes and try again. >>> This would be the best solution maybe? So, before the lock code goes to >>> wait() it sets a ConnectionStateListener (or something similar) that will >>> interrupt the thread when there are connection problems. >>> >>> Leader Latch code is well protected >>> >>> Yes - the leader recipes are better at this. Maybe there's an opportunity >>> to merge/change code. We could consider deprecating the lock recipes in >>> favor of these? >>> >>> -Jordan >>> >>>> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal >>>> <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Hi Jordan, >>>> >>>> Thanks for the suggestions. Much helpful. >>>> >>>> Quick summary of my below comments, >>>> Added test case simulator to reproduce issue (though artificially) >>>> Added a proposed code just to get your review and suggestions to see >>>> whether that would work >>>> Please find my detailed comments inline, >>>> >>>> > I think there's generally a hole in Curator's lock recipe. The lock code >>>> > does not watch the lock node it has created. So, another process or (as >>>> > you found) a race with the server might cause the lock node to disappear >>>> > underneath the lock instance after it thinks it has the lock. One thing >>>> > we can do is to check the session ID before waiting in >>>> > LockInternals.internalLockLoop(). If the session ID changes after wait >>>> > exits it suggests the lock node should no longer be trusted and it >>>> > should be deleted and re-created. That's 1 though anyway. >>>> >>>> Yes, you are right. Currently lock code doesn’t watch the lock node it has >>>> created. That’s where we would like to hear your feedback on our proposal >>>> mentioned in our first mail thread >>>> “Curator lock code has makeRevocable() API that enables application to >>>> revoke lock anytime from application by triggering NODE_CHANGE event >>>> through Revoker.attemptToRevoke() utility. >>>> Proposal: Would it be nice to extend makeRevocable() API to handle Node >>>> delete event, which would allow application to register the watcher for >>>> Node delete event, thereby application can react to Node delete event by >>>> revoking the lock ?”. Tried the Code snippet >>>> (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 >>>> >>>> <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60> >>>> ) to see the approach “Making lock code watch its lock node it has >>>> created” works and would like to get your thoughts on this and do you see >>>> any other impacts there? May need your advice. >>>> >>>> I agree with you on that the session id approach, that would protect us >>>> from seeing this issue. The reason why I thought your first approach >>>> “making lock code watch it has created” could be more protective is this >>>> could protect us from any inconsistencies between original lock node and >>>> local lock state. >>>> >>>> > It would be nice to generate a test/simulation for this case so that it >>>> > can be properly dealt with. >>>> >>>> Created a test case simulator >>>> https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java >>>> >>>> <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java> >>>> >>>> Approach #1 (Artificial way of reproducing the issue) >>>> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). >>>> This would delete the lock node after a pause just to simulate the state >>>> where the zkClient had reconnected and it still happens to see the >>>> ephermal node just before the server deletes it since its session has >>>> expired, but the node is deleted afterwards by the server. >>>> Approach #2 (A little manual interruption needed to reproduce the issue) >>>> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to >>>> false) >>>> Artificially delaying / pausing the ephemeral lock nodes deletion as part >>>> of session cleanup process in server code (ZookeeperServer.close() method) >>>> After a pause (say 5s) to make one of the instance to acquire the lock, >>>> Artificially break the socket connection between client and server for 30s >>>> (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, >>>> we would see session closing logs logged in server code >>>> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume >>>> both Thread 2 and Thread 3 >>>> After that, resume server thread (thread name would be “SessionTracker” >>>> Below Proposals discussed so far (3rd added now for your review) >>>> Making lock code watch its lock node it has created >>>> Session ID changes after wait exits it suggests the lock node should no >>>> longer be trusted and it should be deleted and re-created >>>> Leader Latch code is well protected to cover this zookeeper race >>>> condition, because Leader Latch code internally handle the connection >>>> events (which they use to interrupt latch_acquire_state to reset its latch >>>> every time connection is reconnected), means it will explicitly release >>>> the latch and recreate new if there is a connection disconnect (may be >>>> this can be the approach that lock recipe could use to protect ?) >>>> Many Thanks, >>>> Viswa. >>>> >>>> From: Jordan Zimmerman <[email protected] >>>> <mailto:[email protected]>> >>>> Date: Tuesday, 20 July 2021 at 21:11 >>>> To: Viswanathan Rajagopal <[email protected] >>>> <mailto:[email protected]>> >>>> Cc: Sean Gibbons <[email protected] >>>> <mailto:[email protected]>>, [email protected] >>>> <mailto:[email protected]> <[email protected] >>>> <mailto:[email protected]>>, [email protected] >>>> <mailto:[email protected]> <[email protected] >>>> <mailto:[email protected]>>, [email protected] >>>> <mailto:[email protected]> <[email protected] >>>> <mailto:[email protected]>>, Donal Arundel >>>> <[email protected] <mailto:[email protected]>>, Francisco >>>> Goncalves <[email protected] >>>> <mailto:[email protected]>>, Zak Szadowski >>>> <[email protected] <mailto:[email protected]>>, Dandie >>>> Beloy <[email protected] <mailto:[email protected]>>, >>>> Marcelo Cenerino <[email protected] >>>> <mailto:[email protected]>> >>>> Subject: Re: [External Sender] Double Locking Issue >>>> >>>> In our case, it wouldn’t throw any exception because it had gone past >>>> “creating lock nodes” and was blocked on wait(), which would only then be >>>> interrupted when curator watcher notified on previous sequence node delete >>>> event. >>>> >>>> So, you're using the version of acquire() without a timeout? In any event, >>>> this is a problem. When you receive SUSPENDED you really should interrupt >>>> any threads that are waiting on Curator. The Curator docs imply this even >>>> though it might not be obvious. This is likely the source of your >>>> problems. A simple solution is to use the version of acquire that has a >>>> timeout and repeatedly call it until success (though your problem may >>>> still occur in this case). Maybe we could improve the lock recipe (or >>>> something similar) so that locks inside of acquire are interrupted on a >>>> network partition. >>>> >>>> [snip of your remaining thoughtful analysis] >>>> >>>> I think there's generally a hole in Curator's lock recipe. The lock code >>>> does not watch the lock node it has created. So, another process or (as >>>> you found) a race with the server might cause the lock node to disappear >>>> underneath the lock instance after it thinks it has the lock. >>>> >>>> One thing we can do is to check the session ID before waiting in >>>> LockInternals.internalLockLoop(). If the session ID changes after wait >>>> exits it suggests the lock node should no longer be trusted and it should >>>> be deleted and re-created. That's 1 though anyway. It would be nice to >>>> generate a test/simulation for this case so that it can be properly dealt >>>> with. >>>> >>>> -Jordan >>>> >>>> >>>> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal >>>> <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> Thanks Jordan for coming back on this. >>>> >>>> Please find my inline comments. Also provided below few additional version >>>> info that we are using, >>>> Curator version – 2.5.0 >>>> Zookeeper version – 3.5.3 >>>> >>>> > I don't see how this is the case. If the client has received a network >>>> > partition it shouldn't not consider any locks as being currently held >>>> > (see the Tech Notes I mentioned in my previous email). If there is a >>>> > partition during a call to acquire(), acquire() would throw an exception >>>> > (once the retry policy has expired). BTW, what retry policy are you >>>> > using? >>>> >>>> In our case, it wouldn’t throw any exception because it had gone past >>>> “creating lock nodes” and was blocked on wait(), which would only then be >>>> interrupted when curator watcher notified on previous sequence node delete >>>> event. >>>> >>>> > This isn't correct. Curator watches the previous node but lock >>>> > acquisition is always based on calling ZK's getChildren, sorting them, >>>> > and seeing if the lock's node is the first node in the sequence. If >>>> > Ephemeral nodes aren't cleaned up it wouldn't be a problem. Any phantom >>>> > ephemeral nodes would sort first and prevent Curator from believing it >>>> > holds the lock. >>>> >>>> Totally agreed on your point on lock acquisition. The problem that I was >>>> trying to explain is that the network partition occurs while the client is >>>> waiting for the lock (in the acquire()) but after the acquire code block >>>> has already written out the lock node successfully, so the client code >>>> will be blocking in the acquire and not get a connection exception. Once >>>> the previous lock node is deleted but not the lock node for this client >>>> then this client will leave the acquire thinking it has the lock but >>>> actually at this point the ZK server has expired the session and is in the >>>> process of deleting our clients lock node. >>>> >>>> Trying to categorize the timeline events below to see if that would help >>>> us for better understanding, >>>> [ CONNECTED ] : >>>> Client A & Client B calls acquire creating Node N1 & N2 respectively >>>> Client A acquire() -> holds the lock as its Node N1 is first node in >>>> sequence >>>> Client B acquire() -> created Node N2, watches for previous sequence node >>>> (blocked on wait()) >>>> >>>> [ SUSPENDED CONNECTION OCCURS ] : >>>> Network partition happens >>>> Curator fires SUSPENDED >>>> Client A, on receiving SUSPENDED event, attempts to release the lock >>>> Client B, on receiving SUSPENDED event, does nothing as it was not holding >>>> any lock (as blocked on acquire() -> wait() ) >>>> >>>> [ SESSION EXPIRED ] : >>>> Client A attempts to release the lock (with retries) >>>> Client B does nothing as it was not holding any lock (and is still blocked >>>> on acquire() -> wait() ) >>>> Server prepared to cleanup previous client session and deleting lock nodes >>>> created as part of previous client session >>>> >>>> [ RECONNECTED ] : >>>> Client A and Client B reconnected to another server with new session id >>>> (Note : This happens before server managed to cleanup / delete ephemeral >>>> nodes of previous client session) >>>> Client A released the lock successfully (means it would delete its lock >>>> node N1) and attempts to acquire lock by creating lock node N3 and watches >>>> for previous sequence node (N2) >>>> Client B who was blocked on acquire() -> wait() would be notified with >>>> previous sequence node (N1) deletion -> getChildren, sorting them, and >>>> seeing if the lock's node is the first node in the sequence. So, Client B >>>> sees its lock node N2 still (which I call it as about to be deleted node >>>> by server) and thereby acquires the lock >>>> >>>> [ AFTER FEW SECONDS ] : >>>> Server managed to delete the ephemeral node N2 as part of previous client >>>> session cleanup >>>> Client A who was blocked on acquire() -> wait() would be notified with >>>> previous sequence node (N2) deletion -> getChildren, sorting them, and >>>> seeing if the lock's node is the first node in the sequence and thereby >>>> acquires the lock >>>> Client B – its local lock thread data went stale (as its lock path N2 not >>>> has been deleted by Server) >>>> >>>> > SUSPENDED in Curator means only that the connect has been lost, not >>>> > that the session has ended. >>>> LOST is the state that means the session has ended >>>> Be aware of how GCs can affect Curator. See the Tech Note here: >>>> https://cwiki.apache.org/confluence/display/CURATOR/TN10 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=> >>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>> >>>> Also read this Tech Note on session handling: >>>> https://cwiki.apache.org/confluence/display/CURATOR/TN14 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=> >>>> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 >>>> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>> >>>> >>>> Very good information on the tech notes. Thanks for that. Agreed, it’s >>>> always recommended to clear the locks when it sees SUSPENDED event. And we >>>> are already following your recommendation. Our application would clear the >>>> lock when it sees SUSPENDED event. >>>> >>>> To summarize my thoughts, >>>> Ephemeral node created by previous session was still seen by client that >>>> reconnected with new session id until server cleans that up the previous >>>> session ephemeral node. This could happen if client manage to reconnect >>>> with server with new session id before server cleans up the previous >>>> session. How it affects the curator lock recipe ? Explanation : The above >>>> explained race condition would make acquire() to hold the lock ( as its >>>> own lock node still seen ), eventually leading to inconsistent state (i.e. >>>> curator local lock state stale) when that lock node is being cleaned up by >>>> the server as part of previous session cleanup activities. >>>> I am NOT seeing a Curator bug here, but looking out for suggestions / >>>> recommendations in handling this zookeeper race condition. Either can a >>>> feature be added to Curator to cover this case / any recommendations that >>>> clients should follow. >>>> >>>> Many Thanks, >>>> Viswa. >>>> >>>> From: Sean Gibbons <[email protected] >>>> <mailto:[email protected]>> >>>> Date: Tuesday, 20 July 2021 at 12:19 >>>> To: Viswanathan Rajagopal <[email protected] >>>> <mailto:[email protected]>> >>>> Subject: FW: [External Sender] Re: Double Locking Issue >>>> >>>> >>>> >>>> On 20/07/2021, 10:12, "Jordan Zimmerman" <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> A few more things... >>>> >>>> > Based on our initial analysis and few test runs, we saw that Curator >>>> acquire() method acquires the lock based on “about to be deleted lock node >>>> of previous session”. Explanation : Ephemeral node created by previous >>>> session was still seen by client that reconnected with new session id >>>> until server cleans that up. If this happens, Curator acquire() would hold >>>> the lock. >>>> >>>> This isn't correct. Curator watches the previous node but lock >>>> acquisition is always based on calling ZK's getChildren, sorting them, and >>>> seeing if the lock's node is the first node in the sequence. If Ephemeral >>>> nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral >>>> nodes would sort first and prevent Curator from believing it holds the >>>> lock. >>>> >>>> > * On the above mentioned race condition, if client manage to >>>> reconnect to server with new session id before server cleans up the >>>> ephemeral nodes of client’s previous session, Curator lock acquire() who >>>> is trying to acquire the lock will hold the lock as it still sees the lock >>>> node in zookeeper directory. Eventually server would be cleaning up the >>>> ephemeral nodes leaving the Curator local lock thread data stale giving >>>> the illusion that it still hold the lock while its ephemeral node is gone >>>> >>>> I don't see how this is the case. If the client has received a network >>>> partition it shouldn't not consider any locks as being currently held (see >>>> the Tech Notes I mentioned in my previous email). If there is a partition >>>> during a call to acquire(), acquire() would throw an exception (once the >>>> retry policy has expired). BTW, what retry policy are you using? >>>> >>>> So, to reiterate, I don't see how phantom/undeleted ephemeral nodes >>>> would cause a problem. The only problem it could case is that a given >>>> Curator client takes longer to acquire a lock as it waits for those >>>> ephemerals to finally get deleted. >>>> >>>> -JZ >>>> >>>> > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal >>>> <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> > >>>> > Hi Team, >>>> > >>>> > Good day. >>>> > >>>> > Recently came across “Double Locking Issue (i.e. two clients >>>> acquiring lock)” using Curator code ( InterProcessMutex lock APIs ) in our >>>> application >>>> > >>>> > Our use case: >>>> > >>>> > * Two clients attempts to acquire the zookeeper lock using >>>> Curator InterProcessMutex and whoever owns it would release it once sees >>>> the connection disconnect ( on receiving Connection.SUSPENDED / >>>> Connection.LOST Curator Connection Events from Connection listener) >>>> > >>>> > Issue we noticed: >>>> > >>>> > * After session expired & reconnected with new session, both >>>> client seems to have acquired the lock. Interesting thing that we found is >>>> that one of the clients still holds the lock while its lock node >>>> (ephemeral) was gone >>>> > >>>> > Things we found: >>>> > >>>> > * Based on our initial analysis and few test runs, we saw that >>>> Curator acquire() method acquires the lock based on “about to be deleted >>>> lock node of previous session”. Explanation : Ephemeral node created by >>>> previous session was still seen by client that reconnected with new >>>> session id until server cleans that up. If this happens, Curator acquire() >>>> would hold the lock. >>>> > >>>> > >>>> > >>>> > * Clearly we could see the race condition (in zookeeper code) >>>> between 1). Client reconnecting to server with new session id and 2). >>>> server deleting the ephemeral nodes of client’s previous session. We were >>>> able to reproduce this issue using the following approach, >>>> > * Artificially break the socket connection between client and >>>> server for 30s >>>> > * Artificially pausing the set of server codes for a min and >>>> unpause >>>> > >>>> > >>>> > * On the above mentioned race condition, if client manage to >>>> reconnect to server with new session id before server cleans up the >>>> ephemeral nodes of client’s previous session, Curator lock acquire() who >>>> is trying to acquire the lock will hold the lock as it still sees the lock >>>> node in zookeeper directory. Eventually server would be cleaning up the >>>> ephemeral nodes leaving the Curator local lock thread data stale giving >>>> the illusion that it still hold the lock while its ephemeral node is gone >>>> > >>>> > >>>> > * Timeline events described below for better understanding, >>>> > * At t1, Client A and Client B establishes zookeeper session >>>> with session id A1 and B1 respectively >>>> > * At t2, Client A creates the lock node N1 & acquires the lock >>>> > * At t3, Client B creates the lock node N2 & blocked on >>>> acquire() to acquire the lock >>>> > * At t4, session timed out for both clients & server is about >>>> to clean up the old session • Client A trying to release the lock >>>> > * At t5, Client A and Client B reconnects to server with new >>>> session id A2 and B2 respectively before server deletes the ephemeral node >>>> N1 & N2 of previous client session. Client A releases the lock, deleting >>>> N1 and trying to acquire it again by creating N3 node and Client B who is >>>> blocked on acquire() acquires the lock based on N2 (about to be deleted >>>> node created by previous session) >>>> > * At t6, server cleans up the ephemeral node N2 created by >>>> Client B’s previous session. Client A acquires the lock with node N3 as >>>> its previous sequence N2 gets deleted, whereas Client B who incorrectly >>>> acquired the lock at t5 timeline still holds the lock >>>> > >>>> > >>>> > Note: >>>> > >>>> > * We are not certain that this race condition that we noticed in >>>> zookeeper code is intentional design. >>>> > >>>> > Questions: >>>> > >>>> > * Given this race condition seen in zookeeper code, we would like >>>> to hear your recommendations / suggestions to avoid this issue while using >>>> Curator lock code? >>>> > >>>> > >>>> > >>>> > * We also see that Interprocess Mutex has the makeRevocable() API >>>> that enables application to revoke lock, but that handles Node change >>>> event only but not for Node deleted event. I understand that this makes >>>> sense to handle Node change event alone, as to enable application user to >>>> revoke lock externally from application side. But would it be also to okay >>>> to have one for Node delete event, so as application can register the >>>> listener for Node delete event. I would like to hear your thoughts. >>>> > >>>> > >>>> > Many Thanks, >>>> > Viswa >>>> >>> >> >
