Actually... I'm looking at LockInternals and it already has a watcher set on the previous node before it calls wait(). That watcher will get called when there's a connection problem. It would be pretty easy to add something in that watcher to interrupt the waiting thread. This could be a pretty simple change actually.
Watcher set just before waiting: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L301> The Watcher: https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64 <https://github.com/apache/curator/blob/master/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L64> So, off the top of my head, an AtomicReference<Thread> field could be added to LockInternals. It gets set to the current thread just before waiting. If the Watcher is called with a connection event and the AtomicReference isn't empty, interrupt the thread. This will cause wait() to throw InterruptedException and then the lock node will be deleted. -Jordan > On Jul 30, 2021, at 9:11 AM, Jordan Zimmerman <[email protected]> > wrote: > > Making lock code watch its lock node it has created > > I don't think this is useful. Once the lock method returns the lock code > can't notify the caller/user that the lock node has disappeared. We'd need > some kind of additional class "lock watcher" or something that client code > would have to periodically call to ensure that it still has the lock. tbh - > this has been discussed a lot over the years on the ZooKeeper/Curator > channels. Just because you have a ZNode doesn't mean it's 100% valid. You > should still practice optimistic locks, etc. But, maybe we can reduce the > edge cases with this lock watcher or some other mechanism. > > Session ID changes after wait exits it suggests the lock node should no > longer be trusted and it should be deleted and re-created > > This is a simple change. But, I don't know if it would help much. It would be > better to interrupt waiting lockers when the connection is lost. If an > internal mechanism forced any locks blocked in wait() to throw an exception > then those lock threads will already delete their ZNodes and try again. This > would be the best solution maybe? So, before the lock code goes to wait() it > sets a ConnectionStateListener (or something similar) that will interrupt the > thread when there are connection problems. > > Leader Latch code is well protected > > Yes - the leader recipes are better at this. Maybe there's an opportunity to > merge/change code. We could consider deprecating the lock recipes in favor of > these? > > -Jordan > >> On Jul 29, 2021, at 2:35 AM, Viswanathan Rajagopal >> <[email protected] <mailto:[email protected]>> >> wrote: >> >> Hi Jordan, >> >> Thanks for the suggestions. Much helpful. >> >> Quick summary of my below comments, >> Added test case simulator to reproduce issue (though artificially) >> Added a proposed code just to get your review and suggestions to see whether >> that would work >> Please find my detailed comments inline, >> >> > I think there's generally a hole in Curator's lock recipe. The lock code >> > does not watch the lock node it has created. So, another process or (as >> > you found) a race with the server might cause the lock node to disappear >> > underneath the lock instance after it thinks it has the lock. One thing we >> > can do is to check the session ID before waiting in >> > LockInternals.internalLockLoop(). If the session ID changes after wait >> > exits it suggests the lock node should no longer be trusted and it should >> > be deleted and re-created. That's 1 though anyway. >> >> Yes, you are right. Currently lock code doesn’t watch the lock node it has >> created. That’s where we would like to hear your feedback on our proposal >> mentioned in our first mail thread >> “Curator lock code has makeRevocable() API that enables application to >> revoke lock anytime from application by triggering NODE_CHANGE event through >> Revoker.attemptToRevoke() utility. >> Proposal: Would it be nice to extend makeRevocable() API to handle Node >> delete event, which would allow application to register the watcher for Node >> delete event, thereby application can react to Node delete event by revoking >> the lock ?”. Tried the Code snippet >> (https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60 >> >> <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/main/java/org/apache/curator/framework/recipes/locks/LockInternals.java#L60> >> ) to see the approach “Making lock code watch its lock node it has created” >> works and would like to get your thoughts on this and do you see any other >> impacts there? May need your advice. >> >> I agree with you on that the session id approach, that would protect us from >> seeing this issue. The reason why I thought your first approach “making lock >> code watch it has created” could be more protective is this could protect us >> from any inconsistencies between original lock node and local lock state. >> >> > It would be nice to generate a test/simulation for this case so that it >> > can be properly dealt with. >> >> Created a test case simulator >> https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java >> >> <https://github.com/ViswaNXplore/curator/blob/curator-2.5.0/patch/double_locking_test_simulator/curator-recipes/src/test/java/org/apache/curator/framework/recipes/CuratorRecipeTest.java> >> >> Approach #1 (Artificial way of reproducing the issue) >> Run the test case with (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to true). This >> would delete the lock node after a pause just to simulate the state where >> the zkClient had reconnected and it still happens to see the ephermal node >> just before the server deletes it since its session has expired, but the >> node is deleted afterwards by the server. >> Approach #2 (A little manual interruption needed to reproduce the issue) >> Run the test case in Debug mode (SIMULATE_ISSUE_BY_EXPLICIT_DELETE set to >> false) >> Artificially delaying / pausing the ephemeral lock nodes deletion as part of >> session cleanup process in server code (ZookeeperServer.close() method) >> After a pause (say 5s) to make one of the instance to acquire the lock, >> Artificially break the socket connection between client and server for 30s >> (by keeping breakpoint in ClientCnxnSocketNIO.doIO() method). After 30s, we >> would see session closing logs logged in server code >> After 1min, remove breakpoint in ClientCnxnSocketNIO.doIO() and resume both >> Thread 2 and Thread 3 >> After that, resume server thread (thread name would be “SessionTracker” >> Below Proposals discussed so far (3rd added now for your review) >> Making lock code watch its lock node it has created >> Session ID changes after wait exits it suggests the lock node should no >> longer be trusted and it should be deleted and re-created >> Leader Latch code is well protected to cover this zookeeper race condition, >> because Leader Latch code internally handle the connection events (which >> they use to interrupt latch_acquire_state to reset its latch every time >> connection is reconnected), means it will explicitly release the latch and >> recreate new if there is a connection disconnect (may be this can be the >> approach that lock recipe could use to protect ?) >> Many Thanks, >> Viswa. >> >> From: Jordan Zimmerman <[email protected] >> <mailto:[email protected]>> >> Date: Tuesday, 20 July 2021 at 21:11 >> To: Viswanathan Rajagopal <[email protected] >> <mailto:[email protected]>> >> Cc: Sean Gibbons <[email protected] >> <mailto:[email protected]>>, [email protected] >> <mailto:[email protected]> <[email protected] >> <mailto:[email protected]>>, [email protected] >> <mailto:[email protected]> <[email protected] >> <mailto:[email protected]>>, [email protected] >> <mailto:[email protected]> <[email protected] >> <mailto:[email protected]>>, Donal Arundel <[email protected] >> <mailto:[email protected]>>, Francisco Goncalves >> <[email protected] <mailto:[email protected]>>, >> Zak Szadowski <[email protected] >> <mailto:[email protected]>>, Dandie Beloy <[email protected] >> <mailto:[email protected]>>, Marcelo Cenerino >> <[email protected] <mailto:[email protected]>> >> Subject: Re: [External Sender] Double Locking Issue >> >> In our case, it wouldn’t throw any exception because it had gone past >> “creating lock nodes” and was blocked on wait(), which would only then be >> interrupted when curator watcher notified on previous sequence node delete >> event. >> >> So, you're using the version of acquire() without a timeout? In any event, >> this is a problem. When you receive SUSPENDED you really should interrupt >> any threads that are waiting on Curator. The Curator docs imply this even >> though it might not be obvious. This is likely the source of your problems. >> A simple solution is to use the version of acquire that has a timeout and >> repeatedly call it until success (though your problem may still occur in >> this case). Maybe we could improve the lock recipe (or something similar) so >> that locks inside of acquire are interrupted on a network partition. >> >> [snip of your remaining thoughtful analysis] >> >> I think there's generally a hole in Curator's lock recipe. The lock code >> does not watch the lock node it has created. So, another process or (as you >> found) a race with the server might cause the lock node to disappear >> underneath the lock instance after it thinks it has the lock. >> >> One thing we can do is to check the session ID before waiting in >> LockInternals.internalLockLoop(). If the session ID changes after wait exits >> it suggests the lock node should no longer be trusted and it should be >> deleted and re-created. That's 1 though anyway. It would be nice to generate >> a test/simulation for this case so that it can be properly dealt with. >> >> -Jordan >> >> >> On Jul 20, 2021, at 3:53 PM, Viswanathan Rajagopal >> <[email protected] <mailto:[email protected]>> >> wrote: >> >> Thanks Jordan for coming back on this. >> >> Please find my inline comments. Also provided below few additional version >> info that we are using, >> Curator version – 2.5.0 >> Zookeeper version – 3.5.3 >> >> > I don't see how this is the case. If the client has received a network >> > partition it shouldn't not consider any locks as being currently held (see >> > the Tech Notes I mentioned in my previous email). If there is a partition >> > during a call to acquire(), acquire() would throw an exception (once the >> > retry policy has expired). BTW, what retry policy are you using? >> >> In our case, it wouldn’t throw any exception because it had gone past >> “creating lock nodes” and was blocked on wait(), which would only then be >> interrupted when curator watcher notified on previous sequence node delete >> event. >> >> > This isn't correct. Curator watches the previous node but lock acquisition >> > is always based on calling ZK's getChildren, sorting them, and seeing if >> > the lock's node is the first node in the sequence. If Ephemeral nodes >> > aren't cleaned up it wouldn't be a problem. Any phantom ephemeral nodes >> > would sort first and prevent Curator from believing it holds the lock. >> >> Totally agreed on your point on lock acquisition. The problem that I was >> trying to explain is that the network partition occurs while the client is >> waiting for the lock (in the acquire()) but after the acquire code block has >> already written out the lock node successfully, so the client code will be >> blocking in the acquire and not get a connection exception. Once the >> previous lock node is deleted but not the lock node for this client then >> this client will leave the acquire thinking it has the lock but actually at >> this point the ZK server has expired the session and is in the process of >> deleting our clients lock node. >> >> Trying to categorize the timeline events below to see if that would help us >> for better understanding, >> [ CONNECTED ] : >> Client A & Client B calls acquire creating Node N1 & N2 respectively >> Client A acquire() -> holds the lock as its Node N1 is first node in sequence >> Client B acquire() -> created Node N2, watches for previous sequence node >> (blocked on wait()) >> >> [ SUSPENDED CONNECTION OCCURS ] : >> Network partition happens >> Curator fires SUSPENDED >> Client A, on receiving SUSPENDED event, attempts to release the lock >> Client B, on receiving SUSPENDED event, does nothing as it was not holding >> any lock (as blocked on acquire() -> wait() ) >> >> [ SESSION EXPIRED ] : >> Client A attempts to release the lock (with retries) >> Client B does nothing as it was not holding any lock (and is still blocked >> on acquire() -> wait() ) >> Server prepared to cleanup previous client session and deleting lock nodes >> created as part of previous client session >> >> [ RECONNECTED ] : >> Client A and Client B reconnected to another server with new session id >> (Note : This happens before server managed to cleanup / delete ephemeral >> nodes of previous client session) >> Client A released the lock successfully (means it would delete its lock node >> N1) and attempts to acquire lock by creating lock node N3 and watches for >> previous sequence node (N2) >> Client B who was blocked on acquire() -> wait() would be notified with >> previous sequence node (N1) deletion -> getChildren, sorting them, and >> seeing if the lock's node is the first node in the sequence. So, Client B >> sees its lock node N2 still (which I call it as about to be deleted node by >> server) and thereby acquires the lock >> >> [ AFTER FEW SECONDS ] : >> Server managed to delete the ephemeral node N2 as part of previous client >> session cleanup >> Client A who was blocked on acquire() -> wait() would be notified with >> previous sequence node (N2) deletion -> getChildren, sorting them, and >> seeing if the lock's node is the first node in the sequence and thereby >> acquires the lock >> Client B – its local lock thread data went stale (as its lock path N2 not >> has been deleted by Server) >> >> > SUSPENDED in Curator means only that the connect has been lost, not that >> > the session has ended. >> LOST is the state that means the session has ended >> Be aware of how GCs can affect Curator. See the Tech Note here: >> https://cwiki.apache.org/confluence/display/CURATOR/TN10 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=> >> <https://cwiki.apache.org/confluence/display/CURATOR/TN10 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN10&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=BJywW80sEBAxOQEuTDkeP8tx0XX0s01sz44lNyWXIXs&e=>> >> Also read this Tech Note on session handling: >> https://cwiki.apache.org/confluence/display/CURATOR/TN14 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=> >> <https://cwiki.apache.org/confluence/display/CURATOR/TN14 >> <https://urldefense.proofpoint.com/v2/url?u=https-3A__cwiki.apache.org_confluence_display_CURATOR_TN14&d=DwMFaQ&c=DS6PUFBBr_KiLo7Sjt3ljp5jaW5k2i9ijVXllEdOozc&r=mwSLlPO0Vtstmu1dce0TMFqf5lUxD2SPdNdc1k4NXjU&m=jQttOc88vMYbwNBFzWUO8TKGSmu_8P6EDmcD7jlrVys&s=UhQP5TNa_OMaYVSJX3b9p7XqWZ0fZtErLQwwbcQUUQA&e=>> >> >> Very good information on the tech notes. Thanks for that. Agreed, it’s >> always recommended to clear the locks when it sees SUSPENDED event. And we >> are already following your recommendation. Our application would clear the >> lock when it sees SUSPENDED event. >> >> To summarize my thoughts, >> Ephemeral node created by previous session was still seen by client that >> reconnected with new session id until server cleans that up the previous >> session ephemeral node. This could happen if client manage to reconnect with >> server with new session id before server cleans up the previous session. How >> it affects the curator lock recipe ? Explanation : The above explained race >> condition would make acquire() to hold the lock ( as its own lock node still >> seen ), eventually leading to inconsistent state (i.e. curator local lock >> state stale) when that lock node is being cleaned up by the server as part >> of previous session cleanup activities. >> I am NOT seeing a Curator bug here, but looking out for suggestions / >> recommendations in handling this zookeeper race condition. Either can a >> feature be added to Curator to cover this case / any recommendations that >> clients should follow. >> >> Many Thanks, >> Viswa. >> >> From: Sean Gibbons <[email protected] >> <mailto:[email protected]>> >> Date: Tuesday, 20 July 2021 at 12:19 >> To: Viswanathan Rajagopal <[email protected] >> <mailto:[email protected]>> >> Subject: FW: [External Sender] Re: Double Locking Issue >> >> >> >> On 20/07/2021, 10:12, "Jordan Zimmerman" <[email protected] >> <mailto:[email protected]>> wrote: >> >> A few more things... >> >> > Based on our initial analysis and few test runs, we saw that Curator >> acquire() method acquires the lock based on “about to be deleted lock node >> of previous session”. Explanation : Ephemeral node created by previous >> session was still seen by client that reconnected with new session id until >> server cleans that up. If this happens, Curator acquire() would hold the >> lock. >> >> This isn't correct. Curator watches the previous node but lock >> acquisition is always based on calling ZK's getChildren, sorting them, and >> seeing if the lock's node is the first node in the sequence. If Ephemeral >> nodes aren't cleaned up it wouldn't be a problem. Any phantom ephemeral >> nodes would sort first and prevent Curator from believing it holds the lock. >> >> > * On the above mentioned race condition, if client manage to >> reconnect to server with new session id before server cleans up the >> ephemeral nodes of client’s previous session, Curator lock acquire() who is >> trying to acquire the lock will hold the lock as it still sees the lock node >> in zookeeper directory. Eventually server would be cleaning up the ephemeral >> nodes leaving the Curator local lock thread data stale giving the illusion >> that it still hold the lock while its ephemeral node is gone >> >> I don't see how this is the case. If the client has received a network >> partition it shouldn't not consider any locks as being currently held (see >> the Tech Notes I mentioned in my previous email). If there is a partition >> during a call to acquire(), acquire() would throw an exception (once the >> retry policy has expired). BTW, what retry policy are you using? >> >> So, to reiterate, I don't see how phantom/undeleted ephemeral nodes >> would cause a problem. The only problem it could case is that a given >> Curator client takes longer to acquire a lock as it waits for those >> ephemerals to finally get deleted. >> >> -JZ >> >> > On Jul 19, 2021, at 6:45 PM, Viswanathan Rajagopal >> <[email protected] >> <mailto:[email protected]>> wrote: >> > >> > Hi Team, >> > >> > Good day. >> > >> > Recently came across “Double Locking Issue (i.e. two clients acquiring >> lock)” using Curator code ( InterProcessMutex lock APIs ) in our application >> > >> > Our use case: >> > >> > * Two clients attempts to acquire the zookeeper lock using Curator >> InterProcessMutex and whoever owns it would release it once sees the >> connection disconnect ( on receiving Connection.SUSPENDED / Connection.LOST >> Curator Connection Events from Connection listener) >> > >> > Issue we noticed: >> > >> > * After session expired & reconnected with new session, both client >> seems to have acquired the lock. Interesting thing that we found is that one >> of the clients still holds the lock while its lock node (ephemeral) was gone >> > >> > Things we found: >> > >> > * Based on our initial analysis and few test runs, we saw that >> Curator acquire() method acquires the lock based on “about to be deleted >> lock node of previous session”. Explanation : Ephemeral node created by >> previous session was still seen by client that reconnected with new session >> id until server cleans that up. If this happens, Curator acquire() would >> hold the lock. >> > >> > >> > >> > * Clearly we could see the race condition (in zookeeper code) >> between 1). Client reconnecting to server with new session id and 2). server >> deleting the ephemeral nodes of client’s previous session. We were able to >> reproduce this issue using the following approach, >> > * Artificially break the socket connection between client and >> server for 30s >> > * Artificially pausing the set of server codes for a min and >> unpause >> > >> > >> > * On the above mentioned race condition, if client manage to >> reconnect to server with new session id before server cleans up the >> ephemeral nodes of client’s previous session, Curator lock acquire() who is >> trying to acquire the lock will hold the lock as it still sees the lock node >> in zookeeper directory. Eventually server would be cleaning up the ephemeral >> nodes leaving the Curator local lock thread data stale giving the illusion >> that it still hold the lock while its ephemeral node is gone >> > >> > >> > * Timeline events described below for better understanding, >> > * At t1, Client A and Client B establishes zookeeper session >> with session id A1 and B1 respectively >> > * At t2, Client A creates the lock node N1 & acquires the lock >> > * At t3, Client B creates the lock node N2 & blocked on >> acquire() to acquire the lock >> > * At t4, session timed out for both clients & server is about to >> clean up the old session • Client A trying to release the lock >> > * At t5, Client A and Client B reconnects to server with new >> session id A2 and B2 respectively before server deletes the ephemeral node >> N1 & N2 of previous client session. Client A releases the lock, deleting N1 >> and trying to acquire it again by creating N3 node and Client B who is >> blocked on acquire() acquires the lock based on N2 (about to be deleted node >> created by previous session) >> > * At t6, server cleans up the ephemeral node N2 created by >> Client B’s previous session. Client A acquires the lock with node N3 as its >> previous sequence N2 gets deleted, whereas Client B who incorrectly acquired >> the lock at t5 timeline still holds the lock >> > >> > >> > Note: >> > >> > * We are not certain that this race condition that we noticed in >> zookeeper code is intentional design. >> > >> > Questions: >> > >> > * Given this race condition seen in zookeeper code, we would like >> to hear your recommendations / suggestions to avoid this issue while using >> Curator lock code? >> > >> > >> > >> > * We also see that Interprocess Mutex has the makeRevocable() API >> that enables application to revoke lock, but that handles Node change event >> only but not for Node deleted event. I understand that this makes sense to >> handle Node change event alone, as to enable application user to revoke lock >> externally from application side. But would it be also to okay to have one >> for Node delete event, so as application can register the listener for Node >> delete event. I would like to hear your thoughts. >> > >> > >> > Many Thanks, >> > Viswa >> >
