Karl,

I tried to understand the Zookeeper lock logic in the code, and the only thing I don't understand is the 'handleEphemeralNodeKeeperException' method that is called in the catch(KeeperException e) of every obtain/release lock method of the ZookeeperConnection class.

This method sets the lockNode param to 'null', recreates a session and recreates nodes but do not resets the lockNode param at the end. So, as I understood it, if it happens it may result in the lock release error that I mentioned because this error is triggered when the lockNode param is 'null'.

The method is in the class org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection. If you can take a look and tell me what you think about it, it would be great !

Thanks,

Julien

Le 07/12/2021 à 14:40, Julien Massiera a écrit :
Yes, I will then try the patch and see if it is working

Regards,

Julien

Le 07/12/2021 à 14:28, Karl Wright a écrit :
Yes, this is plausible.  But I'm not sure what the solution is.  If a
zookeeper session disappears, according to the documentation everything
associated with that session should also disappear.

So I guess we could catch this error and just ignore it, assuming that the
session must be gone anyway?

Karl


On Tue, Dec 7, 2021 at 8:21 AM Julien Massiera <
julien.massi...@francelabs.com> wrote:

Hi,

the Zookeeper lock error mentioned in the before last comment of this
issue https://issues.apache.org/jira/browse/CONNECTORS-1447:

FATAL 2017-08-04 09:28:25,855 (Agents idle cleanup thread) - Error tossed:
Can't release lock we don't hold
java.lang.IllegalStateException: Can't release lock we don't hold
at
org.apache.manifoldcf.core.lockmanager.ZooKeeperConnection.releaseLock(ZooKeeperConnection.java:815)
at
org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearLock(ZooKeeperLockObject.java:218)
at
org.apache.manifoldcf.core.lockmanager.ZooKeeperLockObject.clearGlobalWriteLockNoWait(ZooKeeperLockObject.java:100)
at
org.apache.manifoldcf.core.lockmanager.LockObject.clearGlobalWriteLock(LockObject.java:160)
at
org.apache.manifoldcf.core.lockmanager.LockObject.leaveWriteLock(LockObject.java:141)
at
org.apache.manifoldcf.core.lockmanager.LockGate.leaveWriteLock(LockGate.java:205)
at
org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveWrite(BaseLockManager.java:1224)
at
org.apache.manifoldcf.core.lockmanager.BaseLockManager.leaveWriteLock(BaseLockManager.java:771)
at
org.apache.manifoldcf.core.connectorpool.ConnectorPool$Pool.pollAll(ConnectorPool.java:670)
at
org.apache.manifoldcf.core.connectorpool.ConnectorPool.pollAllConnectors(ConnectorPool.java:338)
at
org.apache.manifoldcf.agents.transformationconnectorpool.TransformationConnectorPool.pollAllConnectors(TransformationConnectorPool.java:121)
at
org.apache.manifoldcf.agents.system.IdleCleanupThread.run(IdleCleanupThread.java:91)

is still happening in 2021 with the 2.20 version of MCF.

Karl, you hypothesized that it could be related to Zookeeper being
restarted while the MCF agent is still running, but after some
investigations, my theory is that it is related to re-established
sessions. Locks are not associated to a process but to a session, and it
could happen that when a session is closed accidentally (interrupted by
exceptions etc), it does not correctly release the locks it sets. When a
new session is created by Zookeeper for the same client, the locks
cannot be released because they belong to an old session and the
exception is thrown !

Is it something plausible for you ? I have no knowledge on Zookeeper but
if it is something plausible, then it is worth investigating into the
code to see if everything is correctly done to be sure that all locks
are released when a session is closed/interrupted by a problem.

Julien

--
L'absence de virus dans ce courrier électronique a été vérifiée par le
logiciel antivirus Avast.
https://www.avast.com/antivirus



--
L'absence de virus dans ce courrier électronique a été vérifiée par le logiciel 
antivirus Avast.
https://www.avast.com/antivirus

Reply via email to