Andrea Reale created ZOOKEEPER-3162:
---------------------------------------

             Summary: Broken lock semantics in C client lock-recipe
                 Key: ZOOKEEPER-3162
                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-3162
             Project: ZooKeeper
          Issue Type: Bug
          Components: c client
    Affects Versions: 3.4.13, 3.0.0
            Reporter: Andrea Reale


As reported (but never fixed) in the past by ZOOKEEPER-2409, ZOOKEEPER-2038 and 
(partly) ZOOKEEPER-2878, the C client lock-recipe implementation is broken.

I identified three issues.

The main one (as also reported in the aforementioned reports) is that the logic 
that goes through the lock waiting list is broken. child_floor uses strcmp and 
compares the full node name (i.e., sessionID-sequence) rather than only 
comparing the sequence number. This makes it possible for two different clients 
to hold the lock at the same time: assume two clients, one associated with 
session A, the other with session B, with A < B lexicographically. Now assume 
that at some point a thread in B holds a lock and a thread in A tries to 
acquire the same lock. A will manage to get the lock because of the wrong 
comparison function, so now two guys hold the lock.

The second issue is a possible deadlock inside zkr_lock_operation. 
zkr_lock_operation is always called by holding the mutex associated to the 
client lock. In some cases, zkr_lock_operaton may decide to give-up locking and 
call zkr_lock_unlock to release the lock. When this happens, it will try to 
acquire again the same phtread mutex, which will lead to a deadlock.

The third issue relates to the return value of zkr_lock_lock. According to the 
API docs, the functions returns 0 when no errors. Then it is up to the invoker 
to check when the lock is held by calling zkr_lock_isowner. However, the 
implementation, in case of no error, returns zkr_lock_isowner. This is wrong 
because it becomes impossible to distinguish an error condition from a success 
(but not ownerhsip). Instead the API (as described in the docs, btw) should 
return always 0 when no errors occur.

Shortly I will add the link to a PR fixing the issues.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to