yapeng created CURATOR-716:
------------------------------
Summary: InterProcessMutext has performance issue when there are
lots of threads trying to acqure the lock
Key: CURATOR-716
URL: https://issues.apache.org/jira/browse/CURATOR-716
Project: Apache Curator
Issue Type: Improvement
Components: Recipes
Affects Versions: 5.5.0
Reporter: yapeng
InterProcessMutext has performance issue when there are lots of threads trying
to acqure the lock.
For example we have 1000 threads trying to acquire the lock and the sequence
number is 1 to 1000.
0ms -------1 get the lock, 2~1000 are in the wait()
100ms -----1 released the lock, send a notifyAll() to InterProcessMutext
instance, 2~1000 revocered from wait().
100ms------2 get the lock. 3~1000 are blocked by completing the synchronized
lock. In the synchronized code block, there is a getData request to ZK, we
assume it cost 5ms. So it will cost 5000ms for all threads to get the
synchronized lock.
200ms------2 released the lock. 4~44 already get synchronized lock and in the
wait(). 3 is still in the queue of synchronized.
3000ms-----3 get the synchronized lock, then get NoNodeException, trying to
reacquire and acquired the lock.
........
There are some cases which may worse the case:
# some thread get timeout, it deleted itself and another notifyAll() is sent.
This will make more threads go to queue of synchronized.
# say we have an candidate which rank first in the zk empheral nodes. But is
is blocked by the synchronized too long. Then it become timeout, it didn't get
the distributed lock even after queued a long time.
# timeout of acqure lock will cause more zk node deletion and creation. The zk
update reuqests will block zk read requests. This make threads waiting longer
in queue of synchronized.
It is expected the lock will pass down every 100ms, but actually it took much
longer. In my case, it took 30s+.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)