Also increase the table of contents' depth to 4. --- doc/design-2.1.rst | 82 +++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 81 insertions(+), 1 deletions(-)
diff --git a/doc/design-2.1.rst b/doc/design-2.1.rst index a2617cf..40556c3 100644 --- a/doc/design-2.1.rst +++ b/doc/design-2.1.rst @@ -9,7 +9,7 @@ The 2.1 version will be a relatively small release. Its main aim is to avoid changing too much of the core code, while addressing issues and adding new features and improvements over 2.0, in a timely fashion. -.. contents:: :depth: 3 +.. contents:: :depth: 4 Objective ========= @@ -80,6 +80,86 @@ framework level, but at individual operation level. The goal is that this is a lightweight framework, for abstracting the different storage operation, and not for modelling the storage hierarchy. + +Locking improvements +~~~~~~~~~~~~~~~~~~~~ + +Current State and shortcomings +++++++++++++++++++++++++++++++ + +The class ``LockSet`` (see ``lib/locking.py``) is a container for one or many +``SharedLock`` instances. It provides an interface to add/remove locks and to +acquire and subsequently release any number of those locks contained in it. + +Locks in a ``LockSet`` are always acquired in alphabetic order. Due to the way +we're using locks for nodes and instances (the single cluster lock isn't +affected by this issue) this can lead to long delays when acquiring locks if +another operation tries to acquire multiple locks but has to wait for yet +another operation. + +In the following demonstration we assume to have the instance locks ``inst1``, +``inst2``, ``inst3`` and ``inst4``. + +#. Operation A grabs lock for instance ``inst4``. +#. Operation B wants to acquire all instance locks in alphabetic order, but it + has to wait for ``inst4``. +#. Operation C tries to lock ``inst1``, but it has to wait until + Operation B (which is trying to acquire all locks) releases the lock again. +#. Operation A finishes and releases lock on ``inst4``. Operation B can + continue and eventually releases all locks. +#. Operation C can get ``inst1`` lock and finishes. + +Technically there's no need for Operation C to wait for Operation A, and +subsequently Operation B, to finish. Operation B can't continue until +Operation A is done (it has to wait for ``inst4``), anyway. + +Proposed changes +++++++++++++++++ + +Non-blocking lock acquiring +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Acquiring locks for OpCode execution is always done in blocking mode. They +won't return until the lock has successfully been acquired (or an error +occurred, although we won't cover that case here). + +``SharedLock`` and ``LockSet`` must be able to be acquired in a +non-blocking way. They must support a timeout and abort trying to acquire +the lock(s) after the specified amount of time. + +Retry acquiring locks +^^^^^^^^^^^^^^^^^^^^^ + +To prevent other operations from waiting for a long time, such as described in +the demonstration before, ``LockSet`` must not keep locks for a prolonged period +of time when trying to acquire two or more locks. Instead it should, with an +increasing timeout for acquiring all locks, release all locks again and +sleep some time if it fails to acquire all requested locks. + +A good timeout value needs to be determined. In any case should ``LockSet`` +proceed to acquire locks in blocking mode after a few (unsuccessful) attempts +to acquire all requested locks. + +One proposal for the timeout is to use ``2**tries`` seconds, where ``tries`` +is the number of unsuccessful tries. + +In the demonstration before this would allow Operation C to continue after +Operation B unsuccessfully tried to acquire all locks and released all +acquired locks (``inst1``, ``inst2`` and ``inst3``) again. + +Other solutions discussed ++++++++++++++++++++++++++ + +There was also some discussion on going one step further and extend the job +queue (see ``lib/jqueue.py``) to select the next task for a worker depending on +whether it can acquire the necessary locks. While this may reduce the number of +necessary worker threads and/or increase throughput on large clusters with many +jobs, it also brings many potential problems, such as contention and increased +memory usage, with it. As this would be an extension of the changes proposed +before it could be implemented at a later point in time, but we decided to stay +with the simpler solution for now. + + Feature changes --------------- -- 1.6.4.3
