On Donnerstag, 20. August 2020 12:54:49 CEST Paolo Bonzini wrote: > More seriously: programming with concurrency is first and foremost about > understanding invariants[1]. Locks are relatively simple to reason > about because they enforce invariants at the points where they are > acquired or released.
As a side note, independent of the actual QBL issue discussed, while I agree with you that nested locks should be avoided as much as possible, especially on heterogenous large scale projects like QEMU; please let me correct some of the things you said about recursive locks in general: > However, recursive locks are pure evil. :) That's a common overgeneralization of the potential issues when dealing with recursive locks. Especially this claim ... > but callers have no clue about what invariants are provided around calls > to do_it(). So, understanding complex code that uses recursive mutexes > is effectively impossible. ... is incorrect. It is correct that you can run into deadlocks if you don't know how to deal with nested recursive locks appropriately. It is incorrect though to say they were not managable. There is a golden rule with recursive locks: You always have to preserve a clear hierarchy. Say you have the following recursive mutexes: RecursiveMutex mutex0; RecursiveMutex mutex1; RecursiveMutex mutex2; ... RecursiveMutex mutexN; where the suffix shall identify the hierarchy, i.e. h(mutex0) = 0, h(mutex1) = 1, ... h(mutexN) = N. Then the golden rule is that in any call stack the nested locks must always preserve the same transitive hierarchy, e.g.: h(lock1) <= h(lock2) <= ... <= h(lockK) Example (using lock-guard notation), let's say ascending transitivity is chosen, then the following is Ok, as it does not violate chosen transitivity: synchronized(mutex0) { synchronized(mutex1) { ... synchronized(mutexN) { } } } Likewise, the following is Ok as well: synchronized(mutex3) { synchronized(mutex8) { ... synchronized(mutexN) { } } } whereas this would be illegal: synchronized(mutex3) { synchronized(mutex2) { // < violates chosen transitivity ... synchronized(mutexN) { } } } Now let's say one day somebody accidentally broke that rule and ran into a deadlock. What you can do to resolve the situation is capturing the call stack of each mutex's [last] lock. And when you triggered the deadlock, you know that at least one of the threads violated the lock hierarchy. So you look at the individual call stacks and correct the program flow appropriately to restore the hierarchy. And the latter is BTW independent of (i.e. any side effects) of other threads, so it is really just about looking at what exactly ONE thread is doing. And for the latter reason, there are also more systematic approaches to ensure correctness: for instance a built-in hierarchy check in the individual Mutex implementation which would then e.g. raise an assertaion fault on early testing whenever a developer broke the hierarchy in a nested lock sequence. Another solution would be integrating this hierarchy check into a (e.g. static) sanatizer, as this information can already be derived directly from the AST in most cases. But unfortunatley this does not exist yet in any sanatizer yet AFAIK. For me, a non-recursive mutex makes sense for one use case: if the intention is to lock the mutex on one thread while allowing to unlock it on another thread. For all other use cases I would (personally) prefer a recursive type, as it guards a clear ownership relation and hence allows to guard and prevent many mistakes. Best regards, Christian Schoenebeck