Op 22-05-13 19:24, Maarten Lankhorst schreef:
> Hey,
>
> Op 22-05-13 18:18, Peter Zijlstra schreef:
>> On Wed, May 22, 2013 at 01:18:14PM +0200, Maarten Lankhorst wrote:
>>
>> Lacking the actual msg atm, I'm going to paste in here...
> Thanks for taking the time to review.
>>> Subject: [PATCH v3 2/3] mutex: add support for wound/wait style locks, v3
>>> From: Maarten Lankhorst <maarten.lankhorst at xxxxxxxxxxxxx>
>>>
>>> Changes since RFC patch v1:
>>>  - Updated to use atomic_long instead of atomic, since the reservation_id 
>>> was a long.
>>>  - added mutex_reserve_lock_slow and mutex_reserve_lock_intr_slow
>>>  - removed mutex_locked_set_reservation_id (or w/e it was called)
>>> Changes since RFC patch v2:
>>>  - remove use of __mutex_lock_retval_arg, add warnings when using wrong 
>>> combination of
>>>    mutex_(,reserve_)lock/unlock.
>>> Changes since v1:
>>>  - Add __always_inline to __mutex_lock_common, otherwise reservation paths 
>>> can be
>>>    triggered from normal locks, because __builtin_constant_p might evaluate 
>>> to false
>>>    for the constant 0 in that case. Tests for this have been added in the 
>>> next patch.
>>>  - Updated documentation slightly.
>>> Changes since v2:
>>>  - Renamed everything to ww_mutex. (mlankhorst)
>>>  - Added ww_acquire_ctx and ww_class. (mlankhorst)
>>>  - Added a lot of checks for wrong api usage. (mlankhorst)
>>>  - Documentation updates. (danvet)
>>>
>>> Signed-off-by: Maarten Lankhorst <maarten.lankhorst at xxxxxxxxxxxxx>
>>> Signed-off-by: Daniel Vetter <daniel.vetter at xxxxxxxx>
>>> ---
>>>  Documentation/ww-mutex-design.txt |  322 +++++++++++++++++++++++++++
>>>  include/linux/mutex-debug.h       |    1 
>>>  include/linux/mutex.h             |  257 +++++++++++++++++++++
>>>  kernel/mutex.c                    |  445 
>>> ++++++++++++++++++++++++++++++++++++-
>>>  lib/debug_locks.c                 |    2 
>>>  5 files changed, 1010 insertions(+), 17 deletions(-)
>>>  create mode 100644 Documentation/ww-mutex-design.txt
>>>
>>> diff --git a/Documentation/ww-mutex-design.txt 
>>> b/Documentation/ww-mutex-design.txt
>>> new file mode 100644
>>> index 0000000..154bae3
>>> --- /dev/null
>>> +++ b/Documentation/ww-mutex-design.txt
>>> @@ -0,0 +1,322 @@
>>> +Wait/Wound Deadlock-Proof Mutex Design
>>> +======================================
>>> +
>>> +Please read mutex-design.txt first, as it applies to wait/wound mutexes 
>>> too.
>>> +
>>> +Motivation for WW-Mutexes
>>> +-------------------------
>>> +
>>> +GPU's do operations that commonly involve many buffers.  Those buffers
>>> +can be shared across contexts/processes, exist in different memory
>>> +domains (for example VRAM vs system memory), and so on.  And with
>>> +PRIME / dmabuf, they can even be shared across devices.  So there are
>>> +a handful of situations where the driver needs to wait for buffers to
>>> +become ready.  If you think about this in terms of waiting on a buffer
>>> +mutex for it to become available, this presents a problem because
>>> +there is no way to guarantee that buffers appear in a execbuf/batch in
>>> +the same order in all contexts.  That is directly under control of
>>> +userspace, and a result of the sequence of GL calls that an application
>>> +makes.     Which results in the potential for deadlock.  The problem gets
>>> +more complex when you consider that the kernel may need to migrate the
>>> +buffer(s) into VRAM before the GPU operates on the buffer(s), which
>>> +may in turn require evicting some other buffers (and you don't want to
>>> +evict other buffers which are already queued up to the GPU), but for a
>>> +simplified understanding of the problem you can ignore this.
>>> +
>>> +The algorithm that TTM came up with for dealing with this problem is quite
>>> +simple.  For each group of buffers (execbuf) that need to be locked, the 
>>> caller
>>> +would be assigned a unique reservation id/ticket, from a global counter.  
>>> In
>>> +case of deadlock while locking all the buffers associated with a execbuf, 
>>> the
>>> +one with the lowest reservation ticket (i.e. the oldest task) wins, and 
>>> the one
>>> +with the higher reservation id (i.e. the younger task) unlocks all of the
>>> +buffers that it has already locked, and then tries again.
>>> +
>>> +In the RDBMS literature this deadlock handling approach is called 
>>> wait/wound:
>>> +The older tasks waits until it can acquire the contended lock. The younger 
>>> tasks
>>> +needs to back off and drop all the locks it is currently holding, i.e. the
>>> +younger task is wounded.
>>> +
>>> +Concepts
>>> +--------
>>> +
>>> +Compared to normal mutexes two additional concepts/objects show up in the 
>>> lock
>>> +interface for w/w mutexes:
>>> +
>>> +Acquire context: To ensure eventual forward progress it is important the a 
>>> task
>>> +trying to acquire locks doesn't grab a new reservation id, but keeps the 
>>> one it
>>> +acquired when starting the lock acquisition. This ticket is stored in the
>>> +acquire context. Furthermore the acquire context keeps track of debugging 
>>> state
>>> +to catch w/w mutex interface abuse.
>>> +
>>> +W/w class: In contrast to normal mutexes the lock class needs to be 
>>> explicit for
>>> +w/w mutexes, since it is required to initialize the acquire context.
>>> +
>>> +Furthermore there are three different classe of w/w lock acquire functions:
>>> +- Normal lock acquisition with a context, using ww_mutex_lock
>>> +- Slowpath lock acquisition on the contending lock, used by the wounded 
>>> task
>>> +  after having dropped all already acquired locks. These functions have the
>>> +  _slow postfix.
>> See below, I don't see the need for this interface.
>>
>>> +- Functions to only acquire a single w/w mutex, which results in the exact 
>>> same
>>> +  semantics as a normal mutex. These functions have the _single postfix.
>> This is missing rationale.
> trylock_single is useful when iterating over a list, and you want to evict a 
> bo, but only the first one that can be acquired.
> lock_single is useful when only a single bo needs to be acquired, for example 
> to lock a buffer during mmap.
>
>>> +
>>> +Of course, all the usual variants for handling wake-ups due to signals are 
>>> also
>>> +provided.
>>> +
>>> +Usage
>>> +-----
>>> +
>>> +Three different ways to acquire locks within the same w/w class. Common
>>> +definitions for methods 1&2.
>>> +
>>> +static DEFINE_WW_CLASS(ww_class);
>>> +
>>> +struct obj {
>>> +   sct ww_mutex lock;
>>> +   /* obj data */
>>> +};
>>> +
>>> +struct obj_entry {
>>> +   struct list_head *list;
>>> +   struct obj *obj;
>>> +};
>>> +
>>> +Method 1, using a list in execbuf->buffers that's not allowed to be 
>>> reordered.
>>> +This is useful if a list of required objects is already tracked somewhere.
>>> +Furthermore the lock helper can use propagate the -EALREADY return code 
>>> back to
>>> +the caller as a signal that an object is twice on the list. This is useful 
>>> if
>>> +the list is constructed from userspace input and the ABI requires 
>>> userspace to
>>> +no have duplicate entries (e.g. for a gpu commandbuffer submission ioctl).
>>> +
>>> +int lock_objs(struct list_head *list, struct ww_acquire_ctx *ctx)
>>> +{
>>> +   struct obj *res_obj = NULL;
>>> +   struct obj_entry *contended_entry = NULL;
>>> +   struct obj_entry *entry;
>>> +
>>> +   ww_acquire_init(ctx, &ww_class);
>>> +
>>> +retry:
>>> +   list_for_each_entry (list, entry) {
>>> +           if (entry == res_obj) {
>>> +                   res_obj = NULL;
>>> +                   continue;
>>> +           }
>>> +           ret = ww_mutex_lock(&entry->obj->lock, ctx);
>>> +           if (ret < 0) {
>>> +                   contended_obj = entry;
>>> +                   goto err;
>>> +           }
>>> +   }
>>> +
>>> +   ww_acquire_done(ctx);
>>> +   return 0;
>>> +
>>> +err:
>>> +   list_for_each_entry_continue_reverse (list, contended_entry, entry)
>>> +           ww_mutex_unlock(&entry->obj->lock);
>>> +
>>> +   if (res_obj)
>>> +           ww_mutex_unlock(&res_obj->lock);
>>> +
>>> +   if (ret == -EDEADLK) {
>>> +           /* we lost out in a seqno race, lock and retry.. */
>>> +           ww_mutex_lock_slow(&contended_entry->obj->lock, ctx);
>> I missing the need for ww_mutex_lock_slow(). AFAICT we should be able to tell
>> its the first lock in the ctx and thus we cannot possibly deadlock.
> Theoretically true, but that would require always setting ctx->acquired 
> correctly.
> Plus that would weaken the checks. Without ww_mutex_lock_slow you can not
> say for sure all mutexes have been unlocked, and tell that what you say is 
> really true.
>
>>> +           res_obj = contended_entry->obj;
>>> +           goto retry;
>>> +   }
>>> +   ww_acquire_fini(ctx);
>>> +
>>> +   return ret;
>>> +}
>>> +
>> ... you certainly went all out on documentation.
>>
>>> diff --git a/include/linux/mutex-debug.h b/include/linux/mutex-debug.h
>>> index 731d77d..4ac8b19 100644
>>> --- a/include/linux/mutex-debug.h
>>> +++ b/include/linux/mutex-debug.h
>>> @@ -3,6 +3,7 @@
>>>  
>>>  #include <linux/linkage.h>
>>>  #include <linux/lockdep.h>
>>> +#include <linux/debug_locks.h>
>>>  
>>>  /*
>>>   * Mutexes - debugging helpers:
>>> diff --git a/include/linux/mutex.h b/include/linux/mutex.h
>>> index 9121595..004f863 100644
>>> --- a/include/linux/mutex.h
>>> +++ b/include/linux/mutex.h
>>> @@ -74,6 +74,35 @@ struct mutex_waiter {
>>>  #endif
>>>  };
>>>  
>>> +struct ww_class {
>>> +   atomic_long_t stamp;
>>> +   struct lock_class_key acquire_key;
>>> +   struct lock_class_key mutex_key;
>>> +   const char *acquire_name;
>>> +   const char *mutex_name;
>>> +};
>>> +
>>> +struct ww_acquire_ctx {
>>> +   struct task_struct *task;
>>> +   unsigned long stamp;
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +   unsigned acquired, done_acquire;
>>> +   struct ww_class *ww_class;
>>> +   struct ww_mutex *contending_lock;
>>> +#endif
>>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>>> +   struct lockdep_map dep_map;
>>> +#endif
>>> +};
>>> +
>>> +struct ww_mutex {
>>> +   struct mutex base;
>>> +   struct ww_acquire_ctx *ctx;
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +   struct ww_class *ww_class;
>>> +#endif
>>> +};
>>> +
>>> @@ -167,6 +236,192 @@ extern int __must_check mutex_lock_killable(struct 
>>> mutex *lock);
>>>   */
>>>  extern int mutex_trylock(struct mutex *lock);
>>>  extern void mutex_unlock(struct mutex *lock);
>>> +
>>> +/**
>>> + * ww_acquire_init - initialize a w/w acquire context
>>> + * @ctx: w/w acquire context to initialize
>>> + * @ww_class: w/w class of the context
>>> + *
>>> + * Initializes an context to acquire multiple mutexes of the given w/w 
>>> class.
>>> + *
>>> + * Context-based w/w mutex acquiring can be done in any order whatsoever 
>>> within
>>> + * a given lock class. Deadlocks will be detected and handled with the
>>> + * wait/wound logic.
>>> + *
>>> + * Mixing of context-based w/w mutex acquiring and single w/w mutex 
>>> locking can
>>> + * result in undetected deadlocks and is so forbidden. Mixing different 
>>> contexts
>>> + * for the same w/w class when acquiring mutexes can also result in 
>>> undetected
>>> + * deadlocks, and is hence also forbidden.
>>> + *
>>> + * Nesting of acquire contexts for _different_ w/w classes is possible, 
>>> subject
>>> + * to the usual locking rules between different lock classes.
>>> + *
>>> + * An acquire context must be release by the same task before the memory is
>>> + * freed with ww_acquire_fini. It is recommended to allocate the context 
>>> itself
>>> + * on the stack.
>>> + */
>>> +static inline void ww_acquire_init(struct ww_acquire_ctx *ctx,
>>> +                              struct ww_class *ww_class)
>>> +{
>>> +   ctx->task = current;
>>> +   do {
>>> +           ctx->stamp = atomic_long_inc_return(&ww_class->stamp);
>>> +   } while (unlikely(!ctx->stamp));
>> I suppose we'll figure something out when this becomes a bottleneck. Ideally
>> we'd do something like:
>>
>>  ctx->stamp = local_clock();
>>
>> but for now we cannot guarantee that's not jiffies, and I suppose that's a 
>> tad
>> too coarse to work for this.
> This might mess up when 2 cores happen to return exactly the same time, how 
> do you choose a winner in that case?
> EDIT: Using pointer address like you suggested below is fine with me. ctx 
> pointer would be static enough.
>
>> Also, why is 0 special?
> Oops, 0 is no longer special.
>
> I used to set the samp directly on the lock, so 0 used to mean no ctx set.
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +   ctx->ww_class = ww_class;
>>> +   ctx->acquired = ctx->done_acquire = 0;
>>> +   ctx->contending_lock = NULL;
>>> +#endif
>>> +#ifdef CONFIG_DEBUG_LOCK_ALLOC
>>> +   debug_check_no_locks_freed((void *)ctx, sizeof(*ctx));
>>> +   lockdep_init_map(&ctx->dep_map, ww_class->acquire_name,
>>> +                    &ww_class->acquire_key, 0);
>>> +   mutex_acquire(&ctx->dep_map, 0, 0, _RET_IP_);
>>> +#endif
>>> +}
>>> +/**
>>> + * ww_mutex_trylock_single - tries to acquire the w/w mutex without 
>>> acquire context
>>> + * @lock: mutex to lock
>>> + *
>>> + * Trylocks a mutex without acquire context, so no deadlock detection is
>>> + * possible. Returns 0 if the mutex has been acquired.
>>> + *
>>> + * Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
>>> + */
>>> +static inline int __must_check ww_mutex_trylock_single(struct ww_mutex 
>>> *lock)
>>> +{
>>> +   return mutex_trylock(&lock->base);
>>> +}
>> trylocks can never deadlock they don't block per definition, I don't see the
>> point of the _single() thing here.
> I called it single because they weren't annotated into any ctx. I can drop 
> the _single suffix though,
> but you'd still need to unlock with unlock_single, or we need to remove that 
> distinction altogether,
> lose a few lockdep checks and only have a one unlock function.
>
>>> +/**
>>> + * ww_mutex_lock_single - acquire the w/w mutex without acquire context
>>> + * @lock: mutex to lock
>>> + *
>>> + * Locks a mutex without acquire context, so no deadlock detection is
>>> + * possible.
>>> + *
>>> + * Unlocking the mutex must happen with a call to ww_mutex_unlock_single.
>>> + */
>>> +static inline void ww_mutex_lock_single(struct ww_mutex *lock)
>>> +{
>>> +   mutex_lock(&lock->base);
>>> +}
>> as per the above, I'm missing the rationale for having this.
>>
>>> diff --git a/kernel/mutex.c b/kernel/mutex.c
>>> index 84a5f07..66807c7 100644
>>> --- a/kernel/mutex.c
>>> +++ b/kernel/mutex.c
>>> @@ -127,16 +127,156 @@ void __sched mutex_unlock(struct mutex *lock)
>>>  
>>>  EXPORT_SYMBOL(mutex_unlock);
>>>  
>>> +/**
>>> + * ww_mutex_unlock - release the w/w mutex
>>> + * @lock: the mutex to be released
>>> + *
>>> + * Unlock a mutex that has been locked by this task previously
>>> + * with ww_mutex_lock* using an acquire context. It is forbidden to 
>>> release the
>>> + * locks after releasing the acquire context.
>>> + *
>>> + * This function must not be used in interrupt context. Unlocking
>>> + * of a unlocked mutex is not allowed.
>>> + *
>>> + * Note that locks acquired with one of the ww_mutex_lock*single variant 
>>> must be
>>> + * unlocked with ww_mutex_unlock_single.
>>> + */
>>> +void __sched ww_mutex_unlock(struct ww_mutex *lock)
>>> +{
>>> +   /*
>>> +    * The unlocking fastpath is the 0->1 transition from 'locked'
>>> +    * into 'unlocked' state:
>>> +    */
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +   DEBUG_LOCKS_WARN_ON(!lock->ctx);
>>> +   if (lock->ctx) {
>>> +           DEBUG_LOCKS_WARN_ON(!lock->ctx->acquired);
>>> +           if (lock->ctx->acquired > 0)
>>> +                   lock->ctx->acquired--;
>>> +   }
>>> +#endif
>>> +   lock->ctx = NULL;
>> barriers should always have a comment explaining the exact ordering and the
>> pairing barrier's location.
>>
>>> +   smp_mb__before_atomic_inc();
>>> +
>>> +#ifndef CONFIG_DEBUG_MUTEXES
>>> +   /*
>>> +    * When debugging is enabled we must not clear the owner before time,
>>> +    * the slow path will always be taken, and that clears the owner field
>>> +    * after verifying that it was indeed current.
>>> +    */
>>> +   mutex_clear_owner(&lock->base);
>>> +#endif
>>> +   __mutex_fastpath_unlock(&lock->base.count, __mutex_unlock_slowpath);
>>> +}
>>> +EXPORT_SYMBOL(ww_mutex_unlock);
>>> +
>>> +static inline int __sched
>>> +__mutex_lock_check_stamp(struct mutex *lock, struct ww_acquire_ctx *ctx)
>>> +{
>>> +   struct ww_mutex *ww = container_of(lock, struct ww_mutex, base);
>>> +   struct ww_acquire_ctx *hold_ctx = ACCESS_ONCE(ww->ctx);
>>> +
>>> +   if (!hold_ctx)
>>> +           return 0;
>>> +
>>> +   if (unlikely(ctx->stamp == hold_ctx->stamp))
>>> +           return -EALREADY;
>> Why compare stamps? I expected: ctx == hold_ctx here.
> Because the check just below it compares stamps too, having the same check
> would make it clear that when ctx->stamp - hold_ctx->stamp == 0 is not 
> expected.
>
>>> +
>>> +   if (unlikely(ctx->stamp - hold_ctx->stamp <= LONG_MAX)) {
>> Why not simply write: ctx->stamp > hold_ctx->stamp ?
> To handle the wraparound case on 32-bits?
>> If we need to deal with equal stamps from different contexts we could 
>> tie-break
>> based on ctx address or so, but seeing its a global counter from the class,
>> that shouldn't happen for now.
> Sounds good enough to me.
>
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +           DEBUG_LOCKS_WARN_ON(ctx->contending_lock);
>>> +           ctx->contending_lock = ww;
>>> +#endif
>>> +           return -EDEADLK;
>>> +   }
>>> +
>>> +   return 0;
>>> +}
>>> +
>>> +static __always_inline void ww_mutex_lock_acquired(struct ww_mutex *ww,
>>> +                                              struct ww_acquire_ctx 
>>> *ww_ctx,
>>> +                                              bool ww_slow)
>>> +{
>>> +#ifdef CONFIG_DEBUG_MUTEXES
>>> +   /*
>>> +    * If this WARN_ON triggers, you used mutex_lock to acquire,
>>> +    * but released with ww_mutex_unlock in this call.
>>> +    */
>>> +   DEBUG_LOCKS_WARN_ON(ww->ctx);
>>> +
>>> +   /*
>>> +    * Not quite done after ww_acquire_done() ?
>>> +    */
>>> +   DEBUG_LOCKS_WARN_ON(ww_ctx->done_acquire);
>>> +
>>> +   if (ww_slow) {
>> s/ww_slow/!ww_ctx->acquired/
>>
>>> +           DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock != ww);
>>> +           ww_ctx->contending_lock = NULL;
>>> +   } else
>>> +           DEBUG_LOCKS_WARN_ON(ww_ctx->contending_lock);
>>> +
>>> +
>>> +   /*
>>> +    * Naughty, using a different class can lead to undefined behavior!
>>> +    */
>>> +   DEBUG_LOCKS_WARN_ON(ww_ctx->ww_class != ww->ww_class);
>>> +
>>> +   if (ww_slow)
>>> +           DEBUG_LOCKS_WARN_ON(ww_ctx->acquired > 0);
>>> +
>>> +   ww_ctx->acquired++;
>>> +#endif
>>> +}
>>> +
>>> +/*
>>> + * after acquiring lock with fastpath or when we lost out in contested
>>> + * slowpath, set ctx and wake up any waiters so they can recheck.
>>> + *
>>> + * This function is never called when CONFIG_DEBUG_LOCK_ALLOC is set,
>>> + * as the fastpath and opportunistic spinning are disabled in that case.
>>> + */
>>> +static __always_inline void
>>> +ww_mutex_set_context_fastpath(struct ww_mutex *lock,
>>> +                          struct ww_acquire_ctx *ctx)
>>> +{
>>> +   unsigned long flags;
>>> +   struct mutex_waiter *cur;
>>> +
>>> +   ww_mutex_lock_acquired(lock, ctx, false);
>>> +
>>> +   lock->ctx = ctx;
>>  missing comment
> Yeah, this was patched up as per danvet's command, moved the smp_mb__after 
> upwards, and added a full smp_mb after setting lock->ctx.
>>> +   smp_mb__after_atomic_dec();
>>> +
>>> +   /*
>>> +    * Check if lock is contended, if not there is nobody to wake up
>>> +    */
>>> +   if (likely(atomic_read(&lock->base.count) == 0))
>>> +           return;
>>> +
>>> +   /*
>>> +    * Uh oh, we raced in fastpath, wake up everyone in this case,
>>> +    * so they can see the new ctx
>>> +    */
>>> +   spin_lock_mutex(&lock->base.wait_lock, flags);
>>> +   list_for_each_entry(cur, &lock->base.wait_list, list) {
>>> +           debug_mutex_wake_waiter(&lock->base, cur);
>>> +           wake_up_process(cur->task);
>>> +   }
>>> +   spin_unlock_mutex(&lock->base.wait_lock, flags);
>>> +}
>>> +
>>>  /*
>>>   * Lock a mutex (possibly interruptible), slowpath:
>>>   */
>>> -static inline int __sched
>>> +static __always_inline int __sched
>>>  __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>> -               struct lockdep_map *nest_lock, unsigned long ip)
>>> +               struct lockdep_map *nest_lock, unsigned long ip,
>>> +               struct ww_acquire_ctx *ww_ctx, bool ww_slow)
>>>  {
>>>     struct task_struct *task = current;
>>>     struct mutex_waiter waiter;
>>>     unsigned long flags;
>>> +   int ret;
>>>  
>>>     preempt_disable();
>>>     mutex_acquire_nest(&lock->dep_map, subclass, 0, nest_lock, ip);
>>> @@ -163,6 +303,14 @@ __mutex_lock_common(struct mutex *lock, long state, 
>>> unsigned int subclass,
>>>     for (;;) {
>>>             struct task_struct *owner;
>>>  
>>> +           if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
>> Since we _know_ ww_ctx isn't NULL, we can trivially do: 
>> s/ww_slow/!ww_ctx->acquired/
>>
>>> +                   struct ww_mutex *ww;
>>> +
>>> +                   ww = container_of(lock, struct ww_mutex, base);
>>> +                   if (ACCESS_ONCE(ww->ctx))
>> What's the point of this ACCESS_ONCE()?
> Break out of the spin_on_owner loop. Without taking spin_lock_mutex there is 
> no guarantee that the
> contents of ww->ctx are valid or sane in any way, so there's no way to check 
> if we ought to back off or not.
>
>
>>> +                           break;
>>> +           }
>>> +
>>>             /*
>>>              * If there's an owner, wait for it to either
>>>              * release the lock or go to sleep.
>>> @@ -173,6 +321,13 @@ __mutex_lock_common(struct mutex *lock, long state, 
>>> unsigned int subclass,
>>>  
>>>             if (atomic_cmpxchg(&lock->count, 1, 0) == 1) {
>>>                     lock_acquired(&lock->dep_map, ip);
>> Should this not also have a __builtin_constant_p(ww_ctx == NULL) ?
> ww_slow should not be set to non-zero when ww_ctx == NULL.
>
>>> +                   if (ww_slow) {
>>> +                           struct ww_mutex *ww;
>>> +                           ww = container_of(lock, struct ww_mutex, base);
>>> +
>>> +                           ww_mutex_set_context_fastpath(ww, ww_ctx);
>>> +                   }
>>> +
>>>                     mutex_set_owner(lock);
>>>                     preempt_enable();
>>>                     return 0;
>>> @@ -228,15 +383,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
>>> unsigned int subclass,
>>>              * TASK_UNINTERRUPTIBLE case.)
>>>              */
>>>             if (unlikely(signal_pending_state(state, task))) {
>>> -                   mutex_remove_waiter(lock, &waiter,
>>> -                                       task_thread_info(task));
>>> -                   mutex_release(&lock->dep_map, 1, ip);
>>> -                   spin_unlock_mutex(&lock->wait_lock, flags);
>>> +                   ret = -EINTR;
>>> +                   goto err;
>>> +           }
>>>  
>>> -                   debug_mutex_free_waiter(&waiter);
>>> -                   preempt_enable();
>>> -                   return -EINTR;
>>> +           if (!__builtin_constant_p(ww_ctx == NULL) && !ww_slow) {
>>> +                   ret = __mutex_lock_check_stamp(lock, ww_ctx);
>>> +                   if (ret)
>>> +                           goto err;
>>>             }
>>> +
>>>             __set_task_state(task, state);
>>>  
>>>             /* didn't get the lock, go to sleep: */
>>> @@ -251,6 +407,30 @@ done:
>>>     mutex_remove_waiter(lock, &waiter, current_thread_info());
>>>     mutex_set_owner(lock);
>>>  
>>> +   if (!__builtin_constant_p(ww_ctx == NULL)) {
>>> +           struct ww_mutex *ww = container_of(lock,
>>> +                                                 struct ww_mutex,
>>> +                                                 base);
>>> +           struct mutex_waiter *cur;
>>> +
>>> +           /*
>>> +            * This branch gets optimized out for the common case,
>>> +            * and is only important for ww_mutex_lock.
>>> +            */
>>> +
>>> +           ww_mutex_lock_acquired(ww, ww_ctx, ww_slow);
>>> +           ww->ctx = ww_ctx;
>>> +
>>> +           /*
>>> +            * Give any possible sleeping processes the chance to wake up,
>>> +            * so they can recheck if they have to back off.
>>> +            */
>>> +           list_for_each_entry(cur, &lock->wait_list, list) {
>>> +                   debug_mutex_wake_waiter(lock, cur);
>>> +                   wake_up_process(cur->task);
>>> +           }
>>> +   }
>>> +
>>>     /* set it to 0 if there are no waiters left: */
>>>     if (likely(list_empty(&lock->wait_list)))
>>>             atomic_set(&lock->count, 0);
>>> @@ -261,6 +441,14 @@ done:
>>>     preempt_enable();
>>>  
>>>     return 0;
>>> +
>>> +err:
>>> +   mutex_remove_waiter(lock, &waiter, task_thread_info(task));
>>> +   spin_unlock_mutex(&lock->wait_lock, flags);
>>> +   debug_mutex_free_waiter(&waiter);
>>> +   mutex_release(&lock->dep_map, 1, ip);
>>> +   preempt_enable();
>>> +   return ret;
>>>  }
>>>  
>>>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>>> @@ -268,7 +456,8 @@ void __sched
>>>  mutex_lock_nested(struct mutex *lock, unsigned int subclass)
>>>  {
>>>     might_sleep();
>>> -   __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, subclass, NULL, 
>>> _RET_IP_);
>>> +   __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE,
>>> +                       subclass, NULL, _RET_IP_, 0, 0);
>>>  }
>> The pendant in me has to tell you 4x that NULL != 0 :-)
>>
>>> +EXPORT_SYMBOL_GPL(ww_mutex_lock);
>>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_interruptible);
>>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow);
>>> +EXPORT_SYMBOL_GPL(ww_mutex_lock_slow_interruptible);
>> Now having to do the _slow stuff saves lines and interface complexity!
> It will also reduce useful debugging information returned a little. Danvet 
> answered it better than me.
>
>>> @@ -401,20 +738,39 @@ __mutex_lock_slowpath(atomic_t *lock_count)
>>>  {
>>>     struct mutex *lock = container_of(lock_count, struct mutex, count);
>>>  
>>> -   __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0, NULL, _RET_IP_);
>>> +   __mutex_lock_common(lock, TASK_UNINTERRUPTIBLE, 0,
>>> +                       NULL, _RET_IP_, 0, 0);
>>>  }
>>>  
>>>  static noinline int __sched
>>>  __mutex_lock_killable_slowpath(struct mutex *lock)
>>>  {
>>> -   return __mutex_lock_common(lock, TASK_KILLABLE, 0, NULL, _RET_IP_);
>>> +   return __mutex_lock_common(lock, TASK_KILLABLE, 0,
>>> +                              NULL, _RET_IP_, 0, 0);
>>>  }
>>>  
>>>  static noinline int __sched
>>>  __mutex_lock_interruptible_slowpath(struct mutex *lock)
>>>  {
>>> -   return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_);
>>> +   return __mutex_lock_common(lock, TASK_INTERRUPTIBLE, 0,
>>> +                              NULL, _RET_IP_, 0, 0);
>>>  }
>> A few more cases where NULL != 0 :-)
> But yeah, so open questions..
>
> 1. Do you still want to get rid of the _single variants, even though doing so 
> would slightly reduce debugging?
This is not as disastrous as I originally thought.
lock_single needs to be kept, unlock_single can be removed without consequences.
trylock_single could then be renamed to trylock, since it's no longer paired 
with unlock_single.

It will prevent unlock and unlock_single from being mismatched, because the 
same code would get executed.
Calling the mutex_unlock function directly on a ww_mutex must then be 
forbidden, as it's only an implementation detail.


> 2. Do you really want to drop the *_slow variants?
> Doing so might reduce debugging slightly. I like method #2 in 
> ww-mutex-design.txt, it makes it very clear why you
> would handle the *_slow case differently anyway.
As you pointed out, we wouldn't lose much debugging information.
The same checks could be done in the normal variant with
WARN_ON(ctx->lock && ctx->lock != lock);
WARN_ON(ctx->lock && ctx->acquired > 0);

But it boils down to ww_mutex_lock_slow returning void instead of int 
__must_check from ww_mutex_lock.

Maybe add inlines for *_slow, that use the ww_mutex_lock functions, and check 
ctx->lock == lock in debugging mode?
> 3. is a smp_mb needed to serialize lock->ctx with the atomic_read?
>
> (mutex locked in fastpath, which is typically an atomic_dec operation)
> smp_mb__after_atomic_dec();
> lock->ctx = ..;
> smp_mb();
> if (atomic_read(lock->count) == 0) return;
>
> feels a bit like overkill to me.
>
> ~Maarten

Reply via email to