On 04/10/2019 02:44 PM, Peter Zijlstra wrote:
> On Fri, Apr 05, 2019 at 03:21:05PM -0400, Waiman Long wrote:
>> Because of writer lock stealing, it is possible that a constant
>> stream of incoming writers will cause a waiting writer or reader to
>> wait indefinitely leading to lock starvation.
>>
>> The mutex code has a lock handoff mechanism to prevent lock starvation.
>> This patch implements a similar lock handoff mechanism to disable
>> lock stealing and force lock handoff to the first waiter in the queue
>> after at least a 5ms waiting period. The waiting period is used to
>> avoid discouraging lock stealing too much to affect performance.
> I would say the handoff it not at all similar to the mutex code. It is
> in fact radically different.
>

I mean they are similar in concept. Of course, the implementations are
quite different.

>> @@ -131,6 +138,15 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem,
>>              adjustment = RWSEM_READER_BIAS;
>>              oldcount = atomic_long_fetch_add(adjustment, &sem->count);
>>              if (unlikely(oldcount & RWSEM_WRITER_MASK)) {
>> +                    /*
>> +                     * Initiate handoff to reader, if applicable.
>> +                     */
>> +                    if (!(oldcount & RWSEM_FLAG_HANDOFF) &&
>> +                        time_after(jiffies, waiter->timeout)) {
>> +                            adjustment -= RWSEM_FLAG_HANDOFF;
>> +                            lockevent_inc(rwsem_rlock_handoff);
>> +                    }
>> +
>>                      atomic_long_sub(adjustment, &sem->count);
>>                      return;
>>              }
> That confuses the heck out of me...
>
> The above seems to rely on __rwsem_mark_wake() to be fully serialized
> (and it is, by ->wait_lock, but that isn't spelled out anywhere) such
> that we don't get double increment of FLAG_HANDOFF.
>
> So there is NO __rwsem_mark_wake() vs __wesem_mark_wake() race like:
>
>   CPU0                                        CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
>                                       oldcount = 
> atomic_long_fetch_add(adjustment, &sem->count)
>
>   if (!(oldcount & HANDOFF))
>     adjustment -= HANDOFF;
>
>                                       if (!(oldcount & HANDOFF))
>                                         adjustment -= HANDOFF;
>   atomic_long_sub(adjustment)
>                                       atomic_long_sub(adjustment)
>
>
> *whoops* double negative decrement of HANDOFF (aka double increment).

Yes, __rwsem_mark_wake() is always called with wait_lock held. I can add
a lockdep_assert() statement to clarify this point.

>
> However there is another site that fiddles with the HANDOFF bit, namely
> __rwsem_down_write_failed_common(), and that does:
>
> +                               atomic_long_or(RWSEM_FLAG_HANDOFF, 
> &sem->count);
>
> _OUTSIDE_ of ->wait_lock, which would yield:
>
>   CPU0                                        CPU1
>
>   oldcount = atomic_long_fetch_add(adjustment, &sem->count)
>
>                                       atomic_long_or(HANDOFF)
>
>   if (!(oldcount & HANDOFF))
>     adjustment -= HANDOFF;
>
>   atomic_long_sub(adjustment)
>
> *whoops*, incremented HANDOFF on HANDOFF.
>
>
> And there's not a comment in sight that would elucidate if this is
> possible or not.
>

A writer can only set the handoff bit if it is the first waiter in the
queue. If it is the first waiter, a racing __rwsem_mark_wake() will see
that the first waiter is a writer and so won't go into the reader path.
I know I something don't spell out all the conditions that may look
obvious to me but not to others. I will elaborate more in comments.

> Also:
>
> +                               atomic_long_or(RWSEM_FLAG_HANDOFF, 
> &sem->count);
> +                               first++;
> +
> +                               /*
> +                                * Make sure the handoff bit is seen by
> +                                * others before proceeding.
> +                                */
> +                               smp_mb__after_atomic();
>
> That comment is utter nonsense. smp_mb() doesn't (and cannot) 'make
> visible'. There needs to be order between two memops on both sides.
>
I kind of add that for safety. I will take some time to rethink if it is
really necessary.

Cheers,
Longman


Reply via email to