rwsem: Guard against making count negative

Waiman Long Wed, 24 Apr 2019 10:10:43 -0700

On 4/24/19 1:01 PM, Peter Zijlstra wrote:
> On Wed, Apr 24, 2019 at 12:49:05PM -0400, Waiman Long wrote:
>> On 4/24/19 3:09 AM, Peter Zijlstra wrote:
>>> On Tue, Apr 23, 2019 at 03:12:16PM -0400, Waiman Long wrote:
>>>> That is true in general, but doing preempt_disable/enable across
>>>> function boundary is ugly and prone to further problems down the road.
>>> We do worse things in this code, and the thing Linus proposes is
>>> actually quite simple, something like so:
>>>
>>> ---
>>> --- a/kernel/locking/rwsem.c
>>> +++ b/kernel/locking/rwsem.c
>>> @@ -912,7 +904,7 @@ rwsem_down_read_slowpath(struct rw_semap
>>>                     raw_spin_unlock_irq(&sem->wait_lock);
>>>                     break;
>>>             }
>>> -           schedule();
>>> +           schedule_preempt_disabled();
>>>             lockevent_inc(rwsem_sleep_reader);
>>>     }
>>>  
>>> @@ -1121,6 +1113,7 @@ static struct rw_semaphore *rwsem_downgr
>>>   */
>>>  inline void __down_read(struct rw_semaphore *sem)
>>>  {
>>> +   preempt_disable();
>>>     if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>>                     &sem->count) & RWSEM_READ_FAILED_MASK)) {
>>>             rwsem_down_read_slowpath(sem, TASK_UNINTERRUPTIBLE);
>>> @@ -1129,10 +1122,12 @@ inline void __down_read(struct rw_semaph
>>>     } else {
>>>             rwsem_set_reader_owned(sem);
>>>     }
>>> +   preempt_enable();
>>>  }
>>>  
>>>  static inline int __down_read_killable(struct rw_semaphore *sem)
>>>  {
>>> +   preempt_disable();
>>>     if (unlikely(atomic_long_fetch_add_acquire(RWSEM_READER_BIAS,
>>>                     &sem->count) & RWSEM_READ_FAILED_MASK)) {
>>>             if (IS_ERR(rwsem_down_read_slowpath(sem, TASK_KILLABLE)))
>>> @@ -1142,6 +1137,7 @@ static inline int __down_read_killable(s
>>>     } else {
>>>             rwsem_set_reader_owned(sem);
>>>     }
>>> +   preempt_enable();
>>>     return 0;
>>>  }
>>>  
>> Making that change will help the slowpath to has less preemption points.
> That doesn't matter, right? Either it blocks or it goes through quickly.
>
> If you're worried about a parituclar spot we can easily put in explicit
> preemption points.
>
>> For an uncontended rwsem, this offers no real benefit. Adding
>> preempt_disable() is more complicated than I originally thought.
> I'm not sure I get your objection?
>
>> Maybe we are too paranoid about the possibility of a large number of
>> preemptions happening just at the right moment. If p is the probably of
>> a preemption in the middle of the inc-check-dec sequence, which I have
>> already moved as close to each other as possible. We are talking a
>> probability of p^32768. Since p will be really small, the compound
>> probability will be infinitesimally small.
> Sure; but we run on many millions of machines every second, so the
> actual accumulated chance of it happening eventually is still fairly
> significant.
>
>> So I would like to not do preemption now for the current patchset. We
>> can restart the discussion later on if there is a real concern that it
>> may actually happen. Please let me know if you still want to add
>> preempt_disable() for the read lock.
> I like provably correct schemes over prayers.



I am fine with adding preempt_disable(). I just want confirmation that
you want to have that.


>
> As you noted, distros don't usually ship with PREEMPT=y and therefore
> will not be bothered much by any of this.
>
> The old scheme basically worked by the fact that the total supported
> reader count was higher than the number of addressable pages in the
> system and therefore the overflow could not happen.
>
> We now transition to number of CPUs, and for that we pay a little price
> with PREEMPT=y kernels. Either that or cmpxchg.

I also thought about switching to a cmpxchg loop for PREEMPT=y kernel.
Let start with just preempt_disable() for now. We can evaluate the
cmpxchg loop alternative later on.

Cheers,
Longman

Re: [PATCH v4 14/16] locking/rwsem: Guard against making count negative

Reply via email to