Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-31 Thread Huang, Ying
huang ying  writes:

> On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong  
> wrote:
>> On 2016/1/29 17:53, Peter Zijlstra wrote:
>>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>>
 looks good to me, I will try this solution and report the result, thanks 
 everyone.
>>>
>>> Did you get a change to run with this?
>>>
>>> .
>>>
>>
>> I backport this patch to 3.10 lts kernel, and didn't change any
>> logic, Till now, the patch works fine to me, and no need to change
>> anything,
>> So I think this patch is no problem, could you formal release this patch to 
>> the latest kernel? :)
>>
>> Thanks.
>> Ding
>>
>>
>
> The original patch from Tianhong triggered a performance regression
> because the optimistic spinning is turned off in effect.  I tested
> Peter's patch with same configuration and there show no regression.
> So I think the patch keep the optimistic spinning.  Test result
> details will be in the next email.

Here is the detailed test result:

=
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench

commit: 
  v4.4
  1db66c17114d5437c0757d6792c0d8923192ecd6

v4.4 1db66c17114d5437c0757d6792 
 -- 
 %stddev %change %stddev
 \  |\  
371269 ± 10% -93.2%  25080 ±  4%  
unixbench.time.voluntary_context_switches
371269 ± 10% -93.2%  25080 ±  4%  time.voluntary_context_switches
  6189 ±  8% -76.4%   1463 ±  6%  vmstat.system.cs
  5706 ±  0%  -1.7%   5608 ±  0%  vmstat.system.in
113680 ± 12% -73.5%  30086 ±  1%  cpuidle.C1-IVB.usage
   1515925 ± 20% +68.3%2552001 ± 11%  cpuidle.C1E-IVB.time
   1227221 ± 20% -51.7% 592695 ± 17%  cpuidle.C3-IVB.time
  2697 ± 10% -77.8% 598.33 ± 10%  cpuidle.C3-IVB.usage
 15173 ±  6% -23.1%  11663 ±  1%  cpuidle.C6-IVB.usage
 34.38 ± 27% -35.0%  22.33 ±  2%  cpuidle.POLL.usage
 61.92 ±  9% +14.3%  70.78 ±  7%  sched_debug.cfs_rq:/.load_avg.min
 40.85 ± 29% +27.3%  52.00 ± 10%  
sched_debug.cfs_rq:/.runnable_load_avg.2
 -1949 ±-37% -64.6%-690.19 ±-134%  sched_debug.cfs_rq:/.spread0.4
 -1773 ±-29% -85.6%-256.00 ±-388%  sched_debug.cfs_rq:/.spread0.7
 -2478 ±-26% -49.5%  -1251 ±-66%  sched_debug.cfs_rq:/.spread0.min
 61.95 ±  9% +14.8%  71.11 ±  7%  
sched_debug.cfs_rq:/.tg_load_avg_contrib.min
396962 ± 12% +27.9% 507573 ± 10%  sched_debug.cpu.avg_idle.0
432973 ± 18% +45.3% 629147 ± 12%  sched_debug.cpu.avg_idle.6
448566 ±  3% +11.5% 40 ±  0%  sched_debug.cpu.avg_idle.avg
 45.31 ±  5%  -9.5%  41.00 ±  3%  sched_debug.cpu.cpu_load[3].7
 52204 ± 10% -49.9%  26173 ± 34%  sched_debug.cpu.nr_switches.0
 50383 ± 12% -57.6%  21353 ± 15%  sched_debug.cpu.nr_switches.1
 45425 ± 16% -68.5%  14325 ± 28%  sched_debug.cpu.nr_switches.2
 43069 ± 20% -65.5%  14852 ± 41%  sched_debug.cpu.nr_switches.3
 40285 ± 16% -70.4%  11905 ± 47%  sched_debug.cpu.nr_switches.4
 40732 ± 13% -75.8%   9872 ± 39%  sched_debug.cpu.nr_switches.5
 43011 ± 19% -80.0%   8607 ± 42%  sched_debug.cpu.nr_switches.6
 38076 ± 12% -75.9%   9167 ± 40%  sched_debug.cpu.nr_switches.7
 44148 ±  7% -67.1%  14532 ±  6%  sched_debug.cpu.nr_switches.avg
 59877 ±  8% -51.3%  29146 ± 21%  sched_debug.cpu.nr_switches.max
 33672 ±  9% -83.0%   5718 ±  4%  sched_debug.cpu.nr_switches.min
 -0.62 ±-1411%   -2212.5%  13.00 ± 16%  
sched_debug.cpu.nr_uninterruptible.0
 -1.23 ±-582%   -1318.8%  15.00 ± 35%  
sched_debug.cpu.nr_uninterruptible.1
  2.54 ±263%+267.7%   9.33 ± 91%  
sched_debug.cpu.nr_uninterruptible.2
  0.31 ±2966%   -5841.7% -17.67 ±-19%  
sched_debug.cpu.nr_uninterruptible.5
  9.84 ± 19% +70.4%  16.76 ±  5%  
sched_debug.cpu.nr_uninterruptible.stddev
116287 ±  4% -20.9%  91972 ±  9%  sched_debug.cpu.sched_count.0
 50411 ± 12% -57.6%  21382 ± 15%  sched_debug.cpu.sched_count.1
 45453 ± 16% -68.4%  14356 ± 28%  sched_debug.cpu.sched_count.2
 43098 ± 20% -65.5%  14888 ± 41%  sched_debug.cpu.sched_count.3
 40314 ± 16% -70.4%  11934 ± 47%  sched_debug.cpu.sched_count.4
 40761 ± 13% -75.7%   9896 ± 39%  sched_debug.cpu.sched_count.5
 43041 ± 19% -79.9%   8636 ± 42%  sched_debug.cpu.sched_count.6
 38105 ± 12% -75.9%   9193 ± 40%  sched_debug.cpu.sched_count.7
 52184 ±  6% -56.3%  22782 ±  4%  sched_debug.cpu.sched_count.avg
116288 ±  4% -20.9%  91972 ±  9%  

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-31 Thread huang ying
On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong  wrote:
> On 2016/1/29 17:53, Peter Zijlstra wrote:
>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>
>>> looks good to me, I will try this solution and report the result, thanks 
>>> everyone.
>>
>> Did you get a change to run with this?
>>
>> .
>>
>
> I backport this patch to 3.10 lts kernel, and didn't change any logic, Till 
> now, the patch works fine to me, and no need to change anything,
> So I think this patch is no problem, could you formal release this patch to 
> the latest kernel? :)
>
> Thanks.
> Ding
>
>

The original patch from Tianhong triggered a performance regression
because the optimistic spinning is turned off in effect.  I tested
Peter's patch with same configuration and there show no regression.
So I think the patch keep the optimistic spinning.  Test result
details will be in the next email.

Best Regards,
Huang, YIng


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-31 Thread huang ying
On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong  wrote:
> On 2016/1/29 17:53, Peter Zijlstra wrote:
>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>
>>> looks good to me, I will try this solution and report the result, thanks 
>>> everyone.
>>
>> Did you get a change to run with this?
>>
>> .
>>
>
> I backport this patch to 3.10 lts kernel, and didn't change any logic, Till 
> now, the patch works fine to me, and no need to change anything,
> So I think this patch is no problem, could you formal release this patch to 
> the latest kernel? :)
>
> Thanks.
> Ding
>
>

The original patch from Tianhong triggered a performance regression
because the optimistic spinning is turned off in effect.  I tested
Peter's patch with same configuration and there show no regression.
So I think the patch keep the optimistic spinning.  Test result
details will be in the next email.

Best Regards,
Huang, YIng


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-31 Thread Huang, Ying
huang ying  writes:

> On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong  
> wrote:
>> On 2016/1/29 17:53, Peter Zijlstra wrote:
>>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>>
 looks good to me, I will try this solution and report the result, thanks 
 everyone.
>>>
>>> Did you get a change to run with this?
>>>
>>> .
>>>
>>
>> I backport this patch to 3.10 lts kernel, and didn't change any
>> logic, Till now, the patch works fine to me, and no need to change
>> anything,
>> So I think this patch is no problem, could you formal release this patch to 
>> the latest kernel? :)
>>
>> Thanks.
>> Ding
>>
>>
>
> The original patch from Tianhong triggered a performance regression
> because the optimistic spinning is turned off in effect.  I tested
> Peter's patch with same configuration and there show no regression.
> So I think the patch keep the optimistic spinning.  Test result
> details will be in the next email.

Here is the detailed test result:

=
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
  
gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench

commit: 
  v4.4
  1db66c17114d5437c0757d6792c0d8923192ecd6

v4.4 1db66c17114d5437c0757d6792 
 -- 
 %stddev %change %stddev
 \  |\  
371269 ± 10% -93.2%  25080 ±  4%  
unixbench.time.voluntary_context_switches
371269 ± 10% -93.2%  25080 ±  4%  time.voluntary_context_switches
  6189 ±  8% -76.4%   1463 ±  6%  vmstat.system.cs
  5706 ±  0%  -1.7%   5608 ±  0%  vmstat.system.in
113680 ± 12% -73.5%  30086 ±  1%  cpuidle.C1-IVB.usage
   1515925 ± 20% +68.3%2552001 ± 11%  cpuidle.C1E-IVB.time
   1227221 ± 20% -51.7% 592695 ± 17%  cpuidle.C3-IVB.time
  2697 ± 10% -77.8% 598.33 ± 10%  cpuidle.C3-IVB.usage
 15173 ±  6% -23.1%  11663 ±  1%  cpuidle.C6-IVB.usage
 34.38 ± 27% -35.0%  22.33 ±  2%  cpuidle.POLL.usage
 61.92 ±  9% +14.3%  70.78 ±  7%  sched_debug.cfs_rq:/.load_avg.min
 40.85 ± 29% +27.3%  52.00 ± 10%  
sched_debug.cfs_rq:/.runnable_load_avg.2
 -1949 ±-37% -64.6%-690.19 ±-134%  sched_debug.cfs_rq:/.spread0.4
 -1773 ±-29% -85.6%-256.00 ±-388%  sched_debug.cfs_rq:/.spread0.7
 -2478 ±-26% -49.5%  -1251 ±-66%  sched_debug.cfs_rq:/.spread0.min
 61.95 ±  9% +14.8%  71.11 ±  7%  
sched_debug.cfs_rq:/.tg_load_avg_contrib.min
396962 ± 12% +27.9% 507573 ± 10%  sched_debug.cpu.avg_idle.0
432973 ± 18% +45.3% 629147 ± 12%  sched_debug.cpu.avg_idle.6
448566 ±  3% +11.5% 40 ±  0%  sched_debug.cpu.avg_idle.avg
 45.31 ±  5%  -9.5%  41.00 ±  3%  sched_debug.cpu.cpu_load[3].7
 52204 ± 10% -49.9%  26173 ± 34%  sched_debug.cpu.nr_switches.0
 50383 ± 12% -57.6%  21353 ± 15%  sched_debug.cpu.nr_switches.1
 45425 ± 16% -68.5%  14325 ± 28%  sched_debug.cpu.nr_switches.2
 43069 ± 20% -65.5%  14852 ± 41%  sched_debug.cpu.nr_switches.3
 40285 ± 16% -70.4%  11905 ± 47%  sched_debug.cpu.nr_switches.4
 40732 ± 13% -75.8%   9872 ± 39%  sched_debug.cpu.nr_switches.5
 43011 ± 19% -80.0%   8607 ± 42%  sched_debug.cpu.nr_switches.6
 38076 ± 12% -75.9%   9167 ± 40%  sched_debug.cpu.nr_switches.7
 44148 ±  7% -67.1%  14532 ±  6%  sched_debug.cpu.nr_switches.avg
 59877 ±  8% -51.3%  29146 ± 21%  sched_debug.cpu.nr_switches.max
 33672 ±  9% -83.0%   5718 ±  4%  sched_debug.cpu.nr_switches.min
 -0.62 ±-1411%   -2212.5%  13.00 ± 16%  
sched_debug.cpu.nr_uninterruptible.0
 -1.23 ±-582%   -1318.8%  15.00 ± 35%  
sched_debug.cpu.nr_uninterruptible.1
  2.54 ±263%+267.7%   9.33 ± 91%  
sched_debug.cpu.nr_uninterruptible.2
  0.31 ±2966%   -5841.7% -17.67 ±-19%  
sched_debug.cpu.nr_uninterruptible.5
  9.84 ± 19% +70.4%  16.76 ±  5%  
sched_debug.cpu.nr_uninterruptible.stddev
116287 ±  4% -20.9%  91972 ±  9%  sched_debug.cpu.sched_count.0
 50411 ± 12% -57.6%  21382 ± 15%  sched_debug.cpu.sched_count.1
 45453 ± 16% -68.4%  14356 ± 28%  sched_debug.cpu.sched_count.2
 43098 ± 20% -65.5%  14888 ± 41%  sched_debug.cpu.sched_count.3
 40314 ± 16% -70.4%  11934 ± 47%  sched_debug.cpu.sched_count.4
 40761 ± 13% -75.7%   9896 ± 39%  sched_debug.cpu.sched_count.5
 43041 ± 19% -79.9%   8636 ± 42%  sched_debug.cpu.sched_count.6
 38105 ± 12% -75.9%   9193 ± 40%  sched_debug.cpu.sched_count.7
 52184 ±  6% -56.3%  22782 ±  4%  sched_debug.cpu.sched_count.avg
116288 

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-29 Thread Ding Tianhong
On 2016/1/29 17:53, Peter Zijlstra wrote:
> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> 
>> looks good to me, I will try this solution and report the result, thanks 
>> everyone.
> 
> Did you get a change to run with this?
> 
> .
> 

I backport this patch to 3.10 lts kernel, and didn't change any logic, Till 
now, the patch works fine to me, and no need to change anything,
So I think this patch is no problem, could you formal release this patch to the 
latest kernel? :)

Thanks.
Ding 




Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-29 Thread Peter Zijlstra
On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:

> looks good to me, I will try this solution and report the result, thanks 
> everyone.

Did you get a change to run with this?


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-29 Thread Peter Zijlstra
On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:

> looks good to me, I will try this solution and report the result, thanks 
> everyone.

Did you get a change to run with this?


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-29 Thread Ding Tianhong
On 2016/1/29 17:53, Peter Zijlstra wrote:
> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> 
>> looks good to me, I will try this solution and report the result, thanks 
>> everyone.
> 
> Did you get a change to run with this?
> 
> .
> 

I backport this patch to 3.10 lts kernel, and didn't change any logic, Till 
now, the patch works fine to me, and no need to change anything,
So I think this patch is no problem, could you formal release this patch to the 
latest kernel? :)

Thanks.
Ding 




Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-24 Thread Ding Tianhong
On 2016/1/22 21:59, Waiman Long wrote:
> On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>>
 There might be other details, but this is the one that stood out.
>>> I think this also does the wrong thing for use_ww_ctx.
>> Something like so?
> 
> I think that should work. My only minor concern is that putting the waiter 
> spinner at the end of the OSQ will take it longer to get the lock, but that 
> shouldn't be a big issue.
> 
>> ---
>>   kernel/locking/mutex.c | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c219c40e..070a0ac34aa7 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   struct task_struct *task = current;
>>   struct mutex_waiter waiter;
>>   unsigned long flags;
>> +bool acquired;
>>   int ret;
>>
>>   preempt_disable();
>> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   lock_contended(>dep_map, ip);
>>
>>   for (;;) {
>> +acquired = false;
>>   /*
>>* Lets try to take the lock again - this is needed even if
>>* we get here for the first time (shortly after failing to
>> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   /* didn't get the lock, go to sleep: */
>>   spin_unlock_mutex(>wait_lock, flags);
>>   schedule_preempt_disabled();
>> +
>> +if (mutex_is_locked(lock))
>> +acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>> +
>>   spin_lock_mutex(>wait_lock, flags);
>> +
>> +if (acquired) {
>> +atomic_set(>count, -1);
>> +break;
>> +}
>>   }
>>   __set_task_state(task, TASK_RUNNING);
>>
>> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   atomic_set(>count, 0);
>>   debug_mutex_free_waiter();
>>
>> +if (acquired)
>> +goto unlock;
>> +
>>   skip_wait:
>>   /* got the lock - cleanup and rejoice! */
>>   lock_acquired(>dep_map, ip);
>> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   ww_mutex_set_context_slowpath(ww, ww_ctx);
>>   }
>>
>> +unlock:
>>   spin_unlock_mutex(>wait_lock, flags);
>>   preempt_enable();
>>   return 0;
> 
> Cheers,
> Longman
> 

looks good to me, I will try this solution and report the result, thanks 
everyone.

Ding

> .
> 




Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-24 Thread Ding Tianhong
On 2016/1/22 21:59, Waiman Long wrote:
> On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>>
 There might be other details, but this is the one that stood out.
>>> I think this also does the wrong thing for use_ww_ctx.
>> Something like so?
> 
> I think that should work. My only minor concern is that putting the waiter 
> spinner at the end of the OSQ will take it longer to get the lock, but that 
> shouldn't be a big issue.
> 
>> ---
>>   kernel/locking/mutex.c | 15 +++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c219c40e..070a0ac34aa7 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   struct task_struct *task = current;
>>   struct mutex_waiter waiter;
>>   unsigned long flags;
>> +bool acquired;
>>   int ret;
>>
>>   preempt_disable();
>> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   lock_contended(>dep_map, ip);
>>
>>   for (;;) {
>> +acquired = false;
>>   /*
>>* Lets try to take the lock again - this is needed even if
>>* we get here for the first time (shortly after failing to
>> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   /* didn't get the lock, go to sleep: */
>>   spin_unlock_mutex(>wait_lock, flags);
>>   schedule_preempt_disabled();
>> +
>> +if (mutex_is_locked(lock))
>> +acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>> +
>>   spin_lock_mutex(>wait_lock, flags);
>> +
>> +if (acquired) {
>> +atomic_set(>count, -1);
>> +break;
>> +}
>>   }
>>   __set_task_state(task, TASK_RUNNING);
>>
>> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   atomic_set(>count, 0);
>>   debug_mutex_free_waiter();
>>
>> +if (acquired)
>> +goto unlock;
>> +
>>   skip_wait:
>>   /* got the lock - cleanup and rejoice! */
>>   lock_acquired(>dep_map, ip);
>> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
>> unsigned int subclass,
>>   ww_mutex_set_context_slowpath(ww, ww_ctx);
>>   }
>>
>> +unlock:
>>   spin_unlock_mutex(>wait_lock, flags);
>>   preempt_enable();
>>   return 0;
> 
> Cheers,
> Longman
> 

looks good to me, I will try this solution and report the result, thanks 
everyone.

Ding

> .
> 




Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Davidlohr Bueso

On Fri, 22 Jan 2016, Waiman Long wrote:


The patch that I sent out is just a proof of concept to make sure
that it can fix that particular case. I do plan to refactor it if I
decide to go ahead with an official one. Unlike the OSQ, there can be
no more than one waiter spinner as the wakeup function is directed to
only the first task in the wait list and the spinning won't happen
until the task is first woken up. In the worst case scenario, there
are only 2 spinners spinning on the lock and the owner field, one
from OSQ and one from the wait list. That shouldn't put too much
cacheline contention traffic to the system.


Similarly, I guess we should also wakeup the next waiter in line after
releasing the wait_lock via wake_q. This would allow the woken waiter a
slightly better chance of finding the wait_lock free when continuing to
take the mutex.

Thanks,
Davidlohr


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 06:06 AM, Peter Zijlstra wrote:

On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:

On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:


There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.

Something like so?


I think that should work. My only minor concern is that putting the 
waiter spinner at the end of the OSQ will take it longer to get the 
lock, but that shouldn't be a big issue.



---
  kernel/locking/mutex.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
+   bool acquired;
int ret;

preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);

for (;;) {
+   acquired = false;
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
+
spin_lock_mutex(>wait_lock, flags);
+
+   if (acquired) {
+   atomic_set(>count, -1);
+   break;
+   }
}
__set_task_state(task, TASK_RUNNING);

@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
atomic_set(>count, 0);
debug_mutex_free_waiter();

+   if (acquired)
+   goto unlock;
+
  skip_wait:
/* got the lock - cleanup and rejoice! */
lock_acquired(>dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
ww_mutex_set_context_slowpath(ww, ww_ctx);
}

+unlock:
spin_unlock_mutex(>wait_lock, flags);
preempt_enable();
return 0;


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 03:54 AM, Peter Zijlstra wrote:

On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.


Yes, that is optional. I put it there just to make it is more likely for 
the waiter spinner to get the lock. Without that, the chance will be 
50/50 on average. I can certainly take that out.



The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.


As I said in the previous mail, I do intend to refactor it before 
sending out the official patch.


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:

On Thu, 21 Jan 2016, Waiman Long wrote:


On 01/21/2016 04:29 AM, Ding Tianhong wrote:


I got the vmcore and found that the ifconfig is already in the 
wait_list of the
rtnl_lock for 120 second, but my process could get and release the 
rtnl_lock
normally several times in one second, so it means that my process 
jump the
queue and the ifconfig couldn't get the rtnl all the time, I check 
the mutex lock
slow path and found that the mutex may spin on owner ignore whether 
the  wait list
is empty, it will cause the task in the wait list always be cut in 
line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this 
problem.


So this has been somewhat always known, at least in theory, until now. 
It's the cost

of spinning without going through the wait-queue, unlike other locks.


[...]



From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken 
task in wait list


Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


And one of the reasons why we never bothered 'fixing' things was the 
additional
branching out in the slowpath (and lack of real issue, although this 
one being so
damn pathological). I fear that your approach is one of those 
scenarios where the
code ends up being bloated, albeit most of it is actually duplicated 
and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning 
then sleep again...
phew. Not to mention the performance implications, ie loosing the 
benefits of osq
over waiter spinning in scenarios that would otherwise have more osq 
spinners as
opposed to waiter spinners, or in setups where it is actually best to 
block instead

of spinning.


The patch that I sent out is just a proof of concept to make sure that 
it can fix that particular case. I do plan to refactor it if I decide to 
go ahead with an official one. Unlike the OSQ, there can be no more than 
one waiter spinner as the wakeup function is directed to only the first 
task in the wait list and the spinning won't happen until the task is 
first woken up. In the worst case scenario, there are only 2 spinners 
spinning on the lock and the owner field, one from OSQ and one from the 
wait list. That shouldn't put too much cacheline contention traffic to 
the system.


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
> 
> > There might be other details, but this is the one that stood out.
> 
> I think this also does the wrong thing for use_ww_ctx.

Something like so? 

---
 kernel/locking/mutex.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
+   bool acquired;
int ret;
 
preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);
 
for (;;) {
+   acquired = false;
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
+
spin_lock_mutex(>wait_lock, flags);
+
+   if (acquired) {
+   atomic_set(>count, -1);
+   break;
+   }
}
__set_task_state(task, TASK_RUNNING);
 
@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
atomic_set(>count, 0);
debug_mutex_free_waiter();
 
+   if (acquired)
+   goto unlock;
+
 skip_wait:
/* got the lock - cleanup and rejoice! */
lock_acquired(>dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
ww_mutex_set_context_slowpath(ww, ww_ctx);
}
 
+unlock:
spin_unlock_mutex(>wait_lock, flags);
preempt_enable();
return 0;


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:

> There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote:

> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, 
> unsigned int subclass,
>   lock_contended(>dep_map, ip);
>  
>   for (;;) {
> + bool acquired = false;
> +
>   /*
>* Lets try to take the lock again - this is needed even if
>* we get here for the first time (shortly after failing to
> @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, 
> unsigned int subclass,
>   /* didn't get the lock, go to sleep: */
>   spin_unlock_mutex(>wait_lock, flags);
>   schedule_preempt_disabled();
> +
> + if (mutex_is_locked(lock))
> + acquired = mutex_optimistic_spin(lock, ww_ctx, 
> use_ww_ctx);
>   spin_lock_mutex(>wait_lock, flags);
> + if (acquired)
> + break;
>   }
>   __set_task_state(task, TASK_RUNNING);

I think the problem here is that mutex_optimistic_spin() leaves the
mutex->count == 0, even though we have waiters (us at the very least).

But this should be easily fixed, since if we acquired, we should be the
one releasing, so there's no race.

So something like so:

if (acquired) {
atomic_set(>count, -1);
break;
}

Should deal with that -- we'll set it to 0 again a little further down
if the list ends up empty.


There might be other details, but this is the one that stood out.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Jason Low
On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> > This patch attempts to fix this live-lock condition by enabling the
> > a woken task in the wait list to enter optimistic spinning loop itself
> > with precedence over the ones in the OSQ. This should prevent the
> > live-lock
> > condition from happening.
> 
> 
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.
> 
> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).

Right, and we can also avoid needing to add that extra field to the
mutex structure. Before calling optimistic spinning, we do want to check
if the lock is available to avoid unnecessary OSQ overhead though.

So maybe the following would be sufficient:

---
 kernel/locking/mutex.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..ead0bd1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);
 
for (;;) {
+   bool acquired = false;
+
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
spin_lock_mutex(>wait_lock, flags);
+   if (acquired)
+   break;
}
__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.2.5






Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> This patch attempts to fix this live-lock condition by enabling the
> a woken task in the wait list to enter optimistic spinning loop itself
> with precedence over the ones in the OSQ. This should prevent the
> live-lock
> condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.

The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:

> There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
> 
> > There might be other details, but this is the one that stood out.
> 
> I think this also does the wrong thing for use_ww_ctx.

Something like so? 

---
 kernel/locking/mutex.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
+   bool acquired;
int ret;
 
preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);
 
for (;;) {
+   acquired = false;
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
+
spin_lock_mutex(>wait_lock, flags);
+
+   if (acquired) {
+   atomic_set(>count, -1);
+   break;
+   }
}
__set_task_state(task, TASK_RUNNING);
 
@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
atomic_set(>count, 0);
debug_mutex_free_waiter();
 
+   if (acquired)
+   goto unlock;
+
 skip_wait:
/* got the lock - cleanup and rejoice! */
lock_acquired(>dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
ww_mutex_set_context_slowpath(ww, ww_ctx);
}
 
+unlock:
spin_unlock_mutex(>wait_lock, flags);
preempt_enable();
return 0;


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Jason Low
On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> > This patch attempts to fix this live-lock condition by enabling the
> > a woken task in the wait list to enter optimistic spinning loop itself
> > with precedence over the ones in the OSQ. This should prevent the
> > live-lock
> > condition from happening.
> 
> 
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.
> 
> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).

Right, and we can also avoid needing to add that extra field to the
mutex structure. Before calling optimistic spinning, we do want to check
if the lock is available to avoid unnecessary OSQ overhead though.

So maybe the following would be sufficient:

---
 kernel/locking/mutex.c |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..ead0bd1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);
 
for (;;) {
+   bool acquired = false;
+
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
spin_lock_mutex(>wait_lock, flags);
+   if (acquired)
+   break;
}
__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.2.5






Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote:

> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, 
> unsigned int subclass,
>   lock_contended(>dep_map, ip);
>  
>   for (;;) {
> + bool acquired = false;
> +
>   /*
>* Lets try to take the lock again - this is needed even if
>* we get here for the first time (shortly after failing to
> @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, 
> unsigned int subclass,
>   /* didn't get the lock, go to sleep: */
>   spin_unlock_mutex(>wait_lock, flags);
>   schedule_preempt_disabled();
> +
> + if (mutex_is_locked(lock))
> + acquired = mutex_optimistic_spin(lock, ww_ctx, 
> use_ww_ctx);
>   spin_lock_mutex(>wait_lock, flags);
> + if (acquired)
> + break;
>   }
>   __set_task_state(task, TASK_RUNNING);

I think the problem here is that mutex_optimistic_spin() leaves the
mutex->count == 0, even though we have waiters (us at the very least).

But this should be easily fixed, since if we acquired, we should be the
one releasing, so there's no race.

So something like so:

if (acquired) {
atomic_set(>count, -1);
break;
}

Should deal with that -- we'll set it to 0 again a little further down
if the list ends up empty.


There might be other details, but this is the one that stood out.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Peter Zijlstra
On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> This patch attempts to fix this live-lock condition by enabling the
> a woken task in the wait list to enter optimistic spinning loop itself
> with precedence over the ones in the OSQ. This should prevent the
> live-lock
> condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.

The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 03:54 AM, Peter Zijlstra wrote:

On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.


Yes, that is optional. I put it there just to make it is more likely for 
the waiter spinner to get the lock. Without that, the chance will be 
50/50 on average. I can certainly take that out.



The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.


As I said in the previous mail, I do intend to refactor it before 
sending out the official patch.


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 06:06 AM, Peter Zijlstra wrote:

On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:

On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:


There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.

Something like so?


I think that should work. My only minor concern is that putting the 
waiter spinner at the end of the OSQ will take it longer to get the 
lock, but that shouldn't be a big issue.



---
  kernel/locking/mutex.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
struct task_struct *task = current;
struct mutex_waiter waiter;
unsigned long flags;
+   bool acquired;
int ret;

preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
lock_contended(>dep_map, ip);

for (;;) {
+   acquired = false;
/*
 * Lets try to take the lock again - this is needed even if
 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
/* didn't get the lock, go to sleep: */
spin_unlock_mutex(>wait_lock, flags);
schedule_preempt_disabled();
+
+   if (mutex_is_locked(lock))
+   acquired = mutex_optimistic_spin(lock, ww_ctx, 
use_ww_ctx);
+
spin_lock_mutex(>wait_lock, flags);
+
+   if (acquired) {
+   atomic_set(>count, -1);
+   break;
+   }
}
__set_task_state(task, TASK_RUNNING);

@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
atomic_set(>count, 0);
debug_mutex_free_waiter();

+   if (acquired)
+   goto unlock;
+
  skip_wait:
/* got the lock - cleanup and rejoice! */
lock_acquired(>dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, 
unsigned int subclass,
ww_mutex_set_context_slowpath(ww, ww_ctx);
}

+unlock:
spin_unlock_mutex(>wait_lock, flags);
preempt_enable();
return 0;


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Waiman Long

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:

On Thu, 21 Jan 2016, Waiman Long wrote:


On 01/21/2016 04:29 AM, Ding Tianhong wrote:


I got the vmcore and found that the ifconfig is already in the 
wait_list of the
rtnl_lock for 120 second, but my process could get and release the 
rtnl_lock
normally several times in one second, so it means that my process 
jump the
queue and the ifconfig couldn't get the rtnl all the time, I check 
the mutex lock
slow path and found that the mutex may spin on owner ignore whether 
the  wait list
is empty, it will cause the task in the wait list always be cut in 
line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this 
problem.


So this has been somewhat always known, at least in theory, until now. 
It's the cost

of spinning without going through the wait-queue, unlike other locks.


[...]



From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken 
task in wait list


Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


And one of the reasons why we never bothered 'fixing' things was the 
additional
branching out in the slowpath (and lack of real issue, although this 
one being so
damn pathological). I fear that your approach is one of those 
scenarios where the
code ends up being bloated, albeit most of it is actually duplicated 
and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning 
then sleep again...
phew. Not to mention the performance implications, ie loosing the 
benefits of osq
over waiter spinning in scenarios that would otherwise have more osq 
spinners as
opposed to waiter spinners, or in setups where it is actually best to 
block instead

of spinning.


The patch that I sent out is just a proof of concept to make sure that 
it can fix that particular case. I do plan to refactor it if I decide to 
go ahead with an official one. Unlike the OSQ, there can be no more than 
one waiter spinner as the wakeup function is directed to only the first 
task in the wait list and the spinning won't happen until the task is 
first woken up. In the worst case scenario, there are only 2 spinners 
spinning on the lock and the owner field, one from OSQ and one from the 
wait list. That shouldn't put too much cacheline contention traffic to 
the system.


Cheers,
Longman


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-22 Thread Davidlohr Bueso

On Fri, 22 Jan 2016, Waiman Long wrote:


The patch that I sent out is just a proof of concept to make sure
that it can fix that particular case. I do plan to refactor it if I
decide to go ahead with an official one. Unlike the OSQ, there can be
no more than one waiter spinner as the wakeup function is directed to
only the first task in the wait list and the spinning won't happen
until the task is first woken up. In the worst case scenario, there
are only 2 spinners spinning on the lock and the owner field, one
from OSQ and one from the wait list. That shouldn't put too much
cacheline contention traffic to the system.


Similarly, I guess we should also wakeup the next waiter in line after
releasing the wait_lock via wake_q. This would allow the woken waiter a
slightly better chance of finding the wait_lock free when continuing to
take the mutex.

Thanks,
Davidlohr


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Waiman Long wrote:


On 01/21/2016 04:29 AM, Ding Tianhong wrote:



I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.


So this has been somewhat always known, at least in theory, until now. It's the 
cost
of spinning without going through the wait-queue, unlike other locks.


[...]



From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in 
wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


And one of the reasons why we never bothered 'fixing' things was the additional
branching out in the slowpath (and lack of real issue, although this one being 
so
damn pathological). I fear that your approach is one of those scenarios where 
the
code ends up being bloated, albeit most of it is actually duplicated and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep 
again...
phew. Not to mention the performance implications, ie loosing the benefits of 
osq
over waiter spinning in scenarios that would otherwise have more osq spinners as
opposed to waiter spinners, or in setups where it is actually best to block 
instead
of spinning.

Thanks,
Davidlohr


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Paul E. McKenney wrote:
> 
> >I did some testing, which exposed it to the 0day test robot, which
> >did note some performance differences.  I was hoping that it would
> >clear up some instability from other patches, but no such luck.  ;-)
> 
> Oh, that explains why we got a performance regression report :)

Plus I suspected that you wanted some extra email.  ;-)

Thanx, Paul



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Paul E. McKenney wrote:


I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)


Oh, that explains why we got a performance regression report :)

Thanks,
Davidlohr


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote:
> On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:
> 
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 0551c21..596b341 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> > *lock)
> > struct task_struct *owner;
> > int retval = 1;
> >  
> > -   if (need_resched())
> > +   if (need_resched() || atomic_read(>count) == -1)
> > return 0;
> >  
> 
> One concern I have is this change will eliminate any optimistic spinning
> as long as there is a waiter.  Is there a middle ground that we
> can allow only one spinner if there are waiters?  
> 
> In other words, we allow spinning when
> atomic_read(>count) == -1 but there is no one on the
> osq lock that queue up the spinners (i.e. no other process doing
> optimistic spinning).
> 
> This could allow a bit of spinning without starving out the waiters.

I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)

Thanx, Paul



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Waiman Long

On 01/21/2016 04:29 AM, Ding Tianhong wrote:

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:


149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
[149738.042284] 88052449bd40 0082 88053a33f300 
88052449bfd8
[149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
819e6240
[149738.042288] 819e6244 88053a33f300  
819e6248
[149738.042290] Call Trace:
[149738.042300] [] schedule_preempt_disabled+0x29/0x70
[149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [] mutex_lock+0x1f/0x2f
[149738.042309] [] rtnl_lock+0x15/0x20
[149738.042311] [] dev_ioctl+0xda/0x590
[149738.042314] [] ? __do_page_fault+0x21c/0x560
[149738.042318] [] sock_do_ioctl+0x45/0x50
[149738.042320] [] sock_ioctl+0x1f0/0x2c0
[149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [] ? fget_light+0xa0/0xd0

 cut here 

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Jason Low
Cc: Tim Chen
Cc: Waiman Long
---
  kernel/locking/mutex.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
*lock)
struct task_struct *owner;
int retval = 1;

-   if (need_resched())
+   if (need_resched() || atomic_read(>count) == -1)
return 0;

rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
*lock)
  /*
   * Optimistic spinning.
   *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
   *
   * Since this needs the lock owner, and this mutex implementation
   * doesn't track the owner atomically in the lock field, we need to


This patch will largely defeat the performance benefit of optimistic 
spinning. I have an alternative solution to this live-lock problem. 
Would you mind trying out the attached patch to see if it can fix your 
problem?


Cheers,
Longman

From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001
From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in 
wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.

Signed-off-by: Waiman Long 
---
 include/linux/mutex.h  |2 +
 kernel/locking/mutex.c |   95 +++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2cb7531..2c55ecd 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -57,6 +57,8 @@ struct mutex {
 #endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
struct optimistic_spin_queue osq; /* Spinner MCS lock */
+   /* 

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Tim Chen
On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:

> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>   struct task_struct *owner;
>   int retval = 1;
>  
> - if (need_resched())
> + if (need_resched() || atomic_read(>count) == -1)
>   return 0;
>  

One concern I have is this change will eliminate any optimistic spinning
as long as there is a waiter.  Is there a middle ground that we
can allow only one spinner if there are waiters?  

In other words, we allow spinning when
atomic_read(>count) == -1 but there is no one on the
osq lock that queue up the spinners (i.e. no other process doing
optimistic spinning).

This could allow a bit of spinning without starving out the waiters.

Tim



Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Ding Tianhong
On 2016/1/21 15:29, Ingo Molnar wrote:
> 
> Cc:-ed other gents who touched the mutex code recently. Mail quoted below.
> 

Ok, thanks.

Ding

> Thanks,
> 
>   Ingo
> 
> * Ding Tianhong  wrote:
> 
>> I build a script to create several process for ioctl loop calling,
>> the ioctl will calling the kernel function just like:
>> xx_ioctl {
>> ...
>> rtnl_lock();
>> function();
>> rtnl_unlock();
>> ...
>> }
>> The function may sleep several ms, but will not halt, at the same time
>> another user service may calling ifconfig to change the state of the
>> ethernet, and after several hours, the hung task thread report this problem:
>>
>> 
>> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
>> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
>> [149738.042284] 88052449bd40 0082 88053a33f300 
>> 88052449bfd8
>> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
>> 819e6240
>> [149738.042288] 819e6244 88053a33f300  
>> 819e6248
>> [149738.042290] Call Trace:
>> [149738.042300] [] schedule_preempt_disabled+0x29/0x70
>> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
>> [149738.042305] [] mutex_lock+0x1f/0x2f
>> [149738.042309] [] rtnl_lock+0x15/0x20
>> [149738.042311] [] dev_ioctl+0xda/0x590
>> [149738.042314] [] ? __do_page_fault+0x21c/0x560
>> [149738.042318] [] sock_do_ioctl+0x45/0x50
>> [149738.042320] [] sock_ioctl+0x1f0/0x2c0
>> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
>> [149738.042327] [] ? fget_light+0xa0/0xd0
>>
>>  cut here 
>>
>> I got the vmcore and found that the ifconfig is already in the wait_list of 
>> the
>> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>> normally several times in one second, so it means that my process jump the
>> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
>> lock
>> slow path and found that the mutex may spin on owner ignore whether the  
>> wait list
>> is empty, it will cause the task in the wait list always be cut in line, so 
>> add
>> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>>
>> Signed-off-by: Ding Tianhong 
>> ---
>>  kernel/locking/mutex.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c21..596b341 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
>> *lock)
>>  struct task_struct *owner;
>>  int retval = 1;
>>  
>> -if (need_resched())
>> +if (need_resched() || atomic_read(>count) == -1)
>>  return 0;
>>  
>>  rcu_read_lock();
>> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
>> *lock)
>>  /*
>>   * Optimistic spinning.
>>   *
>> - * We try to spin for acquisition when we find that the lock owner
>> - * is currently running on a (different) CPU and while we don't
>> - * need to reschedule. The rationale is that if the lock owner is
>> - * running, it is likely to release the lock soon.
>> + * We try to spin for acquisition when we find that there are no
>> + * pending waiters and the lock owner is currently running on a
>> + * (different) CPU and while we don't need to reschedule. The
>> + * rationale is that if the lock owner is running, it is likely
>> + * to release the lock soon.
>>   *
>>   * Since this needs the lock owner, and this mutex implementation
>>   * doesn't track the owner atomically in the lock field, we need to
>> -- 
>> 2.5.0
>>
>>
> 
> .
> 




Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Ding Tianhong
On 2016/1/21 15:29, Ingo Molnar wrote:
> 
> Cc:-ed other gents who touched the mutex code recently. Mail quoted below.
> 

Ok, thanks.

Ding

> Thanks,
> 
>   Ingo
> 
> * Ding Tianhong  wrote:
> 
>> I build a script to create several process for ioctl loop calling,
>> the ioctl will calling the kernel function just like:
>> xx_ioctl {
>> ...
>> rtnl_lock();
>> function();
>> rtnl_unlock();
>> ...
>> }
>> The function may sleep several ms, but will not halt, at the same time
>> another user service may calling ifconfig to change the state of the
>> ethernet, and after several hours, the hung task thread report this problem:
>>
>> 
>> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
>> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
>> this message.
>> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
>> [149738.042284] 88052449bd40 0082 88053a33f300 
>> 88052449bfd8
>> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
>> 819e6240
>> [149738.042288] 819e6244 88053a33f300  
>> 819e6248
>> [149738.042290] Call Trace:
>> [149738.042300] [] schedule_preempt_disabled+0x29/0x70
>> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
>> [149738.042305] [] mutex_lock+0x1f/0x2f
>> [149738.042309] [] rtnl_lock+0x15/0x20
>> [149738.042311] [] dev_ioctl+0xda/0x590
>> [149738.042314] [] ? __do_page_fault+0x21c/0x560
>> [149738.042318] [] sock_do_ioctl+0x45/0x50
>> [149738.042320] [] sock_ioctl+0x1f0/0x2c0
>> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
>> [149738.042327] [] ? fget_light+0xa0/0xd0
>>
>>  cut here 
>>
>> I got the vmcore and found that the ifconfig is already in the wait_list of 
>> the
>> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>> normally several times in one second, so it means that my process jump the
>> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
>> lock
>> slow path and found that the mutex may spin on owner ignore whether the  
>> wait list
>> is empty, it will cause the task in the wait list always be cut in line, so 
>> add
>> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>>
>> Signed-off-by: Ding Tianhong 
>> ---
>>  kernel/locking/mutex.c | 11 ++-
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c21..596b341 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
>> *lock)
>>  struct task_struct *owner;
>>  int retval = 1;
>>  
>> -if (need_resched())
>> +if (need_resched() || atomic_read(>count) == -1)
>>  return 0;
>>  
>>  rcu_read_lock();
>> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
>> *lock)
>>  /*
>>   * Optimistic spinning.
>>   *
>> - * We try to spin for acquisition when we find that the lock owner
>> - * is currently running on a (different) CPU and while we don't
>> - * need to reschedule. The rationale is that if the lock owner is
>> - * running, it is likely to release the lock soon.
>> + * We try to spin for acquisition when we find that there are no
>> + * pending waiters and the lock owner is currently running on a
>> + * (different) CPU and while we don't need to reschedule. The
>> + * rationale is that if the lock owner is running, it is likely
>> + * to release the lock soon.
>>   *
>>   * Since this needs the lock owner, and this mutex implementation
>>   * doesn't track the owner atomically in the lock field, we need to
>> -- 
>> 2.5.0
>>
>>
> 
> .
> 




Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Waiman Long

On 01/21/2016 04:29 AM, Ding Tianhong wrote:

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:


149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
[149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
[149738.042284] 88052449bd40 0082 88053a33f300 
88052449bfd8
[149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
819e6240
[149738.042288] 819e6244 88053a33f300  
819e6248
[149738.042290] Call Trace:
[149738.042300] [] schedule_preempt_disabled+0x29/0x70
[149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [] mutex_lock+0x1f/0x2f
[149738.042309] [] rtnl_lock+0x15/0x20
[149738.042311] [] dev_ioctl+0xda/0x590
[149738.042314] [] ? __do_page_fault+0x21c/0x560
[149738.042318] [] sock_do_ioctl+0x45/0x50
[149738.042320] [] sock_ioctl+0x1f0/0x2c0
[149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [] ? fget_light+0xa0/0xd0

 cut here 

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Davidlohr Bueso
Cc: Linus Torvalds
Cc: Paul E. McKenney
Cc: Thomas Gleixner
Cc: Will Deacon
Cc: Jason Low
Cc: Tim Chen
Cc: Waiman Long
---
  kernel/locking/mutex.c | 11 ++-
  1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
*lock)
struct task_struct *owner;
int retval = 1;

-   if (need_resched())
+   if (need_resched() || atomic_read(>count) == -1)
return 0;

rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
*lock)
  /*
   * Optimistic spinning.
   *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
   *
   * Since this needs the lock owner, and this mutex implementation
   * doesn't track the owner atomically in the lock field, we need to


This patch will largely defeat the performance benefit of optimistic 
spinning. I have an alternative solution to this live-lock problem. 
Would you mind trying out the attached patch to see if it can fix your 
problem?


Cheers,
Longman

From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001
From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in 
wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.

Signed-off-by: Waiman Long 
---
 include/linux/mutex.h  |2 +
 kernel/locking/mutex.c |   95 +++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git 

Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Tim Chen
On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:

> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>   struct task_struct *owner;
>   int retval = 1;
>  
> - if (need_resched())
> + if (need_resched() || atomic_read(>count) == -1)
>   return 0;
>  

One concern I have is this change will eliminate any optimistic spinning
as long as there is a waiter.  Is there a middle ground that we
can allow only one spinner if there are waiters?  

In other words, we allow spinning when
atomic_read(>count) == -1 but there is no one on the
osq lock that queue up the spinners (i.e. no other process doing
optimistic spinning).

This could allow a bit of spinning without starving out the waiters.

Tim



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Paul E. McKenney wrote:


I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)


Oh, that explains why we got a performance regression report :)

Thanks,
Davidlohr


Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote:
> On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:
> 
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 0551c21..596b341 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> > *lock)
> > struct task_struct *owner;
> > int retval = 1;
> >  
> > -   if (need_resched())
> > +   if (need_resched() || atomic_read(>count) == -1)
> > return 0;
> >  
> 
> One concern I have is this change will eliminate any optimistic spinning
> as long as there is a waiter.  Is there a middle ground that we
> can allow only one spinner if there are waiters?  
> 
> In other words, we allow spinning when
> atomic_read(>count) == -1 but there is no one on the
> osq lock that queue up the spinners (i.e. no other process doing
> optimistic spinning).
> 
> This could allow a bit of spinning without starving out the waiters.

I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)

Thanx, Paul



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Paul E. McKenney
On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Paul E. McKenney wrote:
> 
> >I did some testing, which exposed it to the 0day test robot, which
> >did note some performance differences.  I was hoping that it would
> >clear up some instability from other patches, but no such luck.  ;-)
> 
> Oh, that explains why we got a performance regression report :)

Plus I suspected that you wanted some extra email.  ;-)

Thanx, Paul



Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-21 Thread Davidlohr Bueso

On Thu, 21 Jan 2016, Waiman Long wrote:


On 01/21/2016 04:29 AM, Ding Tianhong wrote:



I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
lock
slow path and found that the mutex may spin on owner ignore whether the  wait 
list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.


So this has been somewhat always known, at least in theory, until now. It's the 
cost
of spinning without going through the wait-queue, unlike other locks.


[...]



From: Waiman Long 
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in 
wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.


And one of the reasons why we never bothered 'fixing' things was the additional
branching out in the slowpath (and lack of real issue, although this one being 
so
damn pathological). I fear that your approach is one of those scenarios where 
the
code ends up being bloated, albeit most of it is actually duplicated and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep 
again...
phew. Not to mention the performance implications, ie loosing the benefits of 
osq
over waiter spinning in scenarios that would otherwise have more osq spinners as
opposed to waiter spinners, or in setups where it is actually best to block 
instead
of spinning.

Thanks,
Davidlohr


Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-20 Thread Ingo Molnar

Cc:-ed other gents who touched the mutex code recently. Mail quoted below.

Thanks,

Ingo

* Ding Tianhong  wrote:

> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
> 
> 
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
> [149738.042284] 88052449bd40 0082 88053a33f300 
> 88052449bfd8
> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
> 819e6240
> [149738.042288] 819e6244 88053a33f300  
> 819e6248
> [149738.042290] Call Trace:
> [149738.042300] [] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [] mutex_lock+0x1f/0x2f
> [149738.042309] [] rtnl_lock+0x15/0x20
> [149738.042311] [] dev_ioctl+0xda/0x590
> [149738.042314] [] ? __do_page_fault+0x21c/0x560
> [149738.042318] [] sock_do_ioctl+0x45/0x50
> [149738.042320] [] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [] ? fget_light+0xa0/0xd0
> 
>  cut here 
> 
> I got the vmcore and found that the ifconfig is already in the wait_list of 
> the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
> lock
> slow path and found that the mutex may spin on owner ignore whether the  wait 
> list
> is empty, it will cause the task in the wait list always be cut in line, so 
> add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
> 
> Signed-off-by: Ding Tianhong 
> ---
>  kernel/locking/mutex.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>   struct task_struct *owner;
>   int retval = 1;
>  
> - if (need_resched())
> + if (need_resched() || atomic_read(>count) == -1)
>   return 0;
>  
>   rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
> *lock)
>  /*
>   * Optimistic spinning.
>   *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>   *
>   * Since this needs the lock owner, and this mutex implementation
>   * doesn't track the owner atomically in the lock field, we need to
> -- 
> 2.5.0
> 
> 


Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.

2016-01-20 Thread Ingo Molnar

Cc:-ed other gents who touched the mutex code recently. Mail quoted below.

Thanks,

Ingo

* Ding Tianhong  wrote:

> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
> 
> 
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080
> [149738.042284] 88052449bd40 0082 88053a33f300 
> 88052449bfd8
> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 
> 819e6240
> [149738.042288] 819e6244 88053a33f300  
> 819e6248
> [149738.042290] Call Trace:
> [149738.042300] [] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [] mutex_lock+0x1f/0x2f
> [149738.042309] [] rtnl_lock+0x15/0x20
> [149738.042311] [] dev_ioctl+0xda/0x590
> [149738.042314] [] ? __do_page_fault+0x21c/0x560
> [149738.042318] [] sock_do_ioctl+0x45/0x50
> [149738.042320] [] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [] ? fget_light+0xa0/0xd0
> 
>  cut here 
> 
> I got the vmcore and found that the ifconfig is already in the wait_list of 
> the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex 
> lock
> slow path and found that the mutex may spin on owner ignore whether the  wait 
> list
> is empty, it will cause the task in the wait list always be cut in line, so 
> add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
> 
> Signed-off-by: Ding Tianhong 
> ---
>  kernel/locking/mutex.c | 11 ++-
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex 
> *lock)
>   struct task_struct *owner;
>   int retval = 1;
>  
> - if (need_resched())
> + if (need_resched() || atomic_read(>count) == -1)
>   return 0;
>  
>   rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex 
> *lock)
>  /*
>   * Optimistic spinning.
>   *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>   *
>   * Since this needs the lock owner, and this mutex implementation
>   * doesn't track the owner atomically in the lock field, we need to
> -- 
> 2.5.0
> 
>