Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
huang ying writes: > On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong > wrote: >> On 2016/1/29 17:53, Peter Zijlstra wrote: >>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >>> looks good to me, I will try this solution and report the result, thanks everyone. >>> >>> Did you get a change to run with this? >>> >>> . >>> >> >> I backport this patch to 3.10 lts kernel, and didn't change any >> logic, Till now, the patch works fine to me, and no need to change >> anything, >> So I think this patch is no problem, could you formal release this patch to >> the latest kernel? :) >> >> Thanks. >> Ding >> >> > > The original patch from Tianhong triggered a performance regression > because the optimistic spinning is turned off in effect. I tested > Peter's patch with same configuration and there show no regression. > So I think the patch keep the optimistic spinning. Test result > details will be in the next email. Here is the detailed test result: = compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench commit: v4.4 1db66c17114d5437c0757d6792c0d8923192ecd6 v4.4 1db66c17114d5437c0757d6792 -- %stddev %change %stddev \ |\ 371269 ± 10% -93.2% 25080 ± 4% unixbench.time.voluntary_context_switches 371269 ± 10% -93.2% 25080 ± 4% time.voluntary_context_switches 6189 ± 8% -76.4% 1463 ± 6% vmstat.system.cs 5706 ± 0% -1.7% 5608 ± 0% vmstat.system.in 113680 ± 12% -73.5% 30086 ± 1% cpuidle.C1-IVB.usage 1515925 ± 20% +68.3%2552001 ± 11% cpuidle.C1E-IVB.time 1227221 ± 20% -51.7% 592695 ± 17% cpuidle.C3-IVB.time 2697 ± 10% -77.8% 598.33 ± 10% cpuidle.C3-IVB.usage 15173 ± 6% -23.1% 11663 ± 1% cpuidle.C6-IVB.usage 34.38 ± 27% -35.0% 22.33 ± 2% cpuidle.POLL.usage 61.92 ± 9% +14.3% 70.78 ± 7% sched_debug.cfs_rq:/.load_avg.min 40.85 ± 29% +27.3% 52.00 ± 10% sched_debug.cfs_rq:/.runnable_load_avg.2 -1949 ±-37% -64.6%-690.19 ±-134% sched_debug.cfs_rq:/.spread0.4 -1773 ±-29% -85.6%-256.00 ±-388% sched_debug.cfs_rq:/.spread0.7 -2478 ±-26% -49.5% -1251 ±-66% sched_debug.cfs_rq:/.spread0.min 61.95 ± 9% +14.8% 71.11 ± 7% sched_debug.cfs_rq:/.tg_load_avg_contrib.min 396962 ± 12% +27.9% 507573 ± 10% sched_debug.cpu.avg_idle.0 432973 ± 18% +45.3% 629147 ± 12% sched_debug.cpu.avg_idle.6 448566 ± 3% +11.5% 40 ± 0% sched_debug.cpu.avg_idle.avg 45.31 ± 5% -9.5% 41.00 ± 3% sched_debug.cpu.cpu_load[3].7 52204 ± 10% -49.9% 26173 ± 34% sched_debug.cpu.nr_switches.0 50383 ± 12% -57.6% 21353 ± 15% sched_debug.cpu.nr_switches.1 45425 ± 16% -68.5% 14325 ± 28% sched_debug.cpu.nr_switches.2 43069 ± 20% -65.5% 14852 ± 41% sched_debug.cpu.nr_switches.3 40285 ± 16% -70.4% 11905 ± 47% sched_debug.cpu.nr_switches.4 40732 ± 13% -75.8% 9872 ± 39% sched_debug.cpu.nr_switches.5 43011 ± 19% -80.0% 8607 ± 42% sched_debug.cpu.nr_switches.6 38076 ± 12% -75.9% 9167 ± 40% sched_debug.cpu.nr_switches.7 44148 ± 7% -67.1% 14532 ± 6% sched_debug.cpu.nr_switches.avg 59877 ± 8% -51.3% 29146 ± 21% sched_debug.cpu.nr_switches.max 33672 ± 9% -83.0% 5718 ± 4% sched_debug.cpu.nr_switches.min -0.62 ±-1411% -2212.5% 13.00 ± 16% sched_debug.cpu.nr_uninterruptible.0 -1.23 ±-582% -1318.8% 15.00 ± 35% sched_debug.cpu.nr_uninterruptible.1 2.54 ±263%+267.7% 9.33 ± 91% sched_debug.cpu.nr_uninterruptible.2 0.31 ±2966% -5841.7% -17.67 ±-19% sched_debug.cpu.nr_uninterruptible.5 9.84 ± 19% +70.4% 16.76 ± 5% sched_debug.cpu.nr_uninterruptible.stddev 116287 ± 4% -20.9% 91972 ± 9% sched_debug.cpu.sched_count.0 50411 ± 12% -57.6% 21382 ± 15% sched_debug.cpu.sched_count.1 45453 ± 16% -68.4% 14356 ± 28% sched_debug.cpu.sched_count.2 43098 ± 20% -65.5% 14888 ± 41% sched_debug.cpu.sched_count.3 40314 ± 16% -70.4% 11934 ± 47% sched_debug.cpu.sched_count.4 40761 ± 13% -75.7% 9896 ± 39% sched_debug.cpu.sched_count.5 43041 ± 19% -79.9% 8636 ± 42% sched_debug.cpu.sched_count.6 38105 ± 12% -75.9% 9193 ± 40% sched_debug.cpu.sched_count.7 52184 ± 6% -56.3% 22782 ± 4% sched_debug.cpu.sched_count.avg 116288 ± 4% -20.9% 91972 ± 9%
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong wrote: > On 2016/1/29 17:53, Peter Zijlstra wrote: >> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >> >>> looks good to me, I will try this solution and report the result, thanks >>> everyone. >> >> Did you get a change to run with this? >> >> . >> > > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till > now, the patch works fine to me, and no need to change anything, > So I think this patch is no problem, could you formal release this patch to > the latest kernel? :) > > Thanks. > Ding > > The original patch from Tianhong triggered a performance regression because the optimistic spinning is turned off in effect. I tested Peter's patch with same configuration and there show no regression. So I think the patch keep the optimistic spinning. Test result details will be in the next email. Best Regards, Huang, YIng
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhongwrote: > On 2016/1/29 17:53, Peter Zijlstra wrote: >> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >> >>> looks good to me, I will try this solution and report the result, thanks >>> everyone. >> >> Did you get a change to run with this? >> >> . >> > > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till > now, the patch works fine to me, and no need to change anything, > So I think this patch is no problem, could you formal release this patch to > the latest kernel? :) > > Thanks. > Ding > > The original patch from Tianhong triggered a performance regression because the optimistic spinning is turned off in effect. I tested Peter's patch with same configuration and there show no regression. So I think the patch keep the optimistic spinning. Test result details will be in the next email. Best Regards, Huang, YIng
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
huang yingwrites: > On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong > wrote: >> On 2016/1/29 17:53, Peter Zijlstra wrote: >>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >>> looks good to me, I will try this solution and report the result, thanks everyone. >>> >>> Did you get a change to run with this? >>> >>> . >>> >> >> I backport this patch to 3.10 lts kernel, and didn't change any >> logic, Till now, the patch works fine to me, and no need to change >> anything, >> So I think this patch is no problem, could you formal release this patch to >> the latest kernel? :) >> >> Thanks. >> Ding >> >> > > The original patch from Tianhong triggered a performance regression > because the optimistic spinning is turned off in effect. I tested > Peter's patch with same configuration and there show no regression. > So I think the patch keep the optimistic spinning. Test result > details will be in the next email. Here is the detailed test result: = compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench commit: v4.4 1db66c17114d5437c0757d6792c0d8923192ecd6 v4.4 1db66c17114d5437c0757d6792 -- %stddev %change %stddev \ |\ 371269 ± 10% -93.2% 25080 ± 4% unixbench.time.voluntary_context_switches 371269 ± 10% -93.2% 25080 ± 4% time.voluntary_context_switches 6189 ± 8% -76.4% 1463 ± 6% vmstat.system.cs 5706 ± 0% -1.7% 5608 ± 0% vmstat.system.in 113680 ± 12% -73.5% 30086 ± 1% cpuidle.C1-IVB.usage 1515925 ± 20% +68.3%2552001 ± 11% cpuidle.C1E-IVB.time 1227221 ± 20% -51.7% 592695 ± 17% cpuidle.C3-IVB.time 2697 ± 10% -77.8% 598.33 ± 10% cpuidle.C3-IVB.usage 15173 ± 6% -23.1% 11663 ± 1% cpuidle.C6-IVB.usage 34.38 ± 27% -35.0% 22.33 ± 2% cpuidle.POLL.usage 61.92 ± 9% +14.3% 70.78 ± 7% sched_debug.cfs_rq:/.load_avg.min 40.85 ± 29% +27.3% 52.00 ± 10% sched_debug.cfs_rq:/.runnable_load_avg.2 -1949 ±-37% -64.6%-690.19 ±-134% sched_debug.cfs_rq:/.spread0.4 -1773 ±-29% -85.6%-256.00 ±-388% sched_debug.cfs_rq:/.spread0.7 -2478 ±-26% -49.5% -1251 ±-66% sched_debug.cfs_rq:/.spread0.min 61.95 ± 9% +14.8% 71.11 ± 7% sched_debug.cfs_rq:/.tg_load_avg_contrib.min 396962 ± 12% +27.9% 507573 ± 10% sched_debug.cpu.avg_idle.0 432973 ± 18% +45.3% 629147 ± 12% sched_debug.cpu.avg_idle.6 448566 ± 3% +11.5% 40 ± 0% sched_debug.cpu.avg_idle.avg 45.31 ± 5% -9.5% 41.00 ± 3% sched_debug.cpu.cpu_load[3].7 52204 ± 10% -49.9% 26173 ± 34% sched_debug.cpu.nr_switches.0 50383 ± 12% -57.6% 21353 ± 15% sched_debug.cpu.nr_switches.1 45425 ± 16% -68.5% 14325 ± 28% sched_debug.cpu.nr_switches.2 43069 ± 20% -65.5% 14852 ± 41% sched_debug.cpu.nr_switches.3 40285 ± 16% -70.4% 11905 ± 47% sched_debug.cpu.nr_switches.4 40732 ± 13% -75.8% 9872 ± 39% sched_debug.cpu.nr_switches.5 43011 ± 19% -80.0% 8607 ± 42% sched_debug.cpu.nr_switches.6 38076 ± 12% -75.9% 9167 ± 40% sched_debug.cpu.nr_switches.7 44148 ± 7% -67.1% 14532 ± 6% sched_debug.cpu.nr_switches.avg 59877 ± 8% -51.3% 29146 ± 21% sched_debug.cpu.nr_switches.max 33672 ± 9% -83.0% 5718 ± 4% sched_debug.cpu.nr_switches.min -0.62 ±-1411% -2212.5% 13.00 ± 16% sched_debug.cpu.nr_uninterruptible.0 -1.23 ±-582% -1318.8% 15.00 ± 35% sched_debug.cpu.nr_uninterruptible.1 2.54 ±263%+267.7% 9.33 ± 91% sched_debug.cpu.nr_uninterruptible.2 0.31 ±2966% -5841.7% -17.67 ±-19% sched_debug.cpu.nr_uninterruptible.5 9.84 ± 19% +70.4% 16.76 ± 5% sched_debug.cpu.nr_uninterruptible.stddev 116287 ± 4% -20.9% 91972 ± 9% sched_debug.cpu.sched_count.0 50411 ± 12% -57.6% 21382 ± 15% sched_debug.cpu.sched_count.1 45453 ± 16% -68.4% 14356 ± 28% sched_debug.cpu.sched_count.2 43098 ± 20% -65.5% 14888 ± 41% sched_debug.cpu.sched_count.3 40314 ± 16% -70.4% 11934 ± 47% sched_debug.cpu.sched_count.4 40761 ± 13% -75.7% 9896 ± 39% sched_debug.cpu.sched_count.5 43041 ± 19% -79.9% 8636 ± 42% sched_debug.cpu.sched_count.6 38105 ± 12% -75.9% 9193 ± 40% sched_debug.cpu.sched_count.7 52184 ± 6% -56.3% 22782 ± 4% sched_debug.cpu.sched_count.avg 116288
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/29 17:53, Peter Zijlstra wrote: > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > >> looks good to me, I will try this solution and report the result, thanks >> everyone. > > Did you get a change to run with this? > > . > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything, So I think this patch is no problem, could you formal release this patch to the latest kernel? :) Thanks. Ding
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > looks good to me, I will try this solution and report the result, thanks > everyone. Did you get a change to run with this?
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > looks good to me, I will try this solution and report the result, thanks > everyone. Did you get a change to run with this?
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/29 17:53, Peter Zijlstra wrote: > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > >> looks good to me, I will try this solution and report the result, thanks >> everyone. > > Did you get a change to run with this? > > . > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything, So I think this patch is no problem, could you formal release this patch to the latest kernel? :) Thanks. Ding
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/22 21:59, Waiman Long wrote: > On 01/22/2016 06:06 AM, Peter Zijlstra wrote: >> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: >>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: >>> There might be other details, but this is the one that stood out. >>> I think this also does the wrong thing for use_ww_ctx. >> Something like so? > > I think that should work. My only minor concern is that putting the waiter > spinner at the end of the OSQ will take it longer to get the lock, but that > shouldn't be a big issue. > >> --- >> kernel/locking/mutex.c | 15 +++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c >> index 0551c219c40e..070a0ac34aa7 100644 >> --- a/kernel/locking/mutex.c >> +++ b/kernel/locking/mutex.c >> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> struct task_struct *task = current; >> struct mutex_waiter waiter; >> unsigned long flags; >> +bool acquired; >> int ret; >> >> preempt_disable(); >> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> lock_contended(>dep_map, ip); >> >> for (;;) { >> +acquired = false; >> /* >>* Lets try to take the lock again - this is needed even if >>* we get here for the first time (shortly after failing to >> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> /* didn't get the lock, go to sleep: */ >> spin_unlock_mutex(>wait_lock, flags); >> schedule_preempt_disabled(); >> + >> +if (mutex_is_locked(lock)) >> +acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); >> + >> spin_lock_mutex(>wait_lock, flags); >> + >> +if (acquired) { >> +atomic_set(>count, -1); >> +break; >> +} >> } >> __set_task_state(task, TASK_RUNNING); >> >> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> atomic_set(>count, 0); >> debug_mutex_free_waiter(); >> >> +if (acquired) >> +goto unlock; >> + >> skip_wait: >> /* got the lock - cleanup and rejoice! */ >> lock_acquired(>dep_map, ip); >> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> ww_mutex_set_context_slowpath(ww, ww_ctx); >> } >> >> +unlock: >> spin_unlock_mutex(>wait_lock, flags); >> preempt_enable(); >> return 0; > > Cheers, > Longman > looks good to me, I will try this solution and report the result, thanks everyone. Ding > . >
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/22 21:59, Waiman Long wrote: > On 01/22/2016 06:06 AM, Peter Zijlstra wrote: >> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: >>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: >>> There might be other details, but this is the one that stood out. >>> I think this also does the wrong thing for use_ww_ctx. >> Something like so? > > I think that should work. My only minor concern is that putting the waiter > spinner at the end of the OSQ will take it longer to get the lock, but that > shouldn't be a big issue. > >> --- >> kernel/locking/mutex.c | 15 +++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c >> index 0551c219c40e..070a0ac34aa7 100644 >> --- a/kernel/locking/mutex.c >> +++ b/kernel/locking/mutex.c >> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> struct task_struct *task = current; >> struct mutex_waiter waiter; >> unsigned long flags; >> +bool acquired; >> int ret; >> >> preempt_disable(); >> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> lock_contended(>dep_map, ip); >> >> for (;;) { >> +acquired = false; >> /* >>* Lets try to take the lock again - this is needed even if >>* we get here for the first time (shortly after failing to >> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> /* didn't get the lock, go to sleep: */ >> spin_unlock_mutex(>wait_lock, flags); >> schedule_preempt_disabled(); >> + >> +if (mutex_is_locked(lock)) >> +acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); >> + >> spin_lock_mutex(>wait_lock, flags); >> + >> +if (acquired) { >> +atomic_set(>count, -1); >> +break; >> +} >> } >> __set_task_state(task, TASK_RUNNING); >> >> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> atomic_set(>count, 0); >> debug_mutex_free_waiter(); >> >> +if (acquired) >> +goto unlock; >> + >> skip_wait: >> /* got the lock - cleanup and rejoice! */ >> lock_acquired(>dep_map, ip); >> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, >> unsigned int subclass, >> ww_mutex_set_context_slowpath(ww, ww_ctx); >> } >> >> +unlock: >> spin_unlock_mutex(>wait_lock, flags); >> preempt_enable(); >> return 0; > > Cheers, > Longman > looks good to me, I will try this solution and report the result, thanks everyone. Ding > . >
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, 22 Jan 2016, Waiman Long wrote: The patch that I sent out is just a proof of concept to make sure that it can fix that particular case. I do plan to refactor it if I decide to go ahead with an official one. Unlike the OSQ, there can be no more than one waiter spinner as the wakeup function is directed to only the first task in the wait list and the spinning won't happen until the task is first woken up. In the worst case scenario, there are only 2 spinners spinning on the lock and the owner field, one from OSQ and one from the wait list. That shouldn't put too much cacheline contention traffic to the system. Similarly, I guess we should also wakeup the next waiter in line after releasing the wait_lock via wake_q. This would allow the woken waiter a slightly better chance of finding the wait_lock free when continuing to take the mutex. Thanks, Davidlohr
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 06:06 AM, Peter Zijlstra wrote: On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: There might be other details, but this is the one that stood out. I think this also does the wrong thing for use_ww_ctx. Something like so? I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue. --- kernel/locking/mutex.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c219c40e..070a0ac34aa7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(>wait_lock, flags); + + if (acquired) { + atomic_set(>count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, atomic_set(>count, 0); debug_mutex_free_waiter(); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(>dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(>wait_lock, flags); preempt_enable(); return 0; Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 03:54 AM, Peter Zijlstra wrote: On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. So I think having the top waiter going back in to contend on the OSQ is an excellent idea, but I'm not sure the wlh_spinning thing is important. Yes, that is optional. I put it there just to make it is more likely for the waiter spinner to get the lock. Without that, the chance will be 50/50 on average. I can certainly take that out. The OSQ itself is FIFO fair, and the waiters retain the wait_list position. So having the top wait_list entry contending on the OSQ ensures we cannot starve (I think). Also, as Davidlohr said, we cannot copy/paste this much code. As I said in the previous mail, I do intend to refactor it before sending out the official patch. Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 01:09 AM, Davidlohr Bueso wrote: On Thu, 21 Jan 2016, Waiman Long wrote: On 01/21/2016 04:29 AM, Ding Tianhong wrote: I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. So this has been somewhat always known, at least in theory, until now. It's the cost of spinning without going through the wait-queue, unlike other locks. [...] From: Waiman Long Date: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. And one of the reasons why we never bothered 'fixing' things was the additional branching out in the slowpath (and lack of real issue, although this one being so damn pathological). I fear that your approach is one of those scenarios where the code ends up being bloated, albeit most of it is actually duplicated and can be refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again... phew. Not to mention the performance implications, ie loosing the benefits of osq over waiter spinning in scenarios that would otherwise have more osq spinners as opposed to waiter spinners, or in setups where it is actually best to block instead of spinning. The patch that I sent out is just a proof of concept to make sure that it can fix that particular case. I do plan to refactor it if I decide to go ahead with an official one. Unlike the OSQ, there can be no more than one waiter spinner as the wakeup function is directed to only the first task in the wait list and the spinning won't happen until the task is first woken up. In the worst case scenario, there are only 2 spinners spinning on the lock and the owner field, one from OSQ and one from the wait list. That shouldn't put too much cacheline contention traffic to the system. Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: > On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > > > There might be other details, but this is the one that stood out. > > I think this also does the wrong thing for use_ww_ctx. Something like so? --- kernel/locking/mutex.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c219c40e..070a0ac34aa7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(>wait_lock, flags); + + if (acquired) { + atomic_set(>count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, atomic_set(>count, 0); debug_mutex_free_waiter(); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(>dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(>wait_lock, flags); preempt_enable(); return 0;
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > There might be other details, but this is the one that stood out. I think this also does the wrong thing for use_ww_ctx.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote: > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, > unsigned int subclass, > lock_contended(>dep_map, ip); > > for (;;) { > + bool acquired = false; > + > /* >* Lets try to take the lock again - this is needed even if >* we get here for the first time (shortly after failing to > @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, > unsigned int subclass, > /* didn't get the lock, go to sleep: */ > spin_unlock_mutex(>wait_lock, flags); > schedule_preempt_disabled(); > + > + if (mutex_is_locked(lock)) > + acquired = mutex_optimistic_spin(lock, ww_ctx, > use_ww_ctx); > spin_lock_mutex(>wait_lock, flags); > + if (acquired) > + break; > } > __set_task_state(task, TASK_RUNNING); I think the problem here is that mutex_optimistic_spin() leaves the mutex->count == 0, even though we have waiters (us at the very least). But this should be easily fixed, since if we acquired, we should be the one releasing, so there's no race. So something like so: if (acquired) { atomic_set(>count, -1); break; } Should deal with that -- we'll set it to 0 again a little further down if the list ends up empty. There might be other details, but this is the one that stood out.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > > This patch attempts to fix this live-lock condition by enabling the > > a woken task in the wait list to enter optimistic spinning loop itself > > with precedence over the ones in the OSQ. This should prevent the > > live-lock > > condition from happening. > > > So I think having the top waiter going back in to contend on the OSQ is > an excellent idea, but I'm not sure the wlh_spinning thing is important. > > The OSQ itself is FIFO fair, and the waiters retain the wait_list > position. So having the top wait_list entry contending on the OSQ > ensures we cannot starve (I think). Right, and we can also avoid needing to add that extra field to the mutex structure. Before calling optimistic spinning, we do want to check if the lock is available to avoid unnecessary OSQ overhead though. So maybe the following would be sufficient: --- kernel/locking/mutex.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..ead0bd1 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + bool acquired = false; + /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); spin_lock_mutex(>wait_lock, flags); + if (acquired) + break; } __set_task_state(task, TASK_RUNNING); -- 1.7.2.5
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > This patch attempts to fix this live-lock condition by enabling the > a woken task in the wait list to enter optimistic spinning loop itself > with precedence over the ones in the OSQ. This should prevent the > live-lock > condition from happening. So I think having the top waiter going back in to contend on the OSQ is an excellent idea, but I'm not sure the wlh_spinning thing is important. The OSQ itself is FIFO fair, and the waiters retain the wait_list position. So having the top wait_list entry contending on the OSQ ensures we cannot starve (I think). Also, as Davidlohr said, we cannot copy/paste this much code.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > There might be other details, but this is the one that stood out. I think this also does the wrong thing for use_ww_ctx.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: > On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > > > There might be other details, but this is the one that stood out. > > I think this also does the wrong thing for use_ww_ctx. Something like so? --- kernel/locking/mutex.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c219c40e..070a0ac34aa7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(>wait_lock, flags); + + if (acquired) { + atomic_set(>count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, atomic_set(>count, 0); debug_mutex_free_waiter(); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(>dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(>wait_lock, flags); preempt_enable(); return 0;
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > > This patch attempts to fix this live-lock condition by enabling the > > a woken task in the wait list to enter optimistic spinning loop itself > > with precedence over the ones in the OSQ. This should prevent the > > live-lock > > condition from happening. > > > So I think having the top waiter going back in to contend on the OSQ is > an excellent idea, but I'm not sure the wlh_spinning thing is important. > > The OSQ itself is FIFO fair, and the waiters retain the wait_list > position. So having the top wait_list entry contending on the OSQ > ensures we cannot starve (I think). Right, and we can also avoid needing to add that extra field to the mutex structure. Before calling optimistic spinning, we do want to check if the lock is available to avoid unnecessary OSQ overhead though. So maybe the following would be sufficient: --- kernel/locking/mutex.c |7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..ead0bd1 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + bool acquired = false; + /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); spin_lock_mutex(>wait_lock, flags); + if (acquired) + break; } __set_task_state(task, TASK_RUNNING); -- 1.7.2.5
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote: > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, > unsigned int subclass, > lock_contended(>dep_map, ip); > > for (;;) { > + bool acquired = false; > + > /* >* Lets try to take the lock again - this is needed even if >* we get here for the first time (shortly after failing to > @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, > unsigned int subclass, > /* didn't get the lock, go to sleep: */ > spin_unlock_mutex(>wait_lock, flags); > schedule_preempt_disabled(); > + > + if (mutex_is_locked(lock)) > + acquired = mutex_optimistic_spin(lock, ww_ctx, > use_ww_ctx); > spin_lock_mutex(>wait_lock, flags); > + if (acquired) > + break; > } > __set_task_state(task, TASK_RUNNING); I think the problem here is that mutex_optimistic_spin() leaves the mutex->count == 0, even though we have waiters (us at the very least). But this should be easily fixed, since if we acquired, we should be the one releasing, so there's no race. So something like so: if (acquired) { atomic_set(>count, -1); break; } Should deal with that -- we'll set it to 0 again a little further down if the list ends up empty. There might be other details, but this is the one that stood out.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > This patch attempts to fix this live-lock condition by enabling the > a woken task in the wait list to enter optimistic spinning loop itself > with precedence over the ones in the OSQ. This should prevent the > live-lock > condition from happening. So I think having the top waiter going back in to contend on the OSQ is an excellent idea, but I'm not sure the wlh_spinning thing is important. The OSQ itself is FIFO fair, and the waiters retain the wait_list position. So having the top wait_list entry contending on the OSQ ensures we cannot starve (I think). Also, as Davidlohr said, we cannot copy/paste this much code.
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 03:54 AM, Peter Zijlstra wrote: On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. So I think having the top waiter going back in to contend on the OSQ is an excellent idea, but I'm not sure the wlh_spinning thing is important. Yes, that is optional. I put it there just to make it is more likely for the waiter spinner to get the lock. Without that, the chance will be 50/50 on average. I can certainly take that out. The OSQ itself is FIFO fair, and the waiters retain the wait_list position. So having the top wait_list entry contending on the OSQ ensures we cannot starve (I think). Also, as Davidlohr said, we cannot copy/paste this much code. As I said in the previous mail, I do intend to refactor it before sending out the official patch. Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 06:06 AM, Peter Zijlstra wrote: On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: There might be other details, but this is the one that stood out. I think this also does the wrong thing for use_ww_ctx. Something like so? I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue. --- kernel/locking/mutex.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c219c40e..070a0ac34aa7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(>dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(>wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(>wait_lock, flags); + + if (acquired) { + atomic_set(>count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, atomic_set(>count, 0); debug_mutex_free_waiter(); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(>dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(>wait_lock, flags); preempt_enable(); return 0; Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/22/2016 01:09 AM, Davidlohr Bueso wrote: On Thu, 21 Jan 2016, Waiman Long wrote: On 01/21/2016 04:29 AM, Ding Tianhong wrote: I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. So this has been somewhat always known, at least in theory, until now. It's the cost of spinning without going through the wait-queue, unlike other locks. [...] From: Waiman LongDate: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. And one of the reasons why we never bothered 'fixing' things was the additional branching out in the slowpath (and lack of real issue, although this one being so damn pathological). I fear that your approach is one of those scenarios where the code ends up being bloated, albeit most of it is actually duplicated and can be refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again... phew. Not to mention the performance implications, ie loosing the benefits of osq over waiter spinning in scenarios that would otherwise have more osq spinners as opposed to waiter spinners, or in setups where it is actually best to block instead of spinning. The patch that I sent out is just a proof of concept to make sure that it can fix that particular case. I do plan to refactor it if I decide to go ahead with an official one. Unlike the OSQ, there can be no more than one waiter spinner as the wakeup function is directed to only the first task in the wait list and the spinning won't happen until the task is first woken up. In the worst case scenario, there are only 2 spinners spinning on the lock and the owner field, one from OSQ and one from the wait list. That shouldn't put too much cacheline contention traffic to the system. Cheers, Longman
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Fri, 22 Jan 2016, Waiman Long wrote: The patch that I sent out is just a proof of concept to make sure that it can fix that particular case. I do plan to refactor it if I decide to go ahead with an official one. Unlike the OSQ, there can be no more than one waiter spinner as the wakeup function is directed to only the first task in the wait list and the spinning won't happen until the task is first woken up. In the worst case scenario, there are only 2 spinners spinning on the lock and the owner field, one from OSQ and one from the wait list. That shouldn't put too much cacheline contention traffic to the system. Similarly, I guess we should also wakeup the next waiter in line after releasing the wait_lock via wake_q. This would allow the woken waiter a slightly better chance of finding the wait_lock free when continuing to take the mutex. Thanks, Davidlohr
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 21 Jan 2016, Waiman Long wrote: On 01/21/2016 04:29 AM, Ding Tianhong wrote: I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. So this has been somewhat always known, at least in theory, until now. It's the cost of spinning without going through the wait-queue, unlike other locks. [...] From: Waiman Long Date: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. And one of the reasons why we never bothered 'fixing' things was the additional branching out in the slowpath (and lack of real issue, although this one being so damn pathological). I fear that your approach is one of those scenarios where the code ends up being bloated, albeit most of it is actually duplicated and can be refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again... phew. Not to mention the performance implications, ie loosing the benefits of osq over waiter spinning in scenarios that would otherwise have more osq spinners as opposed to waiter spinners, or in setups where it is actually best to block instead of spinning. Thanks, Davidlohr
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote: > On Thu, 21 Jan 2016, Paul E. McKenney wrote: > > >I did some testing, which exposed it to the 0day test robot, which > >did note some performance differences. I was hoping that it would > >clear up some instability from other patches, but no such luck. ;-) > > Oh, that explains why we got a performance regression report :) Plus I suspected that you wanted some extra email. ;-) Thanx, Paul
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 21 Jan 2016, Paul E. McKenney wrote: I did some testing, which exposed it to the 0day test robot, which did note some performance differences. I was hoping that it would clear up some instability from other patches, but no such luck. ;-) Oh, that explains why we got a performance regression report :) Thanks, Davidlohr
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote: > On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > > > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > > index 0551c21..596b341 100644 > > --- a/kernel/locking/mutex.c > > +++ b/kernel/locking/mutex.c > > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > > *lock) > > struct task_struct *owner; > > int retval = 1; > > > > - if (need_resched()) > > + if (need_resched() || atomic_read(>count) == -1) > > return 0; > > > > One concern I have is this change will eliminate any optimistic spinning > as long as there is a waiter. Is there a middle ground that we > can allow only one spinner if there are waiters? > > In other words, we allow spinning when > atomic_read(>count) == -1 but there is no one on the > osq lock that queue up the spinners (i.e. no other process doing > optimistic spinning). > > This could allow a bit of spinning without starving out the waiters. I did some testing, which exposed it to the 0day test robot, which did note some performance differences. I was hoping that it would clear up some instability from other patches, but no such luck. ;-) Thanx, Paul
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/21/2016 04:29 AM, Ding Tianhong wrote: I build a script to create several process for ioctl loop calling, the ioctl will calling the kernel function just like: xx_ioctl { ... rtnl_lock(); function(); rtnl_unlock(); ... } The function may sleep several ms, but will not halt, at the same time another user service may calling ifconfig to change the state of the ethernet, and after several hours, the hung task thread report this problem: 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. [149738.040597] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 [149738.042284] 88052449bd40 0082 88053a33f300 88052449bfd8 [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 819e6240 [149738.042288] 819e6244 88053a33f300 819e6248 [149738.042290] Call Trace: [149738.042300] [] schedule_preempt_disabled+0x29/0x70 [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 [149738.042305] [] mutex_lock+0x1f/0x2f [149738.042309] [] rtnl_lock+0x15/0x20 [149738.042311] [] dev_ioctl+0xda/0x590 [149738.042314] [] ? __do_page_fault+0x21c/0x560 [149738.042318] [] sock_do_ioctl+0x45/0x50 [149738.042320] [] sock_ioctl+0x1f0/0x2c0 [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 [149738.042327] [] ? fget_light+0xa0/0xd0 cut here I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. Signed-off-by: Ding Tianhong Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Davidlohr Bueso Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Thomas Gleixner Cc: Will Deacon Cc: Jason Low Cc: Tim Chen Cc: Waiman Long --- kernel/locking/mutex.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..596b341 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) struct task_struct *owner; int retval = 1; - if (need_resched()) + if (need_resched() || atomic_read(>count) == -1) return 0; rcu_read_lock(); @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock) /* * Optimistic spinning. * - * We try to spin for acquisition when we find that the lock owner - * is currently running on a (different) CPU and while we don't - * need to reschedule. The rationale is that if the lock owner is - * running, it is likely to release the lock soon. + * We try to spin for acquisition when we find that there are no + * pending waiters and the lock owner is currently running on a + * (different) CPU and while we don't need to reschedule. The + * rationale is that if the lock owner is running, it is likely + * to release the lock soon. * * Since this needs the lock owner, and this mutex implementation * doesn't track the owner atomically in the lock field, we need to This patch will largely defeat the performance benefit of optimistic spinning. I have an alternative solution to this live-lock problem. Would you mind trying out the attached patch to see if it can fix your problem? Cheers, Longman From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. Signed-off-by: Waiman Long --- include/linux/mutex.h |2 + kernel/locking/mutex.c | 95 +++- 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 2cb7531..2c55ecd 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -57,6 +57,8 @@ struct mutex { #endif #ifdef CONFIG_MUTEX_SPIN_ON_OWNER struct optimistic_spin_queue osq; /* Spinner MCS lock */ + /*
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(>count) == -1) > return 0; > One concern I have is this change will eliminate any optimistic spinning as long as there is a waiter. Is there a middle ground that we can allow only one spinner if there are waiters? In other words, we allow spinning when atomic_read(>count) == -1 but there is no one on the osq lock that queue up the spinners (i.e. no other process doing optimistic spinning). This could allow a bit of spinning without starving out the waiters. Tim
Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/21 15:29, Ingo Molnar wrote: > > Cc:-ed other gents who touched the mutex code recently. Mail quoted below. > Ok, thanks. Ding > Thanks, > > Ingo > > * Ding Tianhong wrote: > >> I build a script to create several process for ioctl loop calling, >> the ioctl will calling the kernel function just like: >> xx_ioctl { >> ... >> rtnl_lock(); >> function(); >> rtnl_unlock(); >> ... >> } >> The function may sleep several ms, but will not halt, at the same time >> another user service may calling ifconfig to change the state of the >> ethernet, and after several hours, the hung task thread report this problem: >> >> >> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. >> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 >> [149738.042284] 88052449bd40 0082 88053a33f300 >> 88052449bfd8 >> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 >> 819e6240 >> [149738.042288] 819e6244 88053a33f300 >> 819e6248 >> [149738.042290] Call Trace: >> [149738.042300] [] schedule_preempt_disabled+0x29/0x70 >> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 >> [149738.042305] [] mutex_lock+0x1f/0x2f >> [149738.042309] [] rtnl_lock+0x15/0x20 >> [149738.042311] [] dev_ioctl+0xda/0x590 >> [149738.042314] [] ? __do_page_fault+0x21c/0x560 >> [149738.042318] [] sock_do_ioctl+0x45/0x50 >> [149738.042320] [] sock_ioctl+0x1f0/0x2c0 >> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 >> [149738.042327] [] ? fget_light+0xa0/0xd0 >> >> cut here >> >> I got the vmcore and found that the ifconfig is already in the wait_list of >> the >> rtnl_lock for 120 second, but my process could get and release the rtnl_lock >> normally several times in one second, so it means that my process jump the >> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex >> lock >> slow path and found that the mutex may spin on owner ignore whether the >> wait list >> is empty, it will cause the task in the wait list always be cut in line, so >> add >> test for wait list in the mutex_can_spin_on_owner and avoid this problem. >> >> Signed-off-by: Ding Tianhong >> --- >> kernel/locking/mutex.c | 11 ++- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c >> index 0551c21..596b341 100644 >> --- a/kernel/locking/mutex.c >> +++ b/kernel/locking/mutex.c >> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex >> *lock) >> struct task_struct *owner; >> int retval = 1; >> >> -if (need_resched()) >> +if (need_resched() || atomic_read(>count) == -1) >> return 0; >> >> rcu_read_lock(); >> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex >> *lock) >> /* >> * Optimistic spinning. >> * >> - * We try to spin for acquisition when we find that the lock owner >> - * is currently running on a (different) CPU and while we don't >> - * need to reschedule. The rationale is that if the lock owner is >> - * running, it is likely to release the lock soon. >> + * We try to spin for acquisition when we find that there are no >> + * pending waiters and the lock owner is currently running on a >> + * (different) CPU and while we don't need to reschedule. The >> + * rationale is that if the lock owner is running, it is likely >> + * to release the lock soon. >> * >> * Since this needs the lock owner, and this mutex implementation >> * doesn't track the owner atomically in the lock field, we need to >> -- >> 2.5.0 >> >> > > . >
Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
On 2016/1/21 15:29, Ingo Molnar wrote: > > Cc:-ed other gents who touched the mutex code recently. Mail quoted below. > Ok, thanks. Ding > Thanks, > > Ingo > > * Ding Tianhongwrote: > >> I build a script to create several process for ioctl loop calling, >> the ioctl will calling the kernel function just like: >> xx_ioctl { >> ... >> rtnl_lock(); >> function(); >> rtnl_unlock(); >> ... >> } >> The function may sleep several ms, but will not halt, at the same time >> another user service may calling ifconfig to change the state of the >> ethernet, and after several hours, the hung task thread report this problem: >> >> >> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. >> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 >> [149738.042284] 88052449bd40 0082 88053a33f300 >> 88052449bfd8 >> [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 >> 819e6240 >> [149738.042288] 819e6244 88053a33f300 >> 819e6248 >> [149738.042290] Call Trace: >> [149738.042300] [] schedule_preempt_disabled+0x29/0x70 >> [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 >> [149738.042305] [] mutex_lock+0x1f/0x2f >> [149738.042309] [] rtnl_lock+0x15/0x20 >> [149738.042311] [] dev_ioctl+0xda/0x590 >> [149738.042314] [] ? __do_page_fault+0x21c/0x560 >> [149738.042318] [] sock_do_ioctl+0x45/0x50 >> [149738.042320] [] sock_ioctl+0x1f0/0x2c0 >> [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 >> [149738.042327] [] ? fget_light+0xa0/0xd0 >> >> cut here >> >> I got the vmcore and found that the ifconfig is already in the wait_list of >> the >> rtnl_lock for 120 second, but my process could get and release the rtnl_lock >> normally several times in one second, so it means that my process jump the >> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex >> lock >> slow path and found that the mutex may spin on owner ignore whether the >> wait list >> is empty, it will cause the task in the wait list always be cut in line, so >> add >> test for wait list in the mutex_can_spin_on_owner and avoid this problem. >> >> Signed-off-by: Ding Tianhong >> --- >> kernel/locking/mutex.c | 11 ++- >> 1 file changed, 6 insertions(+), 5 deletions(-) >> >> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c >> index 0551c21..596b341 100644 >> --- a/kernel/locking/mutex.c >> +++ b/kernel/locking/mutex.c >> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex >> *lock) >> struct task_struct *owner; >> int retval = 1; >> >> -if (need_resched()) >> +if (need_resched() || atomic_read(>count) == -1) >> return 0; >> >> rcu_read_lock(); >> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex >> *lock) >> /* >> * Optimistic spinning. >> * >> - * We try to spin for acquisition when we find that the lock owner >> - * is currently running on a (different) CPU and while we don't >> - * need to reschedule. The rationale is that if the lock owner is >> - * running, it is likely to release the lock soon. >> + * We try to spin for acquisition when we find that there are no >> + * pending waiters and the lock owner is currently running on a >> + * (different) CPU and while we don't need to reschedule. The >> + * rationale is that if the lock owner is running, it is likely >> + * to release the lock soon. >> * >> * Since this needs the lock owner, and this mutex implementation >> * doesn't track the owner atomically in the lock field, we need to >> -- >> 2.5.0 >> >> > > . >
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On 01/21/2016 04:29 AM, Ding Tianhong wrote: I build a script to create several process for ioctl loop calling, the ioctl will calling the kernel function just like: xx_ioctl { ... rtnl_lock(); function(); rtnl_unlock(); ... } The function may sleep several ms, but will not halt, at the same time another user service may calling ifconfig to change the state of the ethernet, and after several hours, the hung task thread report this problem: 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. [149738.040597] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 [149738.042284] 88052449bd40 0082 88053a33f300 88052449bfd8 [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 819e6240 [149738.042288] 819e6244 88053a33f300 819e6248 [149738.042290] Call Trace: [149738.042300] [] schedule_preempt_disabled+0x29/0x70 [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 [149738.042305] [] mutex_lock+0x1f/0x2f [149738.042309] [] rtnl_lock+0x15/0x20 [149738.042311] [] dev_ioctl+0xda/0x590 [149738.042314] [] ? __do_page_fault+0x21c/0x560 [149738.042318] [] sock_do_ioctl+0x45/0x50 [149738.042320] [] sock_ioctl+0x1f0/0x2c0 [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 [149738.042327] [] ? fget_light+0xa0/0xd0 cut here I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. Signed-off-by: Ding TianhongCc: Ingo Molnar Cc: Peter Zijlstra Cc: Davidlohr Bueso Cc: Linus Torvalds Cc: Paul E. McKenney Cc: Thomas Gleixner Cc: Will Deacon Cc: Jason Low Cc: Tim Chen Cc: Waiman Long --- kernel/locking/mutex.c | 11 ++- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..596b341 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) struct task_struct *owner; int retval = 1; - if (need_resched()) + if (need_resched() || atomic_read(>count) == -1) return 0; rcu_read_lock(); @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock) /* * Optimistic spinning. * - * We try to spin for acquisition when we find that the lock owner - * is currently running on a (different) CPU and while we don't - * need to reschedule. The rationale is that if the lock owner is - * running, it is likely to release the lock soon. + * We try to spin for acquisition when we find that there are no + * pending waiters and the lock owner is currently running on a + * (different) CPU and while we don't need to reschedule. The + * rationale is that if the lock owner is running, it is likely + * to release the lock soon. * * Since this needs the lock owner, and this mutex implementation * doesn't track the owner atomically in the lock field, we need to This patch will largely defeat the performance benefit of optimistic spinning. I have an alternative solution to this live-lock problem. Would you mind trying out the attached patch to see if it can fix your problem? Cheers, Longman From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001 From: Waiman Long Date: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. Signed-off-by: Waiman Long --- include/linux/mutex.h |2 + kernel/locking/mutex.c | 95 +++- 2 files changed, 95 insertions(+), 2 deletions(-) diff --git
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(>count) == -1) > return 0; > One concern I have is this change will eliminate any optimistic spinning as long as there is a waiter. Is there a middle ground that we can allow only one spinner if there are waiters? In other words, we allow spinning when atomic_read(>count) == -1 but there is no one on the osq lock that queue up the spinners (i.e. no other process doing optimistic spinning). This could allow a bit of spinning without starving out the waiters. Tim
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 21 Jan 2016, Paul E. McKenney wrote: I did some testing, which exposed it to the 0day test robot, which did note some performance differences. I was hoping that it would clear up some instability from other patches, but no such luck. ;-) Oh, that explains why we got a performance regression report :) Thanks, Davidlohr
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote: > On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > > > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > > index 0551c21..596b341 100644 > > --- a/kernel/locking/mutex.c > > +++ b/kernel/locking/mutex.c > > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > > *lock) > > struct task_struct *owner; > > int retval = 1; > > > > - if (need_resched()) > > + if (need_resched() || atomic_read(>count) == -1) > > return 0; > > > > One concern I have is this change will eliminate any optimistic spinning > as long as there is a waiter. Is there a middle ground that we > can allow only one spinner if there are waiters? > > In other words, we allow spinning when > atomic_read(>count) == -1 but there is no one on the > osq lock that queue up the spinners (i.e. no other process doing > optimistic spinning). > > This could allow a bit of spinning without starving out the waiters. I did some testing, which exposed it to the 0day test robot, which did note some performance differences. I was hoping that it would clear up some instability from other patches, but no such luck. ;-) Thanx, Paul
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote: > On Thu, 21 Jan 2016, Paul E. McKenney wrote: > > >I did some testing, which exposed it to the 0day test robot, which > >did note some performance differences. I was hoping that it would > >clear up some instability from other patches, but no such luck. ;-) > > Oh, that explains why we got a performance regression report :) Plus I suspected that you wanted some extra email. ;-) Thanx, Paul
Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
On Thu, 21 Jan 2016, Waiman Long wrote: On 01/21/2016 04:29 AM, Ding Tianhong wrote: I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. So this has been somewhat always known, at least in theory, until now. It's the cost of spinning without going through the wait-queue, unlike other locks. [...] From: Waiman LongDate: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. And one of the reasons why we never bothered 'fixing' things was the additional branching out in the slowpath (and lack of real issue, although this one being so damn pathological). I fear that your approach is one of those scenarios where the code ends up being bloated, albeit most of it is actually duplicated and can be refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again... phew. Not to mention the performance implications, ie loosing the benefits of osq over waiter spinning in scenarios that would otherwise have more osq spinners as opposed to waiter spinners, or in setups where it is actually best to block instead of spinning. Thanks, Davidlohr
Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
Cc:-ed other gents who touched the mutex code recently. Mail quoted below. Thanks, Ingo * Ding Tianhong wrote: > I build a script to create several process for ioctl loop calling, > the ioctl will calling the kernel function just like: > xx_ioctl { > ... > rtnl_lock(); > function(); > rtnl_unlock(); > ... > } > The function may sleep several ms, but will not halt, at the same time > another user service may calling ifconfig to change the state of the > ethernet, and after several hours, the hung task thread report this problem: > > > 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. > [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 > [149738.042284] 88052449bd40 0082 88053a33f300 > 88052449bfd8 > [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 > 819e6240 > [149738.042288] 819e6244 88053a33f300 > 819e6248 > [149738.042290] Call Trace: > [149738.042300] [] schedule_preempt_disabled+0x29/0x70 > [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 > [149738.042305] [] mutex_lock+0x1f/0x2f > [149738.042309] [] rtnl_lock+0x15/0x20 > [149738.042311] [] dev_ioctl+0xda/0x590 > [149738.042314] [] ? __do_page_fault+0x21c/0x560 > [149738.042318] [] sock_do_ioctl+0x45/0x50 > [149738.042320] [] sock_ioctl+0x1f0/0x2c0 > [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 > [149738.042327] [] ? fget_light+0xa0/0xd0 > > cut here > > I got the vmcore and found that the ifconfig is already in the wait_list of > the > rtnl_lock for 120 second, but my process could get and release the rtnl_lock > normally several times in one second, so it means that my process jump the > queue and the ifconfig couldn't get the rtnl all the time, I check the mutex > lock > slow path and found that the mutex may spin on owner ignore whether the wait > list > is empty, it will cause the task in the wait list always be cut in line, so > add > test for wait list in the mutex_can_spin_on_owner and avoid this problem. > > Signed-off-by: Ding Tianhong > --- > kernel/locking/mutex.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(>count) == -1) > return 0; > > rcu_read_lock(); > @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex > *lock) > /* > * Optimistic spinning. > * > - * We try to spin for acquisition when we find that the lock owner > - * is currently running on a (different) CPU and while we don't > - * need to reschedule. The rationale is that if the lock owner is > - * running, it is likely to release the lock soon. > + * We try to spin for acquisition when we find that there are no > + * pending waiters and the lock owner is currently running on a > + * (different) CPU and while we don't need to reschedule. The > + * rationale is that if the lock owner is running, it is likely > + * to release the lock soon. > * > * Since this needs the lock owner, and this mutex implementation > * doesn't track the owner atomically in the lock field, we need to > -- > 2.5.0 > >
Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
Cc:-ed other gents who touched the mutex code recently. Mail quoted below. Thanks, Ingo * Ding Tianhongwrote: > I build a script to create several process for ioctl loop calling, > the ioctl will calling the kernel function just like: > xx_ioctl { > ... > rtnl_lock(); > function(); > rtnl_unlock(); > ... > } > The function may sleep several ms, but will not halt, at the same time > another user service may calling ifconfig to change the state of the > ethernet, and after several hours, the hung task thread report this problem: > > > 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. > [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > [149738.042280] ifconfig D 88061ec13680 0 11890 11573 0x0080 > [149738.042284] 88052449bd40 0082 88053a33f300 > 88052449bfd8 > [149738.042286] 88052449bfd8 88052449bfd8 88053a33f300 > 819e6240 > [149738.042288] 819e6244 88053a33f300 > 819e6248 > [149738.042290] Call Trace: > [149738.042300] [] schedule_preempt_disabled+0x29/0x70 > [149738.042303] [] __mutex_lock_slowpath+0xc5/0x1c0 > [149738.042305] [] mutex_lock+0x1f/0x2f > [149738.042309] [] rtnl_lock+0x15/0x20 > [149738.042311] [] dev_ioctl+0xda/0x590 > [149738.042314] [] ? __do_page_fault+0x21c/0x560 > [149738.042318] [] sock_do_ioctl+0x45/0x50 > [149738.042320] [] sock_ioctl+0x1f0/0x2c0 > [149738.042324] [] do_vfs_ioctl+0x2e5/0x4c0 > [149738.042327] [] ? fget_light+0xa0/0xd0 > > cut here > > I got the vmcore and found that the ifconfig is already in the wait_list of > the > rtnl_lock for 120 second, but my process could get and release the rtnl_lock > normally several times in one second, so it means that my process jump the > queue and the ifconfig couldn't get the rtnl all the time, I check the mutex > lock > slow path and found that the mutex may spin on owner ignore whether the wait > list > is empty, it will cause the task in the wait list always be cut in line, so > add > test for wait list in the mutex_can_spin_on_owner and avoid this problem. > > Signed-off-by: Ding Tianhong > --- > kernel/locking/mutex.c | 11 ++- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex > *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(>count) == -1) > return 0; > > rcu_read_lock(); > @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex > *lock) > /* > * Optimistic spinning. > * > - * We try to spin for acquisition when we find that the lock owner > - * is currently running on a (different) CPU and while we don't > - * need to reschedule. The rationale is that if the lock owner is > - * running, it is likely to release the lock soon. > + * We try to spin for acquisition when we find that there are no > + * pending waiters and the lock owner is currently running on a > + * (different) CPU and while we don't need to reschedule. The > + * rationale is that if the lock owner is running, it is likely > + * to release the lock soon. > * > * Since this needs the lock owner, and this mutex implementation > * doesn't track the owner atomically in the lock field, we need to > -- > 2.5.0 > >