Re: Turn SCHED_LOCK() into a mutex
> > This work has been started by art@ more than a decade ago and I'm > > willing to finish it This is possibly one of the scariest things you can say in OpenBSD. I am now calling my doctor to get a giant bag of flintstones chewable zoloft prescribed to me just so I can recover from seeing you say this. (I do support your efforts doing this however.. tread carefully :)
Re: Turn SCHED_LOCK() into a mutex
Am Wed, Nov 04, 2020 at 09:13:22AM -0300 schrieb Martin Pieuchot: > Diff below removes the recursive attribute of the SCHED_LOCK() by > turning it into a IPL_NONE mutex. I'm not intending to commit it > yet as it raises multiple questions, see below. > > This work has been started by art@ more than a decade ago and I'm > willing to finish it as I believe it's the easiest way to reduce > the scope of this lock. Having a global mutex is the first step to > have a per runqueue and per sleepqueue mutex. > > This is also a way to avoid lock ordering problems exposed by the recent > races in single_thread_set(). > > About the diff: > > The diff below includes a (hugly) refactoring of rw_exit() to avoid a > recursion on the SCHED_LOCK(). In this case the lock is used to protect > the global sleepqueue and is grabbed in sleep_setup(). > > The same pattern can be observed in single_thread_check(). However in > this case the lock is used to protect different fields so there's no > "recursive access" to the same data structure. > > assertwaitok() has been moved down in mi_switch() which isn't ideal. > > It becomes obvious that the per-CPU and per-thread accounting fields > updated in mi_switch() won't need a separate mutex as proposed last > year and that splitting this global mutex will be enough. > > It's unclear to me if/how WITNESS should be modified to handle this lock > change. > > This has been tested on sparc64 and amd64. I'm not convinced it exposed > all the recursions. So if you want to give it a go and can break it, it > is more than welcome. > > Comments? Questions? I think that's a good direction. This then allows us to add a few smaller mutexes and move some users to those. In the meantime apparently some stuff has changed a little bit. I had to fix one reject, which was very easy because only the context around the diff changed a tiny bit. I then found one schedlock recursion: single_thread_check() takes the sched lock, which we call through sleep_signal_check(). We do the call twice, one time with the sched lock, and one time without. My 'quick' fix is to introduce a locked version that calls single_thread_check_locked(). There might be a nicer way to do that, which I don't know yet, but this diff seems to work fine on my arm64 machine with amdgpu. Build times increase a little, which might be because mutexes are not 'fair' like mplocks. If we are looking for 'instant gratification' we might have to do two steps at once (mplock -> mtx + move contention to multiple mtxs). In any case, I believe this is a good step to go and I'll have a look at reducing the contention. Patrick diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c index 1e51f71301a..4e4e1fc2a27 100644 --- a/sys/kern/kern_fork.c +++ b/sys/kern/kern_fork.c @@ -671,7 +671,7 @@ void proc_trampoline_mp(void) { SCHED_ASSERT_LOCKED(); - __mp_unlock(&sched_lock); + mtx_leave(&sched_lock); spl0(); SCHED_ASSERT_UNLOCKED(); KERNEL_ASSERT_UNLOCKED(); diff --git a/sys/kern/kern_lock.c b/sys/kern/kern_lock.c index 5cc55bb256a..2206ac53321 100644 --- a/sys/kern/kern_lock.c +++ b/sys/kern/kern_lock.c @@ -97,9 +97,6 @@ ___mp_lock_init(struct __mp_lock *mpl, const struct lock_type *type) if (mpl == &kernel_lock) mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | LO_SLEEPABLE | (LO_CLASS_KERNEL_LOCK << LO_CLASSSHIFT); - else if (mpl == &sched_lock) - mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | - LO_RECURSABLE | (LO_CLASS_SCHED_LOCK << LO_CLASSSHIFT); WITNESS_INIT(&mpl->mpl_lock_obj, type); #endif } diff --git a/sys/kern/kern_rwlock.c b/sys/kern/kern_rwlock.c index d79b59748e8..2feeff16943 100644 --- a/sys/kern/kern_rwlock.c +++ b/sys/kern/kern_rwlock.c @@ -129,36 +129,6 @@ rw_enter_write(struct rwlock *rwl) } } -void -rw_exit_read(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_rdlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, 0); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, owner - RWLOCK_READ_INCR))) - rw_do_exit(rwl, 0); -} - -void -rw_exit_write(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_wrlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, LOP_EXCLUSIVE); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, 0))) - rw_do_exit(rwl, RWLOCK_WRLOCK); -} - #ifdef DIAGNOSTIC /* * Put the diagnostic functions here to keep the main code free @@ -313,9 +283,10 @@ retry: } void -rw_exit(struct rwlock *rwl) +_rw_exit(struct rwlock *rwl, int locked) { unsigned long wrlock; + unsigned long owner, set;
Turn SCHED_LOCK() into a mutex
Diff below removes the recursive attribute of the SCHED_LOCK() by turning it into a IPL_NONE mutex. I'm not intending to commit it yet as it raises multiple questions, see below. This work has been started by art@ more than a decade ago and I'm willing to finish it as I believe it's the easiest way to reduce the scope of this lock. Having a global mutex is the first step to have a per runqueue and per sleepqueue mutex. This is also a way to avoid lock ordering problems exposed by the recent races in single_thread_set(). About the diff: The diff below includes a (hugly) refactoring of rw_exit() to avoid a recursion on the SCHED_LOCK(). In this case the lock is used to protect the global sleepqueue and is grabbed in sleep_setup(). The same pattern can be observed in single_thread_check(). However in this case the lock is used to protect different fields so there's no "recursive access" to the same data structure. assertwaitok() has been moved down in mi_switch() which isn't ideal. It becomes obvious that the per-CPU and per-thread accounting fields updated in mi_switch() won't need a separate mutex as proposed last year and that splitting this global mutex will be enough. It's unclear to me if/how WITNESS should be modified to handle this lock change. This has been tested on sparc64 and amd64. I'm not convinced it exposed all the recursions. So if you want to give it a go and can break it, it is more than welcome. Comments? Questions? Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.226 diff -u -p -r1.226 kern_fork.c --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 +++ kern/kern_fork.c2 Nov 2020 10:50:24 - @@ -665,7 +665,7 @@ void proc_trampoline_mp(void) { SCHED_ASSERT_LOCKED(); - __mp_unlock(&sched_lock); + mtx_leave(&sched_lock); spl0(); SCHED_ASSERT_UNLOCKED(); KERNEL_ASSERT_UNLOCKED(); Index: kern/kern_lock.c === RCS file: /cvs/src/sys/kern/kern_lock.c,v retrieving revision 1.71 diff -u -p -r1.71 kern_lock.c --- kern/kern_lock.c5 Mar 2020 09:28:31 - 1.71 +++ kern/kern_lock.c2 Nov 2020 10:50:24 - @@ -97,9 +97,6 @@ ___mp_lock_init(struct __mp_lock *mpl, c if (mpl == &kernel_lock) mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | LO_SLEEPABLE | (LO_CLASS_KERNEL_LOCK << LO_CLASSSHIFT); - else if (mpl == &sched_lock) - mpl->mpl_lock_obj.lo_flags = LO_WITNESS | LO_INITIALIZED | - LO_RECURSABLE | (LO_CLASS_SCHED_LOCK << LO_CLASSSHIFT); WITNESS_INIT(&mpl->mpl_lock_obj, type); #endif } Index: kern/kern_rwlock.c === RCS file: /cvs/src/sys/kern/kern_rwlock.c,v retrieving revision 1.45 diff -u -p -r1.45 kern_rwlock.c --- kern/kern_rwlock.c 2 Mar 2020 17:07:49 - 1.45 +++ kern/kern_rwlock.c 2 Nov 2020 23:13:01 - @@ -128,36 +128,6 @@ rw_enter_write(struct rwlock *rwl) } } -void -rw_exit_read(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_rdlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, 0); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, owner - RWLOCK_READ_INCR))) - rw_do_exit(rwl, 0); -} - -void -rw_exit_write(struct rwlock *rwl) -{ - unsigned long owner; - - rw_assert_wrlock(rwl); - WITNESS_UNLOCK(&rwl->rwl_lock_obj, LOP_EXCLUSIVE); - - membar_exit_before_atomic(); - owner = rwl->rwl_owner; - if (__predict_false((owner & RWLOCK_WAIT) || - rw_cas(&rwl->rwl_owner, owner, 0))) - rw_do_exit(rwl, RWLOCK_WRLOCK); -} - #ifdef DIAGNOSTIC /* * Put the diagnostic functions here to keep the main code free @@ -314,9 +284,10 @@ retry: } void -rw_exit(struct rwlock *rwl) +_rw_exit(struct rwlock *rwl, int locked) { unsigned long wrlock; + unsigned long owner, set; /* Avoid deadlocks after panic or in DDB */ if (panicstr || db_active) @@ -330,15 +301,6 @@ rw_exit(struct rwlock *rwl) WITNESS_UNLOCK(&rwl->rwl_lock_obj, wrlock ? LOP_EXCLUSIVE : 0); membar_exit_before_atomic(); - rw_do_exit(rwl, wrlock); -} - -/* membar_exit_before_atomic() has to precede call of this function. */ -void -rw_do_exit(struct rwlock *rwl, unsigned long wrlock) -{ - unsigned long owner, set; - do { owner = rwl->rwl_owner; if (wrlock) @@ -349,7 +311,13 @@ rw_do_exit(struct rwlock *rwl, unsigned } while (__predict_false(rw_cas(&rwl->rwl_owner, owner, set))); if (owner & RWLOCK_WAIT) - wakeup(rwl); + wakeup_n(rw