On 7/15/24 14:40, Mateusz Guzik wrote:
On Mon, Jul 15, 2024 at 8:33 PM Mateusz Guzik <mjgu...@gmail.com> wrote:
On Mon, Jul 15, 2024 at 8:21 PM John Baldwin <j...@freebsd.org> wrote:
On 7/15/24 13:59, Mateusz Guzik wrote:
On Mon, Jul 15, 2024 at 6:22 PM John Baldwin <j...@freebsd.org> wrote:
On 7/11/24 07:07, Mateusz Guzik wrote:
The branch main has been updated by mjg:
URL:
https://cgit.FreeBSD.org/src/commit/?id=87ee63bac69dc49291f55590b8baa57cad6c7d85
commit 87ee63bac69dc49291f55590b8baa57cad6c7d85
Author: Mateusz Guzik <m...@freebsd.org>
AuthorDate: 2024-07-11 00:17:27 +0000
Commit: Mateusz Guzik <m...@freebsd.org>
CommitDate: 2024-07-11 11:06:52 +0000
locks: add a runtime check for missing turnstile
There are sometimes bugs which result in the unlock fast path failing,
which in turns causes a not-helpful crash report when dereferencing a
NULL turnstile. Help debugging such cases by pointing out what happened
along with some debug.
Sponsored by: Rubicon Communications, LLC ("Netgate")
---
sys/kern/kern_mutex.c | 4 +++-
sys/kern/kern_rwlock.c | 16 ++++++++++++----
2 files changed, 15 insertions(+), 5 deletions(-)
diff --git a/sys/kern/kern_mutex.c b/sys/kern/kern_mutex.c
index 90361b23c09a..0fa624cc4bb1 100644
--- a/sys/kern/kern_mutex.c
+++ b/sys/kern/kern_mutex.c
@@ -1053,7 +1053,9 @@ __mtx_unlock_sleep(volatile uintptr_t *c, uintptr_t v)
turnstile_chain_lock(&m->lock_object);
_mtx_release_lock_quick(m);
ts = turnstile_lookup(&m->lock_object);
- MPASS(ts != NULL);
+ if (__predict_false(ts == NULL)) {
+ panic("got NULL turnstile on mutex %p v %zx", m, v);
+ }
Hmm, this is just an expanded KASSERT() but always on rather than conditional
on INVARIANTS?
Do you have examples of the type of bugs that cause this? (Is it unlocking a
freed mutex
or the like?) We generally hide all these types of checks under INVARIANTS
rather than
shipping them in release kernels.
Use-after-free, overflow, underflow, bitflip or what have you all can
fail the fast path.
Once that happens and the kernel crashes with a null pointer deref,
here is a crash at netgate which prodded this:
calltrap() at calltrap+0x8/frame 0xfffffe0106720920
--- trap 0xc, rip = 0xffffffff80d5ab70, rsp = 0xfffffe01067209f0, rbp
= 0xfffffe0106720a00 ---
turnstile_broadcast() at turnstile_broadcast+0x40/frame 0xfffffe0106720a00
__rw_wunlock_hard() at __rw_wunlock_hard+0x9e/frame 0xfffffe0106720a30
nd6_resolve_slow() at nd6_resolve_slow+0x2d7/frame 0xfffffe0106720aa0
nd6_resolve() at nd6_resolve+0x125/frame 0xfffffe0106720b10
ether_output() at ether_output+0x4e7/frame 0xfffffe0106720ba0
ip_output_send() at ip_output_send+0xdc/frame 0xfffffe0106720be0
ip_output() at ip_output+0x1295/frame 0xfffffe0106720ce0
ip_forward() at ip_forward+0x3c2/frame 0xfffffe0106720d90
ip_input() at ip_input+0x705/frame 0xfffffe0106720df0
swi_net() at swi_net+0x138/frame 0xfffffe0106720e60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe0106720ef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe0106720f30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe0106720f30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Neither the register dump nor anything in the backtrace indicate what happened.
Since the kernel is going down anyway, one may as well get some debug from it.
If you don't mind the extra branches for sanity checks, why not just run with
INVARIANTS? That is, what makes these particular assertions different from
other assertions such that they should be on unconditionally? The last line
below
applies to pretty much every other assertion in the tree.
This adds a branch in the slowpath, a spot which should relatively
rarely execute compared to the fast path. On top of that the branch at
hand does not do any extra memory accesses or complex arithmetic.
So no, I don't think I may as well run with INVARIANTS.
How about this: if you strongly about this branch, feel free to revert
the commit, I'm just going to keep the change at Netgate.
But should you do it, make sure to not add avoidable branches in your stuff.
It's more that I think this is an unusual case (unconditional assertions rather
than conditional on INVARIANTS) such that it's probably worth being a bit more
explicit about that in the log with the rationale, etc. That is, I don't think
"runtime check" quite communicates that you are intentionally doing an
unconditional assertion, and a bit more detail about the specific types of
bugs might have been useful in the log as well. I think it's fine if we want
to have some checks that are always on, but it's currently quite rare so
needs a bit more rationale in the log than other changes is all.
--
John Baldwin