On Fri, 2025-09-19 at 12:49 +0200, Thomas Weißschuh wrote: > Reactions of the existing reactors are not easily detectable from programs > and also not easily attributable to the triggering task. > This makes it hard to test the monitors themselves programmatically. > > The signal reactors allows applications to validate the correct operations > of monitors either by installing custom signal handlers or by forking a > child and waiting for the expected exit code.
Thanks, this looks like a really nice addition! > For now only SIGBUS is supported, but additional signals can be added. Curious choice of a default signal, is this specific to your use-case? We may want to add some kind of reactors options in the future to allow configuring this, but I'd say it isn't needed for now. > As the reactors are called from tracepoints they need to be able to run in > any context. To avoid taking any of the looks used during signal delivery You probably meant "locks" > from an invalid context, schedule it through task_work. The delivery will > be delayed, for example when then sleep monitor detects an invalid sleep. > > Signed-off-by: Thomas Weißschuh <[email protected]> > --- > kernel/trace/rv/Kconfig | 8 +++ > kernel/trace/rv/Makefile | 1 + > kernel/trace/rv/reactor_signal.c | 114 > +++++++++++++++++++++++++++++++++++++++ > 3 files changed, 123 insertions(+) > > diff --git a/kernel/trace/rv/Kconfig b/kernel/trace/rv/Kconfig > index > 5b4be87ba59d3fa5123a64efa746320c9e8b73b1..dc7b546615a67c811ce94c43bb1db2826cd0 > 0c77 100644 > --- a/kernel/trace/rv/Kconfig > +++ b/kernel/trace/rv/Kconfig > @@ -93,3 +93,11 @@ config RV_REACT_PANIC > help > Enables the panic reactor. The panic reactor emits a printk() > message if an exception is found and panic()s the system. > + > +config RV_REACT_SIGNAL > + bool "Signal reactor" > + depends on RV_REACTORS > + default y > + help > + Enables the signal reactor. The signal reactors sends a signal to > the > + task triggering an exception. This assumption is not always correct, imagine a failure in the sleep monitor caused by the wakeup event. The offending task is not current but the wakee. This is a bit tricky because reactors don't have access to that task, just to keep the same implementation between per-cpu and per-task monitors. We may want to revise that, perhaps like Nam is doing with the monitor_target thing [1]. Besides, I'm assuming this reactor is only meaningful for monitors written to validate userspace tasks (signals and TWA_RESUME are probably meaningless/dangerous for kernel threads). I'm fine with that but you should probably mention it here and/or in the reactor itself, since we have also monitors validating kernel behaviour (see the sched collection). [1] - https://lore.kernel.org/lkml/9a1b5a8c449fcb4f1a671016389c1e4fca49a351.1754900299.git.nam...@linutronix.de > diff --git a/kernel/trace/rv/Makefile b/kernel/trace/rv/Makefile > index > 42ff5d5aa9dde3ed2964e0b8d4ab7b236f498319..adf56bbd03ffa0d48de1f0d86ca5fcce1628 > bdba 100644 > --- a/kernel/trace/rv/Makefile > +++ b/kernel/trace/rv/Makefile > @@ -23,3 +23,4 @@ obj-$(CONFIG_RV_MON_OPID) += monitors/opid/opid.o > obj-$(CONFIG_RV_REACTORS) += rv_reactors.o > obj-$(CONFIG_RV_REACT_PRINTK) += reactor_printk.o > obj-$(CONFIG_RV_REACT_PANIC) += reactor_panic.o > +obj-$(CONFIG_RV_REACT_SIGNAL) += reactor_signal.o > diff --git a/kernel/trace/rv/reactor_signal.c > b/kernel/trace/rv/reactor_signal.c > new file mode 100644 > index > 0000000000000000000000000000000000000000..e123786d94371a240beb63b2d1b2f3044a46 > 6404 > --- /dev/null > +++ b/kernel/trace/rv/reactor_signal.c > @@ -0,0 +1,114 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2025 Thomas Weißschuh, Linutronix GmbH > + * > + * Signal RV reactor: > + * Prints the exception msg to the kernel message log and sends a signal to > the offending task. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > + > +#include <linux/cpumask.h> > +#include <linux/init.h> > +#include <linux/mempool.h> > +#include <linux/module.h> > +#include <linux/printk.h> > +#include <linux/rv.h> > +#include <linux/sched/signal.h> > +#include <linux/task_work.h> > + > +struct rv_signal_work { > + struct callback_head twork; > + int signal; > + char message[256]; > +}; > + > +static mempool_t *rv_signal_task_work_pool; > + > +static void rv_signal_force_sig(int signal, const char *message) > +{ > + /* The message already contains a subsystem prefix, so use raw > printk() */ > + printk(KERN_WARNING "%s", message); > + pr_warn("Killing PID %d with signal %d", task_pid_nr(current), > signal); > + force_sig(signal); > +} > + > +static void rv_signal_task_work(struct callback_head *cbh) > +{ > + struct rv_signal_work *work = container_of_const(cbh, struct > rv_signal_work, twork); > + > + rv_signal_force_sig(work->signal, work->message); > + > + mempool_free(work, rv_signal_task_work_pool); > +} > + > +static void rv_reaction_signal(int signal, const char *fmt, va_list args) > +{ > + struct rv_signal_work *work; > + char message[256]; > + > + work = mempool_alloc_preallocated(rv_signal_task_work_pool); > + if (!work) { > + pr_warn_ratelimited("Unable to signal through task_work, > sending directly\n"); > + vsnprintf(message, sizeof(message), fmt, args); > + rv_signal_force_sig(signal, message); > + return; > + } Why do you use the task_work at all instead of signalling directly? If that's something not safe from a (any) tracepoint because it can sleep you should definitely not call it if allocation fails. > + > + init_task_work(&work->twork, rv_signal_task_work); > + work->signal = signal; > + vsnprintf(work->message, sizeof(work->message), fmt, args); > + > + /* > + * The reactor can be called from any context through tracepoints. > + * To avoid any locking or other operations not usable from all > contexts, use TWA_RESUME. > + * The signal might be delayed, but that shouldn't be an issue. > + */ > + task_work_add(current, &work->twork, TWA_RESUME); > +} > + > +__printf(1, 2) > +static void rv_reaction_sigbus(const char *fmt, ...) > +{ > + va_list args; > + > + va_start(args, fmt); > + rv_reaction_signal(SIGBUS, fmt, args); > + va_end(args); > +} > + > +static struct rv_reactor rv_sigbus = { > + .name = "sigbus", > + .description = "Kill the current task with SIGBUS", > + .react = rv_reaction_sigbus, > +}; Let's be consistent and call this reactor "signal", you can use SIGBUS only in the description until/if we allow different signals. What do you think? Thanks, Gabriele
