Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive

Sasha Levin Mon, 11 May 2026 10:22:03 -0700

On Mon, May 11, 2026 at 06:10:20PM +0200, Michal Hocko wrote:

On Mon 11-05-26 11:55:41, Sasha Levin wrote:

On Mon, May 11, 2026 at 04:25:57PM +0200, Michal Hocko wrote:
> On Mon 11-05-26 09:56:30, Sasha Levin wrote:
> > On Mon, May 11, 2026 at 03:49:24PM +0200, Michal Hocko wrote:
> > > On Mon 11-05-26 09:39:32, Sasha Levin wrote:
> > > > On Mon, May 11, 2026 at 03:07:51PM +0200, Michal Hocko wrote:
> > > > In a similar way to how they would know if a given livepatch is safe to 
apply -
> > > > ideally it would be communicated by the vendor/distro/kernel team.
> > >
> > > You have missed my point. KLP takes an extra steps to make sure patching
> > > a particular function is safe to modify or to put the change into the
> > > effect.
> >
> > Safety checks like making sure the patched function is on the stack, or did 
you
> > mean something else?
>
> Yes, exactly what LP infrastructure already provides.


But do we actually need it here?


If not then you can simply systemtap or use BPF to inject the code. In
other words we have several ways how to runtime modify the kernel so
before yet another interface is provided (with a non-trivial amount of
code and very limited functionality) you should really start by
describing why none of the existing one is fitting well.

I do understand your argument that solutions based on loading a module
might have an additional step to deal with (AFAIK you do not need to
build your own kernel to deploy your key) is that a brohibitive
roadblock? We also do have fault injection which is much less convenient
because of all the existing constraines but can those be elevated?

So nothing really against playing with ideas nad LLMs to generated a
quick PoC. That is all good but for this to be considered more seriously
I think we really need to think deeper whether the existing
infrastructure is really not fitting and if not whether it could be
changed to cover more usecase like the one you have mentioned here and I
believe it is something worth thinking about.


Could you describe an existing infrastructure I can use here? Let's look at
this recent "Copy Fail" thing as an example.

I can obviously build my own kernel and enroll my own key, but 99.9% of our
users won't be doing that.

Livepatching, or manually building a module that just injects a kprobe is out
of the question as we previously agreed.

systemtap falls into the same bucket as building my own module.

BPF doesn't help because bpf_override_return() requires the target to be on the
same within_error_injection_list() whitelist as fault injection, and the CVE
targets never are. Some of our fleet doesn't even have BPF enabled either, but
that's the smaller objection.

I can't use fault injection because:

 a. It's almost never built in production/distro kernels, and I suspect this
won't change.
 b. The functions I need are not whitelisted.
 c. Even if (a) and (b) were addressed, fault injection would still need a
securityfs front-end, a cmdline parser, a module-unload notifier, a taint flag,
and audit on engage and disengage. By the time those land in fail_function and
tie into/refactor the fault injection code, the net diff is bigger than this
proposal.

In my case I can remove the module, but not if I run a distro that shipped with
CONFIG_CRYPTO_USER_API_AEAD=y (like RHEL/SUSE).

I can use "initcall_blacklist=" hack and reboot, but as things stand today,
I'll need to be rebooting few times a day.

Even if I'm okay with rebooting that often (and I really really would prefer
not to), this doesn't solve the issues of a larger fleet of servers that can't
just reboot that often.

What am I missing?

--
Thanks,
Sasha

Re: [PATCH] killswitch: add per-function short-circuit mitigation primitive

Reply via email to