On 1/28/26 07:53, Michael Kelley wrote:
From: Mukesh R <[email protected]> Sent: Tuesday, January 27, 2026 
11:56 AM
To: Stanislav Kinsburskii <[email protected]>
Cc: [email protected]; [email protected]; [email protected];
[email protected]; [email protected]; [email protected]; linux-
[email protected]
Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC

On 1/27/26 09:47, Stanislav Kinsburskii wrote:
On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote:
On 1/26/26 16:21, Stanislav Kinsburskii wrote:
On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote:
On 1/26/26 12:43, Stanislav Kinsburskii wrote:
On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote:
On 1/25/26 14:39, Stanislav Kinsburskii wrote:
On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote:
On 1/23/26 14:20, Stanislav Kinsburskii wrote:
The MSHV driver deposits kernel-allocated pages to the hypervisor during
runtime and never withdraws them. This creates a fundamental incompatibility
with KEXEC, as these deposited pages remain unavailable to the new kernel
loaded via KEXEC, leading to potential system crashes upon kernel accessing
hypervisor deposited pages.

Make MSHV mutually exclusive with KEXEC until proper page lifecycle
management is implemented.

Signed-off-by: Stanislav Kinsburskii <[email protected]>
---
       drivers/hv/Kconfig |    1 +
       1 file changed, 1 insertion(+)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index 7937ac0cbd0f..cfd4501db0fa 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -74,6 +74,7 @@ config MSHV_ROOT
        # e.g. When withdrawing memory, the hypervisor gives back 4k pages in
        # no particular order, making it impossible to reassemble larger pages
        depends on PAGE_SIZE_4KB
+       depends on !KEXEC
        select EVENTFD
        select VIRT_XFER_TO_GUEST_WORK
        select HMM_MIRROR



Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c
implying that crash dump might be involved. Or did you test kdump
and it was fine?


Yes, it will. Crash kexec depends on normal kexec functionality, so it
will be affected as well.

So not sure I understand the reason for this patch. We can just block
kexec if there are any VMs running, right? Doing this would mean any
further developement would be without a ver important and major feature,
right?

This is an option. But until it's implemented and merged, a user mshv
driver gets into a situation where kexec is broken in a non-obvious way.
The system may crash at any time after kexec, depending on whether the
new kernel touches the pages deposited to hypervisor or not. This is a
bad user experience.

I understand that. But with this we cannot collect core and debug any
crashes. I was thinking there would be a quick way to prohibit kexec
for update via notifier or some other quick hack. Did you already
explore that and didn't find anything, hence this?


This quick hack you mention isn't quick in the upstream kernel as there
is no hook to interrupt kexec process except the live update one.

That's the one we want to interrupt and block right? crash kexec
is ok and should be allowed. We can document we don't support kexec
for update for now.

I sent an RFC for that one but given todays conversation details is
won't be accepted as is.

Are you taking about this?

          "mshv: Add kexec safety for deposited pages"


Yes.

Making mshv mutually exclusive with kexec is the only viable option for
now given time constraints.
It is intended to be replaced with proper page lifecycle management in
the future.

Yeah, that could take a long time and imo we cannot just disable KEXEC
completely. What we want is just block kexec for updates from some
mshv file for now, we an print during boot that kexec for updates is
not supported on mshv. Hope that makes sense.


The trade-off here is between disabling kexec support and having the
kernel crash after kexec in a non-obvious way. This affects both regular
kexec and crash kexec.

crash kexec on baremetal is not affected, hence disabling that
doesn't make sense as we can't debug crashes then on bm.

Let me think and explore a bit, and if I come up with something, I'll
send a patch here. If nothing, then we can do this as last resort.

Thanks,
-Mukesh

Maybe you've already looked at this, but there's a sysctl parameter
kernel.kexec_load_limit_reboot that prevents loading a kexec
kernel for reboot if the value is zero. Separately, there is
kernel.kexec_load_limit_panic that controls whether a kexec
kernel can be loaded for kdump purposes.

kernel.kexec_load_limit_reboot defaults to -1, which allows an
unlimited number of loading a kexec kernel for reboot. But the value
can be set to zero with this kernel boot line parameter:

sysctl.kernel.kexec_load_limit_reboot=0

Alternatively, the mshv driver initialization could add code along
the lines of process_sysctl_arg() to open
/proc/sys/kernel/kexec_load_limit_reboot and write a value of zero.
Then there's no dependency on setting the kernel boot line.

The downside to either method is that after Linux in the root partition
is up-and-running, it is possible to change the sysctl to a non-zero value,
and then load a kexec kernel for reboot. So this approach isn't absolute
protection against doing a kexec for reboot. But it makes it harder, and
until there's a mechanism to reclaim the deposited pages, it might be
a viable compromise to allow kdump to still be used.

Mmm...eee...weelll... i think i see a much easier way to do this by
just hijacking __kexec_lock. I will resume my normal work tmrw/Fri,
so let me test it out. if it works, will send patch Monday.

Thanks,
-Mukesh



Just a thought ....

Michael



It?s a pity we can?t apply a quick hack to disable only regular kexec.
However, since crash kexec would hit the same issues, until we have a
proper state transition for deposted pages, the best workaround for now
is to reset the hypervisor state on every kexec, which needs design,
work, and testing.

Disabling kexec is the only consistent way to handle this in the
upstream kernel at the moment.

Thanks, Stanislav


Reply via email to