From: Mukesh R <[email protected]> Sent: Tuesday, January 27, 2026 11:56 AM > To: Stanislav Kinsburskii <[email protected]> > Cc: [email protected]; [email protected]; [email protected]; > [email protected]; [email protected]; [email protected]; > linux- > [email protected] > Subject: Re: [PATCH] mshv: Make MSHV mutually exclusive with KEXEC > > On 1/27/26 09:47, Stanislav Kinsburskii wrote: > > On Mon, Jan 26, 2026 at 05:39:49PM -0800, Mukesh R wrote: > >> On 1/26/26 16:21, Stanislav Kinsburskii wrote: > >>> On Mon, Jan 26, 2026 at 03:07:18PM -0800, Mukesh R wrote: > >>>> On 1/26/26 12:43, Stanislav Kinsburskii wrote: > >>>>> On Mon, Jan 26, 2026 at 12:20:09PM -0800, Mukesh R wrote: > >>>>>> On 1/25/26 14:39, Stanislav Kinsburskii wrote: > >>>>>>> On Fri, Jan 23, 2026 at 04:16:33PM -0800, Mukesh R wrote: > >>>>>>>> On 1/23/26 14:20, Stanislav Kinsburskii wrote: > >>>>>>>>> The MSHV driver deposits kernel-allocated pages to the hypervisor > >>>>>>>>> during > >>>>>>>>> runtime and never withdraws them. This creates a fundamental > >>>>>>>>> incompatibility > >>>>>>>>> with KEXEC, as these deposited pages remain unavailable to the new > >>>>>>>>> kernel > >>>>>>>>> loaded via KEXEC, leading to potential system crashes upon kernel > >>>>>>>>> accessing > >>>>>>>>> hypervisor deposited pages. > >>>>>>>>> > >>>>>>>>> Make MSHV mutually exclusive with KEXEC until proper page lifecycle > >>>>>>>>> management is implemented. > >>>>>>>>> > >>>>>>>>> Signed-off-by: Stanislav Kinsburskii > >>>>>>>>> <[email protected]> > >>>>>>>>> --- > >>>>>>>>> drivers/hv/Kconfig | 1 + > >>>>>>>>> 1 file changed, 1 insertion(+) > >>>>>>>>> > >>>>>>>>> diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig > >>>>>>>>> index 7937ac0cbd0f..cfd4501db0fa 100644 > >>>>>>>>> --- a/drivers/hv/Kconfig > >>>>>>>>> +++ b/drivers/hv/Kconfig > >>>>>>>>> @@ -74,6 +74,7 @@ config MSHV_ROOT > >>>>>>>>> # e.g. When withdrawing memory, the hypervisor gives > >>>>>>>>> back 4k pages in > >>>>>>>>> # no particular order, making it impossible to > >>>>>>>>> reassemble larger pages > >>>>>>>>> depends on PAGE_SIZE_4KB > >>>>>>>>> + depends on !KEXEC > >>>>>>>>> select EVENTFD > >>>>>>>>> select VIRT_XFER_TO_GUEST_WORK > >>>>>>>>> select HMM_MIRROR > >>>>>>>>> > >>>>>>>>> > >>>>>>>> > >>>>>>>> Will this affect CRASH kexec? I see few CONFIG_CRASH_DUMP in kexec.c > >>>>>>>> implying that crash dump might be involved. Or did you test kdump > >>>>>>>> and it was fine? > >>>>>>>> > >>>>>>> > >>>>>>> Yes, it will. Crash kexec depends on normal kexec functionality, so it > >>>>>>> will be affected as well. > >>>>>> > >>>>>> So not sure I understand the reason for this patch. We can just block > >>>>>> kexec if there are any VMs running, right? Doing this would mean any > >>>>>> further developement would be without a ver important and major > >>>>>> feature, > >>>>>> right? > >>>>> > >>>>> This is an option. But until it's implemented and merged, a user mshv > >>>>> driver gets into a situation where kexec is broken in a non-obvious way. > >>>>> The system may crash at any time after kexec, depending on whether the > >>>>> new kernel touches the pages deposited to hypervisor or not. This is a > >>>>> bad user experience. > >>>> > >>>> I understand that. But with this we cannot collect core and debug any > >>>> crashes. I was thinking there would be a quick way to prohibit kexec > >>>> for update via notifier or some other quick hack. Did you already > >>>> explore that and didn't find anything, hence this? > >>>> > >>> > >>> This quick hack you mention isn't quick in the upstream kernel as there > >>> is no hook to interrupt kexec process except the live update one. > >> > >> That's the one we want to interrupt and block right? crash kexec > >> is ok and should be allowed. We can document we don't support kexec > >> for update for now. > >> > >>> I sent an RFC for that one but given todays conversation details is > >>> won't be accepted as is. > >> > >> Are you taking about this? > >> > >> "mshv: Add kexec safety for deposited pages" > >> > > > > Yes. > > > >>> Making mshv mutually exclusive with kexec is the only viable option for > >>> now given time constraints. > >>> It is intended to be replaced with proper page lifecycle management in > >>> the future. > >> > >> Yeah, that could take a long time and imo we cannot just disable KEXEC > >> completely. What we want is just block kexec for updates from some > >> mshv file for now, we an print during boot that kexec for updates is > >> not supported on mshv. Hope that makes sense. > >> > > > > The trade-off here is between disabling kexec support and having the > > kernel crash after kexec in a non-obvious way. This affects both regular > > kexec and crash kexec. > > crash kexec on baremetal is not affected, hence disabling that > doesn't make sense as we can't debug crashes then on bm. > > Let me think and explore a bit, and if I come up with something, I'll > send a patch here. If nothing, then we can do this as last resort. > > Thanks, > -Mukesh
Maybe you've already looked at this, but there's a sysctl parameter kernel.kexec_load_limit_reboot that prevents loading a kexec kernel for reboot if the value is zero. Separately, there is kernel.kexec_load_limit_panic that controls whether a kexec kernel can be loaded for kdump purposes. kernel.kexec_load_limit_reboot defaults to -1, which allows an unlimited number of loading a kexec kernel for reboot. But the value can be set to zero with this kernel boot line parameter: sysctl.kernel.kexec_load_limit_reboot=0 Alternatively, the mshv driver initialization could add code along the lines of process_sysctl_arg() to open /proc/sys/kernel/kexec_load_limit_reboot and write a value of zero. Then there's no dependency on setting the kernel boot line. The downside to either method is that after Linux in the root partition is up-and-running, it is possible to change the sysctl to a non-zero value, and then load a kexec kernel for reboot. So this approach isn't absolute protection against doing a kexec for reboot. But it makes it harder, and until there's a mechanism to reclaim the deposited pages, it might be a viable compromise to allow kdump to still be used. Just a thought .... Michael > > > > It?s a pity we can?t apply a quick hack to disable only regular kexec. > > However, since crash kexec would hit the same issues, until we have a > > proper state transition for deposted pages, the best workaround for now > > is to reset the hypervisor state on every kexec, which needs design, > > work, and testing. > > > > Disabling kexec is the only consistent way to handle this in the > > upstream kernel at the moment. > > > > Thanks, Stanislav
