On Fri, Jan 30, 2026 at 01:24:34AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <[email protected]> Sent:
> Thursday, January 29, 2026 11:10 AM
> >
> > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <[email protected]> Sent:
> > > Wednesday, January 21, 2026 2:36 PM
> >
> > <snip>
> >
> > > > static int __init mshv_root_partition_init(struct device *dev)
> > > > {
> > > > int err;
> > > >
> > > > - err = root_scheduler_init(dev);
> > > > - if (err)
> > > > - return err;
> > > > -
> > > > err = register_reboot_notifier(&mshv_reboot_nb);
> > > > if (err)
> > > > - goto root_sched_deinit;
> > > > + return err;
> > > >
> > > > return 0;
> > >
> > > This code is now:
> > >
> > > if (err)
> > > return err;
> > > return 0;
> > >
> > > which can be simplified to just:
> > >
> > > return err;
> > >
> > > Or drop the local variable 'err' and simplify the entire function to:
> > >
> > > return register_reboot_notifier(&mshv_reboot_nb);
> > >
> > > There's a tangential question here: Why is this reboot notifier
> > > needed in the first place? All it does is remove the cpuhp state
> > > that allocates/frees the per-cpu root_scheduler_input and
> > > root_scheduler_output pages. Removing the state will free
> > > the pages, but if Linux is rebooting, why bother?
> > >
> >
> > This was originally done to support kexec.
> > Here is the original commit message:
> >
> > mshv: perform synic cleanup during kexec
> >
> > Register a reboot notifier that performs synic cleanup when a kexec
> > is in progress.
> >
> > One notable issue this commit fixes is one where after a kexec, virtio
> > devices are not functional. Linux root partition receives MMIO doorbell
> > events in the ring buffer in the SIRB synic page. The hypervisor
> > maintains
> > a head pointer where it writes new events into the ring buffer. The root
> > partition maintains a tail pointer to read events from the buffer.
> >
> > Upon kexec reboot, all root data structures are re-initialized and thus
> > the
> > tail pointer gets reset to zero. The hypervisor on the other hand still
> > retains the pre-kexec head pointer which could be non-zero. This means
> > that
> > when the hypervisor writes new events to the ring buffer, the root
> > partition looks at the wrong place and doesn't find any events. So,
> > future
> > doorbell events never get delivered. As a result, virtqueue kicks never
> > get
> > delivered to the host.
> >
> > When the SIRB page is disabled the hypervisor resets the head pointer.
>
> FWIW, I don't see that commit message anywhere in a public source code
> tree. The calls to register/unregister_reboot_notifier() were in the original
> introduction of mshv_root_main.c in upstream commit 621191d709b14.
> Evidently the code described by that commit message was not submitted
> upstream. And of course, the kexec() topic is now being revisited ....
>
> So to clarify: Do you expect that in the future the reboot notifier will be
> used for something that really is required for resetting hypervisor state
> in the case of a kexec reboot?
>
Yes, for now it's the best we have.
This code can be dropped later if we get a better way to handle kexec.
> >
> > > > -root_sched_deinit:
> > > > - root_scheduler_deinit();
> > > > - return err;
> > > > }
> > > >
> > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > > {
> > > > - /*
> > > > - * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX
> > > > or
> > > > - * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In
> > > > that
> > > > - * case it's valid to proceed as if all vmm_caps are disabled
> > > > (zero).
> > > > - */
> > > > - if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > -
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > - 0, &mshv_root.vmm_caps,
> > > > -
> > > > sizeof(mshv_root.vmm_caps)))
> > > > - dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > + int ret;
> > > > +
> > > > + ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > +
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > + 0, &mshv_root.vmm_caps,
> > > > +
> > > > sizeof(mshv_root.vmm_caps));
> > > > + if (ret) {
> > > > + dev_err(dev, "Failed to get VMM capabilities: %d\n",
> > > > ret);
> > > > + return ret;
> > > > + }
> > >
> > > This is a functional change that isn't mentioned in the commit message.
> > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > as all disabled? Presumably there are older versions of the hypervisor
> > > that
> > > don't support the requirements described in the original comment, but
> > > perhaps they are no longer relevant?
> > >
> >
> > To fail is now the only option for the L1VH partition. It must discover
> > the scheduler type. Without this information, the partition cannot
> > operate. The core scheduler logic will not work with an integrated
> > scheduler, and vice versa.
> >
> > And yes, older hypervisor versions do not support L1VH.
>
> That makes sense. Your change in v2 of the patch handles this
> nicely. For the non-L1VH case, the v2 behavior is the same as before in
> that the init path won't error out on older hypervisors that don't
> support the requirements described in the original comment. That's
> the case I am concerned about.
>
Yes. Thank you for the review and feedback!
Stanislav
> Michael