On Fri, Jan 30, 2026 at 01:24:34AM +0000, Michael Kelley wrote:
> From: Stanislav Kinsburskii <[email protected]> Sent: 
> Thursday, January 29, 2026 11:10 AM
> > 
> > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <[email protected]> Sent: 
> > > Wednesday, January 21, 2026 2:36 PM
> > 
> > <snip>
> > 
> > > >  static int __init mshv_root_partition_init(struct device *dev)
> > > >  {
> > > >         int err;
> > > >
> > > > -       err = root_scheduler_init(dev);
> > > > -       if (err)
> > > > -               return err;
> > > > -
> > > >         err = register_reboot_notifier(&mshv_reboot_nb);
> > > >         if (err)
> > > > -               goto root_sched_deinit;
> > > > +               return err;
> > > >
> > > >         return 0;
> > >
> > > This code is now:
> > >
> > >   if (err)
> > >           return err;
> > >   return 0;
> > >
> > > which can be simplified to just:
> > >
> > >   return err;
> > >
> > > Or drop the local variable 'err' and simplify the entire function to:
> > >
> > >   return register_reboot_notifier(&mshv_reboot_nb);
> > >
> > > There's a tangential question here: Why is this reboot notifier
> > > needed in the first place? All it does is remove the cpuhp state
> > > that allocates/frees the per-cpu root_scheduler_input and
> > > root_scheduler_output pages. Removing the state will free
> > > the pages, but if Linux is rebooting, why bother?
> > >
> > 
> > This was originally done to support kexec.
> > Here is the original commit message:
> > 
> >     mshv: perform synic cleanup during kexec
> > 
> >     Register a reboot notifier that performs synic cleanup when a kexec
> >     is in progress.
> > 
> >     One notable issue this commit fixes is one where after a kexec, virtio
> >     devices are not functional. Linux root partition receives MMIO doorbell
> >     events in the ring buffer in the SIRB synic page. The hypervisor 
> > maintains
> >     a head pointer where it writes new events into the ring buffer. The root
> >     partition maintains a tail pointer to read events from the buffer.
> > 
> >     Upon kexec reboot, all root data structures are re-initialized and thus 
> > the
> >     tail pointer gets reset to zero. The hypervisor on the other hand still
> >     retains the pre-kexec head pointer which could be non-zero. This means 
> > that
> >     when the hypervisor writes new events to the ring buffer, the root
> >     partition looks at the wrong place and doesn't find any events. So, 
> > future
> >     doorbell events never get delivered. As a result, virtqueue kicks never 
> > get
> >     delivered to the host.
> > 
> >     When the SIRB page is disabled the hypervisor resets the head pointer.
> 
> FWIW, I don't see that commit message anywhere in a public source code
> tree. The calls to register/unregister_reboot_notifier() were in the original
> introduction of mshv_root_main.c in upstream commit 621191d709b14.
> Evidently the code described by that commit message was not submitted
> upstream. And of course, the kexec() topic is now being revisited ....
> 
> So to clarify: Do you expect that in the future the reboot notifier will be
> used for something that really is required for resetting hypervisor state
> in the case of a kexec reboot?
> 

Yes, for now it's the best we have.
This code can be dropped later if we get a better way to handle kexec.

> > 
> > > > -root_sched_deinit:
> > > > -       root_scheduler_deinit();
> > > > -       return err;
> > > >  }
> > > >
> > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > >  {
> > > > -       /*
> > > > -        * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX 
> > > > or
> > > > -        * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In 
> > > > that
> > > > -        * case it's valid to proceed as if all vmm_caps are disabled 
> > > > (zero).
> > > > -        */
> > > > -       if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > -                                             
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > -                                             0, &mshv_root.vmm_caps,
> > > > -                                             
> > > > sizeof(mshv_root.vmm_caps)))
> > > > -               dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > +       int ret;
> > > > +
> > > > +       ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > +                                               
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > +                                               0, &mshv_root.vmm_caps,
> > > > +                                               
> > > > sizeof(mshv_root.vmm_caps));
> > > > +       if (ret) {
> > > > +               dev_err(dev, "Failed to get VMM capabilities: %d\n", 
> > > > ret);
> > > > +               return ret;
> > > > +       }
> > >
> > > This is a functional change that isn't mentioned in the commit message.
> > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > as all disabled? Presumably there are older versions of the hypervisor 
> > > that
> > > don't support the requirements described in the original comment, but
> > > perhaps they are no longer relevant?
> > >
> > 
> > To fail is now the only option for the L1VH partition. It must discover
> > the scheduler type. Without this information, the partition cannot
> > operate. The core scheduler logic will not work with an integrated
> > scheduler, and vice versa.
> > 
> > And yes, older hypervisor versions do not support L1VH.
> 
> That makes sense. Your change in v2 of the patch handles this
> nicely. For the non-L1VH case, the v2 behavior is the same as before in
> that the init path won't error out on older hypervisors that don't
> support the requirements described in the original comment. That's
> the case I am concerned about.
> 

Yes. Thank you for the review and feedback!

Stanislav
> Michael

Reply via email to