On Fri, Jan 30, 2026 at 05:30:25PM +0000, Anirudh Rayabharam wrote:
> On Thu, Jan 29, 2026 at 11:09:46AM -0800, Stanislav Kinsburskii wrote:
> > On Thu, Jan 29, 2026 at 05:47:02PM +0000, Michael Kelley wrote:
> > > From: Stanislav Kinsburskii <[email protected]> Sent: 
> > > Wednesday, January 21, 2026 2:36 PM
> > > > 
> > > > From: Andreea Pintilie <[email protected]>
> > > > 
> > > > Query the hypervisor for integrated scheduler support and use it if
> > > > configured.
> > > > 
> > > > Microsoft Hypervisor originally provided two schedulers: root and core. 
> > > > The
> > > > root scheduler allows the root partition to schedule guest vCPUs across
> > > > physical cores, supporting both time slicing and CPU affinity (e.g., via
> > > > cgroups). In contrast, the core scheduler delegates 
> > > > vCPU-to-physical-core
> > > > scheduling entirely to the hypervisor.
> > > > 
> > > > Direct virtualization introduces a new privileged guest partition type 
> > > > - L1
> > > > Virtual Host (L1VH) — which can create child partitions from its own
> > > > resources. These child partitions are effectively siblings, scheduled by
> > > > the hypervisor's core scheduler. This prevents the L1VH parent from 
> > > > setting
> > > > affinity or time slicing for its own processes or guest VPs. While 
> > > > cgroups,
> > > > CFS, and cpuset controllers can still be used, their effectiveness is
> > > > unpredictable, as the core scheduler swaps vCPUs according to its own 
> > > > logic
> > > > (typically round-robin across all allocated physical CPUs). As a result,
> > > > the system may appear to "steal" time from the L1VH and its children.
> > > > 
> > > > To address this, Microsoft Hypervisor introduces the integrated 
> > > > scheduler.
> > >   This the s allows an L1VH partition to schedule its own vCPUs and those 
> > > of its
> > > > guests across its "physical" cores, effectively emulating root scheduler
> > > > behavior within the L1VH, while retaining core scheduler behavior for 
> > > > the
> > > > rest of the system.
> > > > 
> > > > The integrated scheduler is controlled by the root partition and gated 
> > > > by
> > > > the vmm_enable_integrated_scheduler capability bit. If set, the 
> > > > hypervisor
> > > > supports the integrated scheduler. The L1VH partition must then check 
> > > > if it
> > > > is enabled by querying the corresponding extended partition property. If
> > > > this property is true, the L1VH partition must use the root scheduler
> > > > logic; otherwise, it must use the core scheduler.
> > > > 
> > > > Signed-off-by: Andreea Pintilie <[email protected]>
> > > > Signed-off-by: Stanislav Kinsburskii <[email protected]>
> > > > ---
> > > >  drivers/hv/mshv_root_main.c |   79 
> > > > +++++++++++++++++++++++++++++--------------
> > > >  include/hyperv/hvhdk_mini.h |    6 +++
> > > >  2 files changed, 58 insertions(+), 27 deletions(-)
> > > > 

 <snip>

> > > > -root_sched_deinit:
> > > > -       root_scheduler_deinit();
> > > > -       return err;
> > > >  }
> > > > 
> > > > -static void mshv_init_vmm_caps(struct device *dev)
> > > > +static int mshv_init_vmm_caps(struct device *dev)
> > > >  {
> > > > -       /*
> > > > -        * This can only fail here if HVCALL_GET_PARTITION_PROPERTY_EX 
> > > > or
> > > > -        * HV_PARTITION_PROPERTY_VMM_CAPABILITIES are not supported. In 
> > > > that
> > > > -        * case it's valid to proceed as if all vmm_caps are disabled 
> > > > (zero).
> > > > -        */
> > > > -       if (hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > -                                             
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > -                                             0, &mshv_root.vmm_caps,
> > > > -                                             
> > > > sizeof(mshv_root.vmm_caps)))
> > > > -               dev_warn(dev, "Unable to get VMM capabilities\n");
> > > > +       int ret;
> > > > +
> > > > +       ret = hv_call_get_partition_property_ex(HV_PARTITION_ID_SELF,
> > > > +                                               
> > > > HV_PARTITION_PROPERTY_VMM_CAPABILITIES,
> > > > +                                               0, &mshv_root.vmm_caps,
> > > > +                                               
> > > > sizeof(mshv_root.vmm_caps));
> > > > +       if (ret) {
> > > > +               dev_err(dev, "Failed to get VMM capabilities: %d\n", 
> > > > ret);
> > > > +               return ret;
> > > > +       }
> > > 
> > > This is a functional change that isn't mentioned in the commit message.
> > > Why is it now appropriate to fail instead of treating the VMM capabilities
> > > as all disabled? Presumably there are older versions of the hypervisor 
> > > that
> > > don't support the requirements described in the original comment, but
> > > perhaps they are no longer relevant?
> > > 
> > 
> > To fail is now the only option for the L1VH partition. It must discover
> > the scheduler type. Without this information, the partition cannot
> > operate. The core scheduler logic will not work with an integrated
> > scheduler, and vice versa.
> 
> I don't think we need to fail here. If we don't find vmm caps, that
> means we are on an older hypervisor that supports l1vh but not
> integrated scheduler (yes, such a version exists). In this case since
> integrated scheduler is not supported by the hypervisor, the core
> scheduler logic will work.
> 

The older hypervisor version won't have the integrated scheduler
capabity bit.
And we can't operate in core schedule mode if the integrated is enabled
underneath us.

Thanks,
Stanislav


> Thanks,
> Anirudh.
> 
> > 
> > And yes, older hypervisor versions do not support L1VH.
> > 
> > Thanks,
> > Stanislav
> > 
> > > > 
> > > >         dev_dbg(dev, "vmm_caps = %#llx\n", 
> > > > mshv_root.vmm_caps.as_uint64[0]);
> > > > +
> > > > +       return 0;
> > > >  }
> > > > 
> > > >  static int __init mshv_parent_partition_init(void)
> > > > @@ -2292,6 +2310,10 @@ static int __init 
> > > > mshv_parent_partition_init(void)
> > > > 
> > > >         mshv_cpuhp_online = ret;
> > > > 
> > > > +       ret = mshv_init_vmm_caps(dev);
> > > > +       if (ret)
> > > > +               goto remove_cpu_state;
> > > > +
> > > >         ret = mshv_retrieve_scheduler_type(dev);
> > > >         if (ret)
> > > >                 goto remove_cpu_state;
> > > > @@ -2301,11 +2323,13 @@ static int __init 
> > > > mshv_parent_partition_init(void)
> > > >         if (ret)
> > > >                 goto remove_cpu_state;
> > > > 
> > > > -       mshv_init_vmm_caps(dev);
> > > > +       ret = root_scheduler_init(dev);
> > > > +       if (ret)
> > > > +               goto exit_partition;
> > > > 
> > > >         ret = mshv_irqfd_wq_init();
> > > >         if (ret)
> > > > -               goto exit_partition;
> > > > +               goto deinit_root_scheduler;
> > > > 
> > > >         spin_lock_init(&mshv_root.pt_ht_lock);
> > > >         hash_init(mshv_root.pt_htable);
> > > > @@ -2314,6 +2338,8 @@ static int __init mshv_parent_partition_init(void)
> > > > 
> > > >         return 0;
> > > > 
> > > > +deinit_root_scheduler:
> > > > +       root_scheduler_deinit();
> > > >  exit_partition:
> > > >         if (hv_root_partition())
> > > >                 mshv_root_partition_exit();
> > > > @@ -2332,6 +2358,7 @@ static void __exit 
> > > > mshv_parent_partition_exit(void)
> > > >         mshv_port_table_fini();
> > > >         misc_deregister(&mshv_dev);
> > > >         mshv_irqfd_wq_cleanup();
> > > > +       root_scheduler_deinit();
> > > >         if (hv_root_partition())
> > > >                 mshv_root_partition_exit();
> > > >         cpuhp_remove_state(mshv_cpuhp_online);
> > > > diff --git a/include/hyperv/hvhdk_mini.h b/include/hyperv/hvhdk_mini.h
> > > > index aa03616f965b..0f7178fa88a8 100644
> > > > --- a/include/hyperv/hvhdk_mini.h
> > > > +++ b/include/hyperv/hvhdk_mini.h
> > > > @@ -87,6 +87,9 @@ enum hv_partition_property_code {
> > > >         HV_PARTITION_PROPERTY_PRIVILEGE_FLAGS                   = 
> > > > 0x00010000,
> > > >         HV_PARTITION_PROPERTY_SYNTHETIC_PROC_FEATURES           = 
> > > > 0x00010001,
> > > > 
> > > > +       /* Integrated scheduling properties */
> > > > +       HV_PARTITION_PROPERTY_INTEGRATED_SCHEDULER_ENABLED      = 
> > > > 0x00020005,
> > > > +
> > > >         /* Resource properties */
> > > >         HV_PARTITION_PROPERTY_GPA_PAGE_ACCESS_TRACKING          = 
> > > > 0x00050005,
> > > >         HV_PARTITION_PROPERTY_UNIMPLEMENTED_MSR_ACTION          = 
> > > > 0x00050017,
> > > > @@ -102,7 +105,7 @@ enum hv_partition_property_code {
> > > >  };
> > > > 
> > > >  #define HV_PARTITION_VMM_CAPABILITIES_BANK_COUNT               1
> > > > -#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT  58
> > > > +#define HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT  57
> > > > 
> > > >  struct hv_partition_property_vmm_capabilities {
> > > >         u16 bank_count;
> > > > @@ -120,6 +123,7 @@ struct hv_partition_property_vmm_capabilities {
> > > >  #endif
> > > >                         u64 assignable_synthetic_proc_features: 1;
> > > >                         u64 tag_hv_message_from_child: 1;
> > > > +                       u64 vmm_enable_integrated_scheduler : 1;
> > > >                         u64 reserved0: 
> > > > HV_PARTITION_VMM_CAPABILITIES_RESERVED_BITFIELD_COUNT;
> > > >                 } __packed;
> > > >         };
> > > > 
> > > > 
> > > 

Reply via email to