On Wed, Nov 12, 2025 at 04:27:05PM +0000, Michael Kelley wrote:
> From: Nuno Das Neves <[email protected]> Sent: Tuesday, 
> November 11, 2025 3:20 PM
> > 
> > The existing mshv create partition ioctl does not provide a way to
> > specify which cpu features are enabled in the guest. Instead, it
> > attempts to enable all features and those that are not supported are
> > silently disabled by the hypervisor.
> > 
> > This was done to reduce unnecessary complexity and is sufficient for
> > many cases. However, new scenarios require fine-grained control over
> > these features.
> > 
> > Define a new mshv_create_partition_v2 structure which supports
> > passing the disabled processor and xsave feature bits through to the
> > create partition hypercall directly.
> > 
> > Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables
> > the new structure. If unset, the original mshv_create_partition struct
> > is used, with the old behavior of enabling all features.
> > 
> > Co-developed-by: Jinank Jain <[email protected]>
> > Signed-off-by: Jinank Jain <[email protected]>
> > Signed-off-by: Muminul Islam <[email protected]>
> > Signed-off-by: Nuno Das Neves <[email protected]>
> > ---
> > Changes in v4:
> > - Change BIT() to BIT_ULL() [Michael Kelley]
> > - Enforce pt_num_cpu_fbanks == MSHV_NUM_CPU_FEATURES_BANKS and expect
> >   that number to never change. In future, additional processor banks
> >   will be settable as 'early' partition properties. Remove redundant
> >   code that set default values for unspecified banks [Michael Kelley]
> > - Set xsave features to 0 on arm64 [Michael Kelley]
> > - Add clarifying comments in a few places
> > 
> > Changes in v3:
> > - Remove the new cpu features definitions in hvhdk.h, and retain the
> >   old behavior of enabling all features for the old struct. For the v2
> >   struct, still disable unspecified feature banks, since that makes it
> >   robust to future extensions.
> > - Amend comments and commit message to reflect the above
> > - Fix unused variable on arm64 [kernel test robot]
> > 
> > Changes in v2:
> > - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot]
> > - Fix compilation issue on arm64 [kernel test robot]
> > ---
> >  drivers/hv/mshv_root_main.c | 113 +++++++++++++++++++++++++++++-------
> >  include/uapi/linux/mshv.h   |  34 +++++++++++
> >  2 files changed, 126 insertions(+), 21 deletions(-)
> > 
> > diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> > index d542a0143bb8..9f9438289b60 100644
> > --- a/drivers/hv/mshv_root_main.c
> > +++ b/drivers/hv/mshv_root_main.c
> > @@ -1900,43 +1900,114 @@ add_partition(struct mshv_partition *partition)
> >     return 0;
> >  }
> > 
> > -static long
> > -mshv_ioctl_create_partition(void __user *user_arg, struct device 
> > *module_dev)
> > +static_assert(MSHV_NUM_CPU_FEATURES_BANKS ==
> > +         HV_PARTITION_PROCESSOR_FEATURES_BANKS);
> > +
> > +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 
> > *pt_flags,
> > +                                   struct hv_partition_creation_properties 
> > *cr_props,
> > +                                   union hv_partition_isolation_properties 
> > *isol_props)
> >  {
> > -   struct mshv_create_partition args;
> > -   u64 creation_flags;
> > -   struct hv_partition_creation_properties creation_properties = {};
> > -   union hv_partition_isolation_properties isolation_properties = {};
> > -   struct mshv_partition *partition;
> > -   struct file *file;
> > -   int fd;
> > -   long ret;
> > +   int i;
> > +   struct mshv_create_partition_v2 args;
> > +   union hv_partition_processor_features *disabled_procs;
> > +   union hv_partition_processor_xsave_features *disabled_xsave;
> > 
> > -   if (copy_from_user(&args, user_arg, sizeof(args)))
> > +   /* First, copy v1 struct in case user is on previous versions */
> > +   if (copy_from_user(&args, user_arg,
> > +                      sizeof(struct mshv_create_partition)))
> >             return -EFAULT;
> > 
> >     if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) ||
> >         args.pt_isolation >= MSHV_PT_ISOLATION_COUNT)
> >             return -EINVAL;
> > 
> > +   disabled_procs = &cr_props->disabled_processor_features;
> > +   disabled_xsave = &cr_props->disabled_processor_xsave_features;
> > +
> > +   /* Check if user provided newer struct with feature fields */
> > +   if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) {
> > +           if (copy_from_user(&args, user_arg, sizeof(args)))
> > +                   return -EFAULT;
> 
> There's subtle issue here that I didn't notice previously. This second 
> copy_from_user()
> re-populates the first two fields of the "args" local variable. These two 
> fields were
> validated by code a few lines above. But there's no guarantee that a second 
> read of
> user space will get the same values. User space could have another thread that
> changes the user space values between the two copy_from_user() calls, and 
> thereby
> sneak in some bogus values to be used further down in this function. Because 
> of
> this risk, there's a general rule for kernel code, which is to avoid multiple 
> accesses to
> the same user space values. There are places in the kernel where such double 
> reads
> would be an exploitable security hole.
> 
> The fix would be to validate the pt_flags and pt_isolation fields again, or 
> to have the
> second copy_from_user copy only the additional fields. But it's also the case 
> that the
> way the pt_flags and pt_isolation fields are used further down in this 
> function,
> nothing bad can happen if malicious user space should succeed in sneaking in 
> some
> bogus values.
> 
> Net, as currently coded, there's nothing that needs to be fixed. It would be 
> more
> robust to do one of the two fixes, if for no other reason than to acknowledge
> awareness of the risk of reading user space twice. But I'm not going to insist
> on a respin.

Nuno, I can commit this patch first. If you can post a diff later I can
squash it in.

        /* Re-validate fields after the second copy_from_user */
        if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) ||
            args.pt_isolation >= MSHV_PT_ISOLATION_COUNT)
                return -EINVAL;

Perhaps something like this after the second copy_from_user()?

>> Other than the double read of user space, LGTM.
> 
> Reviewed-by: Michael Kelley <[email protected]>

Thank you for the detailed review!

Wei

Reply via email to