On Thu, Nov 13, 2025 at 11:11:57AM -0800, Nuno Das Neves wrote:
> On 11/13/2025 10:47 AM, Wei Liu wrote:
> > On Wed, Nov 12, 2025 at 04:27:05PM +0000, Michael Kelley wrote:
> >> From: Nuno Das Neves <[email protected]> Sent: Tuesday, 
> >> November 11, 2025 3:20 PM
> >>>
> >>> The existing mshv create partition ioctl does not provide a way to
> >>> specify which cpu features are enabled in the guest. Instead, it
> >>> attempts to enable all features and those that are not supported are
> >>> silently disabled by the hypervisor.
> >>>
> >>> This was done to reduce unnecessary complexity and is sufficient for
> >>> many cases. However, new scenarios require fine-grained control over
> >>> these features.
> >>>
> >>> Define a new mshv_create_partition_v2 structure which supports
> >>> passing the disabled processor and xsave feature bits through to the
> >>> create partition hypercall directly.
> >>>
> >>> Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables
> >>> the new structure. If unset, the original mshv_create_partition struct
> >>> is used, with the old behavior of enabling all features.
> >>>
> >>> Co-developed-by: Jinank Jain <[email protected]>
> >>> Signed-off-by: Jinank Jain <[email protected]>
> >>> Signed-off-by: Muminul Islam <[email protected]>
> >>> Signed-off-by: Nuno Das Neves <[email protected]>
> >>> ---
> >>> Changes in v4:
> >>> - Change BIT() to BIT_ULL() [Michael Kelley]
> >>> - Enforce pt_num_cpu_fbanks == MSHV_NUM_CPU_FEATURES_BANKS and expect
> >>>   that number to never change. In future, additional processor banks
> >>>   will be settable as 'early' partition properties. Remove redundant
> >>>   code that set default values for unspecified banks [Michael Kelley]
> >>> - Set xsave features to 0 on arm64 [Michael Kelley]
> >>> - Add clarifying comments in a few places
> >>>
> >>> Changes in v3:
> >>> - Remove the new cpu features definitions in hvhdk.h, and retain the
> >>>   old behavior of enabling all features for the old struct. For the v2
> >>>   struct, still disable unspecified feature banks, since that makes it
> >>>   robust to future extensions.
> >>> - Amend comments and commit message to reflect the above
> >>> - Fix unused variable on arm64 [kernel test robot]
> >>>
> >>> Changes in v2:
> >>> - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot]
> >>> - Fix compilation issue on arm64 [kernel test robot]
> >>> ---
> >>>  drivers/hv/mshv_root_main.c | 113 +++++++++++++++++++++++++++++-------
> >>>  include/uapi/linux/mshv.h   |  34 +++++++++++
> >>>  2 files changed, 126 insertions(+), 21 deletions(-)
> >>>
> >>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c
> >>> index d542a0143bb8..9f9438289b60 100644
> >>> --- a/drivers/hv/mshv_root_main.c
> >>> +++ b/drivers/hv/mshv_root_main.c
> >>> @@ -1900,43 +1900,114 @@ add_partition(struct mshv_partition *partition)
> >>>   return 0;
> >>>  }
> >>>
> >>> -static long
> >>> -mshv_ioctl_create_partition(void __user *user_arg, struct device 
> >>> *module_dev)
> >>> +static_assert(MSHV_NUM_CPU_FEATURES_BANKS ==
> >>> +       HV_PARTITION_PROCESSOR_FEATURES_BANKS);
> >>> +
> >>> +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 
> >>> *pt_flags,
> >>> +                                 struct hv_partition_creation_properties 
> >>> *cr_props,
> >>> +                                 union hv_partition_isolation_properties 
> >>> *isol_props)
> >>>  {
> >>> - struct mshv_create_partition args;
> >>> - u64 creation_flags;
> >>> - struct hv_partition_creation_properties creation_properties = {};
> >>> - union hv_partition_isolation_properties isolation_properties = {};
> >>> - struct mshv_partition *partition;
> >>> - struct file *file;
> >>> - int fd;
> >>> - long ret;
> >>> + int i;
> >>> + struct mshv_create_partition_v2 args;
> >>> + union hv_partition_processor_features *disabled_procs;
> >>> + union hv_partition_processor_xsave_features *disabled_xsave;
> >>>
> >>> - if (copy_from_user(&args, user_arg, sizeof(args)))
> >>> + /* First, copy v1 struct in case user is on previous versions */
> >>> + if (copy_from_user(&args, user_arg,
> >>> +                    sizeof(struct mshv_create_partition)))
> >>>           return -EFAULT;
> >>>
> >>>   if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) ||
> >>>       args.pt_isolation >= MSHV_PT_ISOLATION_COUNT)
> >>>           return -EINVAL;
> >>>
> >>> + disabled_procs = &cr_props->disabled_processor_features;
> >>> + disabled_xsave = &cr_props->disabled_processor_xsave_features;
> >>> +
> >>> + /* Check if user provided newer struct with feature fields */
> >>> + if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) {
> >>> +         if (copy_from_user(&args, user_arg, sizeof(args)))
> >>> +                 return -EFAULT;
> >>
> >> There's subtle issue here that I didn't notice previously. This second 
> >> copy_from_user()
> >> re-populates the first two fields of the "args" local variable. These two 
> >> fields were
> >> validated by code a few lines above. But there's no guarantee that a 
> >> second read of
> >> user space will get the same values. User space could have another thread 
> >> that
> >> changes the user space values between the two copy_from_user() calls, and 
> >> thereby
> >> sneak in some bogus values to be used further down in this function. 
> >> Because of
> >> this risk, there's a general rule for kernel code, which is to avoid 
> >> multiple accesses to
> >> the same user space values. There are places in the kernel where such 
> >> double reads
> >> would be an exploitable security hole.
> >>
> 
> Good catch Michael! It's something I had read about once before long ago but 
> had forgotten.
> I wonder if there's some kind of automation that could warn about potential 
> issues - i.e.
> copy_from_user() on the same pointer twice.
> 
> >> The fix would be to validate the pt_flags and pt_isolation fields again, 
> >> or to have the
> >> second copy_from_user copy only the additional fields. But it's also the 
> >> case that the
> >> way the pt_flags and pt_isolation fields are used further down in this 
> >> function,
> >> nothing bad can happen if malicious user space should succeed in sneaking 
> >> in some
> >> bogus values.
> >>
> >> Net, as currently coded, there's nothing that needs to be fixed. It would 
> >> be more
> >> robust to do one of the two fixes, if for no other reason than to 
> >> acknowledge
> >> awareness of the risk of reading user space twice. But I'm not going to 
> >> insist
> >> on a respin.
> > 
> > Nuno, I can commit this patch first. If you can post a diff later I can
> > squash it in.
> 
> It might be easier if I just spin a v5 today? I'll send it soon.

Yes, please resend. Thanks.

Wei

Reply via email to