On 11/13/2025 10:47 AM, Wei Liu wrote: > On Wed, Nov 12, 2025 at 04:27:05PM +0000, Michael Kelley wrote: >> From: Nuno Das Neves <[email protected]> Sent: Tuesday, >> November 11, 2025 3:20 PM >>> >>> The existing mshv create partition ioctl does not provide a way to >>> specify which cpu features are enabled in the guest. Instead, it >>> attempts to enable all features and those that are not supported are >>> silently disabled by the hypervisor. >>> >>> This was done to reduce unnecessary complexity and is sufficient for >>> many cases. However, new scenarios require fine-grained control over >>> these features. >>> >>> Define a new mshv_create_partition_v2 structure which supports >>> passing the disabled processor and xsave feature bits through to the >>> create partition hypercall directly. >>> >>> Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables >>> the new structure. If unset, the original mshv_create_partition struct >>> is used, with the old behavior of enabling all features. >>> >>> Co-developed-by: Jinank Jain <[email protected]> >>> Signed-off-by: Jinank Jain <[email protected]> >>> Signed-off-by: Muminul Islam <[email protected]> >>> Signed-off-by: Nuno Das Neves <[email protected]> >>> --- >>> Changes in v4: >>> - Change BIT() to BIT_ULL() [Michael Kelley] >>> - Enforce pt_num_cpu_fbanks == MSHV_NUM_CPU_FEATURES_BANKS and expect >>> that number to never change. In future, additional processor banks >>> will be settable as 'early' partition properties. Remove redundant >>> code that set default values for unspecified banks [Michael Kelley] >>> - Set xsave features to 0 on arm64 [Michael Kelley] >>> - Add clarifying comments in a few places >>> >>> Changes in v3: >>> - Remove the new cpu features definitions in hvhdk.h, and retain the >>> old behavior of enabling all features for the old struct. For the v2 >>> struct, still disable unspecified feature banks, since that makes it >>> robust to future extensions. >>> - Amend comments and commit message to reflect the above >>> - Fix unused variable on arm64 [kernel test robot] >>> >>> Changes in v2: >>> - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot] >>> - Fix compilation issue on arm64 [kernel test robot] >>> --- >>> drivers/hv/mshv_root_main.c | 113 +++++++++++++++++++++++++++++------- >>> include/uapi/linux/mshv.h | 34 +++++++++++ >>> 2 files changed, 126 insertions(+), 21 deletions(-) >>> >>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c >>> index d542a0143bb8..9f9438289b60 100644 >>> --- a/drivers/hv/mshv_root_main.c >>> +++ b/drivers/hv/mshv_root_main.c >>> @@ -1900,43 +1900,114 @@ add_partition(struct mshv_partition *partition) >>> return 0; >>> } >>> >>> -static long >>> -mshv_ioctl_create_partition(void __user *user_arg, struct device >>> *module_dev) >>> +static_assert(MSHV_NUM_CPU_FEATURES_BANKS == >>> + HV_PARTITION_PROCESSOR_FEATURES_BANKS); >>> + >>> +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 >>> *pt_flags, >>> + struct hv_partition_creation_properties >>> *cr_props, >>> + union hv_partition_isolation_properties >>> *isol_props) >>> { >>> - struct mshv_create_partition args; >>> - u64 creation_flags; >>> - struct hv_partition_creation_properties creation_properties = {}; >>> - union hv_partition_isolation_properties isolation_properties = {}; >>> - struct mshv_partition *partition; >>> - struct file *file; >>> - int fd; >>> - long ret; >>> + int i; >>> + struct mshv_create_partition_v2 args; >>> + union hv_partition_processor_features *disabled_procs; >>> + union hv_partition_processor_xsave_features *disabled_xsave; >>> >>> - if (copy_from_user(&args, user_arg, sizeof(args))) >>> + /* First, copy v1 struct in case user is on previous versions */ >>> + if (copy_from_user(&args, user_arg, >>> + sizeof(struct mshv_create_partition))) >>> return -EFAULT; >>> >>> if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) || >>> args.pt_isolation >= MSHV_PT_ISOLATION_COUNT) >>> return -EINVAL; >>> >>> + disabled_procs = &cr_props->disabled_processor_features; >>> + disabled_xsave = &cr_props->disabled_processor_xsave_features; >>> + >>> + /* Check if user provided newer struct with feature fields */ >>> + if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) { >>> + if (copy_from_user(&args, user_arg, sizeof(args))) >>> + return -EFAULT; >> >> There's subtle issue here that I didn't notice previously. This second >> copy_from_user() >> re-populates the first two fields of the "args" local variable. These two >> fields were >> validated by code a few lines above. But there's no guarantee that a second >> read of >> user space will get the same values. User space could have another thread >> that >> changes the user space values between the two copy_from_user() calls, and >> thereby >> sneak in some bogus values to be used further down in this function. Because >> of >> this risk, there's a general rule for kernel code, which is to avoid >> multiple accesses to >> the same user space values. There are places in the kernel where such double >> reads >> would be an exploitable security hole. >>
Good catch Michael! It's something I had read about once before long ago but had forgotten. I wonder if there's some kind of automation that could warn about potential issues - i.e. copy_from_user() on the same pointer twice. >> The fix would be to validate the pt_flags and pt_isolation fields again, or >> to have the >> second copy_from_user copy only the additional fields. But it's also the >> case that the >> way the pt_flags and pt_isolation fields are used further down in this >> function, >> nothing bad can happen if malicious user space should succeed in sneaking in >> some >> bogus values. >> >> Net, as currently coded, there's nothing that needs to be fixed. It would be >> more >> robust to do one of the two fixes, if for no other reason than to acknowledge >> awareness of the risk of reading user space twice. But I'm not going to >> insist >> on a respin. > > Nuno, I can commit this patch first. If you can post a diff later I can > squash it in. It might be easier if I just spin a v5 today? I'll send it soon. > > /* Re-validate fields after the second copy_from_user */ > if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) || > args.pt_isolation >= MSHV_PT_ISOLATION_COUNT) > return -EINVAL; > > Perhaps something like this after the second copy_from_user()? > Yes, that sounds fine. I thought about just copying the second part of the struct but re-checking those fields looks like a simpler and less error-prone way to me. Nuno >>> Other than the double read of user space, LGTM. >> >> Reviewed-by: Michael Kelley <[email protected]> > > Thank you for the detailed review! > > Wei
