On Thu, Nov 13, 2025 at 11:11:57AM -0800, Nuno Das Neves wrote: > On 11/13/2025 10:47 AM, Wei Liu wrote: > > On Wed, Nov 12, 2025 at 04:27:05PM +0000, Michael Kelley wrote: > >> From: Nuno Das Neves <[email protected]> Sent: Tuesday, > >> November 11, 2025 3:20 PM > >>> > >>> The existing mshv create partition ioctl does not provide a way to > >>> specify which cpu features are enabled in the guest. Instead, it > >>> attempts to enable all features and those that are not supported are > >>> silently disabled by the hypervisor. > >>> > >>> This was done to reduce unnecessary complexity and is sufficient for > >>> many cases. However, new scenarios require fine-grained control over > >>> these features. > >>> > >>> Define a new mshv_create_partition_v2 structure which supports > >>> passing the disabled processor and xsave feature bits through to the > >>> create partition hypercall directly. > >>> > >>> Introduce a new flag MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES which enables > >>> the new structure. If unset, the original mshv_create_partition struct > >>> is used, with the old behavior of enabling all features. > >>> > >>> Co-developed-by: Jinank Jain <[email protected]> > >>> Signed-off-by: Jinank Jain <[email protected]> > >>> Signed-off-by: Muminul Islam <[email protected]> > >>> Signed-off-by: Nuno Das Neves <[email protected]> > >>> --- > >>> Changes in v4: > >>> - Change BIT() to BIT_ULL() [Michael Kelley] > >>> - Enforce pt_num_cpu_fbanks == MSHV_NUM_CPU_FEATURES_BANKS and expect > >>> that number to never change. In future, additional processor banks > >>> will be settable as 'early' partition properties. Remove redundant > >>> code that set default values for unspecified banks [Michael Kelley] > >>> - Set xsave features to 0 on arm64 [Michael Kelley] > >>> - Add clarifying comments in a few places > >>> > >>> Changes in v3: > >>> - Remove the new cpu features definitions in hvhdk.h, and retain the > >>> old behavior of enabling all features for the old struct. For the v2 > >>> struct, still disable unspecified feature banks, since that makes it > >>> robust to future extensions. > >>> - Amend comments and commit message to reflect the above > >>> - Fix unused variable on arm64 [kernel test robot] > >>> > >>> Changes in v2: > >>> - Fix exposure of CONFIG_X86_64 to uapi [kernel test robot] > >>> - Fix compilation issue on arm64 [kernel test robot] > >>> --- > >>> drivers/hv/mshv_root_main.c | 113 +++++++++++++++++++++++++++++------- > >>> include/uapi/linux/mshv.h | 34 +++++++++++ > >>> 2 files changed, 126 insertions(+), 21 deletions(-) > >>> > >>> diff --git a/drivers/hv/mshv_root_main.c b/drivers/hv/mshv_root_main.c > >>> index d542a0143bb8..9f9438289b60 100644 > >>> --- a/drivers/hv/mshv_root_main.c > >>> +++ b/drivers/hv/mshv_root_main.c > >>> @@ -1900,43 +1900,114 @@ add_partition(struct mshv_partition *partition) > >>> return 0; > >>> } > >>> > >>> -static long > >>> -mshv_ioctl_create_partition(void __user *user_arg, struct device > >>> *module_dev) > >>> +static_assert(MSHV_NUM_CPU_FEATURES_BANKS == > >>> + HV_PARTITION_PROCESSOR_FEATURES_BANKS); > >>> + > >>> +static long mshv_ioctl_process_pt_flags(void __user *user_arg, u64 > >>> *pt_flags, > >>> + struct hv_partition_creation_properties > >>> *cr_props, > >>> + union hv_partition_isolation_properties > >>> *isol_props) > >>> { > >>> - struct mshv_create_partition args; > >>> - u64 creation_flags; > >>> - struct hv_partition_creation_properties creation_properties = {}; > >>> - union hv_partition_isolation_properties isolation_properties = {}; > >>> - struct mshv_partition *partition; > >>> - struct file *file; > >>> - int fd; > >>> - long ret; > >>> + int i; > >>> + struct mshv_create_partition_v2 args; > >>> + union hv_partition_processor_features *disabled_procs; > >>> + union hv_partition_processor_xsave_features *disabled_xsave; > >>> > >>> - if (copy_from_user(&args, user_arg, sizeof(args))) > >>> + /* First, copy v1 struct in case user is on previous versions */ > >>> + if (copy_from_user(&args, user_arg, > >>> + sizeof(struct mshv_create_partition))) > >>> return -EFAULT; > >>> > >>> if ((args.pt_flags & ~MSHV_PT_FLAGS_MASK) || > >>> args.pt_isolation >= MSHV_PT_ISOLATION_COUNT) > >>> return -EINVAL; > >>> > >>> + disabled_procs = &cr_props->disabled_processor_features; > >>> + disabled_xsave = &cr_props->disabled_processor_xsave_features; > >>> + > >>> + /* Check if user provided newer struct with feature fields */ > >>> + if (args.pt_flags & BIT_ULL(MSHV_PT_BIT_CPU_AND_XSAVE_FEATURES)) { > >>> + if (copy_from_user(&args, user_arg, sizeof(args))) > >>> + return -EFAULT; > >> > >> There's subtle issue here that I didn't notice previously. This second > >> copy_from_user() > >> re-populates the first two fields of the "args" local variable. These two > >> fields were > >> validated by code a few lines above. But there's no guarantee that a > >> second read of > >> user space will get the same values. User space could have another thread > >> that > >> changes the user space values between the two copy_from_user() calls, and > >> thereby > >> sneak in some bogus values to be used further down in this function. > >> Because of > >> this risk, there's a general rule for kernel code, which is to avoid > >> multiple accesses to > >> the same user space values. There are places in the kernel where such > >> double reads > >> would be an exploitable security hole. > >> > > Good catch Michael! It's something I had read about once before long ago but > had forgotten. > I wonder if there's some kind of automation that could warn about potential > issues - i.e. > copy_from_user() on the same pointer twice. > > >> The fix would be to validate the pt_flags and pt_isolation fields again, > >> or to have the > >> second copy_from_user copy only the additional fields. But it's also the > >> case that the > >> way the pt_flags and pt_isolation fields are used further down in this > >> function, > >> nothing bad can happen if malicious user space should succeed in sneaking > >> in some > >> bogus values. > >> > >> Net, as currently coded, there's nothing that needs to be fixed. It would > >> be more > >> robust to do one of the two fixes, if for no other reason than to > >> acknowledge > >> awareness of the risk of reading user space twice. But I'm not going to > >> insist > >> on a respin. > > > > Nuno, I can commit this patch first. If you can post a diff later I can > > squash it in. > > It might be easier if I just spin a v5 today? I'll send it soon.
Yes, please resend. Thanks. Wei
