RE: [RFC PATCH 11/18] virt/mshv: set up synic pages for intercept messages

2021-03-11 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Thursday, March 
11, 2021 11:38 AM
> 
> On 2/8/2021 11:47 AM, Michael Kelley wrote:
> > From: Nuno Das Neves  Sent: Friday, 
> > November
> 20, 2020 4:31 PM
> >>
> >> Same idea as synic setup in drivers/hv/hv.c:hv_synic_enable_regs()
> >> and hv_synic_disable_regs().
> >> Setting up synic registers in both vmbus driver and mshv would clobber
> >> them, but the vmbus driver will not run in the root partition, so this
> >> is safe.
> >>
> >> Co-developed-by: Lillian Grassin-Drake 
> >> Signed-off-by: Lillian Grassin-Drake 
> >> Signed-off-by: Nuno Das Neves 
> >> ---
> >>  arch/x86/include/asm/hyperv-tlfs.h  |  29 ---
> >>  arch/x86/include/uapi/asm/hyperv-tlfs.h | 264 
> >>  include/asm-generic/hyperv-tlfs.h   |  46 +
> >>  include/linux/mshv.h|   1 +
> >>  include/uapi/asm-generic/hyperv-tlfs.h  |  43 
> >>  virt/mshv/mshv_main.c   |  98 -
> >>  6 files changed, 404 insertions(+), 77 deletions(-)
> >>
> >> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> >> b/arch/x86/include/asm/hyperv-tlfs.h
> >> index 4cd44ae9bffb..c34a6bb4f457 100644
> >> --- a/arch/x86/include/asm/hyperv-tlfs.h
> >> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> >> @@ -267,35 +267,6 @@ struct hv_tsc_emulation_status {
> >>  #define HV_X64_MSR_TSC_REFERENCE_ENABLE   0x0001
> >>  #define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT12
> >>
> >> -
> >> -/* Define hypervisor message types. */
> >> -enum hv_message_type {
> >> -  HVMSG_NONE  = 0x,
> >> -
> >> -  /* Memory access messages. */
> >> -  HVMSG_UNMAPPED_GPA  = 0x8000,
> >> -  HVMSG_GPA_INTERCEPT = 0x8001,
> >> -
> >> -  /* Timer notification messages. */
> >> -  HVMSG_TIMER_EXPIRED = 0x8010,
> >> -
> >> -  /* Error messages. */
> >> -  HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> >> -  HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> >> -  HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> >> -
> >> -  /* Trace buffer complete messages. */
> >> -  HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> >> -
> >> -  /* Platform-specific processor intercept messages. */
> >> -  HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
> >> -  HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> >> -  HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> >> -  HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
> >> -  HVMSG_X64_APIC_EOI  = 0x80010004,
> >> -  HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
> >> -};
> >> -
> >>  struct hv_nested_enlightenments_control {
> >>struct {
> >>__u32 directhypercall:1;
> >> diff --git a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> b/arch/x86/include/uapi/asm/hyperv-
> >> tlfs.h
> >> index 2ff655962738..c6a27053f791 100644
> >> --- a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> >> +++ b/arch/x86/include/uapi/asm/hyperv-tlfs.h
> >> @@ -722,4 +722,268 @@ union hv_register_value {
> >>pending_virtualization_fault_event;
> >>  };
> >>
> >> +/* Define hypervisor message types. */
> >> +enum hv_message_type {
> >> +  HVMSG_NONE  = 0x,
> >> +
> >> +  /* Memory access messages. */
> >> +  HVMSG_UNMAPPED_GPA  = 0x8000,
> >> +  HVMSG_GPA_INTERCEPT = 0x8001,
> >> +
> >> +  /* Timer notification messages. */
> >> +  HVMSG_TIMER_EXPIRED = 0x8010,
> >> +
> >> +  /* Error messages. */
> >> +  HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> >> +  HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> >> +  HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> >> +
> >> +  /* Trace buffer complete messages. */
> >> +  HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> >> +
> >> +  /* Platform-specific processor intercept messages. */
> >> +  HVMSG_X64_IO_PORT_INTERCEPT = 0x8001,
> >> +  HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> >> +  HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> >> +  HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
> >> +  HVMSG_X64_APIC_EOI  = 0x80010004,
> >> +  HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005,
> >> +  HVMSG_X64_IOMMU_PRQ = 0x80010006,
> >> +  HVMSG_X64_HALT  = 0x80010007,
> >> +  HVMSG_X64_INTERRUPTION_DELIVERABLE  = 0x80010008,
> >> +  HVMSG_X64_SIPI_INTERCEPT= 0x80010009,
> >> +};
> >
> > I have a separate patch series that moves this enum to the
> > asm-generic portion of hyperv-tlfs.h because there's not a good way
> > to separate the arch neutral from arch dependent values.
> >
> 
> Ok, but it should also be changed to #define instead of an enum, right?
> I will do that in this patch.
> This requires a couple of changes in other files in drivers/hv
> where this enum is used.

Because of the other uses of the enum in places that don't depend
on exact structure layouts, I left it 

RE: [RFC PATCH 08/18] virt/mshv: map and unmap guest memory

2021-03-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Monday, March 8, 
2021 11:14 AM
> 
> On 2/8/2021 11:45 AM, Michael Kelley wrote:
> > From: Nuno Das Neves  Sent: Friday, 
> > November
> 20, 2020 4:30 PM
> >>

[snip]

> >> @@ -245,16 +249,318 @@ hv_call_delete_partition(u64 partition_id)
> >>return -hv_status_to_errno(status);
> >>  }
> >>
> >> +static int
> >> +hv_call_map_gpa_pages(u64 partition_id,
> >> +u64 gpa_target,
> >> +u64 page_count, u32 flags,
> >> +struct page **pages)
> >> +{
> >> +  struct hv_map_gpa_pages *input_page;
> >> +  int status;
> >> +  int i;
> >> +  struct page **p;
> >> +  u32 completed = 0;
> >> +  u64 hypercall_status;
> >> +  unsigned long remaining = page_count;
> >> +  int rep_count;
> >> +  unsigned long irq_flags;
> >> +  int ret = 0;
> >> +
> >> +  while (remaining) {
> >> +
> >> +  rep_count = min(remaining, HV_MAP_GPA_BATCH_SIZE);
> >> +
> >> +  local_irq_save(irq_flags);
> >> +  input_page = (struct hv_map_gpa_pages *)(*this_cpu_ptr(
> >> +  hyperv_pcpu_input_arg));
> >> +
> >> +  input_page->target_partition_id = partition_id;
> >> +  input_page->target_gpa_base = gpa_target;
> >> +  input_page->map_flags = flags;
> >> +
> >> +  for (i = 0, p = pages; i < rep_count; i++, p++)
> >> +  input_page->source_gpa_page_list[i] =
> >> +  page_to_pfn(*p) & HV_MAP_GPA_MASK;
> >
> > The masking seems a bit weird.  The mask allows for up to 64G page frames,
> > which is 256 Tbytes of total physical memory, which is probably the current
> > Hyper-V limit on memory size (48 bit physical address space, though 52 bit
> > physical address spaces are coming).  So the masking shouldn't ever be doing
> > anything.   And if it was doing something, that probably should be treated 
> > as
> > an error rather than simply dropping the high bits.
> 
> Good point - It looks like the mask isn't needed.
> 
> >
> > Note that this code does not handle the case where PAGE_SIZE !=
> > HV_HYP_PAGE_SIZE.  But maybe we'll never run the root partition with a
> > page size other than 4K.
> >
> 
> For now on x86 it won't happen, but maybe on ARM?
> It shouldn't be hard to support this case, especially since
> PAGE_SIZE >= HV_HYP_PAGE_SIZE. Do you think we need it in this patch set?

No, from my perspective, this case does not need to be handled in 
this patch set.

> 
> >> +  hypercall_status = hv_do_rep_hypercall(
> >> +  HVCALL_MAP_GPA_PAGES, rep_count, 0, input_page, NULL);
> >> +  local_irq_restore(irq_flags);
> >> +
> >> +  status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> >> +  completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> >> +  HV_HYPERCALL_REP_COMP_OFFSET;
> >> +
> >> +  if (status == HV_STATUS_INSUFFICIENT_MEMORY) {
> >> +  ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> +  partition_id, 256);
> >
> > Why adding 256 pages?  I'm just contrasting with other places that add
> > 1 page at a time.  Maybe a comment to explain 
> >
> 
> Empirically determined. I'll add a #define and comment.
> 
> >> +  if (ret)
> >> +  break;
> >> +  } else if (status != HV_STATUS_SUCCESS) {
> >> +  pr_err("%s: completed %llu out of %llu, %s\n",
> >> + __func__,
> >> + page_count - remaining, page_count,
> >> + hv_status_to_string(status));
> >> +  ret = -hv_status_to_errno(status);
> >> +  break;
> >> +  }
> >> +
> >> +  pages += completed;
> >> +  remaining -= completed;
> >> +  gpa_target += completed;
> >> +  }
> >> +
> >> +  if (ret && completed) {
> >
> > Is the above the right test?  Completed could be zero from the most
> > recent iteration, but still could be partially succeeded based on a previous
> > successful iteration.   I think this needs to check whether remaining equals
> > page_count.
> >
> 
> You're right; I'll change it to (ret && remaining < page_count)
> 
> >> +  pr_err("%s: Partially succeeded; mapped regions may be in 
> >> invalid state",
> >> + __func__);
> >> +  ret = -EBADFD;
> >> +  }
> >> +
> >> +  return ret;
> >> +}
> >> +
> >> +static int
> >> +hv_call_unmap_gpa_pages(u64 partition_id,
> >> +  u64 gpa_target,
> >> +  u64 page_count, u32 flags)
> >> +{
> >> +  struct hv_unmap_gpa_pages *input_page;
> >> +  int status;
> >> +  int ret = 0;
> >> +  u32 completed = 0;
> >> +  u64 hypercall_status;
> >> +  unsigned long remaining = page_count;
> >> +  int rep_count;
> >> +  unsigned long irq_flags;
> >> +
> >> +  local_irq_save(irq_flags);
> >> +  input_page = (struct hv_unmap_gpa_pages *)(*this_cpu_ptr(
> >> +  hyperv_pcp

RE: [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls

2021-03-04 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Thursday, March 
4, 2021 3:49 PM
> 
> On 2/8/2021 11:42 AM, Michael Kelley wrote:
> > From: Nuno Das Neves  Sent: Friday, 
> > November
> 20, 2020 4:30 PM
> >>

[snip]

> >> +
> >> +static int
> >> +hv_call_create_partition(
> >> +  u64 flags,
> >> +  struct hv_partition_creation_properties creation_properties,
> >> +  u64 *partition_id)
> >> +{
> >> +  struct hv_create_partition_in *input;
> >> +  struct hv_create_partition_out *output;
> >> +  int status;
> >> +  int ret;
> >> +  unsigned long irq_flags;
> >> +  int i;
> >> +
> >> +  do {
> >> +  local_irq_save(irq_flags);
> >> +  input = (struct hv_create_partition_in *)(*this_cpu_ptr(
> >> +  hyperv_pcpu_input_arg));
> >> +  output = (struct hv_create_partition_out *)(*this_cpu_ptr(
> >> +  hyperv_pcpu_output_arg));
> >> +
> >> +  input->flags = flags;
> >> +  input->proximity_domain_info.as_uint64 = 0;
> >> +  input->compatibility_version = HV_COMPATIBILITY_MANGANESE;
> >> +  for (i = 0; i < HV_PARTITION_PROCESSOR_FEATURE_BANKS; ++i)
> >> +  input->partition_creation_properties
> >> +  .disabled_processor_features.as_uint64[i] = 0;
> >> +  input->partition_creation_properties
> >> +  .disabled_processor_xsave_features.as_uint64 = 0;
> >> +  input->isolation_properties.as_uint64 = 0;
> >> +
> >> +  status = hv_do_hypercall(HVCALL_CREATE_PARTITION,
> >> +   input, output);
> >
> > hv_do_hypercall returns a u64, which should then be masked with
> > HV_HYPERCALL_RESULT_MASK before checking the result.
> >
> 
> Yes, I'll fix this everywhere.
> 
> >> +  if (status != HV_STATUS_INSUFFICIENT_MEMORY) {
> >> +  if (status == HV_STATUS_SUCCESS)
> >> +  *partition_id = output->partition_id;
> >> +  else
> >> +  pr_err("%s: %s\n",
> >> + __func__, hv_status_to_string(status));
> >> +  local_irq_restore(irq_flags);
> >> +  ret = -hv_status_to_errno(status);
> >> +  break;
> >> +  }
> >> +  local_irq_restore(irq_flags);
> >> +  ret = hv_call_deposit_pages(NUMA_NO_NODE,
> >> +  hv_current_partition_id, 1);
> >> +  } while (!ret);
> >> +
> >> +  return ret;
> >> +}
> >> +

I had a separate thread on the linux-hyperv mailing list about the
inconsistency in how we check hypercall status in current upstream
code, and proposed some helper functions to make it easier and
more consistent.  Joe Salisbury has started work on a patch to
provide those helper functions and to start using them in current
upstream code.  You could coordinate with Joe to get the helper
functions as well and use them as discussed in that thread.  Then
later on we won't have to come back and fix up the uses in this
patch series.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [RFC PATCH 16/18] virt/mshv: mmap vp register page

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:31 PM
> 
> Introduce mmap interface for a virtual processor, exposing a page for
> setting and getting common registers while the VP is suspended.
> 
> This provides a more performant and convenient way to get and set these
> registers in the context of a vmm's run-loop.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  Documentation/virt/mshv/api.rst | 11 
>  arch/x86/include/uapi/asm/hyperv-tlfs.h | 74 ++
>  include/asm-generic/hyperv-tlfs.h   | 10 +++
>  include/linux/mshv.h|  1 +
>  include/uapi/asm-generic/hyperv-tlfs.h  |  5 ++
>  include/uapi/linux/mshv.h   | 12 
>  virt/mshv/mshv_main.c   | 82 +
>  7 files changed, 195 insertions(+)
> 
> diff --git a/Documentation/virt/mshv/api.rst b/Documentation/virt/mshv/api.rst
> index 7fd75f248eff..89c276a8778f 100644
> --- a/Documentation/virt/mshv/api.rst
> +++ b/Documentation/virt/mshv/api.rst
> @@ -149,3 +149,14 @@ HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED
>  Get/set various vp state. Currently these can be used to get and set
>  emulated LAPIC state, and xsave data.
> 
> +3.10 mmap(vp)
> +-
> +:Type: vp mmap
> +:Parameters: offset should be HV_VP_MMAP_REGISTERS_OFFSET
> +:Returns: 0 on success
> +
> +Maps a page into userspace that can be used to get and set common registers
> +while the vp is suspended.
> +The page is laid out in struct hv_vp_register_page in asm/hyperv-tlfs.h.
> +

I'm assuming there's no support for the corresponding munmap().
What happens if munmap is called?  Does it just fail and the page remains
mapped?

> +
> diff --git a/arch/x86/include/uapi/asm/hyperv-tlfs.h 
> b/arch/x86/include/uapi/asm/hyperv-
> tlfs.h
> index 78758aedf23e..a241178567ff 100644
> --- a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/uapi/asm/hyperv-tlfs.h
> @@ -1110,4 +1110,78 @@ struct hv_vp_state_data_xsave {
>   union hv_x64_xsave_xfem_register states;
>  };
> 
> +/* Bits for dirty mask of hv_vp_register_page */
> +#define HV_X64_REGISTER_CLASS_GENERAL0
> +#define HV_X64_REGISTER_CLASS_IP 1
> +#define HV_X64_REGISTER_CLASS_XMM2
> +#define HV_X64_REGISTER_CLASS_SEGMENT3
> +#define HV_X64_REGISTER_CLASS_FLAGS  4
> +
> +#define HV_VP_REGISTER_PAGE_VERSION_11u
> +
> +struct hv_vp_register_page {
> + __u16 version;
> + bool isvalid;

Like enum, avoid type "bool" in data structures shared with
Hyper-V.

> + __u8 rsvdz;
> + __u32 dirty;
> + union {
> + struct {
> + __u64 rax;
> + __u64 rcx;
> + __u64 rdx;
> + __u64 rbx;
> + __u64 rsp;
> + __u64 rbp;
> + __u64 rsi;
> + __u64 rdi;
> + __u64 r8;
> + __u64 r9;
> + __u64 r10;
> + __u64 r11;
> + __u64 r12;
> + __u64 r13;
> + __u64 r14;
> + __u64 r15;
> + };
> +
> + __u64 gp_registers[16];
> + };
> + __u64 rip;
> + __u64 rflags;
> + union {
> + struct {
> + struct hv_u128 xmm0;
> + struct hv_u128 xmm1;
> + struct hv_u128 xmm2;
> + struct hv_u128 xmm3;
> + struct hv_u128 xmm4;
> + struct hv_u128 xmm5;
> + };
> +
> + struct hv_u128 xmm_registers[6];
> + };
> + union {
> + struct {
> + struct hv_x64_segment_register es;
> + struct hv_x64_segment_register cs;
> + struct hv_x64_segment_register ss;
> + struct hv_x64_segment_register ds;
> + struct hv_x64_segment_register fs;
> + struct hv_x64_segment_register gs;
> + };
> +
> + struct hv_x64_segment_register segment_registers[6];
> + };
> + /* read only */
> + __u64 cr0;
> + __u64 cr3;
> + __u64 cr4;
> + __u64 cr8;
> + __u64 efer;
> + __u64 dr7;
> + union hv_x64_pending_interruption_register pending_interruption;
> + union hv_x64_interrupt_state_register interrupt_state;
> + __u64 instruction_emulation_hints;
> +};
> +
>  #endif
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 4bc59a0344ce..9eed4b869110 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -837,4 +837,14 @@ struct hv_set_vp_state_in {
>   union hv_input_set_vp_state_data data[];
>  };
> 
> +struct hv_map_vp_state_page_in {
> + u64 partition_id;
> + u32 vp_index;
>

RE: [RFC PATCH 15/18] virt/mshv: get and set vp state ioctls

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:31 PM
> To: linux-hyp...@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org; linux-ker...@vger.kernel.org; 
> Michael Kelley
> ; virem...@linux.microsoft.com; Sunil Muthuswamy
> ; nunodasne...@linux.microsoft.com; 
> wei@kernel.org;
> Lillian Grassin-Drake ; KY Srinivasan
> 
> Subject: [RFC PATCH 15/18] virt/mshv: get and set vp state ioctls
> 
> Introduce ioctls for getting and setting guest vcpu emulated LAPIC
> state, and xsave data.
> 
> Signed-off-by: Nuno Das Neves 
> ---
>  Documentation/virt/mshv/api.rst |   8 +
>  arch/x86/include/uapi/asm/hyperv-tlfs.h |  59 ++
>  include/asm-generic/hyperv-tlfs.h   |  41 
>  include/uapi/asm-generic/hyperv-tlfs.h  |  28 +++
>  include/uapi/linux/mshv.h   |  13 ++
>  virt/mshv/mshv_main.c   | 262 
>  6 files changed, 411 insertions(+)
> 
> diff --git a/Documentation/virt/mshv/api.rst b/Documentation/virt/mshv/api.rst
> index 694f978131f9..7fd75f248eff 100644
> --- a/Documentation/virt/mshv/api.rst
> +++ b/Documentation/virt/mshv/api.rst
> @@ -140,4 +140,12 @@ Assert interrupts in partitions that use Microsoft 
> Hypervisor's
> internal
>  emulated LAPIC. This must be enabled on partition creation with the flag:
>  HV_PARTITION_CREATION_FLAG_LAPIC_ENABLED
> 
> +3.9 MSHV_GET_VP_STATE and MSHV_SET_VP_STATE
> +--
> +:Type: vp ioctl
> +:Parameters: struct mshv_vp_state
> +:Returns: 0 on success
> +
> +Get/set various vp state. Currently these can be used to get and set
> +emulated LAPIC state, and xsave data.
> 
> diff --git a/arch/x86/include/uapi/asm/hyperv-tlfs.h 
> b/arch/x86/include/uapi/asm/hyperv-
> tlfs.h
> index 5478d4943bfc..78758aedf23e 100644
> --- a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/uapi/asm/hyperv-tlfs.h
> @@ -1051,4 +1051,63 @@ union hv_interrupt_control {
>   __u64 as_uint64;
>  };
> 
> +struct hv_local_interrupt_controller_state {
> + __u32 apic_id;
> + __u32 apic_version;
> + __u32 apic_ldr;
> + __u32 apic_dfr;
> + __u32 apic_spurious;
> + __u32 apic_isr[8];
> + __u32 apic_tmr[8];
> + __u32 apic_irr[8];
> + __u32 apic_esr;
> + __u32 apic_icr_high;
> + __u32 apic_icr_low;
> + __u32 apic_lvt_timer;
> + __u32 apic_lvt_thermal;
> + __u32 apic_lvt_perfmon;
> + __u32 apic_lvt_lint0;
> + __u32 apic_lvt_lint1;
> + __u32 apic_lvt_error;
> + __u32 apic_lvt_cmci;
> + __u32 apic_error_status;
> + __u32 apic_initial_count;
> + __u32 apic_counter_value;
> + __u32 apic_divide_configuration;
> + __u32 apic_remote_read;
> +};
> +
> +#define HV_XSAVE_DATA_NO_XMM_REGISTERS 1
> +
> +union hv_x64_xsave_xfem_register {
> + __u64 as_uint64;
> + struct {
> + __u32 low_uint32;
> + __u32 high_uint32;
> + };
> + struct {
> + __u64 legacy_x87: 1;
> + __u64 legacy_sse: 1;
> + __u64 avx: 1;
> + __u64 mpx_bndreg: 1;
> + __u64 mpx_bndcsr: 1;
> + __u64 avx_512_op_mask: 1;
> + __u64 avx_512_zmmhi: 1;
> + __u64 avx_512_zmm16_31: 1;
> + __u64 rsvd8_9: 2;
> + __u64 pasid: 1;
> + __u64 cet_u: 1;
> + __u64 cet_s: 1;
> + __u64 rsvd13_16: 4;
> + __u64 xtile_cfg: 1;
> + __u64 xtile_data: 1;
> + __u64 rsvd19_63: 45;
> + };
> +};
> +
> +struct hv_vp_state_data_xsave {
> + __u64 flags;
> + union hv_x64_xsave_xfem_register states;
> +};
> +
>  #endif
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 2cd46241c545..4bc59a0344ce 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -167,6 +167,9 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_ASSERT_VIRTUAL_INTERRUPT  0x0094
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> +#define HVCALL_MAP_VP_STATE_PAGE 0x00e1
> +#define HVCALL_GET_VP_STATE  0x00e3
> +#define HVCALL_SET_VP_STATE  0x00e4
> 
>  #define HV_FLUSH_ALL_PROCESSORS  BIT(0)
>  #define HV_FLUSH_ALL_VIRTUAL_ADDRESS_SPACES  BIT(1)
> @@ -796,4 +799,42 @@ struct hv_assert_virtual_interrupt {
>   u16 rsvd_z1;
>  };
> 
> +struct hv_vp_state_data {
> + enum hv_get_set_vp_state_type type;
> + u32 rsvd;
> + struct hv_vp_state_data_xsave xsave;
> +
> +};
> +
> +struct hv_get_vp_state_in {
> + u64 partition_id;
> + u32 vp_index;
> + u8 input_vtl;
> + u8 rsvd0;
> + u16 rsvd1;
> + struct hv_vp_state_data state_data;
> + u64 output_data_pfns[];
> +};
> +
> +union hv_get_vp_state_out {
> + struct hv_local_interrupt_controller_state interrupt_controller_state;
> + /*

RE: [RFC PATCH 11/18] virt/mshv: set up synic pages for intercept messages

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:31 PM
> 
> Same idea as synic setup in drivers/hv/hv.c:hv_synic_enable_regs()
> and hv_synic_disable_regs().
> Setting up synic registers in both vmbus driver and mshv would clobber
> them, but the vmbus driver will not run in the root partition, so this
> is safe.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  arch/x86/include/asm/hyperv-tlfs.h  |  29 ---
>  arch/x86/include/uapi/asm/hyperv-tlfs.h | 264 
>  include/asm-generic/hyperv-tlfs.h   |  46 +
>  include/linux/mshv.h|   1 +
>  include/uapi/asm-generic/hyperv-tlfs.h  |  43 
>  virt/mshv/mshv_main.c   |  98 -
>  6 files changed, 404 insertions(+), 77 deletions(-)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index 4cd44ae9bffb..c34a6bb4f457 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -267,35 +267,6 @@ struct hv_tsc_emulation_status {
>  #define HV_X64_MSR_TSC_REFERENCE_ENABLE  0x0001
>  #define HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT   12
> 
> -
> -/* Define hypervisor message types. */
> -enum hv_message_type {
> - HVMSG_NONE  = 0x,
> -
> - /* Memory access messages. */
> - HVMSG_UNMAPPED_GPA  = 0x8000,
> - HVMSG_GPA_INTERCEPT = 0x8001,
> -
> - /* Timer notification messages. */
> - HVMSG_TIMER_EXPIRED = 0x8010,
> -
> - /* Error messages. */
> - HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> - HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> - HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> -
> - /* Trace buffer complete messages. */
> - HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> -
> - /* Platform-specific processor intercept messages. */
> - HVMSG_X64_IOPORT_INTERCEPT  = 0x8001,
> - HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> - HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> - HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
> - HVMSG_X64_APIC_EOI  = 0x80010004,
> - HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005
> -};
> -
>  struct hv_nested_enlightenments_control {
>   struct {
>   __u32 directhypercall:1;
> diff --git a/arch/x86/include/uapi/asm/hyperv-tlfs.h 
> b/arch/x86/include/uapi/asm/hyperv-
> tlfs.h
> index 2ff655962738..c6a27053f791 100644
> --- a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/uapi/asm/hyperv-tlfs.h
> @@ -722,4 +722,268 @@ union hv_register_value {
>   pending_virtualization_fault_event;
>  };
> 
> +/* Define hypervisor message types. */
> +enum hv_message_type {
> + HVMSG_NONE  = 0x,
> +
> + /* Memory access messages. */
> + HVMSG_UNMAPPED_GPA  = 0x8000,
> + HVMSG_GPA_INTERCEPT = 0x8001,
> +
> + /* Timer notification messages. */
> + HVMSG_TIMER_EXPIRED = 0x8010,
> +
> + /* Error messages. */
> + HVMSG_INVALID_VP_REGISTER_VALUE = 0x8020,
> + HVMSG_UNRECOVERABLE_EXCEPTION   = 0x8021,
> + HVMSG_UNSUPPORTED_FEATURE   = 0x8022,
> +
> + /* Trace buffer complete messages. */
> + HVMSG_EVENTLOG_BUFFERCOMPLETE   = 0x8040,
> +
> + /* Platform-specific processor intercept messages. */
> + HVMSG_X64_IO_PORT_INTERCEPT = 0x8001,
> + HVMSG_X64_MSR_INTERCEPT = 0x80010001,
> + HVMSG_X64_CPUID_INTERCEPT   = 0x80010002,
> + HVMSG_X64_EXCEPTION_INTERCEPT   = 0x80010003,
> + HVMSG_X64_APIC_EOI  = 0x80010004,
> + HVMSG_X64_LEGACY_FP_ERROR   = 0x80010005,
> + HVMSG_X64_IOMMU_PRQ = 0x80010006,
> + HVMSG_X64_HALT  = 0x80010007,
> + HVMSG_X64_INTERRUPTION_DELIVERABLE  = 0x80010008,
> + HVMSG_X64_SIPI_INTERCEPT= 0x80010009,
> +};

I have a separate patch series that moves this enum to the
asm-generic portion of hyperv-tlfs.h because there's not a good way
to separate the arch neutral from arch dependent values.

> +
> +
> +union hv_x64_vp_execution_state {
> + __u16 as_uint16;
> + struct {
> + __u16 cpl:2;
> + __u16 cr0_pe:1;
> + __u16 cr0_am:1;
> + __u16 efer_lma:1;
> + __u16 debug_active:1;
> + __u16 interruption_pending:1;
> + __u16 vtl:4;
> + __u16 enclave_mode:1;
> + __u16 interrupt_shadow:1;
> + __u16 virtualization_fault_active:1;
> + __u16 reserved:2;
> + };
> +};
> +
> +/* Values for intercept_access_type field */
> +#define HV_INTERCEPT_ACCESS_READ 

RE: [RFC PATCH 10/18] virt/mshv: get and set vcpu registers ioctls

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> Add ioctls for getting and setting virtual processor registers.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  Documentation/virt/mshv/api.rst |  11 +
>  arch/x86/include/uapi/asm/hyperv-tlfs.h | 601 
>  include/asm-generic/hyperv-tlfs.h   |  65 +--
>  include/linux/mshv.h|   1 +
>  include/uapi/linux/mshv.h   |  12 +
>  virt/mshv/mshv_main.c   | 258 +-
>  6 files changed, 903 insertions(+), 45 deletions(-)
> 
> diff --git a/Documentation/virt/mshv/api.rst b/Documentation/virt/mshv/api.rst
> index f997f49f8690..20a626ac02d4 100644
> --- a/Documentation/virt/mshv/api.rst
> +++ b/Documentation/virt/mshv/api.rst
> @@ -96,3 +96,14 @@ is backed by physical memory.
>  Create a virtual processor in a guest partition, returning a file descriptor 
> to
>  represent the vp and perform ioctls on.
> 
> +3.5 MSHV_GET_VP_REGISTERS and MSHV_SET_VP_REGISTERS
> +---
> +:Type: vp ioctl
> +:Parameters: struct mshv_vp_registers
> +:Returns: 0 on success
> +
> +Get/set vp registers. See asm/hyperv-tlfs.h for the complete set of 
> registers.
> +Includes general purpose platform registers, MSRs, and virtual registers that
> +are part of Microsoft Hypervisor platform and not directly exposed to the 
> guest.
> +
> +
> diff --git a/arch/x86/include/uapi/asm/hyperv-tlfs.h 
> b/arch/x86/include/uapi/asm/hyperv-
> tlfs.h
> index 72150c25ffe6..2ff655962738 100644
> --- a/arch/x86/include/uapi/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/uapi/asm/hyperv-tlfs.h
> @@ -121,4 +121,605 @@ struct hv_partition_creation_properties {
>   disabled_processor_xsave_features;
>  };
> 
> +enum hv_register_name {
> + /* Suspend Registers */
> + HV_REGISTER_EXPLICIT_SUSPEND= 0x,
> + HV_REGISTER_INTERCEPT_SUSPEND   = 0x0001,
> + HV_REGISTER_INSTRUCTION_EMULATION_HINTS = 0x0002,
> + HV_REGISTER_DISPATCH_SUSPEND= 0x0003,
> + HV_REGISTER_INTERNAL_ACTIVITY_STATE = 0x0004,
> +
> + /* Version */
> + HV_REGISTER_HYPERVISOR_VERSION  = 0x0100, /* 128-bit result same as 
> CPUID 0x4002 */
> +
> + /* Feature Access (registers are 128 bits) - same as CPUID 0x4003 - 
> 0x400B */
> + HV_REGISTER_PRIVILEGES_AND_FEATURES_INFO= 0x0200,
> + HV_REGISTER_FEATURES_INFO   = 0x0201,
> + HV_REGISTER_IMPLEMENTATION_LIMITS_INFO  = 0x0202,
> + HV_REGISTER_HARDWARE_FEATURES_INFO  = 0x0203,
> + HV_REGISTER_CPU_MANAGEMENT_FEATURES_INFO= 0x0204,
> + HV_REGISTER_SVM_FEATURES_INFO   = 0x0205,
> + HV_REGISTER_SKIP_LEVEL_FEATURES_INFO= 0x0206,
> + HV_REGISTER_NESTED_VIRT_FEATURES_INFO   = 0x0207,
> + HV_REGISTER_IPT_FEATURES_INFO   = 0x0208,
> +
> + /* Guest Crash Registers */
> + HV_REGISTER_GUEST_CRASH_P0  = 0x0210,
> + HV_REGISTER_GUEST_CRASH_P1  = 0x0211,
> + HV_REGISTER_GUEST_CRASH_P2  = 0x0212,
> + HV_REGISTER_GUEST_CRASH_P3  = 0x0213,
> + HV_REGISTER_GUEST_CRASH_P4  = 0x0214,
> + HV_REGISTER_GUEST_CRASH_CTL = 0x0215,
> +
> + /* Power State Configuration */
> + HV_REGISTER_POWER_STATE_CONFIG_C1   = 0x0220,
> + HV_REGISTER_POWER_STATE_TRIGGER_C1  = 0x0221,
> + HV_REGISTER_POWER_STATE_CONFIG_C2   = 0x0222,
> + HV_REGISTER_POWER_STATE_TRIGGER_C2  = 0x0223,
> + HV_REGISTER_POWER_STATE_CONFIG_C3   = 0x0224,
> + HV_REGISTER_POWER_STATE_TRIGGER_C3  = 0x0225,
> +
> + /* Frequency Registers */
> + HV_REGISTER_PROCESSOR_CLOCK_FREQUENCY   = 0x0240,
> + HV_REGISTER_INTERRUPT_CLOCK_FREQUENCY   = 0x0241,
> +
> + /* Idle Register */
> + HV_REGISTER_GUEST_IDLE  = 0x0250,
> +
> + /* Guest Debug */
> + HV_REGISTER_DEBUG_DEVICE_OPTIONS= 0x0260,
> +
> + /* Memory Zeroing Conrol Register */
> + HV_REGISTER_MEMORY_ZEROING_CONTROL  = 0x0270,
> +
> + /* Pending Event Register */
> + HV_REGISTER_PENDING_EVENT0  = 0x00010004,
> + HV_REGISTER_PENDING_EVENT1  = 0x00010005,
> +
> + /* Misc */
> + HV_REGISTER_VP_RUNTIME  = 0x0009,
> + HV_REGISTER_GUEST_OS_ID = 0x00090002,
> + HV_REGISTER_VP_INDEX= 0x00090003,
> + HV_REGISTER_TIME_REF_COUNT  = 0x00090004,
> + HV_REGISTER_CPU_MANAGEMENT_VERSION  = 0x00090007,
> + HV_REGISTER_VP_ASSIST_PAGE  = 0x00090013,
> + HV_REGISTER_VP_ROOT_SIGNAL_COUNT= 0x00090014,
> + HV_REGISTER_REFERENCE_TSC   = 0x00090017,
> +
> +

RE: [RFC PATCH 08/18] virt/mshv: map and unmap guest memory

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> Introduce ioctls for mapping and unmapping regions of guest memory.
> 
> Uses a table of memory 'slots' similar to KVM, but the slot
> number is not visible to userspace.
> 
> For now, this simple implementation requires each new mapping to be
> disjoint - the underlying hypercalls have no such restriction, and
> implicitly overwrite any mappings on the pages in the specified regions.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  Documentation/virt/mshv/api.rst|  15 ++
>  include/asm-generic/hyperv-tlfs.h  |  15 ++
>  include/linux/mshv.h   |  14 ++
>  include/uapi/asm-generic/hyperv-tlfs.h |   9 +
>  include/uapi/linux/mshv.h  |  15 ++
>  virt/mshv/mshv_main.c  | 322 -
>  6 files changed, 388 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virt/mshv/api.rst b/Documentation/virt/mshv/api.rst
> index ce651a1738e0..530efc29d354 100644
> --- a/Documentation/virt/mshv/api.rst
> +++ b/Documentation/virt/mshv/api.rst
> @@ -72,3 +72,18 @@ it is open - this ioctl can only be called once per open.
>  This ioctl creates a guest partition, returning a file descriptor to use as a
>  handle for partition ioctls.
> 
> +3.3 MSHV_MAP_GUEST_MEMORY and MSHV_UNMAP_GUEST_MEMORY
> +-
> +:Type: partition ioctl
> +:Parameters: struct mshv_user_mem_region
> +:Returns: 0 on success
> +
> +Create a mapping from a region of process memory to a region of physical 
> memory
> +in a guest partition.

Just to be super explicit:

Create a mapping from memory in the user space of the calling process (running
in the root partition) to a region of guest physical memory in a guest 
partition.

> +
> +Mappings must be disjoint in process address space and guest address space.
> +
> +Note: In the current implementation, this memory is pinned to stop the pages
> +being moved by linux and subsequently clobbered by the hypervisor. So the 
> region
> +is backed by physical memory.

Again to be super explicit:

Note: In the current implementation, this memory is pinned to real physical
memory to stop the pages being moved by Linux in the root partition,
and subsequently being clobbered by the hypervisor.  So the region is backed
by real physical memory.

> +
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 2a49503b7396..6e5072e29897 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -149,6 +149,8 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_DEPOSIT_MEMORY0x0048
>  #define HVCALL_WITHDRAW_MEMORY   0x0049
> +#define HVCALL_MAP_GPA_PAGES 0x004b
> +#define HVCALL_UNMAP_GPA_PAGES   0x004c
>  #define HVCALL_CREATE_VP 0x004e
>  #define HVCALL_GET_VP_REGISTERS  0x0050
>  #define HVCALL_SET_VP_REGISTERS  0x0051
> @@ -827,4 +829,17 @@ struct hv_delete_partition {
>   u64 partition_id;
>  };
> 
> +struct hv_map_gpa_pages {
> + u64 target_partition_id;
> + u64 target_gpa_base;
> + u32 map_flags;

Is there a reserved 32 bit field here?  Hyper-V always aligns
things on 64 bit boundaries.

> + u64 source_gpa_page_list[];
> +};
> +
> +struct hv_unmap_gpa_pages {
> + u64 target_partition_id;
> + u64 target_gpa_base;
> + u32 unmap_flags;

Is there a reserved 32 bit field here?  Hyper-V always aligns
things on 64 bit boundaries.

> +};

Add __packed to the above two structs after sorting out
the alignment issues.

> +
>  #endif
> diff --git a/include/linux/mshv.h b/include/linux/mshv.h
> index fc4f35089b2c..91a742f37440 100644
> --- a/include/linux/mshv.h
> +++ b/include/linux/mshv.h
> @@ -7,13 +7,27 @@
>   */
> 
>  #include 
> +#include 
>  #include 
> 
>  #define MSHV_MAX_PARTITIONS  128
> +#define MSHV_MAX_MEM_REGIONS 64
> +
> +struct mshv_mem_region {
> + u64 size; /* bytes */
> + u64 guest_pfn;
> + u64 userspace_addr; /* start of the userspace allocated memory */
> + struct page **pages;
> +};
> 
>  struct mshv_partition {
>   u64 id;
>   refcount_t ref_count;
> + struct mutex mutex;
> + struct {
> + u32 count;
> + struct mshv_mem_region slots[MSHV_MAX_MEM_REGIONS];
> + } regions;
>  };
> 
>  struct mshv {
> diff --git a/include/uapi/asm-generic/hyperv-tlfs.h 
> b/include/uapi/asm-generic/hyperv-
> tlfs.h
> index 7a858226a9c5..e7b09b9f00de 100644
> --- a/include/uapi/asm-generic/hyperv-tlfs.h
> +++ b/include/uapi/asm-generic/hyperv-tlfs.h
> @@ -12,4 +12,13 @@
>  #define HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED  BIT(4)
>  #define HV_PARTITION_CREATION_FLAG_

RE: [RFC PATCH 07/18] virt/mshv: withdraw memory hypercall

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> Withdraw the memory from a finalized partition and free the pages.
> The partition is now cleaned up correctly when the fd is released.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  include/asm-generic/hyperv-tlfs.h | 10 ++
>  virt/mshv/mshv_main.c | 54 ++-
>  2 files changed, 63 insertions(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index ab6ae6c164f5..2a49503b7396 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -148,6 +148,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_DELETE_PARTITION  0x0043
>  #define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_DEPOSIT_MEMORY0x0048
> +#define HVCALL_WITHDRAW_MEMORY   0x0049
>  #define HVCALL_CREATE_VP 0x004e
>  #define HVCALL_GET_VP_REGISTERS  0x0050
>  #define HVCALL_SET_VP_REGISTERS  0x0051
> @@ -472,6 +473,15 @@ union hv_proximity_domain_info {
>   u64 as_uint64;
>  };
> 
> +struct hv_withdraw_memory_in {
> + u64 partition_id;
> + union hv_proximity_domain_info proximity_domain_info;
> +};
> +
> +struct hv_withdraw_memory_out {
> + u64 gpa_page_list[0];

For a variable size array, the Linux kernel community has an effort
underway to replace occurrences of [0] and [1] with just [].  I think
[] can be used here.

> +};
> +

Add __packed to the above two structs.

>  struct hv_lp_startup_status {
>   u64 hv_status;
>   u64 substatus1;
> diff --git a/virt/mshv/mshv_main.c b/virt/mshv/mshv_main.c
> index c4130a6508e5..162a1bb42a4a 100644
> --- a/virt/mshv/mshv_main.c
> +++ b/virt/mshv/mshv_main.c
> @@ -14,6 +14,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -57,8 +58,58 @@ static struct miscdevice mshv_dev = {
>   .mode = 600,
>  };
> 
> +#define HV_WITHDRAW_BATCH_SIZE   (PAGE_SIZE / sizeof(u64))

Use HV_HYP_PAGE_SIZE so that we're explicit that the dependency
is on the page size used by Hyper-V, which might be different from the
guest page size (at least on architectures like ARM64).

>  #define HV_INIT_PARTITION_DEPOSIT_PAGES 208
> 
> +static int
> +hv_call_withdraw_memory(u64 count, int node, u64 partition_id)
> +{
> + struct hv_withdraw_memory_in *input_page;
> + struct hv_withdraw_memory_out *output_page;
> + u16 completed;
> + u64 hypercall_status;
> + unsigned long remaining = count;
> + int status;
> + int i;
> + unsigned long flags;
> +
> + while (remaining) {
> + local_irq_save(flags);
> +
> + input_page = (struct hv_withdraw_memory_in *)(*this_cpu_ptr(
> + hyperv_pcpu_input_arg));
> + output_page = (struct hv_withdraw_memory_out *)(*this_cpu_ptr(
> + hyperv_pcpu_output_arg));
> +
> + input_page->partition_id = partition_id;
> + input_page->proximity_domain_info.as_uint64 = 0;
> + hypercall_status = hv_do_rep_hypercall(
> + HVCALL_WITHDRAW_MEMORY,
> + min(remaining, HV_WITHDRAW_BATCH_SIZE), 0, input_page,
> + output_page);
> +
> + completed = (hypercall_status & HV_HYPERCALL_REP_COMP_MASK) >>
> + HV_HYPERCALL_REP_COMP_OFFSET;
> +
> + for (i = 0; i < completed; i++)
> + __free_page(pfn_to_page(output_page->gpa_page_list[i]));
> +
> + local_irq_restore(flags);

Seems like there's some risk that we have interrupts disabled for too long.
We could be calling __free_page() up to 512 times.  It might be better for this
function to allocate its own page to be used as the output page, so that 
interrupts
can be enabled immediately after the hypercall completes.  Then the 
__free_page()
loop can execute with interrupts enabled.   We have the per-cpu input and output
pages to avoid the overhead of allocating/freeing pages for each hypercall, but 
in this
case a private output page might be warranted.

> +
> + status = hypercall_status & HV_HYPERCALL_RESULT_MASK;
> + if (status != HV_STATUS_SUCCESS) {
> + if (status != HV_STATUS_NO_RESOURCES)
> + pr_err("%s: %s\n", __func__,
> +hv_status_to_string(status));
> + break;
> + }
> +
> + remaining -= completed;
> + }
> +
> + return -hv_status_to_errno(status);
> +}
> +
>  static int
>  hv_call_create_partition(
>   u64 flags,
> @@ -230,7 +281,8 @@ destroy_partition(struct mshv_partition *partition)
> 
>   /* Deallocates and unmaps everything inc

RE: [RFC PATCH 06/18] virt/mshv: create, initialize, finalize, delete partition hypercalls

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> Add hypercalls for fully setting up and mostly tearing down a guest
> partition.
> The teardown operation will generate an error as the deposited
> memory has not been withdrawn.
> This is fixed in the next patch.
> 
> Co-developed-by: Lillian Grassin-Drake 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Nuno Das Neves 
> ---
>  include/asm-generic/hyperv-tlfs.h  |  52 +++-
>  include/uapi/asm-generic/hyperv-tlfs.h |   1 +
>  include/uapi/linux/mshv.h  |   1 +
>  virt/mshv/mshv_main.c  | 169 -
>  4 files changed, 220 insertions(+), 3 deletions(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 2ff580780ce4..ab6ae6c164f5 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -142,6 +142,10 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
>  #define HVCALL_SEND_IPI_EX   0x0015
> +#define HVCALL_CREATE_PARTITION  0x0040
> +#define HVCALL_INITIALIZE_PARTITION  0x0041
> +#define HVCALL_FINALIZE_PARTITION0x0042
> +#define HVCALL_DELETE_PARTITION  0x0043
>  #define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_DEPOSIT_MEMORY0x0048
>  #define HVCALL_CREATE_VP 0x004e
> @@ -451,7 +455,7 @@ struct hv_get_partition_id {
>  struct hv_deposit_memory {
>   u64 partition_id;
>   u64 gpa_page_list[];
> -} __packed;
> +};

Why remove __packed?

> 
>  struct hv_proximity_domain_flags {
>   u32 proximity_preferred : 1;
> @@ -767,4 +771,50 @@ struct hv_input_unmap_device_interrupt {
>  #define HV_SOURCE_SHADOW_NONE   0x0
>  #define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE   0x1
> 
> +#define HV_MAKE_COMPATIBILITY_VERSION(major_, minor_)
>   \
> + ((u32)((major_) << 8 | (minor_)))
> +
> +enum hv_compatibility_version {
> + HV_COMPATIBILITY_19_H1 = HV_MAKE_COMPATIBILITY_VERSION(0X6, 0X5),
> + HV_COMPATIBILITY_MANGANESE = HV_MAKE_COMPATIBILITY_VERSION(0X6, 0X7),

Avoid use of "Manganese", which is an internal code name.  I'd suggest calling 
it
20_H1 instead, which at least has some broader meaning.

> + HV_COMPATIBILITY_PRERELEASE = HV_MAKE_COMPATIBILITY_VERSION(0XFE, 0X0),
> + HV_COMPATIBILITY_EXPERIMENT = HV_MAKE_COMPATIBILITY_VERSION(0XFF, 0X0),
> +};
> +
> +union hv_partition_isolation_properties {
> + u64 as_uint64;
> + struct {
> + u64 isolation_type: 5;
> + u64 rsvd_z: 7;
> + u64 shared_gpa_boundary_page_number: 52;
> + };
> +};

Add __packed.

> +
> +/* Non-userspace-visible partition creation flags */
> +#define HV_PARTITION_CREATION_FLAG_EXO_PARTITIONBIT(8)
> +
> +struct hv_create_partition_in {
> + u64 flags;
> + union hv_proximity_domain_info proximity_domain_info;
> + enum hv_compatibility_version compatibility_version;

An "enum" is a 32 bit value in gcc and I would presume that
Hyper-V is expecting a 64 bit value.  In general, using an enum in a data
structure with exact layout requirements is problematic because the "C"
language doesn't specify how big an enum is.  In such cases, it's better
to use an integer field with an explicit size (like u64) and #defines for
the possible values.

> + struct hv_partition_creation_properties partition_creation_properties;
> + union hv_partition_isolation_properties isolation_properties;
> +};
> +
> +struct hv_create_partition_out {
> + u64 partition_id;
> +};
> +
> +struct hv_initialize_partition {
> + u64 partition_id;
> +};
> +
> +struct hv_finalize_partition {
> + u64 partition_id;
> +};
> +
> +struct hv_delete_partition {
> + u64 partition_id;
> +};

All of the above should have __packed for consistency with the other
Hyper-V data structures.

> +
>  #endif
> diff --git a/include/uapi/asm-generic/hyperv-tlfs.h 
> b/include/uapi/asm-generic/hyperv-
> tlfs.h
> index 140cc0b4f98f..7a858226a9c5 100644
> --- a/include/uapi/asm-generic/hyperv-tlfs.h
> +++ b/include/uapi/asm-generic/hyperv-tlfs.h
> @@ -6,6 +6,7 @@
>  #define BIT(X)   (1ULL << (X))
>  #endif
> 
> +/* Userspace-visible partition creation flags */

Could this comment be included in the earlier patch with the #defines so
that you avoid the trivial change here?

>  #define HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUESTBIT(0)
>  #define HV_PARTITION_CREATION_FLAG_GPA_LARGE_PAGES_DISABLED BIT(3)
>  #define HV_PARTITION_CREATION_FLAG_GPA_SUPER_PAGES_ENABLED  BIT(4)
> diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
> index 3788f8bc5caa..4f8da9a6fde2 100644
> --- a/include/uapi/linux/mshv.h
> +++ b/include/uapi/linux/mshv.h
> @@ -9,6 +9,7 @@
>

RE: [RFC PATCH 04/18] virt/mshv: request version ioctl

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> Reserve ioctl number in userpsace-api/ioctl/ioctl-number.rst
> Introduce MSHV_REQUEST_VERSION ioctl.
> Introduce documentation for /dev/mshv in Documentation/virt/mshv
> 
> Signed-off-by: Nuno Das Neves 
> ---
>  .../userspace-api/ioctl/ioctl-number.rst  |  2 +
>  Documentation/virt/mshv/api.rst   | 62 +++
>  include/linux/mshv.h  | 11 
>  include/uapi/linux/mshv.h | 19 ++
>  virt/mshv/mshv_main.c | 49 +++
>  5 files changed, 143 insertions(+)
>  create mode 100644 Documentation/virt/mshv/api.rst
>  create mode 100644 include/linux/mshv.h
>  create mode 100644 include/uapi/linux/mshv.h
> 
> diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst
> b/Documentation/userspace-api/ioctl/ioctl-number.rst
> index 55a2d9b2ce33..13a4d3ecafca 100644
> --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> @@ -343,6 +343,8 @@ Code  Seq#Include File
>Comments
>  0xB5  00-0F  uapi/linux/rpmsg.h  
>  remotep...@vger.kernel.org>
>  0xB6  alllinux/fpga-dfl.h
>  0xB7  alluapi/linux/remoteproc_cdev.h
>  remotep...@vger.kernel.org>
> +0xB8  alluapi/linux/mshv.h   
> Microsoft Hypervisor root partition APIs
> + 
> 
>  0xC0  00-0F  linux/usb/iowarrior.h
>  0xCA  00-0F  uapi/misc/cxl.h
>  0xCA  10-2F  uapi/misc/ocxl.h
> diff --git a/Documentation/virt/mshv/api.rst b/Documentation/virt/mshv/api.rst
> new file mode 100644
> index ..82e32de48d03
> --- /dev/null
> +++ b/Documentation/virt/mshv/api.rst
> @@ -0,0 +1,62 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +=
> +Microsoft Hypervisor Root Partition API Documentation
> +=
> +
> +1. Overview
> +===
> +
> +This document describes APIs for creating and managing guest virtual machines
> +when running Linux as the root partition on the Microsoft Hypervisor.
> +
> +This API is not yet stable.
> +
> +2. Glossary/Terms
> +=
> +
> +hv
> +--
> +Short for Hyper-V. This name is used in the kernel to describe interfaces to
> +the Microsoft Hypervisor.
> +
> +mshv
> +
> +Short for Microsoft Hypervisor. This is the name of the userland API module
> +described in this document.
> +
> +Partition
> +-
> +A virtual machine running on the Microsoft Hypervisor.
> +
> +Root Partition
> +--
> +The partition that is created and assumes control when the machine boots. The
> +root partition can use mshv APIs to create guest partitions.
> +
> +3. API description
> +==
> +
> +The module is named mshv and can be configured with CONFIG_HYPERV_ROOT_API.
> +
> +Mshv is file descriptor-based, following a similar pattern to KVM.
> +
> +To get a handle to the mshv driver, use open("/dev/mshv").
> +
> +3.1 MSHV_REQUEST_VERSION
> +
> +:Type: /dev/mshv ioctl
> +:Parameters: pointer to a u32
> +:Returns: 0 on success
> +
> +Before issuing any other ioctls, a MSHV_REQUEST_VERSION ioctl must be called 
> to
> +establish the interface version with the kernel module.
> +
> +The caller should pass the MSHV_VERSION as an argument.
> +
> +The kernel module will check which interface versions it supports and return > 0
> +if one of them matches.
> +
> +This /dev/mshv file descriptor will remain 'locked' to that version as long 
> as
> +it is open - this ioctl can only be called once per open.

To clarify the wording:

The caller should pass the requested version as an argument.  If the requested
version is one that the kernel module supports, the ioctl will return 0.  If the
requested version is not supported by the kernel module, the caller may try
the ioctl repeatedly to find a version that the caller supports and that the 
kernel
module supports.   Once a match is found, the /dev/mshv file descriptor is
'locked' to that version as long as it is open; i.e., the ioctl can succeed
only once per open.

> +
> diff --git a/include/linux/mshv.h b/include/linux/mshv.h
> new file mode 100644
> index ..a0982fe2c0b8
> --- /dev/null
> +++ b/include/linux/mshv.h
> @@ -0,0 +1,11 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +#ifndef _LINUX_MSHV_H
> +#define _LINUX_MSHV_H
> +
> +/*
> + * Microsoft Hypervisor root partition driver for /dev/mshv
> + */
> +
> +#include 
> +
> +#endif
> diff --git a/include/uapi/linux/mshv.h b/include/uapi/linux/mshv.h
> new file mode 100644
> index ..dd30fc2f0a80
> --- /dev/null
> +++ b/include/uapi/linux/mshv.h
> @@ -0,0 +1,19

RE: [RFC PATCH 00/18] Microsoft Hypervisor root partition ioctl interface

2021-02-08 Thread Michael Kelley via Virtualization
From: Nuno Das Neves  Sent: Friday, November 
20, 2020 4:30 PM
> 
> This patch series provides a userspace interface for creating and running 
> guest
> virtual machines while running on the Microsoft Hypervisor [0].
> 
> Since managing guest machines can only be done when Linux is the root 
> partition,
> this series depends on the RFC already posted by Wei Liu:
> https://lore.kernel.org/linux-hyperv/20201105165814.29233-1-wei@kernel.org/T/#t
> 
> The first two patches provide some helpers for converting hypervisor status
> codes to linux error codes, and easily printing hypervisor status codes to 
> dmesg
> for debugging.
> 
> Hyper-V related headers asm-generic/hyperv-tlfs.h and x86/asm/hyperv-tlfs.h 
> are
> split into uapi and non-uapi. The uapi versions contain structures used in 
> both
> the ioctl interface and the kernel.
> 
> The mshv API is introduced in virt/mshv/mshv_main.c. As each interface is
> introduced, documentation is added in Documentation/virt/mshv/api.rst.
> The API is file-desciptor based, like KVM. The entry point is /dev/mshv.
> 
> /dev/mshv ioctls:
> MSHV_REQUEST_VERSION
> MSHV_CREATE_PARTITION
> 
> Partition (vm) ioctls:
> MSHV_MAP_GUEST_MEMORY, MSHV_UNMAP_GUEST_MEMORY
> MSHV_INSTALL_INTERCEPT
> MSHV_ASSERT_INTERRUPT
> MSHV_GET_PARTITION_PROPERTY, MSHV_SET_PARTITION_PROPERTY
> MSHV_CREATE_VP
> 
> Vp (vcpu) ioctls:
> MSHV_GET_VP_REGISTERS, MSHV_SET_VP_REGISTERS
> MSHV_RUN_VP
> MSHV_GET_VP_STATE, MSHV_SET_VP_STATE
> mmap() (register page)
> 
> [0] Hyper-V is more well-known, but it really refers to the whole stack
> including the hypervisor and other components that run in Windows kernel
> and userspace.
> 
> Nuno Das Neves (18):
>   x86/hyperv: convert hyperv statuses to linux error codes
>   asm-generic/hyperv: convert hyperv statuses to strings
>   virt/mshv: minimal mshv module (/dev/mshv/)
>   virt/mshv: request version ioctl
>   virt/mshv: create partition ioctl
>   virt/mshv: create, initialize, finalize, delete partition hypercalls
>   virt/mshv: withdraw memory hypercall
>   virt/mshv: map and unmap guest memory
>   virt/mshv: create vcpu ioctl
>   virt/mshv: get and set vcpu registers ioctls
>   virt/mshv: set up synic pages for intercept messages
>   virt/mshv: run vp ioctl and isr
>   virt/mshv: install intercept ioctl
>   virt/mshv: assert interrupt ioctl
>   virt/mshv: get and set vp state ioctls
>   virt/mshv: mmap vp register page
>   virt/mshv: get and set partition property ioctls
>   virt/mshv: Add enlightenment bits to create partition ioctl
> 
>  .../userspace-api/ioctl/ioctl-number.rst  |2 +
>  Documentation/virt/mshv/api.rst   |  173 ++
>  arch/x86/Kconfig  |2 +
>  arch/x86/hyperv/Kconfig   |   22 +
>  arch/x86/hyperv/Makefile  |4 +
>  arch/x86/hyperv/hv_init.c |2 +-
>  arch/x86/hyperv/hv_proc.c |   40 +-
>  arch/x86/include/asm/hyperv-tlfs.h|   44 +-
>  arch/x86/include/asm/mshyperv.h   |1 +
>  arch/x86/include/uapi/asm/hyperv-tlfs.h   | 1312 +++
>  arch/x86/kernel/cpu/mshyperv.c|   16 +
>  include/asm-generic/hyperv-tlfs.h |  324 ++-
>  include/asm-generic/mshyperv.h|3 +
>  include/linux/mshv.h  |   61 +
>  include/uapi/asm-generic/hyperv-tlfs.h|  160 ++
>  include/uapi/linux/mshv.h |  109 +
>  virt/mshv/mshv_main.c | 2054 +
>  17 files changed, 4178 insertions(+), 151 deletions(-)
>  create mode 100644 Documentation/virt/mshv/api.rst
>  create mode 100644 arch/x86/hyperv/Kconfig
>  create mode 100644 arch/x86/include/uapi/asm/hyperv-tlfs.h
>  create mode 100644 include/linux/mshv.h
>  create mode 100644 include/uapi/asm-generic/hyperv-tlfs.h
>  create mode 100644 include/uapi/linux/mshv.h
>  create mode 100644 virt/mshv/mshv_main.c
> 
> --
> 2.25.1

I finally made it through reviewing this patch series.  Nice
work -- to you, and to Lillian as the original author of significant
portions!  There's a lot code, but it is well organized for reviewing
and overall is done well.

I have a three general comments:

1) Historically we have very precisely specified the layout of data
structures that are shared with Hyper-V.  Each field has an explicit
width (i.e., u16, u32, u64, etc.) and we have avoided field types that
lack an explicit width (int, enum, bool, etc.).  These patches make
liberal use of enum types in the Hyper-V data structures, and I saw
one occurrence of bool.  While treating enum and bool as 32 bits
works, I have a concern that such specifications aren't consistent
with the original rigor we tried to use.

Related, there are several places where the proper layout depends
on the compiler inserting padding (and not inserting padding in the
wrong places) to achieve the needed alignment.  In my view, we
should be explicitly a

RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Thursday, February 4, 2021 9:57 AM
> 
> On Thu, Feb 04, 2021 at 05:43:16PM +, Michael Kelley wrote:
> [...]
> > >  remove_cpuhp_state:
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > new file mode 100644
> > > index ..117f17e8c88a
> > > --- /dev/null
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -0,0 +1,362 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +
> > > +/*
> > > + * for Linux to run as the root partition on Microsoft Hypervisor.
> >
> > Nit:  Looks like the initial word "Irqdomain" got dropped from the above
> > comment line.  But don't respin just for this.
> >
> 
> I've added it back. Thanks.
> 
> > > +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> > > + int cpu, int vector, struct hv_interrupt_entry *entry)
> > > +{
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + unsigned long flags;
> > > + u64 status;
> > > + cpumask_t mask = CPU_MASK_NONE;
> > > + int nr_bank, var_size;
> > > +
> > > + local_irq_save(flags);
> > > +
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > +
> > > + intr_desc = &input->interrupt_descriptor;
> > > + memset(input, 0, sizeof(*input));
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->vector_count = 1;
> > > + intr_desc->target.vector = vector;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + cpumask_set_cpu(cpu, &mask);
> > > + intr_desc->target.vp_set.valid_bank_mask = 0;
> > > + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> > > + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> >
> > There's a function get_cpu_mask() that returns a pointer to a cpumask with 
> > only
> > the specified cpu set in the mask.  It returns a const pointer to the 
> > correct entry
> > in a pre-allocated array of all such cpumasks, so it's a lot more efficient 
> > than
> > allocating and initializing a local cpumask instance on the stack.
> >
> 
> That's nice.
> 
> I've got the following diff to fix both issues. If you're happy with the
> changes, can you give your Reviewed-by? That saves a round of posting.
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 0cabc9aece38..fa71db798465 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
> 
>  /*
> - * for Linux to run as the root partition on Microsoft Hypervisor.
> + * Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
>   *
>   * Authors:
>   *  Sunil Muthuswamy 
> @@ -20,7 +20,7 @@ static int hv_map_interrupt(union hv_device_id device_id, 
> bool level,
> struct hv_device_interrupt_descriptor *intr_desc;
> unsigned long flags;
> u64 status;
> -   cpumask_t mask = CPU_MASK_NONE;
> +   const cpumask_t *mask;
> int nr_bank, var_size;
> 
> local_irq_save(flags);
> @@ -41,10 +41,10 @@ static int hv_map_interrupt(union hv_device_id device_id, 
> bool
> level,
> else
> intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> 
> -   cpumask_set_cpu(cpu, &mask);
> +   mask = cpumask_of(cpu);
> intr_desc->target.vp_set.valid_bank_mask = 0;
> intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> -   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);
> +   nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), mask);

Can you just do the following and get rid of the 'mask' local entirely?

nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), cpumask_of(cpu));

Either way,

Reviewed-by: Michael Kelley 

> if (nr_bank < 0) {
> local_irq_restore(flags);
> pr_err("%s: unable to generate VP set\n", __func__);
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 15/16] x86/hyperv: implement an MSI domain for root partition

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:05 AM
> 
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped.  Linux will need to explicitly map and unmap
> interrupts for hardware.
> 
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Use u64 status.
> 2. Use vpset instead of bitmap.
> 3. Factor out hv_map_interrupt
> 4. Address other misc comments.
> 
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.
> v2: This patch is simplified due to upstream changes.
> ---
>  arch/x86/hyperv/Makefile|   2 +-
>  arch/x86/hyperv/hv_init.c   |   9 +
>  arch/x86/hyperv/irqdomain.c | 362 
>  arch/x86/include/asm/mshyperv.h |   2 +
>  4 files changed, 374 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/irqdomain.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 11c5997691f4..894ce899f0cb 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -483,6 +483,15 @@ void __init hyperv_init(void)
> 
>   BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> 
> +#ifdef CONFIG_PCI_MSI
> + /*
> +  * If we're running as root, we want to create our own PCI MSI domain.
> +  * We can't set this in hv_pci_init because that would be too late.
> +  */
> + if (hv_root_partition)
> + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
>   return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index ..117f17e8c88a
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,362 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +/*
> + * for Linux to run as the root partition on Microsoft Hypervisor.

Nit:  Looks like the initial word "Irqdomain" got dropped from the above
comment line.  But don't respin just for this.

> + *
> + * Authors:
> + *  Sunil Muthuswamy 
> + *  Wei Liu 
> + */
> +
> +#include 
> +#include 
> +#include 
> +
> +static int hv_map_interrupt(union hv_device_id device_id, bool level,
> + int cpu, int vector, struct hv_interrupt_entry *entry)
> +{
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + unsigned long flags;
> + u64 status;
> + cpumask_t mask = CPU_MASK_NONE;
> + int nr_bank, var_size;
> +
> + local_irq_save(flags);
> +
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> +
> + intr_desc = &input->interrupt_descriptor;
> + memset(input, 0, sizeof(*input));
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->vector_count = 1;
> + intr_desc->target.vector = vector;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + cpumask_set_cpu(cpu, &mask);
> + intr_desc->target.vp_set.valid_bank_mask = 0;
> + intr_desc->target.vp_set.format = HV_GENERIC_SET_SPARSE_4K;
> + nr_bank = cpumask_to_vpset(&(intr_desc->target.vp_set), &mask);

There's a function get_cpu_mask() that returns a pointer to a cpumask with only
the specified cpu set in the mask.  It returns a const pointer to the correct 
entry
in a pre-allocated array of all such cpumasks, so it's a lot more efficient than
allocating and initializing a local cpumask instance on the stack.

> + if (nr_bank < 0) {
> + local_irq_restore(flags);
> + pr_err("%s: unable to generate VP set\n", __func__);
> + return EINVAL;
> + }
> + intr_desc->target.flags = HV_DEVICE_INTERRUPT_TARGET_PROCESSOR_SET;
> +
> + /*
> +  * var-sized hypercall, var-size starts after vp_mask (thus
> +  * vp_set.format does not count, but vp_set.valid_bank_mask
> +  * does).
> +  */
> + var_size = nr_bank + 1;
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, var_size,
> + input, output);

RE: [PATCH v6 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:05 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Simplify code due to changes in a previous patch.
> ---
>  arch/x86/hyperv/irqdomain.c |  25 +
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 177 +++-
>  3 files changed, 203 insertions(+), 3 deletions(-)
> 

Reviewed-by: Michael Kelley 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:05 AM
> 
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate.  Introduce the data structures for that purpose.
> 
> They will be used in a later patch.
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Add reserved0 as field name.
> ---
>  include/asm-generic/hyperv-tlfs.h | 79 +++
>  1 file changed, 79 insertions(+)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 94c7d77bbf68..ce53c0db28ae 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
>   } element[];
>  } __packed;
> 
> +enum hv_device_type {
> + HV_DEVICE_TYPE_LOGICAL = 0,
> + HV_DEVICE_TYPE_PCI = 1,
> + HV_DEVICE_TYPE_IOAPIC = 2,
> + HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> + u16 as_uint16;
> +
> + struct {
> + u8 function:3;
> + u8 device:5;
> + u8 bus;
> + };
> +} __packed;
> +
> +union hv_pci_bus_range {
> + u16 as_uint16;
> +
> + struct {
> + u8 subordinate_bus;
> + u8 secondary_bus;
> + };
> +} __packed;
> +
> +union hv_device_id {
> + u64 as_uint64;
> +
> + struct {
> + u64 reserved0:62;
> + u64 device_type:2;
> + };
> +
> + /* HV_DEVICE_TYPE_LOGICAL */
> + struct {
> + u64 id:62;
> + u64 device_type:2;
> + } logical;
> +
> + /* HV_DEVICE_TYPE_PCI */
> + struct {
> + union {
> + hv_pci_rid rid;
> + union hv_pci_bdf bdf;
> + };
> +
> + hv_pci_segment segment;
> + union hv_pci_bus_range shadow_bus_range;
> +
> + u16 phantom_function_bits:2;
> + u16 source_shadow:1;
> +
> + u16 rsvdz0:11;
> + u16 device_type:2;
> + } pci;
> +
> + /* HV_DEVICE_TYPE_IOAPIC */
> + struct {
> + u8 ioapic_id;
> + u8 rsvdz0;
> + u16 rsvdz1;
> + u16 rsvdz2;
> +
> + u16 rsvdz3:14;
> + u16 device_type:2;
> + } ioapic;
> +
> + /* HV_DEVICE_TYPE_ACPI */
> + struct {
> + u32 input_mapping_base;
> + u32 input_mapping_count:30;
> + u32 device_type:2;
> + } acpi;
> +} __packed;
> +
>  #endif
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 09/16] x86/hyperv: provide a bunch of helper functions

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:04 AM
> 
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Nuno Das Neves 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Co-Developed-by: Nuno Das Neves 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Address Michael's comments.
> 
> v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.
> 
> v3:
> 1. Add __packed to structures.
> 2. Drop unnecessary exports.
> 
> v2:
> 1. Adapt to hypervisor side changes
> 2. Address Vitaly's comments
> 
> use u64 status
> 
> pages
> 
> major comments
> 
> minor comments
> 
> rely on acpi code
> ---
>  arch/x86/hyperv/Makefile  |   2 +-
>  arch/x86/hyperv/hv_proc.c | 219 ++
>  arch/x86/include/asm/mshyperv.h   |   4 +
>  include/asm-generic/hyperv-tlfs.h |  67 +
>  4 files changed, 291 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/hv_proc.c
> 

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 08/16] ACPI / NUMA: add a stub function for node_to_pxm()

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:04 AM
> 
> There is already a stub function for pxm_to_node but conversion to the
> other direction is missing.
> 
> It will be used by Microsoft Hypervisor code later.
> 
> Signed-off-by: Wei Liu 
> ---
> v6: new
> ---
>  include/acpi/acpi_numa.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/acpi/acpi_numa.h b/include/acpi/acpi_numa.h
> index a4c6ef809e27..40a91ce87e04 100644
> --- a/include/acpi/acpi_numa.h
> +++ b/include/acpi/acpi_numa.h
> @@ -30,6 +30,10 @@ static inline int pxm_to_node(int pxm)
>  {
>   return 0;
>  }
> +static inline int node_to_pxm(int node)
> +{
> + return 0;
> +}
>  #endif   /* CONFIG_ACPI_NUMA */
> 
>  #ifdef CONFIG_ACPI_HMAT
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 06/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:04 AM
> 
> We will need the partition ID for executing some hypercalls later.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v6:
> 1. Use u64 status.
> 
> v3:
> 1. Make hv_get_partition_id static.
> 2. Change code structure a bit.
> ---
>  arch/x86/hyperv/hv_init.c | 26 ++
>  arch/x86/include/asm/mshyperv.h   |  2 ++
>  include/asm-generic/hyperv-tlfs.h |  6 ++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f4cb40e53fe..5b90a7290177 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,9 @@
>  #include 
>  #include 
> 
> +u64 hv_current_partition_id = ~0ull;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> @@ -331,6 +334,24 @@ static struct syscore_ops hv_syscore_ops = {
>   .resume = hv_resume,
>  };
> 
> +static void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + u64 status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
> + if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
> + /* No point in proceeding if this failed */
> + pr_err("Failed to get partition ID: %lld\n", status);
> + BUG();
> + }
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +}
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -426,6 +447,11 @@ void __init hyperv_init(void)
> 
>   register_syscore_ops(&hv_syscore_ops);
> 
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> + hv_get_partition_id();
> +
> + BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> +
>   return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 62d9390f1ddf..67f5d35a73d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
>  extern void  __percpu  **hyperv_pcpu_output_arg;
> 
> +extern u64 hv_current_partition_id;
> +
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>   u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
>  #define HVCALL_SEND_IPI_EX   0x0015
> +#define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_GET_VP_REGISTERS  0x0050
>  #define HVCALL_SET_VP_REGISTERS  0x0051
>  #define HVCALL_POST_MESSAGE  0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
>   u64 gva_list[];
>  } __packed;
> 
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>   u64 as_uint64;
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 05/16] x86/hyperv: allocate output arg pages if required

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:04 AM
> 
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
> 
> Allocate pages for storing results when Linux runs as the root
> partition.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Lillian Grassin-Drake 
> Signed-off-by: Wei Liu 
> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
>  arch/x86/hyperv/hv_init.c   | 35 -
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> 
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> 
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
>   void **input_arg;
>   struct page *pg;
> 
> - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
> hv_root_partition ? 1 : 0);
>   if (unlikely(!pg))
>   return -ENOMEM;
> +
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   *input_arg = page_address(pg);
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(pg + 1);
> + }
> 
>   hv_get_vp_index(msr_vp_index);
> 
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
>   unsigned int new_cpu;
>   unsigned long flags;
>   void **input_arg;
> - void *input_pg = NULL;
> + void *pg;
> 
>   local_irq_save(flags);
>   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - input_pg = *input_arg;
> + pg = *input_arg;
>   *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = NULL;
> + }
> +
>   local_irq_restore(flags);
> - free_page((unsigned long)input_pg);
> +
> + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
> 
>   if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>   wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
> 
>   BUG_ON(hyperv_pcpu_input_arg == NULL);
> 
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
>   /* Allocate percpu VP index */
>   hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>   GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v6 02/16] x86/hyperv: detect if Linux is the root partition

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 7:04 AM
> 
> For now we can use the privilege flag to check. Stash the value to be
> used later.
> 
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
> 
> Signed-off-by: Wei Liu 
> Reviewed-by: Pavel Tatashin 
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 10 ++
>  arch/x86/include/asm/mshyperv.h|  2 ++
>  arch/x86/kernel/cpu/mshyperv.c | 20 
>  3 files changed, 32 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
>  #define HYPERV_CPUID_FEATURES0x4003
>  #define HYPERV_CPUID_ENLIGHTMENT_INFO0x4004
>  #define HYPERV_CPUID_IMPLEMENT_LIMITS0x4005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x4007
>  #define HYPERV_CPUID_NESTED_FEATURES 0x400A
> 
>  #define HYPERV_CPUID_VIRT_STACK_INTERFACE0x4081
> @@ -110,6 +111,15 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED  BIT(14)
> 
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR   BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC  BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31)
> +
>  /*
>   * Virtual processor will never share a physical core with another virtual
>   * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
>   struct hv_guest_mapping_flush_list *flush,
>   u64 start_gfn, u64 end_gfn);
> 
> +extern bool hv_root_partition;
> +
>  #ifdef CONFIG_X86_64
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
>  #include 
>  #include 
> 
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
> 
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
>   pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
>ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> 
> + /*
> +  * Check CPU management privilege.
> +  *
> +  * To mirror what Windows does we should extract CPU management
> +  * features and use the ReservedIdentityBit to detect if Linux is the
> +  * root partition. But that requires negotiating CPU management
> +  * interface (a process to be finalized).
> +  *
> +  * For now, use the privilege flag as the indicator for running as
> +  * root.
> +  */
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {
> + hv_root_partition = true;
> + pr_info("Hyper-V: running as root partition\n");
> + }
> +
>   /*
>* Extract host information.
>*/
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 6:09 AM
> 
> On Wed, Feb 03, 2021 at 02:49:53PM +0100, Arnd Bergmann wrote:
> > On Wed, Feb 3, 2021 at 2:26 PM Wei Liu  wrote:
> > > On Tue, Feb 02, 2021 at 05:02:48PM +, Wei Liu wrote:
> > > > On Tue, Jan 26, 2021 at 01:26:52AM +, Michael Kelley wrote:
> > > > > From: Wei Liu  Sent: Wednesday, January 20, 2021 
> > > > > 4:01 AM
> > > > > > +union hv_device_id {
> > > > > > + u64 as_uint64;
> > > > > > +
> > > > > > + struct {
> > > > > > + u64 :62;
> > > > > > + u64 device_type:2;
> > > > > > + };
> > > > >
> > > > > Are the above 4 lines extraneous junk?
> > > > > If not, a comment would be helpful.  And we
> > > > > would normally label the 62 bit field as
> > > > > "reserved0" or something similar.
> > > > >
> > > >
> > > > No. It is not junk. I got this from a header in tree.
> > > >
> > > > I am inclined to just drop this hunk. If that breaks things, I will use
> > > > "reserved0".
> > > >
> > >
> > > It turns out adding reserved0 is required. Dropping this hunk does not
> > > work.
> >
> > Generally speaking, bitfields are not great for specifying binary 
> > interfaces,
> > as the actual bit order can differ by architecture. The normal way we get
> > around it in the kernel is to use basic integer types and define macros
> > for bit masks. Ideally, each such field should also be marked with a
> > particular endianess as __le64 or __be64, in case this is ever used with
> > an Arm guest running a big-endian kernel.
> 
> Thanks for the information.
> 
> I think we will need to wait until Microsoft Hypervisor clearly defines
> the endianess in its header(s) before we can make changes to the copy in
> Linux.
> 
> >
> > That said, if you do not care about the specific order of the bits, having
> > anonymous bitfields for the reserved members is fine, I don't see a
> > reason to name it as reserved.
> 
> Michael, let me know what you think. I'm not too fussed either way.
> 
> Wei.

I'm OK either way.  In the Hyper-V code we've typically given such
fields a name rather than leave them anonymous, which is why it stuck
out.

Michael

> 
> >
> >   Arnd
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, February 3, 2021 4:47 AM
> 
> On Wed, Jan 27, 2021 at 05:47:08AM +, Michael Kelley wrote:
> > From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> > > Hypervisor when Linux runs as the root partition. Implement an IRQ
> > > domain to handle mapping and unmapping of IO-APIC interrupts.
> > >
> > > Signed-off-by: Wei Liu 
> > > ---
> > >  arch/x86/hyperv/irqdomain.c |  54 ++
> > >  arch/x86/include/asm/mshyperv.h |   4 +
> > >  drivers/iommu/hyperv-iommu.c| 179 +++-
> > >  3 files changed, 233 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> > > index 19637cd60231..8e2b4e478b70 100644
> > > --- a/arch/x86/hyperv/irqdomain.c
> > > +++ b/arch/x86/hyperv/irqdomain.c
> > > @@ -330,3 +330,57 @@ struct irq_domain * __init 
> > > hv_create_pci_msi_domain(void)
> > >  }
> > >
> > >  #endif /* CONFIG_PCI_MSI */
> > > +
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry)
> > > +{
> > > + union hv_device_id device_id;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + return hv_unmap_interrupt(device_id.as_uint64, entry) &
> HV_HYPERCALL_RESULT_MASK;
> >
> > The masking is already done in hv_unmap_interrupt.
> 
> Fixed.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> > > +
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry)
> > > +{
> > > + unsigned long flags;
> > > + struct hv_input_map_device_interrupt *input;
> > > + struct hv_output_map_device_interrupt *output;
> > > + union hv_device_id device_id;
> > > + struct hv_device_interrupt_descriptor *intr_desc;
> > > + u16 status;
> > > +
> > > + device_id.as_uint64 = 0;
> > > + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> > > + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> > > +
> > > + local_irq_save(flags);
> > > + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> > > + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > + memset(input, 0, sizeof(*input));
> > > + intr_desc = &input->interrupt_descriptor;
> > > + input->partition_id = hv_current_partition_id;
> > > + input->device_id = device_id.as_uint64;
> > > + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> > > + intr_desc->target.vector = vector;
> > > + intr_desc->vector_count = 1;
> > > +
> > > + if (level)
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> > > + else
> > > + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> > > +
> > > + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> > > +
> > > + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input,
> output) &
> > > +  HV_HYPERCALL_RESULT_MASK;
> > > + local_irq_restore(flags);
> > > +
> > > + *entry = output->interrupt_entry;
> > > +
> > > + return status;
> >
> > As a cross-check, I was comparing this code against hv_map_msi_interrupt(). 
> >  They are
> > mostly parallel, though some of the assignments are done in a different 
> > order.  It's a nit,
> > but making them as parallel as possible would be nice. :-)
> >
> 
> Indeed. I will see about factoring out a function.

If factoring out a separate helper function is clumsy, just having the parallel 
code
in the two functions be as similar as possible makes it easier to see what's the
same and what's different.

> 
> > Same 64 vCPU comment applies here as well.
> >
> 
> This is changed to use vpset instead. Took me a bit of time to get it
> working because document is a bit lacking.
> 
> >
> > > +}
> > > +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> > > diff --git a/arch/x86/include/asm/mshyperv.h 
> > > b/arch/x86/include/asm/mshyperv.h
> > > index ccc849e25d5e..345d7c6f8c37 100644
> > > --- a/arch/x86/include/asm/mshyperv.h
> > > +++ b/arch/x86/include/asm/mshyperv.h
> > > @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> > > hv_msi_entry *msi_entry,
> > >
> > >  struct irq_domain *hv_create_pci_msi_domain(void);
> > >
> > > +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int 
> > > vector,
> > > + struct hv_interrupt_entry *entry);
> > > +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> > > *entry);
> > > +
> > >  #else /* CONFIG_HYPERV */
> > >  static inline void hyperv_init(void) {}
> > >  static inline void hyperv_setup_mmu_ops(void) {}
> > > diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> > > index b7db6024e65c..6d35e4c303c6 100644
> > > --- a/drivers/iommu/hyperv-iommu.c
> > > +++ b/drivers/iommu/hyperv-iommu.c
> > > @@ -116,30 +116,43 @@ static const struct irq_domain_ops 
> > > hyperv_ir_domain_o

RE: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

2021-02-04 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Tuesday, February 2, 2021 7:04 AM
> 
> On Tue, Jan 26, 2021 at 12:48:37AM +, Michael Kelley wrote:
> > From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > We will need the partition ID for executing some hypercalls later.
> > >
> > > Signed-off-by: Lillian Grassin-Drake 
> > > Co-Developed-by: Sunil Muthuswamy 
> > > Signed-off-by: Wei Liu 
> > > ---
> > > v3:
> > > 1. Make hv_get_partition_id static.
> > > 2. Change code structure a bit.
> > > ---
> > >  arch/x86/hyperv/hv_init.c | 27 +++
> > >  arch/x86/include/asm/mshyperv.h   |  2 ++
> > >  include/asm-generic/hyperv-tlfs.h |  6 ++
> > >  3 files changed, 35 insertions(+)
> > >
> > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > > index 6f4cb40e53fe..fc9941bd8653 100644
> > > --- a/arch/x86/hyperv/hv_init.c
> > > +++ b/arch/x86/hyperv/hv_init.c
> > > @@ -26,6 +26,9 @@
> > >  #include 
> > >  #include 
> > >
> > > +u64 hv_current_partition_id = ~0ull;
> > > +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> > > +
> > >  void *hv_hypercall_pg;
> > >  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> > >
> > > @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
> > >   .resume = hv_resume,
> > >  };
> > >
> > > +static void __init hv_get_partition_id(void)
> > > +{
> > > + struct hv_get_partition_id *output_page;
> > > + u16 status;
> > > + unsigned long flags;
> > > +
> > > + local_irq_save(flags);
> > > + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> > > + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> > > + HV_HYPERCALL_RESULT_MASK;
> > > + if (status != HV_STATUS_SUCCESS) {
> >
> > Across the Hyper-V code in Linux, the way we check the hypercall result
> > is very inconsistent.  IMHO, the and'ing of hv_do_hypercall() with
> > HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically
> > a bit unusual.
> >
> > I'd like to see the hypercall result being stored into a u64 local variable.
> > Then the subsequent test for the status should 'and' the u64 with
> > HV_HYPERCALL_RESULT_MASK to determine the result code.
> > I've made a note to go fix the places that aren't doing it that way.
> >
> 
> I will fold in the following diff in the next version. I will also check
> if there are other instances in this patch series that need fixing.
> Pretty sure there are a few.
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index fc9941bd8653..6064f64a1295 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -337,14 +337,13 @@ static struct syscore_ops hv_syscore_ops = {
>  static void __init hv_get_partition_id(void)
>  {
> struct hv_get_partition_id *output_page;
> -   u16 status;
> +   u64 status;
> unsigned long flags;
> 
> local_irq_save(flags);
> output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> -   status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> -   HV_HYPERCALL_RESULT_MASK;
> -   if (status != HV_STATUS_SUCCESS) {
> +   status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page);
> +   if ((status & HV_HYPERCALL_RESULT_MASK) != HV_STATUS_SUCCESS) {
> /* No point in proceeding if this failed */
> pr_err("Failed to get partition ID: %d\n", status);
> BUG();
> > > + /* No point in proceeding if this failed */
> > > + pr_err("Failed to get partition ID: %d\n", status);
> > > + BUG();
> > > + }
> > > + hv_current_partition_id = output_page->partition_id;
> > > + local_irq_restore(flags);
> > > +}
> > > +
> > >  /*
> > >   * This function is to be invoked early in the boot sequence after the
> > >   * hypervisor has been detected.
> > > @@ -426,6 +448,11 @@ void __init hyperv_init(void)
> > >
> > >   register_syscore_ops(&hv_syscore_ops);
> > >
> > > + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> > > + hv_get_partition_id();
> >
> > Another place where the EBX value saved into the ms_hyperv structure
> > could be used.
> 
> If you're okay with my response earlier, this will be handled later in
> another patch (series).
> 

Yes, that's OK.  Andrea Parri's patch series for Isolated VMs is capturing the
EBX value as well.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition

2021-02-02 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Tuesday, February 2, 2021 9:32 AM
> 
> On Wed, Jan 27, 2021 at 05:47:04AM +, Michael Kelley wrote:
> > From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> > >
> > > When Linux runs as the root partition on Microsoft Hypervisor, its
> > > interrupts are remapped.  Linux will need to explicitly map and unmap
> > > interrupts for hardware.
> > >
> > > Implement an MSI domain to issue the correct hypercalls. And initialize
> > > this irqdomain as the default MSI irq domain.
> > >
> > > Signed-off-by: Sunil Muthuswamy 
> > > Co-Developed-by: Sunil Muthuswamy 
> > > Signed-off-by: Wei Liu 
> > > ---
> > > v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> > > v3: build irqdomain.o for 32bit as well.
> >
> > I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is 
> > built
> > only for 64-bit, I'm assuming running Linux in the root partition
> > is only functional for 64-bit builds.  So is the goal simply that 32-bit
> > builds will compile correctly?  Seems like maybe there should be
> > a CONFIG option for running Linux in the root partition, and that
> > option would force 64-bit.
> 
> To ensure 32 bit kernel builds and 32 bit guests still work.
> 
> The config option ROOT_API is to be introduced by Nuno's /dev/mshv
> series. We can use that option to gate some objects when that's
> available.
> 

But just so I'm 100% clear, is there intent to run 32-bit Linux in the root
partition?  I'm assuming not.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 16/16] iommu/hyperv: setup an IO-APIC IRQ remapping domain for root partition

2021-01-26 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Just like MSI/MSI-X, IO-APIC interrupts are remapped by Microsoft
> Hypervisor when Linux runs as the root partition. Implement an IRQ
> domain to handle mapping and unmapping of IO-APIC interrupts.
> 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/hyperv/irqdomain.c |  54 ++
>  arch/x86/include/asm/mshyperv.h |   4 +
>  drivers/iommu/hyperv-iommu.c| 179 +++-
>  3 files changed, 233 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> index 19637cd60231..8e2b4e478b70 100644
> --- a/arch/x86/hyperv/irqdomain.c
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -330,3 +330,57 @@ struct irq_domain * __init hv_create_pci_msi_domain(void)
>  }
> 
>  #endif /* CONFIG_PCI_MSI */
> +
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry)
> +{
> + union hv_device_id device_id;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + return hv_unmap_interrupt(device_id.as_uint64, entry) & 
> HV_HYPERCALL_RESULT_MASK;

The masking is already done in hv_unmap_interrupt.

> +}
> +EXPORT_SYMBOL_GPL(hv_unmap_ioapic_interrupt);
> +
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry)
> +{
> + unsigned long flags;
> + struct hv_input_map_device_interrupt *input;
> + struct hv_output_map_device_interrupt *output;
> + union hv_device_id device_id;
> + struct hv_device_interrupt_descriptor *intr_desc;
> + u16 status;
> +
> + device_id.as_uint64 = 0;
> + device_id.device_type = HV_DEVICE_TYPE_IOAPIC;
> + device_id.ioapic.ioapic_id = (u8)ioapic_id;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> + output = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + memset(input, 0, sizeof(*input));
> + intr_desc = &input->interrupt_descriptor;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = device_id.as_uint64;
> + intr_desc->interrupt_type = HV_X64_INTERRUPT_TYPE_FIXED;
> + intr_desc->target.vector = vector;
> + intr_desc->vector_count = 1;
> +
> + if (level)
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_LEVEL;
> + else
> + intr_desc->trigger_mode = HV_INTERRUPT_TRIGGER_MODE_EDGE;
> +
> + __set_bit(vcpu, (unsigned long *)&intr_desc->target.vp_mask);
> +
> + status = hv_do_rep_hypercall(HVCALL_MAP_DEVICE_INTERRUPT, 0, 0, input, 
> output) &
> +  HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + *entry = output->interrupt_entry;
> +
> + return status;

As a cross-check, I was comparing this code against hv_map_msi_interrupt().  
They are
mostly parallel, though some of the assignments are done in a different order.  
It's a nit,
but making them as parallel as possible would be nice. :-)

Same 64 vCPU comment applies here as well.


> +}
> +EXPORT_SYMBOL_GPL(hv_map_ioapic_interrupt);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ccc849e25d5e..345d7c6f8c37 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -263,6 +263,10 @@ static inline void hv_set_msi_entry_from_desc(union
> hv_msi_entry *msi_entry,
> 
>  struct irq_domain *hv_create_pci_msi_domain(void);
> 
> +int hv_map_ioapic_interrupt(int ioapic_id, bool level, int vcpu, int vector,
> + struct hv_interrupt_entry *entry);
> +int hv_unmap_ioapic_interrupt(int ioapic_id, struct hv_interrupt_entry 
> *entry);
> +
>  #else /* CONFIG_HYPERV */
>  static inline void hyperv_init(void) {}
>  static inline void hyperv_setup_mmu_ops(void) {}
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index b7db6024e65c..6d35e4c303c6 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -116,30 +116,43 @@ static const struct irq_domain_ops hyperv_ir_domain_ops 
> = {
>   .free = hyperv_irq_remapping_free,
>  };
> 
> +static const struct irq_domain_ops hyperv_root_ir_domain_ops;
>  static int __init hyperv_prepare_irq_remapping(void)
>  {
>   struct fwnode_handle *fn;
>   int i;
> + const char *name;
> + const struct irq_domain_ops *ops;
> 
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
> - !x2apic_supported() || hv_root_partition)
> + !x2apic_supported())

Any reason that the check for hv_root_partition was added
in patch #4  of this series, and then removed here?  Could
patch #4 just be dropped?

>   return -ENODEV;
> 
> - fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> + if (hv_root_partition) {
> + name = "HYPERV-ROOT-IR";
> + ops = &h

RE: [PATCH v5 15/16] x86/hyperv: implement an MSI domain for root partition

2021-01-26 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition on Microsoft Hypervisor, its
> interrupts are remapped.  Linux will need to explicitly map and unmap
> interrupts for hardware.
> 
> Implement an MSI domain to issue the correct hypercalls. And initialize
> this irqdomain as the default MSI irq domain.
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v4: Fix compilation issue when CONFIG_PCI_MSI is not set.
> v3: build irqdomain.o for 32bit as well.

I'm not clear on the intent for 32-bit builds.  Given that hv_proc.c is built
only for 64-bit, I'm assuming running Linux in the root partition
is only functional for 64-bit builds.  So is the goal simply that 32-bit
builds will compile correctly?  Seems like maybe there should be
a CONFIG option for running Linux in the root partition, and that
option would force 64-bit.

> v2: This patch is simplified due to upstream changes.
> ---
>  arch/x86/hyperv/Makefile|   2 +-
>  arch/x86/hyperv/hv_init.c   |   9 +
>  arch/x86/hyperv/irqdomain.c | 332 
>  arch/x86/include/asm/mshyperv.h |   2 +
>  4 files changed, 344 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/irqdomain.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 565358020921..48e2c51464e8 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0-only
> -obj-y:= hv_init.o mmu.o nested.o
> +obj-y:= hv_init.o mmu.o nested.o irqdomain.o
>  obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index ad8e77859b32..1cb2f7d1850a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -484,6 +484,15 @@ void __init hyperv_init(void)
> 
>   BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> 
> +#ifdef CONFIG_PCI_MSI
> + /*
> +  * If we're running as root, we want to create our own PCI MSI domain.
> +  * We can't set this in hv_pci_init because that would be too late.
> +  */
> + if (hv_root_partition)
> + x86_init.irqs.create_pci_msi_domain = hv_create_pci_msi_domain;
> +#endif
> +
>   return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/hyperv/irqdomain.c b/arch/x86/hyperv/irqdomain.c
> new file mode 100644
> index ..19637cd60231
> --- /dev/null
> +++ b/arch/x86/hyperv/irqdomain.c
> @@ -0,0 +1,332 @@
> +// SPDX-License-Identifier: GPL-2.0
> +//
> +// Irqdomain for Linux to run as the root partition on Microsoft Hypervisor.
> +//
> +// Authors:
> +//   Sunil Muthuswamy 
> +//   Wei Liu 

I think the // comment style should only be used for the SPDX line.

> +
> +#include 
> +#include 
> +#include 
> +
> +static int hv_unmap_interrupt(u64 id, struct hv_interrupt_entry *old_entry)
> +{
> + unsigned long flags;
> + struct hv_input_unmap_device_interrupt *input;
> + struct hv_interrupt_entry *intr_entry;
> + u16 status;
> +
> + local_irq_save(flags);
> + input = *this_cpu_ptr(hyperv_pcpu_input_arg);
> +
> + memset(input, 0, sizeof(*input));
> + intr_entry = &input->interrupt_entry;
> + input->partition_id = hv_current_partition_id;
> + input->device_id = id;
> + *intr_entry = *old_entry;
> +
> + status = hv_do_rep_hypercall(HVCALL_UNMAP_DEVICE_INTERRUPT, 0, 0, 
> input, NULL) &
> +  HV_HYPERCALL_RESULT_MASK;
> + local_irq_restore(flags);
> +
> + return status;
> +}
> +
> +#ifdef CONFIG_PCI_MSI
> +struct rid_data {
> + struct pci_dev *bridge;
> + u32 rid;
> +};
> +
> +static int get_rid_cb(struct pci_dev *pdev, u16 alias, void *data)
> +{
> + struct rid_data *rd = data;
> + u8 bus = PCI_BUS_NUM(rd->rid);
> +
> + if (pdev->bus->number != bus || PCI_BUS_NUM(alias) != bus) {
> + rd->bridge = pdev;
> + rd->rid = alias;
> + }
> +
> + return 0;
> +}
> +
> +static union hv_device_id hv_build_pci_dev_id(struct pci_dev *dev)
> +{
> + union hv_device_id dev_id;
> + struct rid_data data = {
> + .bridge = NULL,
> + .rid = PCI_DEVID(dev->bus->number, dev->devfn)
> + };
> +
> + pci_for_each_dma_alias(dev, get_rid_cb, &data);
> +
> + dev_id.as_uint64 = 0;
> + dev_id.device_type = HV_DEVICE_TYPE_PCI;
> + dev_id.pci.segment = pci_domain_nr(dev->bus);
> +
> + dev_id.pci.bdf.bus = PCI_BUS_NUM(data.rid);
> + dev_id.pci.bdf.device = PCI_SLOT(data.rid);
> + dev_id.pci.bdf.function = PCI_FUNC(data.rid);
> + dev_id.pci.source_shadow = HV_SOURCE_SHADOW_NONE;
> +
> + if (data.bridge) {
> + int pos;
> +
> + /*
> +  * Microsoft Hypervisor requires a bus range when the bridge is
> +  * running in PCI-X mode.
>

RE: [PATCH v5 12/16] asm-generic/hyperv: update hv_interrupt_entry

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will soon use the same structure to handle IO-APIC interrupts as
> well. Introduce an enum to identify the source and a data structure for
> IO-APIC RTE.
> 
> While at it, update pci-hyperv.c to use the enum.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 
> Acked-by: Rob Herring 
> ---
>  drivers/pci/controller/pci-hyperv.c |  2 +-
>  include/asm-generic/hyperv-tlfs.h   | 36 +++--
>  2 files changed, 35 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/pci/controller/pci-hyperv.c 
> b/drivers/pci/controller/pci-hyperv.c
> index 6db8d96a78eb..87aa62ee0368 100644
> --- a/drivers/pci/controller/pci-hyperv.c
> +++ b/drivers/pci/controller/pci-hyperv.c
> @@ -1216,7 +1216,7 @@ static void hv_irq_unmask(struct irq_data *data)
>   params = &hbus->retarget_msi_interrupt_params;
>   memset(params, 0, sizeof(*params));
>   params->partition_id = HV_PARTITION_ID_SELF;
> - params->int_entry.source = 1; /* MSI(-X) */
> + params->int_entry.source = HV_INTERRUPT_SOURCE_MSI;
>   hv_set_msi_entry_from_desc(¶ms->int_entry.msi_entry, msi_desc);
>   params->device_id = (hbus->hdev->dev_instance.b[5] << 24) |
>  (hbus->hdev->dev_instance.b[4] << 16) |
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 7e103be42799..8423bf53c237 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -480,6 +480,11 @@ struct hv_create_vp {
>   u64 flags;
>  } __packed;
> 
> +enum hv_interrupt_source {
> + HV_INTERRUPT_SOURCE_MSI = 1, /* MSI and MSI-X */
> + HV_INTERRUPT_SOURCE_IOAPIC,
> +};
> +
>  union hv_msi_address_register {
>   u32 as_uint32;
>   struct {
> @@ -513,10 +518,37 @@ union hv_msi_entry {
>   } __packed;
>  };
> 
> +union hv_ioapic_rte {
> + u64 as_uint64;
> +
> + struct {
> + u32 vector:8;
> + u32 delivery_mode:3;
> + u32 destination_mode:1;
> + u32 delivery_status:1;
> + u32 interrupt_polarity:1;
> + u32 remote_irr:1;
> + u32 trigger_mode:1;
> + u32 interrupt_mask:1;
> + u32 reserved1:15;
> +
> + u32 reserved2:24;
> + u32 destination_id:8;
> + };
> +
> + struct {
> + u32 low_uint32;
> + u32 high_uint32;
> + };
> +} __packed;
> +
>  struct hv_interrupt_entry {
> - u32 source; /* 1 for MSI(-X) */
> + u32 source;
>   u32 reserved1;
> - union hv_msi_entry msi_entry;
> + union {
> + union hv_msi_entry msi_entry;
> + union hv_ioapic_rte ioapic_rte;
> + };
>  } __packed;
> 
>  /*
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 09/16] x86/hyperv: provide a bunch of helper functions

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> They are used to deposit pages into Microsoft Hypervisor and bring up
> logical and virtual processors.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Nuno Das Neves 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Co-Developed-by: Nuno Das Neves 
> Signed-off-by: Wei Liu 
> ---
> v4: Fix compilation issue when CONFIG_ACPI_NUMA is not set.
> 
> v3:
> 1. Add __packed to structures.
> 2. Drop unnecessary exports.
> 
> v2:
> 1. Adapt to hypervisor side changes
> 2. Address Vitaly's comments
> ---
>  arch/x86/hyperv/Makefile  |   2 +-
>  arch/x86/hyperv/hv_proc.c | 225 ++
>  arch/x86/include/asm/mshyperv.h   |   4 +
>  include/asm-generic/hyperv-tlfs.h |  67 +
>  4 files changed, 297 insertions(+), 1 deletion(-)
>  create mode 100644 arch/x86/hyperv/hv_proc.c
> 
> diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
> index 89b1f74d3225..565358020921 100644
> --- a/arch/x86/hyperv/Makefile
> +++ b/arch/x86/hyperv/Makefile
> @@ -1,6 +1,6 @@
>  # SPDX-License-Identifier: GPL-2.0-only
>  obj-y:= hv_init.o mmu.o nested.o
> -obj-$(CONFIG_X86_64) += hv_apic.o
> +obj-$(CONFIG_X86_64) += hv_apic.o hv_proc.o
> 
>  ifdef CONFIG_X86_64
>  obj-$(CONFIG_PARAVIRT_SPINLOCKS) += hv_spinlock.o
> diff --git a/arch/x86/hyperv/hv_proc.c b/arch/x86/hyperv/hv_proc.c
> new file mode 100644
> index ..706097160e2f
> --- /dev/null
> +++ b/arch/x86/hyperv/hv_proc.c
> @@ -0,0 +1,225 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define HV_DEPOSIT_MAX_ORDER (8)
> +#define HV_DEPOSIT_MAX (1 << HV_DEPOSIT_MAX_ORDER)

Is there any reason to not let the maximum be 511, which is
how many entries will fit on the hypercall input page?  The
max could be define in terms of HY_HYP_PAGE_SIZE so that
the logical dependency is fully expressed.  

> +
> +/*
> + * Deposits exact number of pages
> + * Must be called with interrupts enabled
> + * Max 256 pages
> + */
> +int hv_call_deposit_pages(int node, u64 partition_id, u32 num_pages)
> +{
> + struct page **pages;
> + int *counts;
> + int num_allocations;
> + int i, j, page_count;
> + int order;
> + int desired_order;
> + u16 status;
> + int ret;
> + u64 base_pfn;
> + struct hv_deposit_memory *input_page;
> + unsigned long flags;
> +
> + if (num_pages > HV_DEPOSIT_MAX)
> + return -E2BIG;
> + if (!num_pages)
> + return 0;
> +
> + /* One buffer for page pointers and counts */
> + pages = page_address(alloc_page(GFP_KERNEL));
> + if (!pages)

Does the above check work?  If alloc_pages() returns NULL, it looks like
page_address() might fault.

> + return -ENOMEM;
> +
> + counts = kcalloc(HV_DEPOSIT_MAX, sizeof(int), GFP_KERNEL);
> + if (!counts) {
> + free_page((unsigned long)pages);
> + return -ENOMEM;
> + }
> +
> + /* Allocate all the pages before disabling interrupts */
> + num_allocations = 0;
> + i = 0;
> + order = HV_DEPOSIT_MAX_ORDER;
> +
> + while (num_pages) {
> + /* Find highest order we can actually allocate */
> + desired_order = 31 - __builtin_clz(num_pages);
> + order = min(desired_order, order);

The above seems redundant since request sizes larger than the
max have already been rejected.

> + do {
> + pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
> + if (!pages[i]) {
> + if (!order) {
> + ret = -ENOMEM;
> + goto err_free_allocations;
> + }
> + --order;
> + }
> + } while (!pages[i]);

The duplicative test of !pages[i] is somewhat annoying.  How about
this:

while{!pages[i] = alloc_pages_node(node, GFP_KERNEL, order) {
if (!order) {
ret = -ENOMEM;
goto err_free_allocations;
}
--order;
}

or if you don't like doing an assignment in the while test:

while(1) {
pages[i] = alloc_pages_node(node, GFP_KERNEL, order);
if (page[i])
break;
if (!order) {
ret = -ENOMEM;
goto err_free_allocations;
}
--order;
}

> +
> + split_

RE: [PATCH v5 10/16] x86/hyperv: implement and use hv_smp_prepare_cpus

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Microsoft Hypervisor requires the root partition to make a few
> hypercalls to setup application processors before they can be used.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> CPU hotplug and unplug is not yet supported in this setup, so those
> paths remain untouched.
> 
> v3: Always call native SMP preparation function.
> ---
>  arch/x86/kernel/cpu/mshyperv.c | 29 +
>  1 file changed, 29 insertions(+)
> 
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index c376d191a260..13d3b6dd21a3 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /* Is Linux running as the root partition? */
>  bool hv_root_partition;
> @@ -212,6 +213,32 @@ static void __init hv_smp_prepare_boot_cpu(void)
>   hv_init_spinlocks();
>  #endif
>  }
> +
> +static void __init hv_smp_prepare_cpus(unsigned int max_cpus)
> +{
> +#ifdef CONFIG_X86_64
> + int i;
> + int ret;
> +#endif
> +
> + native_smp_prepare_cpus(max_cpus);
> +
> +#ifdef CONFIG_X86_64
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_add_logical_proc(numa_cpu_node(i), i, 
> cpu_physical_id(i));
> + BUG_ON(ret);
> + }
> +
> + for_each_present_cpu(i) {
> + if (i == 0)
> + continue;
> + ret = hv_call_create_vp(numa_cpu_node(i), 
> hv_current_partition_id, i, i);
> + BUG_ON(ret);
> + }
> +#endif
> +}
>  #endif
> 
>  static void __init ms_hyperv_init_platform(void)
> @@ -368,6 +395,8 @@ static void __init ms_hyperv_init_platform(void)
> 
>  # ifdef CONFIG_SMP
>   smp_ops.smp_prepare_boot_cpu = hv_smp_prepare_boot_cpu;
> + if (hv_root_partition)
> + smp_ops.smp_prepare_cpus = hv_smp_prepare_cpus;
>  # endif
> 
>   /*
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 11/16] asm-generic/hyperv: update hv_msi_entry

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will soon need to access fields inside the MSI address and MSI data
> fields. Introduce hv_msi_address_register and hv_msi_data_register.
> 
> Fix up one user of hv_msi_entry in mshyperv.h.
> 
> No functional change expected.
> 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/include/asm/mshyperv.h   |  4 ++--
>  include/asm-generic/hyperv-tlfs.h | 28 ++--
>  2 files changed, 28 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 4e590a167160..cbee72550a12 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -257,8 +257,8 @@ static inline void hv_apic_init(void) {}
>  static inline void hv_set_msi_entry_from_desc(union hv_msi_entry *msi_entry,
> struct msi_desc *msi_desc)
>  {
> - msi_entry->address = msi_desc->msg.address_lo;
> - msi_entry->data = msi_desc->msg.data;
> + msi_entry->address.as_uint32 = msi_desc->msg.address_lo;
> + msi_entry->data.as_uint32 = msi_desc->msg.data;
>  }
> 
>  #else /* CONFIG_HYPERV */
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index ec53570102f0..7e103be42799 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -480,12 +480,36 @@ struct hv_create_vp {
>   u64 flags;
>  } __packed;
> 
> +union hv_msi_address_register {
> + u32 as_uint32;
> + struct {
> + u32 reserved1:2;
> + u32 destination_mode:1;
> + u32 redirection_hint:1;
> + u32 reserved2:8;
> + u32 destination_id:8;
> + u32 msi_base:12;
> + };
> +} __packed;
> +
> +union hv_msi_data_register {
> + u32 as_uint32;
> + struct {
> + u32 vector:8;
> + u32 delivery_mode:3;
> + u32 reserved1:3;
> + u32 level_assert:1;
> + u32 trigger_mode:1;
> + u32 reserved2:16;
> + };
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>   u64 as_uint64;
>   struct {
> - u32 address;
> - u32 data;
> + union hv_msi_address_register address;
> + union hv_msi_data_register data;
>   } __packed;
>  };
> 
> --
> 2.20.1

Reviewed-by: Michael Kelley 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 06/16] x86/hyperv: allocate output arg pages if required

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition, it will need to make hypercalls
> which return data from the hypervisor.
> 
> Allocate pages for storing results when Linux runs as the root
> partition.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Lillian Grassin-Drake 
> Signed-off-by: Wei Liu 
> ---
> v3: Fix hv_cpu_die to use free_pages.
> v2: Address Vitaly's comments
> ---
>  arch/x86/hyperv/hv_init.c   | 35 -
>  arch/x86/include/asm/mshyperv.h |  1 +
>  2 files changed, 31 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index e04d90af4c27..6f4cb40e53fe 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -41,6 +41,9 @@ EXPORT_SYMBOL_GPL(hv_vp_assist_page);
>  void  __percpu **hyperv_pcpu_input_arg;
>  EXPORT_SYMBOL_GPL(hyperv_pcpu_input_arg);
> 
> +void  __percpu **hyperv_pcpu_output_arg;
> +EXPORT_SYMBOL_GPL(hyperv_pcpu_output_arg);
> +
>  u32 hv_max_vp_index;
>  EXPORT_SYMBOL_GPL(hv_max_vp_index);
> 
> @@ -73,12 +76,19 @@ static int hv_cpu_init(unsigned int cpu)
>   void **input_arg;
>   struct page *pg;
> 
> - input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   /* hv_cpu_init() can be called with IRQs disabled from hv_resume() */
> - pg = alloc_page(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL);
> + pg = alloc_pages(irqs_disabled() ? GFP_ATOMIC : GFP_KERNEL, 
> hv_root_partition ?
> 1 : 0);
>   if (unlikely(!pg))
>   return -ENOMEM;
> +
> + input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
>   *input_arg = page_address(pg);
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = page_address(pg + 1);
> + }
> 
>   hv_get_vp_index(msr_vp_index);
> 
> @@ -205,14 +215,23 @@ static int hv_cpu_die(unsigned int cpu)
>   unsigned int new_cpu;
>   unsigned long flags;
>   void **input_arg;
> - void *input_pg = NULL;
> + void *pg;
> 
>   local_irq_save(flags);
>   input_arg = (void **)this_cpu_ptr(hyperv_pcpu_input_arg);
> - input_pg = *input_arg;
> + pg = *input_arg;
>   *input_arg = NULL;
> +
> + if (hv_root_partition) {
> + void **output_arg;
> +
> + output_arg = (void **)this_cpu_ptr(hyperv_pcpu_output_arg);
> + *output_arg = NULL;
> + }
> +
>   local_irq_restore(flags);
> - free_page((unsigned long)input_pg);
> +
> + free_pages((unsigned long)pg, hv_root_partition ? 1 : 0);
> 
>   if (hv_vp_assist_page && hv_vp_assist_page[cpu])
>   wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> @@ -346,6 +365,12 @@ void __init hyperv_init(void)
> 
>   BUG_ON(hyperv_pcpu_input_arg == NULL);
> 
> + /* Allocate the per-CPU state for output arg for root */
> + if (hv_root_partition) {
> + hyperv_pcpu_output_arg = alloc_percpu(void *);
> + BUG_ON(hyperv_pcpu_output_arg == NULL);
> + }
> +
>   /* Allocate percpu VP index */
>   hv_vp_index = kmalloc_array(num_possible_cpus(), sizeof(*hv_vp_index),
>   GFP_KERNEL);
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ac2b0d110f03..62d9390f1ddf 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -76,6 +76,7 @@ static inline void hv_disable_stimer0_percpu_irq(int irq) {}
>  #if IS_ENABLED(CONFIG_HYPERV)
>  extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
> +extern void  __percpu  **hyperv_pcpu_output_arg;
> 
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
> --
> 2.20.1

I think this all works OK.  But a meta question:  Do we need a separate
per-cpu output argument page?  From the Hyper-V hypercall standpoint, I
don't think input and output args need to be in separate pages.  They both
just need to not cross a page boundary.  As long as we don't have a hypercall
where the sum of the sizes of the input and output args exceeds a page,
we could just have a single page, and split it up in any manner that works
for the particular hypercall.

Thoughts?

Michael


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 14/16] asm-generic/hyperv: import data structures for mapping device interrupts

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 13 +++
>  include/asm-generic/hyperv-tlfs.h  | 36 ++
>  2 files changed, 49 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index 204010350604..ab7d6cde548d 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -533,6 +533,19 @@ struct hv_partition_assist_pg {
>   u32 tlb_lock_count;
>  };
> 
> +enum hv_interrupt_type {
> + HV_X64_INTERRUPT_TYPE_FIXED = 0x,
> + HV_X64_INTERRUPT_TYPE_LOWESTPRIORITY= 0x0001,
> + HV_X64_INTERRUPT_TYPE_SMI   = 0x0002,
> + HV_X64_INTERRUPT_TYPE_REMOTEREAD= 0x0003,
> + HV_X64_INTERRUPT_TYPE_NMI   = 0x0004,
> + HV_X64_INTERRUPT_TYPE_INIT  = 0x0005,
> + HV_X64_INTERRUPT_TYPE_SIPI  = 0x0006,
> + HV_X64_INTERRUPT_TYPE_EXTINT= 0x0007,
> + HV_X64_INTERRUPT_TYPE_LOCALINT0 = 0x0008,
> + HV_X64_INTERRUPT_TYPE_LOCALINT1 = 0x0009,
> + HV_X64_INTERRUPT_TYPE_MAXIMUM   = 0x000A,
> +};
> 
>  #include 
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 42ff1326c6bd..07efe0131fe3 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -152,6 +152,8 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_RETRIEVE_DEBUG_DATA   0x006a
>  #define HVCALL_RESET_DEBUG_SESSION   0x006b
>  #define HVCALL_ADD_LOGICAL_PROCESSOR 0x0076
> +#define HVCALL_MAP_DEVICE_INTERRUPT  0x007c
> +#define HVCALL_UNMAP_DEVICE_INTERRUPT0x007d
>  #define HVCALL_RETARGET_INTERRUPT0x007e
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_SPACE 0x00af
>  #define HVCALL_FLUSH_GUEST_PHYSICAL_ADDRESS_LIST 0x00b0
> @@ -702,4 +704,38 @@ union hv_device_id {
>   } acpi;
>  } __packed;
> 
> +enum hv_interrupt_trigger_mode {
> + HV_INTERRUPT_TRIGGER_MODE_EDGE = 0,
> + HV_INTERRUPT_TRIGGER_MODE_LEVEL = 1,
> +};
> +
> +struct hv_device_interrupt_descriptor {
> + u32 interrupt_type;
> + u32 trigger_mode;
> + u32 vector_count;
> + u32 reserved;
> + struct hv_device_interrupt_target target;
> +} __packed;
> +
> +struct hv_input_map_device_interrupt {
> + u64 partition_id;
> + u64 device_id;
> + u64 flags;
> + struct hv_interrupt_entry logical_interrupt_entry;
> + struct hv_device_interrupt_descriptor interrupt_descriptor;
> +} __packed;
> +
> +struct hv_output_map_device_interrupt {
> + struct hv_interrupt_entry interrupt_entry;
> +} __packed;
> +
> +struct hv_input_unmap_device_interrupt {
> + u64 partition_id;
> + u64 device_id;
> + struct hv_interrupt_entry interrupt_entry;
> +} __packed;
> +
> +#define HV_SOURCE_SHADOW_NONE   0x0
> +#define HV_SOURCE_SHADOW_BRIDGE_BUS_RANGE   0x1
> +
>  #endif
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 13/16] asm-generic/hyperv: introduce hv_device_id and auxiliary structures

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will need to identify the device we want Microsoft Hypervisor to
> manipulate.  Introduce the data structures for that purpose.
> 
> They will be used in a later patch.
> 
> Signed-off-by: Sunil Muthuswamy 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
>  include/asm-generic/hyperv-tlfs.h | 79 +++
>  1 file changed, 79 insertions(+)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index 8423bf53c237..42ff1326c6bd 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -623,4 +623,83 @@ struct hv_set_vp_registers_input {
>   } element[];
>  } __packed;
> 
> +enum hv_device_type {
> + HV_DEVICE_TYPE_LOGICAL = 0,
> + HV_DEVICE_TYPE_PCI = 1,
> + HV_DEVICE_TYPE_IOAPIC = 2,
> + HV_DEVICE_TYPE_ACPI = 3,
> +};
> +
> +typedef u16 hv_pci_rid;
> +typedef u16 hv_pci_segment;
> +typedef u64 hv_logical_device_id;
> +union hv_pci_bdf {
> + u16 as_uint16;
> +
> + struct {
> + u8 function:3;
> + u8 device:5;
> + u8 bus;
> + };
> +} __packed;
> +
> +union hv_pci_bus_range {
> + u16 as_uint16;
> +
> + struct {
> + u8 subordinate_bus;
> + u8 secondary_bus;
> + };
> +} __packed;
> +
> +union hv_device_id {
> + u64 as_uint64;
> +
> + struct {
> + u64 :62;
> + u64 device_type:2;
> + };

Are the above 4 lines extraneous junk? 
If not, a comment would be helpful.  And we
would normally label the 62 bit field as 
"reserved0" or something similar.

> +
> + /* HV_DEVICE_TYPE_LOGICAL */
> + struct {
> + u64 id:62;
> + u64 device_type:2;
> + } logical;
> +
> + /* HV_DEVICE_TYPE_PCI */
> + struct {
> + union {
> + hv_pci_rid rid;
> + union hv_pci_bdf bdf;
> + };
> +
> + hv_pci_segment segment;
> + union hv_pci_bus_range shadow_bus_range;
> +
> + u16 phantom_function_bits:2;
> + u16 source_shadow:1;
> +
> + u16 rsvdz0:11;
> + u16 device_type:2;
> + } pci;
> +
> + /* HV_DEVICE_TYPE_IOAPIC */
> + struct {
> + u8 ioapic_id;
> + u8 rsvdz0;
> + u16 rsvdz1;
> + u16 rsvdz2;
> +
> + u16 rsvdz3:14;
> + u16 device_type:2;
> + } ioapic;
> +
> + /* HV_DEVICE_TYPE_ACPI */
> + struct {
> + u32 input_mapping_base;
> + u32 input_mapping_count:30;
> + u32 device_type:2;
> + } acpi;
> +} __packed;
> +
>  #endif
> --
> 2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 08/16] x86/hyperv: handling hypercall page setup for root

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux is running as the root partition, the hypercall page will
> have already been setup by Hyper-V. Copy the content over to the
> allocated page.
> 
> Add checks to hv_suspend & co to bail early because they are not
> supported in this setup yet.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Signed-off-by: Sunil Muthuswamy 
> Signed-off-by: Nuno Das Neves 
> Co-Developed-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Co-Developed-by: Nuno Das Neves 
> Signed-off-by: Wei Liu 
> ---
> v3:
> 1. Use HV_HYP_PAGE_SIZE.
> 2. Add checks to hv_suspend & co.
> ---
>  arch/x86/hyperv/hv_init.c | 37 ++---
>  1 file changed, 34 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index fc9941bd8653..ad8e77859b32 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  u64 hv_current_partition_id = ~0ull;
>  EXPORT_SYMBOL_GPL(hv_current_partition_id);
> @@ -283,6 +284,9 @@ static int hv_suspend(void)
>   union hv_x64_msr_hypercall_contents hypercall_msr;
>   int ret;
> 
> + if (hv_root_partition)
> + return -EPERM;
> +
>   /*
>* Reset the hypercall page as it is going to be invalidated
>* accross hibernation. Setting hv_hypercall_pg to NULL ensures
> @@ -433,8 +437,35 @@ void __init hyperv_init(void)
> 
>   rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
>   hypercall_msr.enable = 1;
> - hypercall_msr.guest_physical_address = vmalloc_to_pfn(hv_hypercall_pg);
> - wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + if (hv_root_partition) {
> + struct page *pg;
> + void *src, *dst;
> +
> + /*
> +  * For the root partition, the hypervisor will set up its
> +  * hypercall page. The hypervisor guarantees it will not show
> +  * up in the root's address space. The root can't change the
> +  * location of the hypercall page.
> +  *
> +  * Order is important here. We must enable the hypercall page
> +  * so it is populated with code, then copy the code to an
> +  * executable page.
> +  */
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> +
> + pg = vmalloc_to_page(hv_hypercall_pg);
> + dst = kmap(pg);
> + src = memremap(hypercall_msr.guest_physical_address << 
> PAGE_SHIFT,
> PAGE_SIZE,
> + MEMREMAP_WB);
> + BUG_ON(!(src && dst));
> + memcpy(dst, src, HV_HYP_PAGE_SIZE);
> + memunmap(src);
> + kunmap(pg);
> + } else {
> + hypercall_msr.guest_physical_address = 
> vmalloc_to_pfn(hv_hypercall_pg);
> + wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
> + }
> 
>   /*
>* Ignore any errors in setting up stimer clockevents
> @@ -577,6 +608,6 @@ EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
> 
>  bool hv_is_hibernation_supported(void)
>  {
> - return acpi_sleep_state_supported(ACPI_STATE_S4);
> + return !hv_root_partition && acpi_sleep_state_supported(ACPI_STATE_S4);
>  }
>  EXPORT_SYMBOL_GPL(hv_is_hibernation_supported);
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 07/16] x86/hyperv: extract partition ID from Microsoft Hypervisor if necessary

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> We will need the partition ID for executing some hypercalls later.
> 
> Signed-off-by: Lillian Grassin-Drake 
> Co-Developed-by: Sunil Muthuswamy 
> Signed-off-by: Wei Liu 
> ---
> v3:
> 1. Make hv_get_partition_id static.
> 2. Change code structure a bit.
> ---
>  arch/x86/hyperv/hv_init.c | 27 +++
>  arch/x86/include/asm/mshyperv.h   |  2 ++
>  include/asm-generic/hyperv-tlfs.h |  6 ++
>  3 files changed, 35 insertions(+)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 6f4cb40e53fe..fc9941bd8653 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -26,6 +26,9 @@
>  #include 
>  #include 
> 
> +u64 hv_current_partition_id = ~0ull;
> +EXPORT_SYMBOL_GPL(hv_current_partition_id);
> +
>  void *hv_hypercall_pg;
>  EXPORT_SYMBOL_GPL(hv_hypercall_pg);
> 
> @@ -331,6 +334,25 @@ static struct syscore_ops hv_syscore_ops = {
>   .resume = hv_resume,
>  };
> 
> +static void __init hv_get_partition_id(void)
> +{
> + struct hv_get_partition_id *output_page;
> + u16 status;
> + unsigned long flags;
> +
> + local_irq_save(flags);
> + output_page = *this_cpu_ptr(hyperv_pcpu_output_arg);
> + status = hv_do_hypercall(HVCALL_GET_PARTITION_ID, NULL, output_page) &
> + HV_HYPERCALL_RESULT_MASK;
> + if (status != HV_STATUS_SUCCESS) {

Across the Hyper-V code in Linux, the way we check the hypercall result
is very inconsistent.  IMHO, the and'ing of hv_do_hypercall() with 
HV_HYPERCALL_RESULT_MASK so that status can be a u16 is stylistically
a bit unusual.

I'd like to see the hypercall result being stored into a u64 local variable.
Then the subsequent test for the status should 'and' the u64 with
HV_HYPERCALL_RESULT_MASK to determine the result code.
I've made a note to go fix the places that aren't doing it that way.

> + /* No point in proceeding if this failed */
> + pr_err("Failed to get partition ID: %d\n", status);
> + BUG();
> + }
> + hv_current_partition_id = output_page->partition_id;
> + local_irq_restore(flags);
> +}
> +
>  /*
>   * This function is to be invoked early in the boot sequence after the
>   * hypervisor has been detected.
> @@ -426,6 +448,11 @@ void __init hyperv_init(void)
> 
>   register_syscore_ops(&hv_syscore_ops);
> 
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_ACCESS_PARTITION_ID)
> + hv_get_partition_id();

Another place where the EBX value saved into the ms_hyperv structure
could be used.

> +
> + BUG_ON(hv_root_partition && hv_current_partition_id == ~0ull);
> +
>   return;
> 
>  remove_cpuhp_state:
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 62d9390f1ddf..67f5d35a73d3 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -78,6 +78,8 @@ extern void *hv_hypercall_pg;
>  extern void  __percpu  **hyperv_pcpu_input_arg;
>  extern void  __percpu  **hyperv_pcpu_output_arg;
> 
> +extern u64 hv_current_partition_id;
> +
>  static inline u64 hv_do_hypercall(u64 control, void *input, void *output)
>  {
>   u64 input_address = input ? virt_to_phys(input) : 0;
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index e6903589a82a..87b1a79b19eb 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -141,6 +141,7 @@ struct ms_hyperv_tsc_page {
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_SPACE_EX0x0013
>  #define HVCALL_FLUSH_VIRTUAL_ADDRESS_LIST_EX 0x0014
>  #define HVCALL_SEND_IPI_EX   0x0015
> +#define HVCALL_GET_PARTITION_ID  0x0046
>  #define HVCALL_GET_VP_REGISTERS  0x0050
>  #define HVCALL_SET_VP_REGISTERS  0x0051
>  #define HVCALL_POST_MESSAGE  0x005c
> @@ -407,6 +408,11 @@ struct hv_tlb_flush_ex {
>   u64 gva_list[];
>  } __packed;
> 
> +/* HvGetPartitionId hypercall (output only) */
> +struct hv_get_partition_id {
> + u64 partition_id;
> +} __packed;
> +
>  /* HvRetargetDeviceInterrupt hypercall */
>  union hv_msi_entry {
>   u64 as_uint64;
> --
> 2.20.1

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 01/16] asm-generic/hyperv: change HV_CPU_POWER_MANAGEMENT to HV_CPU_MANAGEMENT

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> This makes the name match Hyper-V TLFS.
> 
> Signed-off-by: Wei Liu 
> Reviewed-by: Vitaly Kuznetsov 
> ---
>  include/asm-generic/hyperv-tlfs.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/asm-generic/hyperv-tlfs.h 
> b/include/asm-generic/hyperv-tlfs.h
> index e73a11850055..e6903589a82a 100644
> --- a/include/asm-generic/hyperv-tlfs.h
> +++ b/include/asm-generic/hyperv-tlfs.h
> @@ -88,7 +88,7 @@
>  #define HV_CONNECT_PORT  BIT(7)
>  #define HV_ACCESS_STATS  BIT(8)
>  #define HV_DEBUGGING BIT(11)
> -#define HV_CPU_POWER_MANAGEMENT  BIT(12)
> +#define HV_CPU_MANAGEMENTBIT(12)
> 
> 
>  /*
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 05/16] clocksource/hyperv: use MSR-based access if running as root

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> When Linux runs as the root partition, the setup required for TSC page
> is different. Luckily Linux also has access to the MSR based
> clocksource. We can just disable the TSC page clocksource if Linux is
> the root partition.
> 
> Signed-off-by: Wei Liu 
> Acked-by: Daniel Lezcano 
> ---
>  drivers/clocksource/hyperv_timer.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..269a691bd2c4 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -426,6 +426,9 @@ static bool __init hv_init_tsc_clocksource(void)
>   if (!(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE))
>   return false;
> 
> + if (hv_root_partition)
> + return false;
> +
>   hv_read_reference_counter = read_hv_clock_tsc;
>   phys_addr = virt_to_phys(hv_get_tsc_page());
> 
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 04/16] iommu/hyperv: don't setup IRQ remapping when running as root

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> The IOMMU code needs more work. We're sure for now the IRQ remapping
> hooks are not applicable when Linux is the root partition.
> 
> Signed-off-by: Wei Liu 
> Acked-by: Joerg Roedel 
> Reviewed-by: Vitaly Kuznetsov 
> ---
>  drivers/iommu/hyperv-iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
> index 1d21a0b5f724..b7db6024e65c 100644
> --- a/drivers/iommu/hyperv-iommu.c
> +++ b/drivers/iommu/hyperv-iommu.c
> @@ -20,6 +20,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  #include "irq_remapping.h"
> 
> @@ -122,7 +123,7 @@ static int __init hyperv_prepare_irq_remapping(void)
> 
>   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV) ||
>   x86_init.hyper.msi_ext_dest_id() ||
> - !x2apic_supported())
> + !x2apic_supported() || hv_root_partition)
>   return -ENODEV;
> 
>   fn = irq_domain_alloc_named_id_fwnode("HYPERV-IR", 0);
> --
> 2.20.1

Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 03/16] Drivers: hv: vmbus: skip VMBus initialization if Linux is root

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> There is no VMBus and the other infrastructures initialized in
> hv_acpi_init when Linux is running as the root partition.
> 
> Signed-off-by: Wei Liu 
> ---
> v3: Return 0 instead of -ENODEV.
> ---
>  drivers/hv/vmbus_drv.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
> index 502f8cd95f6d..ee27b3670a51 100644
> --- a/drivers/hv/vmbus_drv.c
> +++ b/drivers/hv/vmbus_drv.c
> @@ -2620,6 +2620,9 @@ static int __init hv_acpi_init(void)
>   if (!hv_is_hyperv_initialized())
>   return -ENODEV;
> 
> + if (hv_root_partition)
> + return 0;
> +
>   init_completion(&probe_event);
> 
>   /*
> --
> 2.20.1

Reviewed-by: Michael Kelley 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v5 02/16] x86/hyperv: detect if Linux is the root partition

2021-01-25 Thread Michael Kelley via Virtualization
From: Wei Liu  Sent: Wednesday, January 20, 2021 4:01 AM
> 
> For now we can use the privilege flag to check. Stash the value to be
> used later.
> 
> Put in a bunch of defines for future use when we want to have more
> fine-grained detection.
> 
> Signed-off-by: Wei Liu 
> ---
> v3: move hv_root_partition to mshyperv.c
> ---
>  arch/x86/include/asm/hyperv-tlfs.h | 10 ++
>  arch/x86/include/asm/mshyperv.h|  2 ++
>  arch/x86/kernel/cpu/mshyperv.c | 20 
>  3 files changed, 32 insertions(+)
> 
> diff --git a/arch/x86/include/asm/hyperv-tlfs.h 
> b/arch/x86/include/asm/hyperv-tlfs.h
> index 6bf42aed387e..204010350604 100644
> --- a/arch/x86/include/asm/hyperv-tlfs.h
> +++ b/arch/x86/include/asm/hyperv-tlfs.h
> @@ -21,6 +21,7 @@
>  #define HYPERV_CPUID_FEATURES0x4003
>  #define HYPERV_CPUID_ENLIGHTMENT_INFO0x4004
>  #define HYPERV_CPUID_IMPLEMENT_LIMITS0x4005
> +#define HYPERV_CPUID_CPU_MANAGEMENT_FEATURES 0x4007
>  #define HYPERV_CPUID_NESTED_FEATURES 0x400A
> 
>  #define HYPERV_CPUID_VIRT_STACK_INTERFACE0x4081
> @@ -110,6 +111,15 @@
>  /* Recommend using enlightened VMCS */
>  #define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED  BIT(14)
> 
> +/*
> + * CPU management features identification.
> + * These are HYPERV_CPUID_CPU_MANAGEMENT_FEATURES.EAX bits.
> + */
> +#define HV_X64_START_LOGICAL_PROCESSOR   BIT(0)
> +#define HV_X64_CREATE_ROOT_VIRTUAL_PROCESSOR BIT(1)
> +#define HV_X64_PERFORMANCE_COUNTER_SYNC  BIT(2)
> +#define HV_X64_RESERVED_IDENTITY_BIT BIT(31)
> +

I wonder if these bit definitions should go in the asm-generic part of
hyperv-tlfs.h instead of the X64 specific part.  They look very architecture
neutral (in which case the X64 should be dropped from the name
as well).  Of course, they can be moved later when/if we get to that point
and have a firmer understanding of what is and isn't arch neutral.

>  /*
>   * Virtual processor will never share a physical core with another virtual
>   * processor, except for virtual processors that are reported as sibling SMT
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..ac2b0d110f03 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -237,6 +237,8 @@ int hyperv_fill_flush_guest_mapping_list(
>   struct hv_guest_mapping_flush_list *flush,
>   u64 start_gfn, u64 end_gfn);
> 
> +extern bool hv_root_partition;
> +
>  #ifdef CONFIG_X86_64
>  void hv_apic_init(void);
>  void __init hv_init_spinlocks(void);
> diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
> index f628e3dc150f..c376d191a260 100644
> --- a/arch/x86/kernel/cpu/mshyperv.c
> +++ b/arch/x86/kernel/cpu/mshyperv.c
> @@ -32,6 +32,10 @@
>  #include 
>  #include 
> 
> +/* Is Linux running as the root partition? */
> +bool hv_root_partition;
> +EXPORT_SYMBOL_GPL(hv_root_partition);
> +
>  struct ms_hyperv_info ms_hyperv;
>  EXPORT_SYMBOL_GPL(ms_hyperv);
> 
> @@ -237,6 +241,22 @@ static void __init ms_hyperv_init_platform(void)
>   pr_debug("Hyper-V: max %u virtual processors, %u logical processors\n",
>ms_hyperv.max_vp_index, ms_hyperv.max_lp_index);
> 
> + /*
> +  * Check CPU management privilege.
> +  *
> +  * To mirror what Windows does we should extract CPU management
> +  * features and use the ReservedIdentityBit to detect if Linux is the
> +  * root partition. But that requires negotiating CPU management
> +  * interface (a process to be finalized).
> +  *
> +  * For now, use the privilege flag as the indicator for running as
> +  * root.
> +  */
> + if (cpuid_ebx(HYPERV_CPUID_FEATURES) & HV_CPU_MANAGEMENT) {

Should the EBX value be captured in the ms_hyperv structure with the
other similar values, and then used from there?

Michael

> + hv_root_partition = true;
> + pr_info("Hyper-V: running as root partition\n");
> + }
> +
>   /*
>* Extract host information.
>*/
> --
> 2.20.1


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v4 07/15] x86/paravirt: switch time pvops functions to use static_call()

2021-01-24 Thread Michael Kelley via Virtualization
From: Juergen Gross  Sent: Wednesday, January 20, 2021 5:56 AM
> 
> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Signed-off-by: Juergen Gross 
> ---
> V4:
> - drop paravirt_time.h again
> - don't move Hyper-V code (Michael Kelley)
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   |  2 +-
>  arch/x86/include/asm/paravirt.h   | 17 ++---
>  arch/x86/include/asm/paravirt_types.h |  6 --
>  arch/x86/kernel/cpu/vmware.c  |  5 +++--
>  arch/x86/kernel/kvm.c |  2 +-
>  arch/x86/kernel/kvmclock.c|  2 +-
>  arch/x86/kernel/paravirt.c| 16 
>  arch/x86/kernel/tsc.c |  2 +-
>  arch/x86/xen/time.c   | 11 ---
>  drivers/clocksource/hyperv_timer.c|  5 +++--
>  drivers/xen/time.c|  2 +-
>  12 files changed, 42 insertions(+), 29 deletions(-)
> 

[snip]

> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index 30f76b966857..b4ee331d29a7 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -63,7 +63,7 @@ typedef int (*hyperv_fill_flush_list_func)(
>  static __always_inline void hv_setup_sched_clock(void *sched_clock)
>  {
>  #ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> + paravirt_set_sched_clock(sched_clock);
>  #endif
>  }
> 

This looks fine.

[snip]

> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..bf3bf20bc6bd 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -18,6 +18,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_tsc);
> + paravirt_set_sched_clock(read_hv_sched_clock_tsc);
> 
>   return true;
>  }
> @@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_msr);
> + static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
>  }
>  EXPORT_SYMBOL_GPL(hv_init_clocksource);

The changes to hyperv_timer.c aren't needed and shouldn't be
there, so as to preserve hyperv_timer.c as architecture neutral.  With
your update to hv_setup_sched_clock() in mshyperv.h, the original
code works correctly.  While there are two call sites for
hv_setup_sched_clock(), only one is called.  And once the sched clock
function is set, it is never changed or overridden.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 06/15] x86/paravirt: switch time pvops functions to use static_call()

2020-12-17 Thread Michael Kelley via Virtualization
From: Juergen Gross  Sent: Thursday, December 17, 2020 1:31 AM

> The time pvops functions are the only ones left which might be
> used in 32-bit mode and which return a 64-bit value.
> 
> Switch them to use the static_call() mechanism instead of pvops, as
> this allows quite some simplification of the pvops implementation.
> 
> Due to include hell this requires to split out the time interfaces
> into a new header file.
> 
> Signed-off-by: Juergen Gross 
> ---
>  arch/x86/Kconfig  |  1 +
>  arch/x86/include/asm/mshyperv.h   | 11 
>  arch/x86/include/asm/paravirt.h   | 14 --
>  arch/x86/include/asm/paravirt_time.h  | 38 +++
>  arch/x86/include/asm/paravirt_types.h |  6 -
>  arch/x86/kernel/cpu/vmware.c  |  5 ++--
>  arch/x86/kernel/kvm.c |  3 ++-
>  arch/x86/kernel/kvmclock.c|  3 ++-
>  arch/x86/kernel/paravirt.c| 16 ---
>  arch/x86/kernel/tsc.c |  3 ++-
>  arch/x86/xen/time.c   | 12 -
>  drivers/clocksource/hyperv_timer.c|  5 ++--
>  drivers/xen/time.c|  3 ++-
>  kernel/sched/sched.h  |  1 +
>  14 files changed, 71 insertions(+), 50 deletions(-)
>  create mode 100644 arch/x86/include/asm/paravirt_time.h
>

[snip]
 
> diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
> index ffc289992d1b..45942d420626 100644
> --- a/arch/x86/include/asm/mshyperv.h
> +++ b/arch/x86/include/asm/mshyperv.h
> @@ -56,17 +56,6 @@ typedef int (*hyperv_fill_flush_list_func)(
>  #define hv_get_raw_timer() rdtsc_ordered()
>  #define hv_get_vector() HYPERVISOR_CALLBACK_VECTOR
> 
> -/*
> - * Reference to pv_ops must be inline so objtool
> - * detection of noinstr violations can work correctly.
> - */
> -static __always_inline void hv_setup_sched_clock(void *sched_clock)
> -{
> -#ifdef CONFIG_PARAVIRT
> - pv_ops.time.sched_clock = sched_clock;
> -#endif
> -}
> -
>  void hyperv_vector_handler(struct pt_regs *regs);
> 
>  static inline void hv_enable_stimer0_percpu_irq(int irq) {}

[snip]

> diff --git a/drivers/clocksource/hyperv_timer.c 
> b/drivers/clocksource/hyperv_timer.c
> index ba04cb381cd3..1ed79993fc50 100644
> --- a/drivers/clocksource/hyperv_timer.c
> +++ b/drivers/clocksource/hyperv_timer.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  static struct clock_event_device __percpu *hv_clock_event;
>  static u64 hv_sched_clock_offset __ro_after_init;
> @@ -445,7 +446,7 @@ static bool __init hv_init_tsc_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_tsc, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_tsc);
> + paravirt_set_sched_clock(read_hv_sched_clock_tsc);
> 
>   return true;
>  }
> @@ -470,6 +471,6 @@ void __init hv_init_clocksource(void)
>   clocksource_register_hz(&hyperv_cs_msr, NSEC_PER_SEC/100);
> 
>   hv_sched_clock_offset = hv_read_reference_counter();
> - hv_setup_sched_clock(read_hv_sched_clock_msr);
> + static_call_update(pv_sched_clock, read_hv_sched_clock_msr);
>  }
>  EXPORT_SYMBOL_GPL(hv_init_clocksource);

These Hyper-V changes are problematic as we want to keep hyperv_timer.c
architecture independent.  While only the code for x86/x64 is currently
accepted upstream, code for ARM64 support is in progress.   So we need
to use hv_setup_sched_clock() in hyperv_timer.c, and have the per-arch
implementation in mshyperv.h.

Michael
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


RE: [PATCH v3 4/9] x86/mm/tlb: Flush remote and local TLBs concurrently

2019-07-30 Thread Michael Kelley via Virtualization
From: Nadav Amit  Sent: Thursday, July 18, 2019 5:59 PM
> 
> To improve TLB shootdown performance, flush the remote and local TLBs
> concurrently. Introduce flush_tlb_multi() that does so. Introduce
> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen
> and hyper-v are only compile-tested).
> 
> While the updated smp infrastructure is capable of running a function on
> a single local core, it is not optimized for this case. The multiple
> function calls and the indirect branch introduce some overhead, and
> might make local TLB flushes slower than they were before the recent
> changes.
> 
> Before calling the SMP infrastructure, check if only a local TLB flush
> is needed to restore the lost performance in this common case. This
> requires to check mm_cpumask() one more time, but unless this mask is
> updated very frequently, this should impact performance negatively.
> 
> Cc: "K. Y. Srinivasan" 
> Cc: Haiyang Zhang 
> Cc: Stephen Hemminger 
> Cc: Sasha Levin 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Borislav Petkov 
> Cc: x...@kernel.org
> Cc: Juergen Gross 
> Cc: Paolo Bonzini 
> Cc: Dave Hansen 
> Cc: Andy Lutomirski 
> Cc: Peter Zijlstra 
> Cc: Boris Ostrovsky 
> Cc: linux-hyp...@vger.kernel.org
> Cc: linux-ker...@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: k...@vger.kernel.org
> Cc: xen-de...@lists.xenproject.org
> Signed-off-by: Nadav Amit 
> ---
>  arch/x86/hyperv/mmu.c | 10 +++---
>  arch/x86/include/asm/paravirt.h   |  6 ++--
>  arch/x86/include/asm/paravirt_types.h |  4 +--
>  arch/x86/include/asm/tlbflush.h   |  8 ++---
>  arch/x86/include/asm/trace/hyperv.h   |  2 +-
>  arch/x86/kernel/kvm.c | 11 +--
>  arch/x86/kernel/paravirt.c|  2 +-
>  arch/x86/mm/tlb.c | 47 ++-
>  arch/x86/xen/mmu_pv.c | 11 +++
>  include/trace/events/xen.h|  2 +-
>  10 files changed, 62 insertions(+), 41 deletions(-)
> 

For the Hyper-V parts --
Reviewed-by: Michael Kelley 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization