Re: [PATCH kvmtool v3 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-04-04 Thread Leo Yan
Hi Jean-Philippe,

On Thu, Apr 04, 2019 at 12:06:51PM +0100, Jean-Philippe Brucker wrote:
> On 26/03/2019 07:41, Leo Yan wrote:
> > Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
> > default to disable INTx mode when enable MSI/MSIX mode; but this logic is
> > easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
> > expected and tries to rollback to use INTx mode.  In this case, the INTx
> > mode has been disabled and has no chance to re-enable it, thus both INTx
> > mode and MSI/MSIX mode cannot work in vfio.
> > 
> > Below shows the detailed flow for introducing this issue:
> > 
> >   vfio_pci_configure_dev_irqs()
> > `-> vfio_pci_enable_intx()
> > 
> >   vfio_pci_enable_msis()
> > `-> vfio_pci_disable_intx()
> > 
> >   vfio_pci_disable_msis()   => Guest PCI driver disables MSI
> > 
> > To fix this issue, when disable MSI/MSIX we need to check if INTx mode
> > is available for this device or not; if the device can support INTx then
> > re-enable it so that the device can fallback to use it.
> > 
> > Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
> > may be called for multiple times, this patch uses 'intx_fd == -1' to
> > denote the INTx is disabled, the pair functions can directly bail out
> > when detect INTx has been disabled and enabled respectively.
> > 
> > Suggested-by: Jean-Philippe Brucker 
> > Signed-off-by: Leo Yan 
> > ---
> >  vfio/pci.c | 41 ++---
> >  1 file changed, 30 insertions(+), 11 deletions(-)
> > 
> > diff --git a/vfio/pci.c b/vfio/pci.c
> > index 3c39844..3b2b1e7 100644
> > --- a/vfio/pci.c
> > +++ b/vfio/pci.c
> > @@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
> > msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
> >  
> >  static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device 
> > *vdev);
> > +static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
> >  
> >  static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
> > bool msix)
> > @@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
> > vfio_device *vdev,
> > if (!msi_is_enabled(msis->virt_state))
> > return 0;
> >  
> > -   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> > -   /*
> > -* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> > -* time. Since INTx has to be enabled from the start (we don't
> > -* have a reliable way to know when the user starts using it),
> > -* disable it now.
> > -*/
> > +   /*
> > +* PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> > +* time. Since INTx has to be enabled from the start (after enabling
> > +* 'pdev->intx_fd' will be assigned to an eventfd and doesn't equal
> > +* to the init value -1), disable it now.
> > +*/
> 
> I don't think the comment change is useful, we don't need that much
> detail. The text that you replaced was trying to explain why we enable
> INTx from the start, and would still apply (although it should have been
> s/user/guest/)
> 
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > vfio_pci_disable_intx(kvm, vdev);
> > -   /* Permanently disable INTx */
> > -   pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
> > -   }
> >  
> > eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
> >  
> > @@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, 
> > struct vfio_device *vdev,
> > msi_set_enabled(msis->phys_state, false);
> > msi_set_empty(msis->phys_state, true);
> >  
> > -   return 0;
> > +   /*
> > +* When MSI or MSIX is disabled, this might be called when
> > +* PCI driver detects the MSI interrupt failure and wants to
> > +* rollback to INTx mode.  Thus enable INTx if the device
> > +* supports INTx mode in this case.
> > +*/
> > +   if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> > +   ret = vfio_pci_enable_intx(kvm, vdev);
> > +
> > +   return ret >= 0 ? 0 : ret;
> >  }
> >  
> >  static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device 
> > *vdev,
> > @@ -1002,6 +1009,10 @@ static void vfio_pci_disable_intx(struct kvm *kvm, 
> > struct vfio_device *vdev)
> > .index  = VFIO_PCI_INTX_IRQ_INDEX,
> > };
> >  
> > +   /* INTx mode has been disabled */
> 
> Here as well, the comments on intx_fd seem unnecessary. But these are
> only nits, the code is fine and I tested for regressions on my hardware, so:
> 
> Reviewed-by: Jean-Philippe Brucker 

Thanks for reviewing.  I will spin a new patch set to drop unnecessary
comment to let changes as simple as possible.

Thanks,
Leo Yan
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 06/27] arm64/sve: Clarify role of the VQ map maintenance functions

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:31PM +, Dave Martin wrote:
> The roles of sve_init_vq_map(), sve_update_vq_map() and
> sve_verify_vq_map() are highly non-obvious to anyone who has not dug
> through cpufeatures.c in detail.
> 
> Since the way these functions interact with each other is more
> important here than a full understanding of the cpufeatures code, this
> patch adds comments to make the functions' roles clearer.
> 
> No functional change.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Reviewed-by: Julien Grall 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] This patch is useful for explaining the previous
>patch (as per the v5 ordering), and is anyway non-functional.
>Swapped it with the previous patch to provide a more logical reading
>order for the series.
> ---
>  arch/arm64/kernel/fpsimd.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 62c37f0..f59ea67 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -647,6 +647,10 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, 
> SVE_VQ_MAX))
>   }
>  }
>  
> +/*
> + * Initialise the set of known supported VQs for the boot CPU.
> + * This is called during kernel boot, before secondary CPUs are brought up.
> + */
>  void __init sve_init_vq_map(void)
>  {
>   sve_probe_vqs(sve_vq_map);
> @@ -655,6 +659,7 @@ void __init sve_init_vq_map(void)
>  /*
>   * If we haven't committed to the set of supported VQs yet, filter out
>   * those not supported by the current CPU.
> + * This function is called during the bring-up of early secondary CPUs only.
>   */
>  void sve_update_vq_map(void)
>  {
> @@ -662,7 +667,10 @@ void sve_update_vq_map(void)
>   bitmap_and(sve_vq_map, sve_vq_map, sve_secondary_vq_map, SVE_VQ_MAX);
>  }
>  
> -/* Check whether the current CPU supports all VQs in the committed set */
> +/*
> + * Check whether the current CPU supports all VQs in the committed set.
> + * This function is called during the bring-up of late secondary CPUs only.
> + */
>  int sve_verify_vq_map(void)
>  {
>   int ret = 0;
> -- 
> 2.1.4

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 07/27] arm64/sve: Check SVE virtualisability

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:32PM +, Dave Martin wrote:
> Due to the way the effective SVE vector length is controlled and
> trapped at different exception levels, certain mismatches in the
> sets of vector lengths supported by different physical CPUs in the
> system may prevent straightforward virtualisation of SVE at parity
> with the host.
> 
> This patch analyses the extent to which SVE can be virtualised
> safely without interfering with migration of vcpus between physical
> CPUs, and rejects late secondary CPUs that would erode the
> situation further.
> 
> It is left up to KVM to decide what to do with this information.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> QUESTION: The final structure of this code makes it quite natural to
> clamp the vector length for KVM guests to the maximum value supportable
> across all CPUs; such a value is guaranteed to exist, but may be
> surprisingly small on a given hardware platform.
> 
> Should we be stricter and actually refuse to support KVM at all on such
> hardware?  This may help to force people to fix Linux (or the
> architecture) if/when this issue comes up.

Blocking KVM would be too harsh, since the users of the host may not
care about guests with SVE, but still care about guests.

> 
> For now, I stick with pr_warn() and make do with a limited SVE vector
> length for guests.

I think that's probably the best we can do.

> 
> Changes since v5:
> 
>  * pr_info() about the presence of unvirtualisable vector lengths in
>sve_setup() upgrade to pr_warn(), for consistency with
>sve_verify_vq_map().
> ---
>  arch/arm64/include/asm/fpsimd.h |  1 +
>  arch/arm64/kernel/cpufeature.c  |  2 +-
>  arch/arm64/kernel/fpsimd.c  | 86 
> ++---
>  3 files changed, 73 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index dd1ad39..964adc9 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -87,6 +87,7 @@ extern void sve_kernel_enable(const struct 
> arm64_cpu_capabilities *__unused);
>  extern u64 read_zcr_features(void);
>  
>  extern int __ro_after_init sve_max_vl;
> +extern int __ro_after_init sve_max_virtualisable_vl;
>  
>  #ifdef CONFIG_ARM64_SVE
>  
> diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
> index 4061de1..7f8cc51 100644
> --- a/arch/arm64/kernel/cpufeature.c
> +++ b/arch/arm64/kernel/cpufeature.c
> @@ -1863,7 +1863,7 @@ static void verify_sve_features(void)
>   unsigned int len = zcr & ZCR_ELx_LEN_MASK;
>  
>   if (len < safe_len || sve_verify_vq_map()) {
> - pr_crit("CPU%d: SVE: required vector length(s) missing\n",
> + pr_crit("CPU%d: SVE: vector length support mismatch\n",
>   smp_processor_id());
>   cpu_die_early();
>   }
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index f59ea67..b219796a 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -18,6 +18,7 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -48,6 +49,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #define FPEXC_IOF(1 << 0)
>  #define FPEXC_DZF(1 << 1)
> @@ -130,14 +132,18 @@ static int sve_default_vl = -1;
>  
>  /* Maximum supported vector length across all CPUs (initially poisoned) */
>  int __ro_after_init sve_max_vl = SVE_VL_MIN;
> +int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN;
>  /* Set of available vector lengths, as vq_to_bit(vq): */
>  static __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +/* Set of vector lengths present on at least one cpu: */
> +static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
>  static void __percpu *efi_sve_state;
>  
>  #else /* ! CONFIG_ARM64_SVE */
>  
>  /* Dummy declaration for code that will be optimised out: */
>  extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +extern __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
>  extern void __percpu *efi_sve_state;
>  
>  #endif /* ! CONFIG_ARM64_SVE */
> @@ -623,12 +629,6 @@ int sve_get_current_vl(void)
>   return sve_prctl_status(0);
>  }
>  
> -/*
> - * Bitmap for temporary storage of the per-CPU set of supported vector 
> lengths
> - * during secondary boot.
> - */
> -static DECLARE_BITMAP(sve_secondary_vq_map, SVE_VQ_MAX);
> -
>  static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
>  {
>   unsigned int vq, vl;
> @@ -654,6 +654,7 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
>  void __init sve_init_vq_map(void)
>  {
>   sve_probe_vqs(sve_vq_map);
> + bitmap_copy(sve_vq_partial_map, sve_vq_map, SVE_VQ_MAX);
>  }
>  
>  /*
> @@ -663,8 +664,11 @@ void __init sve_init_vq_map(void)
>   */
>  void sve_update_vq_map(void)
>  {
> - sve_probe_vqs(sve_secondary_vq_map);
> -

Re: [PATCH v7 09/27] KVM: arm64: Add a vcpu flag to control SVE visibility for the guest

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:34PM +, Dave Martin wrote:
> Since SVE will be enabled or disabled on a per-vcpu basis, a flag
> is needed in order to track which vcpus have it enabled.
> 
> This patch adds a suitable flag and a helper for checking it.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Tested-by: zhang.lei 
> ---
>  arch/arm64/include/asm/kvm_host.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 6d10100..ad4f7f0 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -328,6 +328,10 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_FP_HOST(1 << 2) /* host FP regs loaded */
>  #define KVM_ARM64_HOST_SVE_IN_USE(1 << 3) /* backup for host TIF_SVE */
>  #define KVM_ARM64_HOST_SVE_ENABLED   (1 << 4) /* SVE enabled for EL0 */
> +#define KVM_ARM64_GUEST_HAS_SVE  (1 << 5) /* SVE exposed to 
> guest */
> +
> +#define vcpu_has_sve(vcpu) (system_supports_sve() && \
> + ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
>  
>  #define vcpu_gp_regs(v)  (&(v)->arch.ctxt.gp_regs)
>  
> -- 
> 2.1.4
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 10/27] KVM: arm64: Propagate vcpu into read_id_reg()

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:35PM +, Dave Martin wrote:
> Architecture features that are conditionally visible to the guest
> will require run-time checks in the ID register accessor functions.
> In particular, read_id_reg() will need to perform checks in order
> to generate the correct emulated value for certain ID register
> fields such as ID_AA64PFR0_EL1.SVE for example.
> 
> This patch propagates vcpu into read_id_reg() so that future
> patches can add run-time checks on the guest configuration here.
> 
> For now, there is no functional change.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Tested-by: zhang.lei 
> ---
>  arch/arm64/kvm/sys_regs.c | 23 +--
>  1 file changed, 13 insertions(+), 10 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 14/27] KVM: Allow 2048-bit register access via ioctl interface

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:39PM +, Dave Martin wrote:
> The Arm SVE architecture defines registers that are up to 2048 bits
> in size (with some possibility of further future expansion).
> 
> In order to avoid the need for an excessively large number of
> ioctls when saving and restoring a vcpu's registers, this patch
> adds a #define to make support for individual 2048-bit registers
> through the KVM_{GET,SET}_ONE_REG ioctl interface official.  This
> will allow each SVE register to be accessed in a single call.
> 
> There are sufficient spare bits in the register id size field for
> this change, so there is no ABI impact, providing that
> KVM_GET_REG_LIST does not enumerate any 2048-bit register unless
> userspace explicitly opts in to the relevant architecture-specific
> features.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Tested-by: zhang.lei 
> ---
>  include/uapi/linux/kvm.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 6d4ea4b..dc77a5a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1145,6 +1145,7 @@ struct kvm_dirty_tlb {
>  #define KVM_REG_SIZE_U2560x0050ULL
>  #define KVM_REG_SIZE_U5120x0060ULL
>  #define KVM_REG_SIZE_U1024   0x0070ULL
> +#define KVM_REG_SIZE_U2048   0x0080ULL
>  
>  struct kvm_reg_list {
>   __u64 n; /* number of regs */
> -- 
> 2.1.4
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 27/27] KVM: arm64/sve: Document KVM API extensions for SVE

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:52PM +, Dave Martin wrote:
> This patch adds sections to the KVM API documentation describing
> the extensions for supporting the Scalable Vector Extension (SVE)
> in guests.
> 
> Signed-off-by: Dave Martin 
> 
> ---
> 
> Changes since v5:
> 
>  * Document KVM_ARM_VCPU_FINALIZE and its interactions with SVE.
> ---
>  Documentation/virtual/kvm/api.txt | 132 
> +-
>  1 file changed, 129 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index cd920dd..68509de 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1873,6 +1873,7 @@ Parameters: struct kvm_one_reg (in)
>  Returns: 0 on success, negative value on failure
>  Errors:
>    ENOENT:   no such register
> +  EPERM:    register access forbidden for architecture-dependent reasons
>    EINVAL:   other errors, such as bad size encoding for a known register
>  
>  struct kvm_one_reg {
> @@ -2127,13 +2128,20 @@ Specifically:
>0x6030  0010 004c SPSR_UND64  spsr[KVM_SPSR_UND]
>0x6030  0010 004e SPSR_IRQ64  spsr[KVM_SPSR_IRQ]
>0x6060  0010 0050 SPSR_FIQ64  spsr[KVM_SPSR_FIQ]
> -  0x6040  0010 0054 V0 128  fp_regs.vregs[0]
> -  0x6040  0010 0058 V1 128  fp_regs.vregs[1]
> +  0x6040  0010 0054 V0 128  fp_regs.vregs[0](*)
> +  0x6040  0010 0058 V1 128  fp_regs.vregs[1](*)
>  ...
> -  0x6040  0010 00d0 V31128  fp_regs.vregs[31]
> +  0x6040  0010 00d0 V31128  fp_regs.vregs[31]   (*)
>0x6020  0010 00d4 FPSR32  fp_regs.fpsr
>0x6020  0010 00d5 FPCR32  fp_regs.fpcr
>  
> +(*) These encodings are not accepted for SVE-enabled vcpus.  See
> +KVM_ARM_VCPU_INIT.
> +
> +The equivalent register content can be accessed via bits [127:0] of
> +the corresponding SVE Zn registers instead for vcpus that have SVE
> +enabled (see below).
> +
>  arm64 CCSIDR registers are demultiplexed by CSSELR value:
>0x6020  0011 00 
>  
> @@ -2143,6 +2151,61 @@ arm64 system registers have the following id bit 
> patterns:
>  arm64 firmware pseudo-registers have the following bit pattern:
>0x6030  0014 
>  
> +arm64 SVE registers have the following bit patterns:
> +  0x6080  0015 00 Zn bits[2048*slice + 2047 : 
> 2048*slice]
> +  0x6050  0015 04 Pn bits[256*slice + 255 : 256*slice]
> +  0x6050  0015 060 FFR bits[256*slice + 255 : 256*slice]
> +  0x6060  0015  KVM_REG_ARM64_SVE_VLS pseudo-register
> +
> +Access to slices beyond the maximum vector length configured for the
> +vcpu (i.e., where 16 * slice >= max_vq (**)) will fail with ENOENT.

I've been doing pretty well keeping track of the 16's, 128's, VL's and
VQ's, but this example lost me. Also, should it be >= or just > ?

> +
> +These registers are only accessible on vcpus for which SVE is enabled.
> +See KVM_ARM_VCPU_INIT for details.
> +
> +In addition, except for KVM_REG_ARM64_SVE_VLS, these registers are not
> +accessible until the vcpu's SVE configuration has been finalized
> +using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).  See KVM_ARM_VCPU_INIT
> +and KVM_ARM_VCPU_FINALIZE for more information about this procedure.
> +
> +KVM_REG_ARM64_SVE_VLS is a pseudo-register that allows the set of vector
> +lengths supported by the vcpu to be discovered and configured by
> +userspace.  When transferred to or from user memory via KVM_GET_ONE_REG
> +or KVM_SET_ONE_REG, the value of this register is of type __u64[8], and
> +encodes the set of vector lengths as follows:
> +
> +__u64 vector_lengths[8];
> +
> +if (vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX &&
> +((vector_lengths[(vq - 1) / 64] >> ((vq - 1) % 64)) & 1))
> + /* Vector length vq * 16 bytes supported */
> +else
> + /* Vector length vq * 16 bytes not supported */
> +
> +(**) The maximum value vq for which the above condition is true is
> +max_vq.  This is the maximum vector length available to the guest on
> +this vcpu, and determines which register slices are visible through
> +this ioctl interface.
> +
> +(See Documentation/arm64/sve.txt for an explanation of the "vq"
> +nomenclature.)
> +
> +KVM_REG_ARM64_SVE_VLS is only accessible after KVM_ARM_VCPU_INIT.
> +KVM_ARM_VCPU_INIT initialises it to the best set of vector lengths that
> +the host supports.
> +
> +Userspace may subsequently modify it if desired until the vcpu's SVE
> +configuration is finalized using KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
> +
> +Apart from simply removing all vector lengths from the host set that
> +exceed some value, support for arbitrarily chosen sets of vector lengths
> +is hardware-dependent and may not be available.  Attempting to configure
> +an invalid set of vector lengths via KVM_SET_ONE_REG will fail with
> +EINVAL.
> +
> +After the vcpu's SVE configuration is finalized, further 

Re: [PATCH v7 25/27] KVM: arm64: Add a capability to advertise SVE support

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:50PM +, Dave Martin wrote:
> To provide a uniform way to check for KVM SVE support amongst other
> features, this patch adds a suitable capability KVM_CAP_ARM_SVE,
> and reports it as present when SVE is available.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] Strip out has_vhe() sanity-check, which wasn't in
>the most logical place, and anyway doesn't really belong in this
>patch.
> 
>Moved to KVM: arm64/sve: Allow userspace to enable SVE for vcpus
>instead.
> ---
>  arch/arm64/kvm/reset.c   | 3 +++
>  include/uapi/linux/kvm.h | 1 +
>  2 files changed, 4 insertions(+)
> 
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index 32c5ac0..f13378d 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -98,6 +98,9 @@ int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long 
> ext)
>   case KVM_CAP_ARM_VM_IPA_SIZE:
>   r = kvm_ipa_limit;
>   break;
> + case KVM_CAP_ARM_SVE:
> + r = system_supports_sve();
> + break;
>   default:
>   r = 0;
>   }
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index c3b8e7a..1d56444 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -988,6 +988,7 @@ struct kvm_ppc_resize_hpt {
>  #define KVM_CAP_ARM_VM_IPA_SIZE 165
>  #define KVM_CAP_MANUAL_DIRTY_LOG_PROTECT 166
>  #define KVM_CAP_HYPERV_CPUID 167
> +#define KVM_CAP_ARM_SVE 168
>  
>  #ifdef KVM_CAP_IRQ_ROUTING
>  
> -- 
> 2.1.4
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 24/27] KVM: arm64/sve: Allow userspace to enable SVE for vcpus

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:49PM +, Dave Martin wrote:
> Now that all the pieces are in place, this patch offers a new flag
> KVM_ARM_VCPU_SVE that userspace can pass to KVM_ARM_VCPU_INIT to
> turn on SVE for the guest, on a per-vcpu basis.
> 
> As part of this, support for initialisation and reset of the SVE
> vector length set and registers is added in the appropriate places,
> as well as finally setting the KVM_ARM64_GUEST_HAS_SVE vcpu flag,
> to turn on the SVE support code.
> 
> Allocation of the SVE register storage in vcpu->arch.sve_state is
> deferred until the SVE configuration is finalized, by which time
> the size of the registers is known.
> 
> Setting the vector lengths supported by the vcpu is considered
> configuration of the emulated hardware rather than runtime
> configuration, so no support is offered for changing the vector
> lengths available to an existing vcpu across reset.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v6:
> 
>  * [Kristina Martšenko] Amend comments explaining the
>kvm_arm_vcpu_sve_finalized() versus !kvm_arm_vcpu_sve_finalized()
>cases in kvm_reset_vcpu().
> 
>Actually, I've just deleted the comments, since if anything they're
>more confusing than the code they're supposed to describe.
> 
> Changes since v5:
> 
>  * Refactored to make the code flow clearer and clarify responsiblity
>for the various initialisation phases/checks.
> 
>In place of the previous, confusingly dual-purpose kvm_reset_sve(),
>enabling and resetting of SVE are split into separate functions and
>called as appropriate from kvm_reset_vcpu().
> 
>To avoid interactions with preempt_disable(), memory allocation is
>done in the kvm_vcpu_first_fun_init() path instead.  To achieve
>this, the SVE memory allocation is moved to kvm_arm_vcpu_finalize(),
>which now takes on the role of actually doing deferred setup instead
>of just setting a flag to indicate that the setup was done.
> 
>  * Add has_vhe() sanity-check into kvm_vcpu_enable_sve(), since it
>makes more sense here than when resetting the vcpu.
> 
>  * When checking for SVE finalization in kvm_reset_vcpu(), call the new
>SVE-specific function kvm_arm_vcpu_sve_finalized().  The new generic
>check kvm_arm_vcpu_is_finalized() is unnecessarily broad here: using
>the appropriate specific check makes the code more self-describing.
> 
>  * Definition of KVM_ARM_VCPU_SVE moved to KVM: arm64/sve: Add pseudo-
>register for the guest's vector lengths (which needs it for the
>KVM_ARM_VCPU_FINALIZE ioctl).
> ---
>  arch/arm64/include/asm/kvm_host.h |  3 +--
>  arch/arm64/kvm/reset.c| 43 
> ++-
>  2 files changed, 43 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 5475cc4..9d57cf8 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -49,8 +49,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> -/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
> -#define KVM_VCPU_MAX_FEATURES 4
> +#define KVM_VCPU_MAX_FEATURES 5
>  
>  #define KVM_REQ_SLEEP \
>   KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
> diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
> index e7f9c06..32c5ac0 100644
> --- a/arch/arm64/kvm/reset.c
> +++ b/arch/arm64/kvm/reset.c
> @@ -20,10 +20,12 @@
>   */
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  #include 
> @@ -37,6 +39,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /* Maximum phys_shift supported for any VM on this host */
>  static u32 kvm_ipa_limit;
> @@ -130,6 +133,27 @@ int kvm_arm_init_arch_resources(void)
>   return 0;
>  }
>  
> +static int kvm_vcpu_enable_sve(struct kvm_vcpu *vcpu)
> +{
> + if (!system_supports_sve())
> + return -EINVAL;
> +
> + /* Verify that KVM startup enforced this when SVE was detected: */
> + if (WARN_ON(!has_vhe()))
> + return -EINVAL;
> +
> + vcpu->arch.sve_max_vl = kvm_sve_max_vl;
> +
> + /*
> +  * Userspace can still customize the vector lengths by writing
> +  * KVM_REG_ARM64_SVE_VLS.  Allocation is deferred until
> +  * kvm_arm_vcpu_finalize(), which freezes the configuration.
> +  */
> + vcpu->arch.flags |= KVM_ARM64_GUEST_HAS_SVE;
> +
> + return 0;
> +}
> +
>  /*
>   * Finalize vcpu's maximum SVE vector length, allocating
>   * vcpu->arch.sve_state as necessary.
> @@ -188,13 +212,20 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
>   kfree(vcpu->arch.sve_state);
>  }
>  
> +static void kvm_vcpu_reset_sve(struct kvm_vcpu *vcpu)
> +{
> + if (vcpu_has_sve(vcpu))
> + memset(vcpu->arch.sve_state, 0, vcpu_sve_state_size(vcpu));
> +}
> +
>  /**
>   * 

Re: [PATCH v7 23/27] KVM: arm64/sve: Add pseudo-register for the guest's vector lengths

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:48PM +, Dave Martin wrote:
> This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> allow userspace to set and query the set of vector lengths visible
> to the guest.
> 
> In the future, multiple register slices per SVE register may be
> visible through the ioctl interface.  Once the set of slices has
> been determined we would not be able to allow the vector length set
> to be changed any more, in order to avoid userspace seeing
> inconsistent sets of registers.  For this reason, this patch adds
> support for explicit finalization of the SVE configuration via the
> KVM_ARM_VCPU_FINALIZE ioctl.
> 
> Finalization is the proper place to allocate the SVE register state
> storage in vcpu->arch.sve_state, so this patch adds that as
> appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> was previously a no-op on arm64.
> 
> To simplify the logic for determining what vector lengths can be
> supported, some code is added to KVM init to work this out, in the
> kvm_arm_init_arch_resources() hook.
> 
> The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> Subsequent patches will allow SVE to be turned on for guest vcpus,
> making it visible.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] Delete overzealous BUILD_BUG_ON() checks.
>It also turns out that these could cause kernel build failures in
>some configurations, even though the checked condition is compile-
>time constant.
> 
>Because of the way the affected functions are called, the checks
>are superfluous, so the simplest option is simply to get rid of
>them.
> 
>  * [Julien Thierry] Free vcpu->arch.sve_state (if any) in
>kvm_arch_vcpu_uninit() (which is currently a no-op).
> 
>This was accidentally lost during a previous rebase.
> 
>  * Add kvm_arm_init_arch_resources() hook, and use it to probe SVE
>configuration for KVM, to avoid duplicating the logic elsewhere.
>We only need to do this once.
> 
>  * Move sve_state buffer allocation to kvm_arm_vcpu_finalize().
> 
>As well as making the code more straightforward, this avoids the
>need to allocate memory in kvm_reset_vcpu(), the meat of which is
>non-preemptible since commit 358b28f09f0a ("arm/arm64: KVM: Allow a
>VCPU to fully reset itself").
> 
>The refactoring means that if this has not been done by the time
>we hit KVM_RUN, then this allocation will happen on the
>kvm_arm_first_run_init() path, where preemption remains enabled.
> 
>  * Add a couple of comments in {get,set}_sve_reg() to make the handling
>of the KVM_REG_ARM64_SVE_VLS special case a little clearer.
> 
>  * Move mis-split rework to avoid put_user() being the correct size
>by accident in KVM_GET_REG_LIST to KVM: arm64: Enumerate SVE register
>indices for KVM_GET_REG_LIST.
> 
>  * Fix wrong WARN_ON() check sense when checking whether the
>implementation may needs move SVE register slices than KVM can
>support.
> 
>  * Fix erroneous setting of vcpu->arch.sve_max_vl based on stale loop
>control veriable vq.
> 
>  * Move definition of KVM_ARM_VCPU_SVE from KVM: arm64/sve: Allow
>userspace to enable SVE for vcpus.
> 
>  * Migrate to explicit finalization of the SVE configuration, using
>KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
> ---
>  arch/arm64/include/asm/kvm_host.h |  15 +++--
>  arch/arm64/include/uapi/asm/kvm.h |   5 ++
>  arch/arm64/kvm/guest.c| 114 
> +-
>  arch/arm64/kvm/reset.c|  89 +
>  4 files changed, 215 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 98658f7..5475cc4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -23,7 +23,6 @@
>  #define __ARM64_KVM_HOST_H__
>  
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -50,6 +49,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> +/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
>  #define KVM_VCPU_MAX_FEATURES 4
>  
>  #define KVM_REQ_SLEEP \
> @@ -59,10 +59,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> -static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +extern unsigned int kvm_sve_max_vl;
> +int kvm_arm_init_arch_resources(void);
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t 
> idmap_start);
>  
> @@ -353,6 +355,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_HOST_SVE_IN_USE(1 << 3) /* backup for host TIF_SVE */
>  #define KVM_ARM64_HOST_SVE_ENABLED   (1 << 4) /* SVE enabled for 

Re: [PATCH v7 23/27] KVM: arm64/sve: Add pseudo-register for the guest's vector lengths

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:48PM +, Dave Martin wrote:
> This patch adds a new pseudo-register KVM_REG_ARM64_SVE_VLS to
> allow userspace to set and query the set of vector lengths visible
> to the guest.
> 
> In the future, multiple register slices per SVE register may be
> visible through the ioctl interface.  Once the set of slices has
> been determined we would not be able to allow the vector length set
> to be changed any more, in order to avoid userspace seeing
> inconsistent sets of registers.  For this reason, this patch adds
> support for explicit finalization of the SVE configuration via the
> KVM_ARM_VCPU_FINALIZE ioctl.
> 
> Finalization is the proper place to allocate the SVE register state
> storage in vcpu->arch.sve_state, so this patch adds that as
> appropriate.  The data is freed via kvm_arch_vcpu_uninit(), which
> was previously a no-op on arm64.
> 
> To simplify the logic for determining what vector lengths can be
> supported, some code is added to KVM init to work this out, in the
> kvm_arm_init_arch_resources() hook.
> 
> The KVM_REG_ARM64_SVE_VLS pseudo-register is not exposed yet.
> Subsequent patches will allow SVE to be turned on for guest vcpus,
> making it visible.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] Delete overzealous BUILD_BUG_ON() checks.
>It also turns out that these could cause kernel build failures in
>some configurations, even though the checked condition is compile-
>time constant.
> 
>Because of the way the affected functions are called, the checks
>are superfluous, so the simplest option is simply to get rid of
>them.
> 
>  * [Julien Thierry] Free vcpu->arch.sve_state (if any) in
>kvm_arch_vcpu_uninit() (which is currently a no-op).
> 
>This was accidentally lost during a previous rebase.
> 
>  * Add kvm_arm_init_arch_resources() hook, and use it to probe SVE
>configuration for KVM, to avoid duplicating the logic elsewhere.
>We only need to do this once.
> 
>  * Move sve_state buffer allocation to kvm_arm_vcpu_finalize().
> 
>As well as making the code more straightforward, this avoids the
>need to allocate memory in kvm_reset_vcpu(), the meat of which is
>non-preemptible since commit 358b28f09f0a ("arm/arm64: KVM: Allow a
>VCPU to fully reset itself").
> 
>The refactoring means that if this has not been done by the time
>we hit KVM_RUN, then this allocation will happen on the
>kvm_arm_first_run_init() path, where preemption remains enabled.
> 
>  * Add a couple of comments in {get,set}_sve_reg() to make the handling
>of the KVM_REG_ARM64_SVE_VLS special case a little clearer.
> 
>  * Move mis-split rework to avoid put_user() being the correct size
>by accident in KVM_GET_REG_LIST to KVM: arm64: Enumerate SVE register
>indices for KVM_GET_REG_LIST.
> 
>  * Fix wrong WARN_ON() check sense when checking whether the
>implementation may needs move SVE register slices than KVM can
>support.
> 
>  * Fix erroneous setting of vcpu->arch.sve_max_vl based on stale loop
>control veriable vq.
> 
>  * Move definition of KVM_ARM_VCPU_SVE from KVM: arm64/sve: Allow
>userspace to enable SVE for vcpus.
> 
>  * Migrate to explicit finalization of the SVE configuration, using
>KVM_ARM_VCPU_FINALIZE(KVM_ARM_VCPU_SVE).
> ---
>  arch/arm64/include/asm/kvm_host.h |  15 +++--
>  arch/arm64/include/uapi/asm/kvm.h |   5 ++
>  arch/arm64/kvm/guest.c| 114 
> +-
>  arch/arm64/kvm/reset.c|  89 +
>  4 files changed, 215 insertions(+), 8 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 98658f7..5475cc4 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -23,7 +23,6 @@
>  #define __ARM64_KVM_HOST_H__
>  
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -50,6 +49,7 @@
>  
>  #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS
>  
> +/* Will be incremented when KVM_ARM_VCPU_SVE is fully implemented: */
>  #define KVM_VCPU_MAX_FEATURES 4
>  
>  #define KVM_REQ_SLEEP \
> @@ -59,10 +59,12 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> -static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +extern unsigned int kvm_sve_max_vl;
> +int kvm_arm_init_arch_resources(void);
>  
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> +void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
>  void __extended_idmap_trampoline(phys_addr_t boot_pgd, phys_addr_t 
> idmap_start);
>  
> @@ -353,6 +355,7 @@ struct kvm_vcpu_arch {
>  #define KVM_ARM64_HOST_SVE_IN_USE(1 << 3) /* backup for host TIF_SVE */
>  #define KVM_ARM64_HOST_SVE_ENABLED   (1 << 4) /* SVE enabled for 

Re: [PATCH v12 4/8] arm64: arm_pmu: Add !VHE support for exclude_host/exclude_guest attributes

2019-04-04 Thread Andrew Murray
On Thu, Apr 04, 2019 at 05:34:00PM +0100, Will Deacon wrote:
> On Thu, Mar 28, 2019 at 10:37:27AM +, Andrew Murray wrote:
> > Add support for the :G and :H attributes in perf by handling the
> > exclude_host/exclude_guest event attributes.
> > 
> > We notify KVM of counters that we wish to be enabled or disabled on
> > guest entry/exit and thus defer from starting or stopping events based
> > on their event attributes.
> > 
> > With !VHE we switch the counters between host/guest at EL2. We are able
> > to eliminate counters counting host events on the boundaries of guest
> > entry/exit when using :G by filtering out EL2 for exclude_host. When
> > using !exclude_hv there is a small blackout window at the guest
> > entry/exit where host events are not captured.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/kernel/perf_event.c | 43 --
> >  1 file changed, 36 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index cccf4fc86d67..6bb28aaf5aea 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -26,6 +26,7 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -528,11 +529,21 @@ static inline int armv8pmu_enable_counter(int idx)
> >  
> >  static inline void armv8pmu_enable_event_counter(struct perf_event *event)
> >  {
> > +   struct perf_event_attr *attr = >attr;
> > int idx = event->hw.idx;
> > +   u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> >  
> > -   armv8pmu_enable_counter(idx);
> > if (armv8pmu_event_is_chained(event))
> > -   armv8pmu_enable_counter(idx - 1);
> > +   counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > +
> > +   kvm_set_pmu_events(counter_bits, attr);
> > +
> > +   /* We rely on the hypervisor switch code to enable guest counters */
> > +   if (!kvm_pmu_counter_deferred(attr)) {
> > +   armv8pmu_enable_counter(idx);
> > +   if (armv8pmu_event_is_chained(event))
> > +   armv8pmu_enable_counter(idx - 1);
> > +   }
> >  }
> >  
> >  static inline int armv8pmu_disable_counter(int idx)
> > @@ -545,11 +556,21 @@ static inline int armv8pmu_disable_counter(int idx)
> >  static inline void armv8pmu_disable_event_counter(struct perf_event *event)
> >  {
> > struct hw_perf_event *hwc = >hw;
> > +   struct perf_event_attr *attr = >attr;
> > int idx = hwc->idx;
> > +   u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> >  
> > if (armv8pmu_event_is_chained(event))
> > -   armv8pmu_disable_counter(idx - 1);
> > -   armv8pmu_disable_counter(idx);
> > +   counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > +
> > +   kvm_clr_pmu_events(counter_bits);
> > +
> > +   /* We rely on the hypervisor switch code to disable guest counters */
> > +   if (!kvm_pmu_counter_deferred(attr)) {
> > +   if (armv8pmu_event_is_chained(event))
> > +   armv8pmu_disable_counter(idx - 1);
> > +   armv8pmu_disable_counter(idx);
> > +   }
> >  }
> >  
> >  static inline int armv8pmu_enable_intens(int idx)
> > @@ -829,11 +850,16 @@ static int armv8pmu_set_event_filter(struct 
> > hw_perf_event *event,
> > if (!attr->exclude_kernel)
> > config_base |= ARMV8_PMU_INCLUDE_EL2;
> > } else {
> > -   if (attr->exclude_kernel)
> > -   config_base |= ARMV8_PMU_EXCLUDE_EL1;
> > -   if (!attr->exclude_hv)
> > +   if (!attr->exclude_hv && !attr->exclude_host)
> > config_base |= ARMV8_PMU_INCLUDE_EL2;
> 
> FWIW, that doesn't align with my personal view of what the host is, but
> I'm not going to argue if Marc and Christoffer are happy with it.

Ideally armv8pmu_set_event_filter needs no knowledge of exclude_guest
and exclude_host as we leave the config_base alone and instead rely on
turning the counters on/off at guest entry/exit...

The addition of "&& !attr->exclude_host" here (!VHE) and likewise for VHE in
"arm64: KVM: Enable VHE support for :G/:H perf event modifiers" (in case
you missed it) is to eliminate counters counting host events on the boundaries
of guest entry/exit when counting for guest_only (:G). Thus this is really
an optimisation for more accurate counting.

Consider the case where a user wants to record guest events only - we switch
over to the guest counters at EL2 on world-switch (both VHE and !VHE host) -
now there is a small period of time between when we start the guest counters
and when we actually jump into the guest. This period of time in the host
shouldn't be counted - therefore so long as exclude_host is set we can filter
EL2 to remove these unwanted counts.

Therefore the above hunks likely won't describe what our view of the host is.

> 
> Anyway, thanks for persevering with this:

It's fun even though it hurts my head every time each time I come back to this.

> 
> 

Re: [PATCH v12 2/8] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2019-04-04 Thread Andrew Murray
On Thu, Apr 04, 2019 at 05:14:16PM +0100, Will Deacon wrote:
> On Thu, Apr 04, 2019 at 05:09:58PM +0100, Will Deacon wrote:
> > On Thu, Mar 28, 2019 at 10:37:25AM +, Andrew Murray wrote:
> > > The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
> > > a typedef to kvm_cpu_context and is used to store host cpu context. The
> > > kvm_cpu_context structure is also used elsewhere to hold vcpu context.
> > > In order to use the percpu to hold additional future host information we
> > > encapsulate kvm_cpu_context in a new structure and rename the typedef and
> > > percpu to match.
> > > 
> > > Signed-off-by: Andrew Murray 
> > > ---
> > >  arch/arm/include/asm/kvm_host.h   |  8 ++--
> > >  arch/arm64/include/asm/kvm_asm.h  |  3 ++-
> > >  arch/arm64/include/asm/kvm_host.h | 16 ++--
> > >  arch/arm64/kernel/asm-offsets.c   |  1 +
> > >  virt/kvm/arm/arm.c| 14 --
> > >  5 files changed, 27 insertions(+), 15 deletions(-)
> > 
> > As per usual, I'll need an Ack from Christoffer and/or Marc on this one :)
> 
> Actually... given that most of this series is now firmly in KVM territory,
> it probably makes more sense for the kvm/arm tree to take this on a separate
> branch, which I can also pull into the arm64 perf tree if I run into
> conflicts on the non-kvm parts.

No problem.

Thanks,

Andrew Murray

> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v12 8/8] arm64: docs: document perf event attributes

2019-04-04 Thread Andrew Murray
On Thu, Apr 04, 2019 at 05:21:28PM +0100, Will Deacon wrote:
> On Thu, Mar 28, 2019 at 10:37:31AM +, Andrew Murray wrote:
> > The interaction between the exclude_{host,guest} flags,
> > exclude_{user,kernel,hv} flags and presence of VHE can result in
> > different exception levels being filtered by the ARMv8 PMU. As this
> > can be confusing let's document how they work on arm64.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  Documentation/arm64/perf.txt | 74 
> >  1 file changed, 74 insertions(+)
> >  create mode 100644 Documentation/arm64/perf.txt
> > 
> > diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt
> > new file mode 100644
> > index ..604446c1f720
> > --- /dev/null
> > +++ b/Documentation/arm64/perf.txt
> > @@ -0,0 +1,74 @@
> > +Perf Event Attributes
> > +=
> > +
> > +Author: Andrew Murray 
> > +Date: 2019-03-06
> > +
> > +exclude_user
> > +
> > +
> > +This attribute excludes userspace.
> > +
> > +Userspace always runs at EL0 and thus this attribute will exclude EL0.
> > +
> > +
> > +exclude_kernel
> > +--
> > +
> > +This attribute excludes the kernel.
> > +
> > +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> > +at EL1.
> > +
> > +This attribute will exclude EL1 and additionally EL2 on a VHE system.
> 
> I find this last sentence a bit confusing, because it can be read to imply
> that if you don't set exclude_kernel and you're in a guest on a VHE system,
> then you can profile EL2.

Yes this could be misleading.

However from the perspective of the guest, when exclude_kernel is not set we
do indeed allow the guest to program it's PMU with ARMV8_PMU_INCLUDE_EL2 - and
thus the statement above is correct in terms of what the kernel believes it is
doing.

I think these statements are less confusing if we treat the exception levels
as those 'detected' by the running context (e.g. consider the impact of nested
virt here) - and we if ignore what the hypervisor (KVM) does outside (e.g.
stops counting upon switching between guest/host, translating PMU filters in
kvm_pmu_set_counter_event_type etc, etc). This then makes this document useful
for those wishing to change this logic (which is the intent) rather than those
trying to understand how we filter for EL levels as seen bare-metal.

With regards to the example you gave (exclude_kernel, EL2) - yes we want the
kernel to believe it can count EL2 - because one day we may want to update
KVM to allow the guest to count it's hypervisor overhead (e.g. host kernel
time associated with the guest).

I could write some preface that describes this outlook. Alternatively I could
just spell out what happens on a guest, e.g.

"For the host this attribute will exclude EL1 and additionally EL2 on a VHE
system.

For the guest this attribute will exclude EL1."

Though I'm less comfortable with this, as the last statement "For the guest this
attribute will exclude EL1." describes the product of both
kvm_pmu_set_counter_event_type and armv8pmu_set_event_filter which is confusing
to work out and also makes an assumption that we don't have nested virt (true
for now at least) and also reasons about bare-metal EL levels which probably
aren't that useful for someone changing this logic or understanding what the
flags do for there performance analysis.

Do you have a preference for how this is improved?

> 
> > +exclude_hv
> > +--
> > +
> > +This attribute excludes the hypervisor, we ignore this flag on a VHE system
> > +as we consider the host kernel to be the hypervisor.
> 
> Similar comment as the above: I don't think this makes sense when you look
> at things from the guest perspective.
> 
> > +On a non-VHE system we consider the hypervisor to be any code that runs at
> > +EL2 which is predominantly used for guest/host transitions.
> > +
> > +This attribute will exclude EL2 on a non-VHE system.
> > +
> > +
> > +exclude_host / exclude_guest
> > +
> > +
> > +This attribute excludes the KVM host.
> 
> But there are two attributes...

Oh, I'll have to update this.

Thanks for the review,

Andrew Murray

> 
> Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v8 7/9] KVM: arm/arm64: context-switch ptrauth registers

2019-04-04 Thread Kristina Martsenko
On 02/04/2019 03:27, Amit Daniel Kachhap wrote:
> From: Mark Rutland 
> 
> When pointer authentication is supported, a guest may wish to use it.
> This patch adds the necessary KVM infrastructure for this to work, with
> a semi-lazy context switch of the pointer auth state.

[...]

> diff --git a/arch/arm64/include/asm/kvm_ptrauth_asm.h 
> b/arch/arm64/include/asm/kvm_ptrauth_asm.h
> new file mode 100644
> index 000..65f99e9
> --- /dev/null
> +++ b/arch/arm64/include/asm/kvm_ptrauth_asm.h
> @@ -0,0 +1,106 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/* arch/arm64/include/asm/kvm_ptrauth_asm.h: Guest/host ptrauth save/restore
> + * Copyright 2019 Arm Limited
> + * Author: Mark Rutland 
> + * Amit Daniel Kachhap 
> + */
> +
> +#ifndef __ASM_KVM_PTRAUTH_ASM_H
> +#define __ASM_KVM_PTRAUTH_ASM_H
> +
> +#ifndef __ASSEMBLY__
> +
> +#define __ptrauth_save_key(regs, key)
> \
> +({   
> \
> + regs[key ## KEYLO_EL1] = read_sysreg_s(SYS_ ## key ## KEYLO_EL1);   
> \
> + regs[key ## KEYHI_EL1] = read_sysreg_s(SYS_ ## key ## KEYHI_EL1);   
> \
> +})
> +
> +#define __ptrauth_save_state(ctxt)   
> \
> +({   
> \
> + __ptrauth_save_key(ctxt->sys_regs, APIA);   
> \
> + __ptrauth_save_key(ctxt->sys_regs, APIB);   
> \
> + __ptrauth_save_key(ctxt->sys_regs, APDA);   
> \
> + __ptrauth_save_key(ctxt->sys_regs, APDB);   
> \
> + __ptrauth_save_key(ctxt->sys_regs, APGA);   
> \
> +})
> +
> +#else /* __ASSEMBLY__ */
> +
> +#include 
> +
> +#ifdef   CONFIG_ARM64_PTR_AUTH
> +
> +#define PTRAUTH_REG_OFFSET(x)(x - CPU_APIAKEYLO_EL1)
> +
> +/*
> + * CPU_AP*_EL1 values exceed immediate offset range (512) for stp instruction
> + * so below macros takes CPU_APIAKEYLO_EL1 as base and calculates the offset 
> of
> + * the keys from this base to avoid an extra add instruction. These macros
> + * assumes the keys offsets are aligned in a specific increasing order.
> + */
> +.macro   ptrauth_save_state base, reg1, reg2
> + mrs_s   \reg1, SYS_APIAKEYLO_EL1
> + mrs_s   \reg2, SYS_APIAKEYHI_EL1
> + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
> + mrs_s   \reg1, SYS_APIBKEYLO_EL1
> + mrs_s   \reg2, SYS_APIBKEYHI_EL1
> + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
> + mrs_s   \reg1, SYS_APDAKEYLO_EL1
> + mrs_s   \reg2, SYS_APDAKEYHI_EL1
> + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
> + mrs_s   \reg1, SYS_APDBKEYLO_EL1
> + mrs_s   \reg2, SYS_APDBKEYHI_EL1
> + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
> + mrs_s   \reg1, SYS_APGAKEYLO_EL1
> + mrs_s   \reg2, SYS_APGAKEYHI_EL1
> + stp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
> +.endm
> +
> +.macro   ptrauth_restore_state base, reg1, reg2
> + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIAKEYLO_EL1)]
> + msr_s   SYS_APIAKEYLO_EL1, \reg1
> + msr_s   SYS_APIAKEYHI_EL1, \reg2
> + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APIBKEYLO_EL1)]
> + msr_s   SYS_APIBKEYLO_EL1, \reg1
> + msr_s   SYS_APIBKEYHI_EL1, \reg2
> + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDAKEYLO_EL1)]
> + msr_s   SYS_APDAKEYLO_EL1, \reg1
> + msr_s   SYS_APDAKEYHI_EL1, \reg2
> + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APDBKEYLO_EL1)]
> + msr_s   SYS_APDBKEYLO_EL1, \reg1
> + msr_s   SYS_APDBKEYHI_EL1, \reg2
> + ldp \reg1, \reg2, [\base, #PTRAUTH_REG_OFFSET(CPU_APGAKEYLO_EL1)]
> + msr_s   SYS_APGAKEYLO_EL1, \reg1
> + msr_s   SYS_APGAKEYHI_EL1, \reg2
> +.endm
> +
> +.macro ptrauth_switch_to_guest g_ctxt, reg1, reg2, reg3
> + ldr \reg1, [\g_ctxt, #CPU_HCR_EL2]
> + and \reg1, \reg1, #(HCR_API | HCR_APK)
> + cbz \reg1, 1f
> + add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
> + ptrauth_restore_state   \reg1, \reg2, \reg3
> +1:

Nit: the label in assembly macros is usually a larger number (see
assembler.h or alternative.h for example). I think this is to avoid
future accidents like

cbz x0, 1f
ptrauth_switch_to_guest x1, x2, x3, x4
add x5, x5, x6
1:
...

where the code would incorrectly branch to the label inside
ptrauth_switch_to_guest, instead of the one after it.

Thanks,
Kristina

> +.endm
> +
> +.macro ptrauth_switch_to_host g_ctxt, h_ctxt, reg1, reg2, reg3
> + ldr \reg1, [\g_ctxt, #CPU_HCR_EL2]
> + and \reg1, \reg1, #(HCR_API | HCR_APK)
> + cbz \reg1, 2f
> + add \reg1, \g_ctxt, #CPU_APIAKEYLO_EL1
> + 

[PATCH] KVM: arm/arm64: ensure vcpu target is unset on reset failure

2019-04-04 Thread Andrew Jones
A failed KVM_ARM_VCPU_INIT, should not set the vcpu target,
as the vcpu target is used by kvm_vcpu_initialized() to
determine if other vcpu ioctls may proceed. We need to set
the target before calling kvm_reset_vcpu(), but if that call
fails, we should then unset it.

Signed-off-by: Andrew Jones 
---
 virt/kvm/arm/arm.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
index 99c37384ba7b..7e5724ae1efd 100644
--- a/virt/kvm/arm/arm.c
+++ b/virt/kvm/arm/arm.c
@@ -934,7 +934,7 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level,
 static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
   const struct kvm_vcpu_init *init)
 {
-   unsigned int i;
+   unsigned int i, ret;
int phys_target = kvm_target_cpu();
 
if (init->target != phys_target)
@@ -969,9 +969,15 @@ static int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
vcpu->arch.target = phys_target;
 
/* Now we know what it is, we can reset it. */
-   return kvm_reset_vcpu(vcpu);
-}
+   ret = kvm_reset_vcpu(vcpu);
+   if (ret) {
+   vcpu->arch.target = -1;
+   bitmap_zero(vcpu->arch.features, KVM_VCPU_MAX_FEATURES);
+   return ret;
+   }
 
+   return 0;
+}
 
 static int kvm_arch_vcpu_ioctl_vcpu_init(struct kvm_vcpu *vcpu,
 struct kvm_vcpu_init *init)
-- 
2.20.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 18/27] KVM: arm64/sve: Add SVE support to register access ioctl interface

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 06:25:39PM +0200, Andrew Jones wrote:
> On Thu, Apr 04, 2019 at 03:50:56PM +0100, Dave Martin wrote:
> > On Thu, Apr 04, 2019 at 03:57:06PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:43PM +, Dave Martin wrote:
> > > > This patch adds the following registers for access via the
> > > > KVM_{GET,SET}_ONE_REG interface:
> > > > 
> > > >  * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
> > > >  * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
> > > >  * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
> > > > 
> > > > In order to adapt gracefully to future architectural extensions,
> > > > the registers are logically divided up into slices as noted above:
> > > > the i parameter denotes the slice index.
> > > > 
> > > > This allows us to reserve space in the ABI for future expansion of
> > > > these registers.  However, as of today the architecture does not
> > > > permit registers to be larger than a single slice, so no code is
> > > > needed in the kernel to expose additional slices, for now.  The
> > > > code can be extended later as needed to expose them up to a maximum
> > > > of 32 slices (as carved out in the architecture itself) if they
> > > > really exist someday.
> > > > 
> > > > The registers are only visible for vcpus that have SVE enabled.
> > > > They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
> > > > have SVE.
> > > > 
> > > > Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
> > > > allowed for SVE-enabled vcpus: SVE-aware userspace can use the
> > > > KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
> > > > register state.  This avoids some complex and pointless emulation
> > > > in the kernel to convert between the two views of these aliased
> > > > registers.
> > > > 
> > > > Signed-off-by: Dave Martin 
> > > > Reviewed-by: Julien Thierry 
> > > > Tested-by: zhang.lei 

[...]

> > > > +#define KVM_REG_ARM64_SVE_FFR(i)   KVM_REG_ARM64_SVE_PREG(16, i)
> > > 
> > > Since this is user api and a user may want to construct their own register
> > > indices, then shouldn't we instead provide KVM_REG_ARM64_SVE_FFR_BASE as
> > > 
> > >  #define KVM_REG_ARM64_SVE_FFR_BASE KVM_REG_ARM64_SVE_PREG_BASE | (16 << 
> > > 5)
> > 
> > Can do, or just
> > 
> > #define KVM_REG_ARM64_SVE_FFR_BASE KVM_REG_ARM64_SVE_PREG(0, 0)
> 
> I don't see how this would work for an FFR base.

Err yes, scratch that.  But I'm happy to have it, however defined.

[...]

> > > > +/* Get sanitised bounds for user/kernel SVE register copy */
> > > > +static int sve_reg_to_region(struct sve_state_reg_region *region,
> > > > +struct kvm_vcpu *vcpu,
> > > > +const struct kvm_one_reg *reg)
> > > > +{

[...]

> > > > +   sve_state_size = vcpu_sve_state_size(vcpu);
> > > > +   if (!sve_state_size)
> > > > +   return -EINVAL;
> > > > +
> > > > +   region->koffset = array_index_nospec(reqoffset, sve_state_size);
> > > > +   region->klen = min(maxlen, reqlen);
> > > > +   region->upad = reqlen - region->klen;
> > > > +
> > > > +   return 0;
> > > > +}
> > > > +
> > > > +static int get_sve_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg 
> > > > *reg)
> > > > +{
> > > > +   struct sve_state_reg_region region;
> > > > +   char __user *uptr = (char __user *)reg->addr;
> > > > +
> > > > +   if (!vcpu_has_sve(vcpu) || sve_reg_to_region(, vcpu, 
> > > > reg))
> > > > +   return -ENOENT;
> > > 
> > > sve_reg_to_region() can return EINVAL, but here it would get changed to
> > > ENOENT.
> > 
> > Hmm, I'd say the affected code in sve_reg_to_region() should really be
> > a WARN_ON(): we're not supposed to hit it because we can't get here
> > until the vcpu is finalized.  It's really just a defensive check before
> > dividing by some potentially invalid value.  In such a case, it's
> > reasonable to have that EINVAL show through to userspace.
> 
> Adding the WARN_ON is a good idea. The thing is that the EINVAL is *not*
> going to show through to userspace. ENOENT will. Which might be fine,
> but if so, then it would clear things up to just return ENOENT in
> sve_reg_to_region() as well.

I meant that we can propagate the actual return value back.

It might be better just to merge the vcpu_has_sve() check into 
sve_reg_to_region(), and simply have

int ret;

ret = sve_reg_to_region(...);
if (ret)
return ret;

here.

Currently we return -ENOENT for a non-SVE-enabled vcpu, even if the reg
ID is complete garbage.  It would probably be useful to tidy that up at
the same time: -EINVAL would probably be more appropriate for such
cases.

[...]

> > > >  int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct 
> > > > kvm_regs *regs)
> > > >  {
> > > > return -EINVAL;
> > > > @@ -346,12 +461,12 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const 
> > > > struct kvm_one_reg 

Re: [PATCH v7 22/27] KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 05:07:09PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:47PM +, Dave Martin wrote:
> > Some aspects of vcpu configuration may be too complex to be
> > completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
> > requirement for userspace to do some additional configuration
> > before various other ioctls will work in a consistent way.
> > 
> > In particular this will be the case for SVE, where userspace will
> > need to negotiate the set of vector lengths to be made available to
> > the guest before the vcpu becomes fully usable.
> > 
> > In order to provide an explicit way for userspace to confirm that
> > it has finished setting up a particular vcpu feature, this patch
> > adds a new ioctl KVM_ARM_VCPU_FINALIZE.
> > 
> > When userspace has opted into a feature that requires finalization,
> > typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
> > matching call to KVM_ARM_VCPU_FINALIZE is now required before
> > KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
> > impose additional restrictions where appropriate.
> > 
> > No existing vcpu features are affected by this, so current
> > userspace implementations will continue to work exactly as before,
> > with no need to issue KVM_ARM_VCPU_FINALIZE.
> > 
> > As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
> > placeholder: no finalizable features exist yet, so ioctl is not
> > required and will always yield EINVAL.  Subsequent patches will add
> > the finalization logic to make use of this ioctl for SVE.
> > 
> > No functional change for existing userspace.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Julien Thierry 
> > Tested-by: zhang.lei 
> > 
> > ---
> > 
> > Changes since v5:
> > 
> >  * Commit message, including subject line, rewritten.
> > 
> >This patch is a rework of "KVM: arm/arm64: Add hook to finalize the
> >vcpu configuration".  The old subject line and commit message no
> >longer accurately described what the patch does.  However, the code
> >is an evolution of the previous patch rather than a wholesale
> >rewrite.
> > 
> >  * Added an explicit KVM_ARM_VCPU_FINALIZE ioctl, rather than just
> >providing internal hooks in the kernel to finalize the vcpu
> >configuration implicitly.  This allows userspace to confirm exactly
> >when it has finished configuring the vcpu and is ready to use it.
> > 
> >This results in simpler (and hopefully more maintainable) ioctl
> >ordering rules.
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  4 
> >  arch/arm64/include/asm/kvm_host.h |  4 
> >  include/uapi/linux/kvm.h  |  3 +++
> >  virt/kvm/arm/arm.c| 18 ++
> >  4 files changed, 29 insertions(+)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index a49ee01..e80cfc1 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -19,6 +19,7 @@
> >  #ifndef __ARM_KVM_HOST_H__
> >  #define __ARM_KVM_HOST_H__
> >  
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -411,4 +412,7 @@ static inline int kvm_arm_setup_stage2(struct kvm *kvm, 
> > unsigned long type)
> > return 0;
> >  }
> >  
> > +#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> > +#define kvm_arm_vcpu_is_finalized(vcpu) true
> > +
> 
> We usually use inline functions for the stubs.

I guess we could.

The vcpu_has_sve() circular include problem applies here too if we put
the actual function bodies here, which is why I ended up with this.  Now
that the bodies (for arm64) are out of line, it actually doesn't matter.

> >  #endif /* __ARM_KVM_HOST_H__ */
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 3e89509..98658f7 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -23,6 +23,7 @@
> >  #define __ARM64_KVM_HOST_H__
> >  
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -625,4 +626,7 @@ void kvm_arch_free_vm(struct kvm *kvm);
> >  
> >  int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
> >  
> > +#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> > +#define kvm_arm_vcpu_is_finalized(vcpu) true
> 
> Same as above.

Ditto

> > +
> >  #endif /* __ARM64_KVM_HOST_H__ */
> > diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> > index dc77a5a..c3b8e7a 100644
> > --- a/include/uapi/linux/kvm.h
> > +++ b/include/uapi/linux/kvm.h
> > @@ -1441,6 +1441,9 @@ struct kvm_enc_region {
> >  /* Available with KVM_CAP_HYPERV_CPUID */
> >  #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
> >  
> > +/* Available with KVM_CAP_ARM_SVE */
> > +#define KVM_ARM_VCPU_FINALIZE_IOW(KVMIO,  0xc2, int)
> > +
> >  /* Secure Encrypted Virtualization command */
> >  enum sev_cmd_id {
> > /* Guest initialization commands */
> > diff --git 

Re: [PATCH] KVM: arm/arm64: vgic: Restrict setting irq->targets only in GICv2

2019-04-04 Thread Zenghui Yu
Hi Andre,

Thanks for looking into this.

On Thu, Apr 4, 2019 at 9:17 PM Andre Przywara  wrote:
>
> On Thu, 4 Apr 2019 12:30:15 +
> Zenghui Yu  wrote:
>
> Hi,
>
> > Commit ad275b8bb1e6 ("KVM: arm/arm64: vgic-new: vgic_init: implement
> > vgic_init") had set irq->targets in kvm_vgic_vcpu_init(), regardless of
> > the GIC architecture (v2 or v3). When the number of vcpu reaches 32
> > (in v3), UBSAN will complain about it.
>
> The first part looks similar to this one:
> http://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/637209.html

Yes. I have not noticed this, sorry.

>
> >  
> > 
> >  UBSAN: Undefined behaviour in 
> > arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:223:21
> >  shift exponent 32 is too large for 32-bit type 'unsigned int'
> >  CPU: 13 PID: 833 Comm: CPU 32/KVM Kdump: loaded Not tainted 5.1.0-rc1+ #16
> >  Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
> >  Call trace:
> >   dump_backtrace+0x0/0x190
> >   show_stack+0x24/0x30
> >   dump_stack+0xc8/0x114
> >   ubsan_epilogue+0x14/0x50
> >   __ubsan_handle_shift_out_of_bounds+0x118/0x188
> >   kvm_vgic_vcpu_init+0x1d4/0x200
> >   kvm_arch_vcpu_init+0x3c/0x48
> >   kvm_vcpu_init+0xa8/0x100
> >   kvm_arch_vcpu_create+0x94/0x120
> >   kvm_vm_ioctl+0x57c/0xe58
> >   do_vfs_ioctl+0xc4/0x7f0
> >   ksys_ioctl+0x8c/0xa0
> >   __arm64_sys_ioctl+0x28/0x38
> >   el0_svc_common+0xa0/0x190
> >   el0_svc_handler+0x38/0x78
> >   el0_svc+0x8/0xc
> >  
> > 
> >
> > This patch Restricts setting irq->targets in GICv2, which only supports
> > a maximum of eight PEs, to keep UBSAN quiet. And since irq->mpidr will
> > only be used by SPI in GICv3, we decided to set mpidr to 0 for SGI and
> > PPI.
> >
> > Like commit ab2d5eb03dbb ("KVM: arm/arm64: vgic: Always initialize the
> > group of private IRQs"), we should also take the creating order of the
> > VGIC and VCPUs into consideration.
> >
> > Cc: Eric Auger 
> > Cc: Marc Zyngier 
> > Cc: Andre Przywara 
> > Cc: Christoffer Dall 
> > Signed-off-by: Zenghui Yu 
> > ---
> >  virt/kvm/arm/vgic/vgic-init.c | 16 +++-
> >  1 file changed, 11 insertions(+), 5 deletions(-)
> >
> > diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> > index 3bdb31e..0cba92e 100644
> > --- a/virt/kvm/arm/vgic/vgic-init.c
> > +++ b/virt/kvm/arm/vgic/vgic-init.c
> > @@ -220,7 +220,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
> >   irq->intid = i;
> >   irq->vcpu = NULL;
> >   irq->target_vcpu = vcpu;
> > - irq->targets = 1U << vcpu->vcpu_id;
> >   kref_init(>refcount);
> >   if (vgic_irq_is_sgi(i)) {
> >   /* SGIs */
> > @@ -231,10 +230,14 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
> >   irq->config = VGIC_CONFIG_LEVEL;
> >   }
> >
> > - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> > + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
> >   irq->group = 1;
> > - else
> > + irq->mpidr = 0;
> > + } else {
> >   irq->group = 0;
> > + if (vcpu->vcpu_id < VGIC_V2_MAX_CPUS)
> > + irq->targets = 1U << vcpu->vcpu_id;
> > + }
> >   }
> >
> >   if (!irqchip_in_kernel(vcpu->kvm))
> > @@ -297,10 +300,13 @@ int vgic_init(struct kvm *kvm)
> >
> >   for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
> >   struct vgic_irq *irq = _cpu->private_irqs[i];
> > - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> > + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
> >   irq->group = 1;
> > - else
> > + irq->mpidr = 0;
> > + } else {
> >   irq->group = 0;
> > + irq->targets = 1U << vcpu->vcpu_id;
> > + }
>
> So why would you need this part? That does the same as above, doesn't it?

This idea comes from commit ab2d5eb03dbb. As Christoffer said, "we have no
enforced ordering of creating the VGIC and creating VCPUs". Without this
part, the VCPUs created before VGIC might still end up with the wrong
"mpidr (target)" set, since they don't know the actual vGIC model.

If we're using QEMU to boot a vGIC-v3 guest, we'll still find the incorrect
TARGET value from debugfs. That's QEMU will create and intialize all of the
VCPUs before VGIC.

A detailed explanation can be found at:

https://marc.info/?l=android-virt=154713085226516=2


thanks,

zenghui

>
> Cheers,
> Andre.
>
>
> >   }
> >   }
> >
>
>
> ___
> linux-arm-kernel 

Re: [PATCH v12 4/8] arm64: arm_pmu: Add !VHE support for exclude_host/exclude_guest attributes

2019-04-04 Thread Will Deacon
On Thu, Mar 28, 2019 at 10:37:27AM +, Andrew Murray wrote:
> Add support for the :G and :H attributes in perf by handling the
> exclude_host/exclude_guest event attributes.
> 
> We notify KVM of counters that we wish to be enabled or disabled on
> guest entry/exit and thus defer from starting or stopping events based
> on their event attributes.
> 
> With !VHE we switch the counters between host/guest at EL2. We are able
> to eliminate counters counting host events on the boundaries of guest
> entry/exit when using :G by filtering out EL2 for exclude_host. When
> using !exclude_hv there is a small blackout window at the guest
> entry/exit where host events are not captured.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm64/kernel/perf_event.c | 43 --
>  1 file changed, 36 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index cccf4fc86d67..6bb28aaf5aea 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -26,6 +26,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -528,11 +529,21 @@ static inline int armv8pmu_enable_counter(int idx)
>  
>  static inline void armv8pmu_enable_event_counter(struct perf_event *event)
>  {
> + struct perf_event_attr *attr = >attr;
>   int idx = event->hw.idx;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
> - armv8pmu_enable_counter(idx);
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_enable_counter(idx - 1);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + kvm_set_pmu_events(counter_bits, attr);
> +
> + /* We rely on the hypervisor switch code to enable guest counters */
> + if (!kvm_pmu_counter_deferred(attr)) {
> + armv8pmu_enable_counter(idx);
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_enable_counter(idx - 1);
> + }
>  }
>  
>  static inline int armv8pmu_disable_counter(int idx)
> @@ -545,11 +556,21 @@ static inline int armv8pmu_disable_counter(int idx)
>  static inline void armv8pmu_disable_event_counter(struct perf_event *event)
>  {
>   struct hw_perf_event *hwc = >hw;
> + struct perf_event_attr *attr = >attr;
>   int idx = hwc->idx;
> + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
>  
>   if (armv8pmu_event_is_chained(event))
> - armv8pmu_disable_counter(idx - 1);
> - armv8pmu_disable_counter(idx);
> + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> +
> + kvm_clr_pmu_events(counter_bits);
> +
> + /* We rely on the hypervisor switch code to disable guest counters */
> + if (!kvm_pmu_counter_deferred(attr)) {
> + if (armv8pmu_event_is_chained(event))
> + armv8pmu_disable_counter(idx - 1);
> + armv8pmu_disable_counter(idx);
> + }
>  }
>  
>  static inline int armv8pmu_enable_intens(int idx)
> @@ -829,11 +850,16 @@ static int armv8pmu_set_event_filter(struct 
> hw_perf_event *event,
>   if (!attr->exclude_kernel)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;
>   } else {
> - if (attr->exclude_kernel)
> - config_base |= ARMV8_PMU_EXCLUDE_EL1;
> - if (!attr->exclude_hv)
> + if (!attr->exclude_hv && !attr->exclude_host)
>   config_base |= ARMV8_PMU_INCLUDE_EL2;

FWIW, that doesn't align with my personal view of what the host is, but
I'm not going to argue if Marc and Christoffer are happy with it.

Anyway, thanks for persevering with this:

Acked-by: Will Deacon 

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 21/27] KVM: arm/arm64: Add hook for arch-specific KVM initialisation

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 03:53:55PM +0100, Dave Martin wrote:
> On Thu, Apr 04, 2019 at 04:25:28PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:46PM +, Dave Martin wrote:
> > > This patch adds a kvm_arm_init_arch_resources() hook to perform
> > > subarch-specific initialisation when starting up KVM.
> > > 
> > > This will be used in a subsequent patch for global SVE-related
> > > setup on arm64.
> > > 
> > > No functional change.
> > > 
> > > Signed-off-by: Dave Martin 
> > > Reviewed-by: Julien Thierry 
> > > Tested-by: zhang.lei 
> > > ---
> > >  arch/arm/include/asm/kvm_host.h   | 2 ++
> > >  arch/arm64/include/asm/kvm_host.h | 2 ++
> > >  virt/kvm/arm/arm.c| 4 
> > >  3 files changed, 8 insertions(+)
> > > 
> > > diff --git a/arch/arm/include/asm/kvm_host.h 
> > > b/arch/arm/include/asm/kvm_host.h
> > > index 770d732..a49ee01 100644
> > > --- a/arch/arm/include/asm/kvm_host.h
> > > +++ b/arch/arm/include/asm/kvm_host.h
> > > @@ -53,6 +53,8 @@
> > >  
> > >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> > >  
> > > +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> > > +
> > >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> > >  int __attribute_const__ kvm_target_cpu(void);
> > >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 205438a..3e89509 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -58,6 +58,8 @@
> > >  
> > >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> > >  
> > > +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> > > +
> > >  int __attribute_const__ kvm_target_cpu(void);
> > >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > >  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
> > > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > > index 99c3738..c69e137 100644
> > > --- a/virt/kvm/arm/arm.c
> > > +++ b/virt/kvm/arm/arm.c
> > > @@ -1664,6 +1664,10 @@ int kvm_arch_init(void *opaque)
> > >   if (err)
> > >   return err;
> > >  
> > > + err = kvm_arm_init_arch_resources();
> > > + if (err)
> > > + return err;
> > > +
> > >   if (!in_hyp_mode) {
> > >   err = init_hyp_mode();
> > >   if (err)
> > > -- 
> > > 2.1.4
> > >
> > 
> > It's not clear to me from the commit message why init_common_resources()
> > won't work for this. Maybe it'll be more clear as I continue the review.
> 
> init_common_resources() is for stuff common to arm and arm64.

Well currently init_common_resources() only calls kvm_set_ipa_limit(),
which isn't implemented for arm. So if there was a plan to only use
that function for init that actually does something on both, it doesn't.

> 
> kvm_arm_init_arch_resources() is intended for stuff specific to the
> particular arch backend.

I'm not sure we need that yet. We just need an arm setup sve stub like
arm's kvm_set_ipa_limit() stub. OTOH, if we have to keep adding stubs
to arm we should probably have something like
kvm_arm_init_arch_resources() instead, and kvm_set_ipa_limit() should
be moved inside the arm64 one and the arm ipa limit stub can go. Then
since init_common_resources() would no longer be used, it could just
be deleted.

> 
> Maybe there is a better name for this.
>

The name is fine for me.

Thanks,
drew 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 18/27] KVM: arm64/sve: Add SVE support to register access ioctl interface

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 03:50:56PM +0100, Dave Martin wrote:
> On Thu, Apr 04, 2019 at 03:57:06PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:43PM +, Dave Martin wrote:
> > > This patch adds the following registers for access via the
> > > KVM_{GET,SET}_ONE_REG interface:
> > > 
> > >  * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
> > >  * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
> > >  * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
> > > 
> > > In order to adapt gracefully to future architectural extensions,
> > > the registers are logically divided up into slices as noted above:
> > > the i parameter denotes the slice index.
> > > 
> > > This allows us to reserve space in the ABI for future expansion of
> > > these registers.  However, as of today the architecture does not
> > > permit registers to be larger than a single slice, so no code is
> > > needed in the kernel to expose additional slices, for now.  The
> > > code can be extended later as needed to expose them up to a maximum
> > > of 32 slices (as carved out in the architecture itself) if they
> > > really exist someday.
> > > 
> > > The registers are only visible for vcpus that have SVE enabled.
> > > They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
> > > have SVE.
> > > 
> > > Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
> > > allowed for SVE-enabled vcpus: SVE-aware userspace can use the
> > > KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
> > > register state.  This avoids some complex and pointless emulation
> > > in the kernel to convert between the two views of these aliased
> > > registers.
> > > 
> > > Signed-off-by: Dave Martin 
> > > Reviewed-by: Julien Thierry 
> > > Tested-by: zhang.lei 
> > > 
> > > ---
> > > 
> > > Changes since v5:
> > > 
> > >  * [Julien Thierry] rename sve_reg_region() to sve_reg_to_region() to
> > >make its purpose a bit clearer.
> > > 
> > >  * [Julien Thierry] rename struct sve_state_region to
> > >sve_state_reg_region to make it clearer this this struct only
> > >describes the bounds of (part of) a single register within
> > >sve_state.
> > > 
> > >  * [Julien Thierry] Add a comment to clarify the purpose of struct
> > >sve_state_reg_region.
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  14 
> > >  arch/arm64/include/uapi/asm/kvm.h |  17 +
> > >  arch/arm64/kvm/guest.c| 139 
> > > ++
> > >  3 files changed, 158 insertions(+), 12 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 4fabfd2..205438a 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -329,6 +329,20 @@ struct kvm_vcpu_arch {
> > >  #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + 
> > > \
> > > sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> > >  
> > > +#define vcpu_sve_state_size(vcpu) ({ 
> > > \
> > > + size_t __size_ret;  \
> > > + unsigned int __vcpu_vq; \
> > > + \
> > > + if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) {  \
> > > + __size_ret = 0; \
> > > + } else {\
> > > + __vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl);\
> > > + __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq);  \
> > > + }   \
> > > + \
> > > + __size_ret; \
> > > +})
> > 
> > I know why this is a macro instead of an inline :)
> 
> Yep
> 
> I can't claim this one is a one-liner :/
> 
> > > +
> > >  /* vcpu_arch flags field values: */
> > >  #define KVM_ARM64_DEBUG_DIRTY(1 << 0)
> > >  #define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs 
> > > loaded */
> > > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > > b/arch/arm64/include/uapi/asm/kvm.h
> > > index 97c3478..ced760c 100644
> > > --- a/arch/arm64/include/uapi/asm/kvm.h
> > > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > > @@ -226,6 +226,23 @@ struct kvm_vcpu_events {
> > >KVM_REG_ARM_FW | ((r) & 0x))
> > >  #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0)
> > >  
> > > +/* SVE registers */
> > > +#define KVM_REG_ARM64_SVE(0x15 << 
> > > KVM_REG_ARM_COPROC_SHIFT)
> > > +
> > > +/* Z- and P-regs occupy blocks at the following offsets within this 
> > > range: */
> > > +#define KVM_REG_ARM64_SVE_ZREG_BASE  0
> > > +#define KVM_REG_ARM64_SVE_PREG_BASE  

Re: [PATCH v12 8/8] arm64: docs: document perf event attributes

2019-04-04 Thread Will Deacon
On Thu, Mar 28, 2019 at 10:37:31AM +, Andrew Murray wrote:
> The interaction between the exclude_{host,guest} flags,
> exclude_{user,kernel,hv} flags and presence of VHE can result in
> different exception levels being filtered by the ARMv8 PMU. As this
> can be confusing let's document how they work on arm64.
> 
> Signed-off-by: Andrew Murray 
> ---
>  Documentation/arm64/perf.txt | 74 
>  1 file changed, 74 insertions(+)
>  create mode 100644 Documentation/arm64/perf.txt
> 
> diff --git a/Documentation/arm64/perf.txt b/Documentation/arm64/perf.txt
> new file mode 100644
> index ..604446c1f720
> --- /dev/null
> +++ b/Documentation/arm64/perf.txt
> @@ -0,0 +1,74 @@
> +Perf Event Attributes
> +=
> +
> +Author: Andrew Murray 
> +Date: 2019-03-06
> +
> +exclude_user
> +
> +
> +This attribute excludes userspace.
> +
> +Userspace always runs at EL0 and thus this attribute will exclude EL0.
> +
> +
> +exclude_kernel
> +--
> +
> +This attribute excludes the kernel.
> +
> +The kernel runs at EL2 with VHE and EL1 without. Guest kernels always run
> +at EL1.
> +
> +This attribute will exclude EL1 and additionally EL2 on a VHE system.

I find this last sentence a bit confusing, because it can be read to imply
that if you don't set exclude_kernel and you're in a guest on a VHE system,
then you can profile EL2.

> +exclude_hv
> +--
> +
> +This attribute excludes the hypervisor, we ignore this flag on a VHE system
> +as we consider the host kernel to be the hypervisor.

Similar comment as the above: I don't think this makes sense when you look
at things from the guest perspective.

> +On a non-VHE system we consider the hypervisor to be any code that runs at
> +EL2 which is predominantly used for guest/host transitions.
> +
> +This attribute will exclude EL2 on a non-VHE system.
> +
> +
> +exclude_host / exclude_guest
> +
> +
> +This attribute excludes the KVM host.

But there are two attributes...

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v12 2/8] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2019-04-04 Thread Will Deacon
On Thu, Apr 04, 2019 at 05:09:58PM +0100, Will Deacon wrote:
> On Thu, Mar 28, 2019 at 10:37:25AM +, Andrew Murray wrote:
> > The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
> > a typedef to kvm_cpu_context and is used to store host cpu context. The
> > kvm_cpu_context structure is also used elsewhere to hold vcpu context.
> > In order to use the percpu to hold additional future host information we
> > encapsulate kvm_cpu_context in a new structure and rename the typedef and
> > percpu to match.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm/include/asm/kvm_host.h   |  8 ++--
> >  arch/arm64/include/asm/kvm_asm.h  |  3 ++-
> >  arch/arm64/include/asm/kvm_host.h | 16 ++--
> >  arch/arm64/kernel/asm-offsets.c   |  1 +
> >  virt/kvm/arm/arm.c| 14 --
> >  5 files changed, 27 insertions(+), 15 deletions(-)
> 
> As per usual, I'll need an Ack from Christoffer and/or Marc on this one :)

Actually... given that most of this series is now firmly in KVM territory,
it probably makes more sense for the kvm/arm tree to take this on a separate
branch, which I can also pull into the arm64 perf tree if I run into
conflicts on the non-kvm parts.

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v12 2/8] arm64: KVM: encapsulate kvm_cpu_context in kvm_host_data

2019-04-04 Thread Will Deacon
On Thu, Mar 28, 2019 at 10:37:25AM +, Andrew Murray wrote:
> The virt/arm core allocates a kvm_cpu_context_t percpu, at present this is
> a typedef to kvm_cpu_context and is used to store host cpu context. The
> kvm_cpu_context structure is also used elsewhere to hold vcpu context.
> In order to use the percpu to hold additional future host information we
> encapsulate kvm_cpu_context in a new structure and rename the typedef and
> percpu to match.
> 
> Signed-off-by: Andrew Murray 
> ---
>  arch/arm/include/asm/kvm_host.h   |  8 ++--
>  arch/arm64/include/asm/kvm_asm.h  |  3 ++-
>  arch/arm64/include/asm/kvm_host.h | 16 ++--
>  arch/arm64/kernel/asm-offsets.c   |  1 +
>  virt/kvm/arm/arm.c| 14 --
>  5 files changed, 27 insertions(+), 15 deletions(-)

As per usual, I'll need an Ack from Christoffer and/or Marc on this one :)

Will
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 22/27] KVM: arm/arm64: Add KVM_ARM_VCPU_FINALIZE ioctl

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:47PM +, Dave Martin wrote:
> Some aspects of vcpu configuration may be too complex to be
> completed inside KVM_ARM_VCPU_INIT.  Thus, there may be a
> requirement for userspace to do some additional configuration
> before various other ioctls will work in a consistent way.
> 
> In particular this will be the case for SVE, where userspace will
> need to negotiate the set of vector lengths to be made available to
> the guest before the vcpu becomes fully usable.
> 
> In order to provide an explicit way for userspace to confirm that
> it has finished setting up a particular vcpu feature, this patch
> adds a new ioctl KVM_ARM_VCPU_FINALIZE.
> 
> When userspace has opted into a feature that requires finalization,
> typically by means of a feature flag passed to KVM_ARM_VCPU_INIT, a
> matching call to KVM_ARM_VCPU_FINALIZE is now required before
> KVM_RUN or KVM_GET_REG_LIST is allowed.  Individual features may
> impose additional restrictions where appropriate.
> 
> No existing vcpu features are affected by this, so current
> userspace implementations will continue to work exactly as before,
> with no need to issue KVM_ARM_VCPU_FINALIZE.
> 
> As implemented in this patch, KVM_ARM_VCPU_FINALIZE is currently a
> placeholder: no finalizable features exist yet, so ioctl is not
> required and will always yield EINVAL.  Subsequent patches will add
> the finalization logic to make use of this ioctl for SVE.
> 
> No functional change for existing userspace.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * Commit message, including subject line, rewritten.
> 
>This patch is a rework of "KVM: arm/arm64: Add hook to finalize the
>vcpu configuration".  The old subject line and commit message no
>longer accurately described what the patch does.  However, the code
>is an evolution of the previous patch rather than a wholesale
>rewrite.
> 
>  * Added an explicit KVM_ARM_VCPU_FINALIZE ioctl, rather than just
>providing internal hooks in the kernel to finalize the vcpu
>configuration implicitly.  This allows userspace to confirm exactly
>when it has finished configuring the vcpu and is ready to use it.
> 
>This results in simpler (and hopefully more maintainable) ioctl
>ordering rules.
> ---
>  arch/arm/include/asm/kvm_host.h   |  4 
>  arch/arm64/include/asm/kvm_host.h |  4 
>  include/uapi/linux/kvm.h  |  3 +++
>  virt/kvm/arm/arm.c| 18 ++
>  4 files changed, 29 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index a49ee01..e80cfc1 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -19,6 +19,7 @@
>  #ifndef __ARM_KVM_HOST_H__
>  #define __ARM_KVM_HOST_H__
>  
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -411,4 +412,7 @@ static inline int kvm_arm_setup_stage2(struct kvm *kvm, 
> unsigned long type)
>   return 0;
>  }
>  
> +#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> +#define kvm_arm_vcpu_is_finalized(vcpu) true
> +

We usually use inline functions for the stubs.

>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 3e89509..98658f7 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -23,6 +23,7 @@
>  #define __ARM64_KVM_HOST_H__
>  
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -625,4 +626,7 @@ void kvm_arch_free_vm(struct kvm *kvm);
>  
>  int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type);
>  
> +#define kvm_arm_vcpu_finalize(vcpu, what) (-EINVAL)
> +#define kvm_arm_vcpu_is_finalized(vcpu) true

Same as above.

> +
>  #endif /* __ARM64_KVM_HOST_H__ */
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index dc77a5a..c3b8e7a 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -1441,6 +1441,9 @@ struct kvm_enc_region {
>  /* Available with KVM_CAP_HYPERV_CPUID */
>  #define KVM_GET_SUPPORTED_HV_CPUID _IOWR(KVMIO, 0xc1, struct kvm_cpuid2)
>  
> +/* Available with KVM_CAP_ARM_SVE */
> +#define KVM_ARM_VCPU_FINALIZE  _IOW(KVMIO,  0xc2, int)
> +
>  /* Secure Encrypted Virtualization command */
>  enum sev_cmd_id {
>   /* Guest initialization commands */
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index c69e137..9edbf0f 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -545,6 +545,9 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>   if (likely(vcpu->arch.has_run_once))
>   return 0;
>  
> + if (!kvm_arm_vcpu_is_finalized(vcpu))
> + return -EPERM;
> +
>   vcpu->arch.has_run_once = true;
>  
>   if (likely(irqchip_in_kernel(kvm))) {
> @@ -1116,6 +1119,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   if 

Re: [PATCH v7 21/27] KVM: arm/arm64: Add hook for arch-specific KVM initialisation

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 04:25:28PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:46PM +, Dave Martin wrote:
> > This patch adds a kvm_arm_init_arch_resources() hook to perform
> > subarch-specific initialisation when starting up KVM.
> > 
> > This will be used in a subsequent patch for global SVE-related
> > setup on arm64.
> > 
> > No functional change.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Julien Thierry 
> > Tested-by: zhang.lei 
> > ---
> >  arch/arm/include/asm/kvm_host.h   | 2 ++
> >  arch/arm64/include/asm/kvm_host.h | 2 ++
> >  virt/kvm/arm/arm.c| 4 
> >  3 files changed, 8 insertions(+)
> > 
> > diff --git a/arch/arm/include/asm/kvm_host.h 
> > b/arch/arm/include/asm/kvm_host.h
> > index 770d732..a49ee01 100644
> > --- a/arch/arm/include/asm/kvm_host.h
> > +++ b/arch/arm/include/asm/kvm_host.h
> > @@ -53,6 +53,8 @@
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> > +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> > +
> >  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 205438a..3e89509 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -58,6 +58,8 @@
> >  
> >  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
> >  
> > +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> > +
> >  int __attribute_const__ kvm_target_cpu(void);
> >  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> >  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
> > diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> > index 99c3738..c69e137 100644
> > --- a/virt/kvm/arm/arm.c
> > +++ b/virt/kvm/arm/arm.c
> > @@ -1664,6 +1664,10 @@ int kvm_arch_init(void *opaque)
> > if (err)
> > return err;
> >  
> > +   err = kvm_arm_init_arch_resources();
> > +   if (err)
> > +   return err;
> > +
> > if (!in_hyp_mode) {
> > err = init_hyp_mode();
> > if (err)
> > -- 
> > 2.1.4
> >
> 
> It's not clear to me from the commit message why init_common_resources()
> won't work for this. Maybe it'll be more clear as I continue the review.

init_common_resources() is for stuff common to arm and arm64.

kvm_arm_init_arch_resources() is intended for stuff specific to the
particular arch backend.

Maybe there is a better name for this.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 18/27] KVM: arm64/sve: Add SVE support to register access ioctl interface

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 03:57:06PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:43PM +, Dave Martin wrote:
> > This patch adds the following registers for access via the
> > KVM_{GET,SET}_ONE_REG interface:
> > 
> >  * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
> >  * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
> >  * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
> > 
> > In order to adapt gracefully to future architectural extensions,
> > the registers are logically divided up into slices as noted above:
> > the i parameter denotes the slice index.
> > 
> > This allows us to reserve space in the ABI for future expansion of
> > these registers.  However, as of today the architecture does not
> > permit registers to be larger than a single slice, so no code is
> > needed in the kernel to expose additional slices, for now.  The
> > code can be extended later as needed to expose them up to a maximum
> > of 32 slices (as carved out in the architecture itself) if they
> > really exist someday.
> > 
> > The registers are only visible for vcpus that have SVE enabled.
> > They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
> > have SVE.
> > 
> > Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
> > allowed for SVE-enabled vcpus: SVE-aware userspace can use the
> > KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
> > register state.  This avoids some complex and pointless emulation
> > in the kernel to convert between the two views of these aliased
> > registers.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Julien Thierry 
> > Tested-by: zhang.lei 
> > 
> > ---
> > 
> > Changes since v5:
> > 
> >  * [Julien Thierry] rename sve_reg_region() to sve_reg_to_region() to
> >make its purpose a bit clearer.
> > 
> >  * [Julien Thierry] rename struct sve_state_region to
> >sve_state_reg_region to make it clearer this this struct only
> >describes the bounds of (part of) a single register within
> >sve_state.
> > 
> >  * [Julien Thierry] Add a comment to clarify the purpose of struct
> >sve_state_reg_region.
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  14 
> >  arch/arm64/include/uapi/asm/kvm.h |  17 +
> >  arch/arm64/kvm/guest.c| 139 
> > ++
> >  3 files changed, 158 insertions(+), 12 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 4fabfd2..205438a 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -329,6 +329,20 @@ struct kvm_vcpu_arch {
> >  #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
> >   sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> >  
> > +#define vcpu_sve_state_size(vcpu) ({   
> > \
> > +   size_t __size_ret;  \
> > +   unsigned int __vcpu_vq; \
> > +   \
> > +   if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) {  \
> > +   __size_ret = 0; \
> > +   } else {\
> > +   __vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl);\
> > +   __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq);  \
> > +   }   \
> > +   \
> > +   __size_ret; \
> > +})
> 
> I know why this is a macro instead of an inline :)

Yep

I can't claim this one is a one-liner :/

> > +
> >  /* vcpu_arch flags field values: */
> >  #define KVM_ARM64_DEBUG_DIRTY  (1 << 0)
> >  #define KVM_ARM64_FP_ENABLED   (1 << 1) /* guest FP regs 
> > loaded */
> > diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> > b/arch/arm64/include/uapi/asm/kvm.h
> > index 97c3478..ced760c 100644
> > --- a/arch/arm64/include/uapi/asm/kvm.h
> > +++ b/arch/arm64/include/uapi/asm/kvm.h
> > @@ -226,6 +226,23 @@ struct kvm_vcpu_events {
> >  KVM_REG_ARM_FW | ((r) & 0x))
> >  #define KVM_REG_ARM_PSCI_VERSION   KVM_REG_ARM_FW_REG(0)
> >  
> > +/* SVE registers */
> > +#define KVM_REG_ARM64_SVE  (0x15 << KVM_REG_ARM_COPROC_SHIFT)
> > +
> > +/* Z- and P-regs occupy blocks at the following offsets within this range: 
> > */
> > +#define KVM_REG_ARM64_SVE_ZREG_BASE0
> > +#define KVM_REG_ARM64_SVE_PREG_BASE0x400
> > +
> > +#define KVM_REG_ARM64_SVE_ZREG(n, i)   (KVM_REG_ARM64 | 
> > KVM_REG_ARM64_SVE | \
> > +KVM_REG_ARM64_SVE_ZREG_BASE |  \
> > +KVM_REG_SIZE_U2048 |   \
> > +  

Re: [PATCH v7 21/27] KVM: arm/arm64: Add hook for arch-specific KVM initialisation

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:46PM +, Dave Martin wrote:
> This patch adds a kvm_arm_init_arch_resources() hook to perform
> subarch-specific initialisation when starting up KVM.
> 
> This will be used in a subsequent patch for global SVE-related
> setup on arm64.
> 
> No functional change.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> ---
>  arch/arm/include/asm/kvm_host.h   | 2 ++
>  arch/arm64/include/asm/kvm_host.h | 2 ++
>  virt/kvm/arm/arm.c| 4 
>  3 files changed, 8 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 770d732..a49ee01 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -53,6 +53,8 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +
>  u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 205438a..3e89509 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -58,6 +58,8 @@
>  
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>  
> +static inline int kvm_arm_init_arch_resources(void) { return 0; }
> +
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  int kvm_arch_vm_ioctl_check_extension(struct kvm *kvm, long ext);
> diff --git a/virt/kvm/arm/arm.c b/virt/kvm/arm/arm.c
> index 99c3738..c69e137 100644
> --- a/virt/kvm/arm/arm.c
> +++ b/virt/kvm/arm/arm.c
> @@ -1664,6 +1664,10 @@ int kvm_arch_init(void *opaque)
>   if (err)
>   return err;
>  
> + err = kvm_arm_init_arch_resources();
> + if (err)
> + return err;
> +
>   if (!in_hyp_mode) {
>   err = init_hyp_mode();
>   if (err)
> -- 
> 2.1.4
>

It's not clear to me from the commit message why init_common_resources()
won't work for this. Maybe it'll be more clear as I continue the review.

drew 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 20/27] arm64/sve: In-kernel vector length availability query interface

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:45PM +, Dave Martin wrote:
> KVM will need to interrogate the set of SVE vector lengths
> available on the system.
> 
> This patch exposes the relevant bits to the kernel, along with a
> sve_vq_available() helper to check whether a particular vector
> length is supported.
> 
> __vq_to_bit() and __bit_to_vq() are not intended for use outside
> these functions: now that these are exposed outside fpsimd.c, they
> are prefixed with __ in order to provide an extra hint that they
> are not intended for general-purpose use.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Alex Bennée 
> Tested-by: zhang.lei 
> ---
>  arch/arm64/include/asm/fpsimd.h | 29 +
>  arch/arm64/kernel/fpsimd.c  | 35 ---
>  2 files changed, 37 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h
> index df7a143..ad6d2e4 100644
> --- a/arch/arm64/include/asm/fpsimd.h
> +++ b/arch/arm64/include/asm/fpsimd.h
> @@ -24,10 +24,13 @@
>  
>  #ifndef __ASSEMBLY__
>  
> +#include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #if defined(__KERNEL__) && defined(CONFIG_COMPAT)
>  /* Masks for extracting the FPSR and FPCR from the FPSCR */
> @@ -89,6 +92,32 @@ extern u64 read_zcr_features(void);
>  
>  extern int __ro_after_init sve_max_vl;
>  extern int __ro_after_init sve_max_virtualisable_vl;
> +/* Set of available vector lengths, as vq_to_bit(vq): */

s/as/for use with/ ?
s/vq_to_bit/__vq_to_bit/

> +extern __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +
> +/*
> + * Helpers to translate bit indices in sve_vq_map to VQ values (and
> + * vice versa).  This allows find_next_bit() to be used to find the
> + * _maximum_ VQ not exceeding a certain value.
> + */
> +static inline unsigned int __vq_to_bit(unsigned int vq)
> +{

Why not have the same WARN_ON and clamping here as we do
in __bit_to_vq. Here a vq > SVE_VQ_MAX will wrap around
to a super high bit.

> + return SVE_VQ_MAX - vq;
> +}
> +
> +static inline unsigned int __bit_to_vq(unsigned int bit)
> +{
> + if (WARN_ON(bit >= SVE_VQ_MAX))
> + bit = SVE_VQ_MAX - 1;
> +
> + return SVE_VQ_MAX - bit;
> +}
> +
> +/* Ensure vq >= SVE_VQ_MIN && vq <= SVE_VQ_MAX before calling this function 
> */

Are we avoiding putting these tests and WARN_ONs in this function to
keep it fast?

> +static inline bool sve_vq_available(unsigned int vq)
> +{
> + return test_bit(__vq_to_bit(vq), sve_vq_map);
> +}
>  
>  #ifdef CONFIG_ARM64_SVE
>  
> diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c
> index 8a93afa..577296b 100644
> --- a/arch/arm64/kernel/fpsimd.c
> +++ b/arch/arm64/kernel/fpsimd.c
> @@ -136,7 +136,7 @@ static int sve_default_vl = -1;
>  int __ro_after_init sve_max_vl = SVE_VL_MIN;
>  int __ro_after_init sve_max_virtualisable_vl = SVE_VL_MIN;
>  /* Set of available vector lengths, as vq_to_bit(vq): */
> -static __ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
> +__ro_after_init DECLARE_BITMAP(sve_vq_map, SVE_VQ_MAX);
>  /* Set of vector lengths present on at least one cpu: */
>  static __ro_after_init DECLARE_BITMAP(sve_vq_partial_map, SVE_VQ_MAX);
>  static void __percpu *efi_sve_state;
> @@ -270,25 +270,6 @@ void fpsimd_save(void)
>  }
>  
>  /*
> - * Helpers to translate bit indices in sve_vq_map to VQ values (and
> - * vice versa).  This allows find_next_bit() to be used to find the
> - * _maximum_ VQ not exceeding a certain value.
> - */
> -
> -static unsigned int vq_to_bit(unsigned int vq)
> -{
> - return SVE_VQ_MAX - vq;
> -}
> -
> -static unsigned int bit_to_vq(unsigned int bit)
> -{
> - if (WARN_ON(bit >= SVE_VQ_MAX))
> - bit = SVE_VQ_MAX - 1;
> -
> - return SVE_VQ_MAX - bit;
> -}
> -
> -/*
>   * All vector length selection from userspace comes through here.
>   * We're on a slow path, so some sanity-checks are included.
>   * If things go wrong there's a bug somewhere, but try to fall back to a
> @@ -309,8 +290,8 @@ static unsigned int find_supported_vector_length(unsigned 
> int vl)
>   vl = max_vl;
>  
>   bit = find_next_bit(sve_vq_map, SVE_VQ_MAX,
> - vq_to_bit(sve_vq_from_vl(vl)));
> - return sve_vl_from_vq(bit_to_vq(bit));
> + __vq_to_bit(sve_vq_from_vl(vl)));
> + return sve_vl_from_vq(__bit_to_vq(bit));
>  }
>  
>  #ifdef CONFIG_SYSCTL
> @@ -648,7 +629,7 @@ static void sve_probe_vqs(DECLARE_BITMAP(map, SVE_VQ_MAX))
>   write_sysreg_s(zcr | (vq - 1), SYS_ZCR_EL1); /* self-syncing */
>   vl = sve_get_vl();
>   vq = sve_vq_from_vl(vl); /* skip intervening lengths */
> - set_bit(vq_to_bit(vq), map);
> + set_bit(__vq_to_bit(vq), map);
>   }
>  }
>  
> @@ -717,7 +698,7 @@ int sve_verify_vq_map(void)
>* Mismatches above sve_max_virtualisable_vl are fine, since
>* no 

Re: [PATCH v7 19/27] KVM: arm64: Enumerate SVE register indices for KVM_GET_REG_LIST

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:44PM +, Dave Martin wrote:
> This patch includes the SVE register IDs in the list returned by
> KVM_GET_REG_LIST, as appropriate.
> 
> On a non-SVE-enabled vcpu, no new IDs are added.
> 
> On an SVE-enabled vcpu, IDs for the FPSIMD V-registers are removed
> from the list, since userspace is required to access the Z-
> registers instead in order to access the V-register content.  For
> the variably-sized SVE registers, the appropriate set of slice IDs
> are enumerated, depending on the maximum vector length for the
> vcpu.
> 
> As it currently stands, the SVE architecture never requires more
> than one slice to exist per register, so this patch adds no
> explicit support for enumerating multiple slices.  The code can be
> extended straightforwardly to support this in the future, if
> needed.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v6:
> 
>  * [Julien Thierry] Add a #define to replace the magic "slices = 1",
>and add a comment explaining to maintainers what needs to happen if
>this is updated in the future.
> 
> Changes since v5:
> 
> (Dropped Julien Thierry's Reviewed-by due to non-trivial rebasing)
> 
>  * Move mis-split reword to prevent put_user()s being accidentally the
>correct size from KVM: arm64/sve: Add pseudo-register for the guest's
>vector lengths.
> ---
>  arch/arm64/kvm/guest.c | 63 
> ++
>  1 file changed, 63 insertions(+)
> 
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 736d8cb..2aa80a5 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -222,6 +222,13 @@ static int set_core_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *reg)
>  #define KVM_SVE_ZREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_ZREG(0, 0))
>  #define KVM_SVE_PREG_SIZE KVM_REG_SIZE(KVM_REG_ARM64_SVE_PREG(0, 0))
>  
> +/*
> + * number of register slices required to cover each whole SVE register on 
> vcpu

s/number/Number/
s/on vcpu//

> + * NOTE: If you are tempted to modify this, you must also to rework

s/to rework/rework/

> + * sve_reg_to_region() to match:
> + */
> +#define vcpu_sve_slices(vcpu) 1
> +
>  /* Bounds of a single SVE register slice within vcpu->arch.sve_state */
>  struct sve_state_reg_region {
>   unsigned int koffset;   /* offset into sve_state in kernel memory */
> @@ -411,6 +418,56 @@ static int get_timer_reg(struct kvm_vcpu *vcpu, const 
> struct kvm_one_reg *reg)
>   return copy_to_user(uaddr, , KVM_REG_SIZE(reg->id)) ? -EFAULT : 0;
>  }
>  
> +static unsigned long num_sve_regs(const struct kvm_vcpu *vcpu)
> +{
> + /* Only the first slice ever exists, for now */

I'd move this comment up into the one above vcpu_sve_slices(),
and then nothing needs to change here when more slices come.

> + const unsigned int slices = vcpu_sve_slices(vcpu);
> +
> + if (!vcpu_has_sve(vcpu))
> + return 0;
> +
> + return slices * (SVE_NUM_PREGS + SVE_NUM_ZREGS + 1 /* FFR */);
> +}
> +
> +static int copy_sve_reg_indices(const struct kvm_vcpu *vcpu,
> + u64 __user *uindices)
> +{
> + /* Only the first slice ever exists, for now */

Same comment as above.

> + const unsigned int slices = vcpu_sve_slices(vcpu);
> + u64 reg;
> + unsigned int i, n;
> + int num_regs = 0;
> +
> + if (!vcpu_has_sve(vcpu))
> + return 0;
> +
> + for (i = 0; i < slices; i++) {
> + for (n = 0; n < SVE_NUM_ZREGS; n++) {
> + reg = KVM_REG_ARM64_SVE_ZREG(n, i);
> + if (put_user(reg, uindices++))
> + return -EFAULT;
> +
> + num_regs++;
> + }
> +
> + for (n = 0; n < SVE_NUM_PREGS; n++) {
> + reg = KVM_REG_ARM64_SVE_PREG(n, i);
> + if (put_user(reg, uindices++))
> + return -EFAULT;
> +
> + num_regs++;
> + }
> +
> + reg = KVM_REG_ARM64_SVE_FFR(i);
> + if (put_user(reg, uindices++))
> + return -EFAULT;
> +
> + num_regs++;
> + }

nit: the extra blank lines above the num_regs++'s give the code an odd
 look (to me)

> +
> + return num_regs;
> +}
> +
>  /**
>   * kvm_arm_num_regs - how many registers do we present via KVM_GET_ONE_REG
>   *
> @@ -421,6 +478,7 @@ unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu)
>   unsigned long res = 0;
>  
>   res += num_core_regs(vcpu);
> + res += num_sve_regs(vcpu);
>   res += kvm_arm_num_sys_reg_descs(vcpu);
>   res += kvm_arm_get_fw_num_regs(vcpu);
>   res += NUM_TIMER_REGS;
> @@ -442,6 +500,11 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 
> __user *uindices)
>   return ret;
>   uindices += ret;
>  
> + ret = copy_sve_reg_indices(vcpu, uindices);
> + if 

Re: [PATCH v7 18/27] KVM: arm64/sve: Add SVE support to register access ioctl interface

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:43PM +, Dave Martin wrote:
> This patch adds the following registers for access via the
> KVM_{GET,SET}_ONE_REG interface:
> 
>  * KVM_REG_ARM64_SVE_ZREG(n, i) (n = 0..31) (in 2048-bit slices)
>  * KVM_REG_ARM64_SVE_PREG(n, i) (n = 0..15) (in 256-bit slices)
>  * KVM_REG_ARM64_SVE_FFR(i) (in 256-bit slices)
> 
> In order to adapt gracefully to future architectural extensions,
> the registers are logically divided up into slices as noted above:
> the i parameter denotes the slice index.
> 
> This allows us to reserve space in the ABI for future expansion of
> these registers.  However, as of today the architecture does not
> permit registers to be larger than a single slice, so no code is
> needed in the kernel to expose additional slices, for now.  The
> code can be extended later as needed to expose them up to a maximum
> of 32 slices (as carved out in the architecture itself) if they
> really exist someday.
> 
> The registers are only visible for vcpus that have SVE enabled.
> They are not enumerated by KVM_GET_REG_LIST on vcpus that do not
> have SVE.
> 
> Accesses to the FPSIMD registers via KVM_REG_ARM_CORE is not
> allowed for SVE-enabled vcpus: SVE-aware userspace can use the
> KVM_REG_ARM64_SVE_ZREG() interface instead to access the same
> register state.  This avoids some complex and pointless emulation
> in the kernel to convert between the two views of these aliased
> registers.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * [Julien Thierry] rename sve_reg_region() to sve_reg_to_region() to
>make its purpose a bit clearer.
> 
>  * [Julien Thierry] rename struct sve_state_region to
>sve_state_reg_region to make it clearer this this struct only
>describes the bounds of (part of) a single register within
>sve_state.
> 
>  * [Julien Thierry] Add a comment to clarify the purpose of struct
>sve_state_reg_region.
> ---
>  arch/arm64/include/asm/kvm_host.h |  14 
>  arch/arm64/include/uapi/asm/kvm.h |  17 +
>  arch/arm64/kvm/guest.c| 139 
> ++
>  3 files changed, 158 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 4fabfd2..205438a 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -329,6 +329,20 @@ struct kvm_vcpu_arch {
>  #define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
> sve_ffr_offset((vcpu)->arch.sve_max_vl)))
>  
> +#define vcpu_sve_state_size(vcpu) ({ \
> + size_t __size_ret;  \
> + unsigned int __vcpu_vq; \
> + \
> + if (WARN_ON(!sve_vl_valid((vcpu)->arch.sve_max_vl))) {  \
> + __size_ret = 0; \
> + } else {\
> + __vcpu_vq = sve_vq_from_vl((vcpu)->arch.sve_max_vl);\
> + __size_ret = SVE_SIG_REGS_SIZE(__vcpu_vq);  \
> + }   \
> + \
> + __size_ret; \
> +})

I know why this is a macro instead of an inline :)

> +
>  /* vcpu_arch flags field values: */
>  #define KVM_ARM64_DEBUG_DIRTY(1 << 0)
>  #define KVM_ARM64_FP_ENABLED (1 << 1) /* guest FP regs loaded */
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 97c3478..ced760c 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -226,6 +226,23 @@ struct kvm_vcpu_events {
>KVM_REG_ARM_FW | ((r) & 0x))
>  #define KVM_REG_ARM_PSCI_VERSION KVM_REG_ARM_FW_REG(0)
>  
> +/* SVE registers */
> +#define KVM_REG_ARM64_SVE(0x15 << KVM_REG_ARM_COPROC_SHIFT)
> +
> +/* Z- and P-regs occupy blocks at the following offsets within this range: */
> +#define KVM_REG_ARM64_SVE_ZREG_BASE  0
> +#define KVM_REG_ARM64_SVE_PREG_BASE  0x400
> +
> +#define KVM_REG_ARM64_SVE_ZREG(n, i) (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
> +  KVM_REG_ARM64_SVE_ZREG_BASE |  \
> +  KVM_REG_SIZE_U2048 |   \
> +  ((n) << 5) | (i))
> +#define KVM_REG_ARM64_SVE_PREG(n, i) (KVM_REG_ARM64 | KVM_REG_ARM64_SVE | \
> +  KVM_REG_ARM64_SVE_PREG_BASE |  \
> +  KVM_REG_SIZE_U256 |\
> +  ((n) << 5) | 

Re: [PATCH kvmtool 06/16] pci: Fix ioport allocation size

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:07 +
Julien Thierry  wrote:

> The PCI Local Bus Specification, Rev. 3.0,
> Section 6.2.5.1. "Address Maps" states:
> "Devices that map control functions into I/O Space must not consume more
> than 256 bytes per I/O Base Address register."
> 
> Yet all the PCI devices allocate IO ports of IOPORT_SIZE (= 1024 bytes).
> 
> Fix this by having PCI devices use 256 bytes ports for IO BARs.

Good catch!

> Signed-off-by: Julien Thierry 

Reviewed-by: Andre Przywara 

Cheers,
Andre.

> ---
>  hw/pci-shmem.c   |  4 ++--
>  hw/vesa.c|  4 ++--
>  include/kvm/ioport.h |  1 -
>  pci.c|  2 +-
>  virtio/pci.c | 14 +++---
>  5 files changed, 12 insertions(+), 13 deletions(-)
> 
> diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
> index a0c5ba8..2e1474b 100644
> --- a/hw/pci-shmem.c
> +++ b/hw/pci-shmem.c
> @@ -357,8 +357,8 @@ int pci_shmem__init(struct kvm *kvm)
>   return 0;
>  
>   /* Register MMIO space for MSI-X */
> - r = pci_get_io_port_block(IOPORT_SIZE);
> - r = ioport__register(kvm, r, _pci__io_ops, IOPORT_SIZE, NULL);
> + r = pci_get_io_port_block(PCI_IO_SIZE);
> + r = ioport__register(kvm, r, _pci__io_ops, PCI_IO_SIZE, NULL);
>   if (r < 0)
>   return r;
>   ivshmem_registers = (u16)r;
> diff --git a/hw/vesa.c b/hw/vesa.c
> index 404a8a3..71935d5 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -60,8 +60,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>   if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>   return NULL;
> - r = pci_get_io_space_block(IOPORT_SIZE);
> - r = ioport__register(kvm, r, _io_ops, IOPORT_SIZE, NULL);
> + r = pci_get_io_space_block(PCI_IO_SIZE);
> + r = ioport__register(kvm, r, _io_ops, PCI_IO_SIZE, NULL);
>   if (r < 0)
>   return ERR_PTR(r);
>  
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index b10fcd5..8c86b71 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -14,7 +14,6 @@
>  
>  /* some ports we reserve for own use */
>  #define IOPORT_DBG   0xe0
> -#define IOPORT_SIZE  0x400
>  
>  struct kvm;
>  
> diff --git a/pci.c b/pci.c
> index cd749db..228a628 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -23,7 +23,7 @@ static u16 io_port_blocks   = PCI_IOPORT_START;
>  
>  u16 pci_get_io_port_block(u32 size)
>  {
> - u16 port = ALIGN(io_port_blocks, IOPORT_SIZE);
> + u16 port = ALIGN(io_port_blocks, PCI_IO_SIZE);
>   io_port_blocks = port + size;
>   return port;
>  }
> diff --git a/virtio/pci.c b/virtio/pci.c
> index c8e16dd..5a6c0d0 100644
> --- a/virtio/pci.c
> +++ b/virtio/pci.c
> @@ -406,7 +406,7 @@ static void virtio_pci__io_mmio_callback(struct kvm_cpu 
> *vcpu,
>  {
>   struct virtio_pci *vpci = ptr;
>   int direction = is_write ? KVM_EXIT_IO_OUT : KVM_EXIT_IO_IN;
> - u16 port = vpci->port_addr + (addr & (IOPORT_SIZE - 1));
> + u16 port = vpci->port_addr + (addr & (PCI_IO_SIZE - 1));
>  
>   kvm__emulate_io(vcpu, port, data, direction, len, 1);
>  }
> @@ -420,14 +420,14 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct 
> virtio_device *vdev,
>   vpci->kvm = kvm;
>   vpci->dev = dev;
>  
> - r = pci_get_io_port_block(IOPORT_SIZE);
> - r = ioport__register(kvm, r, _pci__io_ops, IOPORT_SIZE, vdev);
> + r = pci_get_io_port_block(PCI_IO_SIZE);
> + r = ioport__register(kvm, r, _pci__io_ops, PCI_IO_SIZE, vdev);
>   if (r < 0)
>   return r;
>   vpci->port_addr = (u16)r;
>  
> - vpci->mmio_addr = pci_get_io_space_block(IOPORT_SIZE);
> - r = kvm__register_mmio(kvm, vpci->mmio_addr, IOPORT_SIZE, false,
> + vpci->mmio_addr = pci_get_io_space_block(PCI_IO_SIZE);
> + r = kvm__register_mmio(kvm, vpci->mmio_addr, PCI_IO_SIZE, false,
>  virtio_pci__io_mmio_callback, vpci);
>   if (r < 0)
>   goto free_ioport;
> @@ -457,8 +457,8 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct 
> virtio_device *vdev,
>   | 
> PCI_BASE_ADDRESS_SPACE_MEMORY),
>   .status = cpu_to_le16(PCI_STATUS_CAP_LIST),
>   .capabilities   = (void *)>pci_hdr.msix - (void 
> *)>pci_hdr,
> - .bar_size[0]= cpu_to_le32(IOPORT_SIZE),
> - .bar_size[1]= cpu_to_le32(IOPORT_SIZE),
> + .bar_size[0]= cpu_to_le32(PCI_IO_SIZE),
> + .bar_size[1]= cpu_to_le32(PCI_IO_SIZE),
>   .bar_size[2]= cpu_to_le32(PCI_IO_SIZE*2),
>   };
>  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 05/16] ioport: pci: Move port allocations to PCI devices

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:06 +
Julien Thierry  wrote:

Hi,

> The dynamic ioport allocation with IOPORT_EMPTY is currently only used
> by PCI devices. Other devices use fixed ports for which they request
> registration to the ioport API.
> 
> PCI ports need to be in the PCI IO space and there is no reason ioport
> API should know a PCI port is being allocated and needs to be placed in
> PCI IO space. This currently just happens to be the case.
> 
> Move the responsability of dynamic allocation of ioports from the ioport
> API to PCI.
> 
> In the future, if other types of devices also need dynamic ioport
> allocation, they'll have to figure out the range of ports they are
> allowed to use.
> 
> Signed-off-by: Julien Thierry 
> ---
>  hw/pci-shmem.c   |  3 ++-
>  hw/vesa.c|  4 ++--
>  include/kvm/ioport.h |  3 ---
>  include/kvm/pci.h|  2 ++
>  ioport.c | 18 --
>  pci.c|  8 
>  vfio/core.c  |  6 --
>  virtio/pci.c |  3 ++-
>  8 files changed, 20 insertions(+), 27 deletions(-)
> 
> diff --git a/hw/pci-shmem.c b/hw/pci-shmem.c
> index f92bc75..a0c5ba8 100644
> --- a/hw/pci-shmem.c
> +++ b/hw/pci-shmem.c
> @@ -357,7 +357,8 @@ int pci_shmem__init(struct kvm *kvm)
>   return 0;
>  
>   /* Register MMIO space for MSI-X */
> - r = ioport__register(kvm, IOPORT_EMPTY, _pci__io_ops, 
> IOPORT_SIZE, NULL);
> + r = pci_get_io_port_block(IOPORT_SIZE);
> + r = ioport__register(kvm, r, _pci__io_ops, IOPORT_SIZE, NULL);
>   if (r < 0)
>   return r;
>   ivshmem_registers = (u16)r;
> diff --git a/hw/vesa.c b/hw/vesa.c
> index f3c5114..404a8a3 100644
> --- a/hw/vesa.c
> +++ b/hw/vesa.c
> @@ -60,8 +60,8 @@ struct framebuffer *vesa__init(struct kvm *kvm)
>  
>   if (!kvm->cfg.vnc && !kvm->cfg.sdl && !kvm->cfg.gtk)
>   return NULL;
> -
> - r = ioport__register(kvm, IOPORT_EMPTY, _io_ops, IOPORT_SIZE, 
> NULL);
> + r = pci_get_io_space_block(IOPORT_SIZE);

I am confused. This is still registering I/O ports, right? And this
(misnamed) function is about MMIO?
So should it read r = pci_get_io_port_block(IOPORT_SIZE); ?

> + r = ioport__register(kvm, r, _io_ops, IOPORT_SIZE, NULL);
>   if (r < 0)
>   return ERR_PTR(r);
>  
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index db52a47..b10fcd5 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -14,11 +14,8 @@
>  
>  /* some ports we reserve for own use */
>  #define IOPORT_DBG   0xe0
> -#define IOPORT_START 0x6200
>  #define IOPORT_SIZE  0x400
>  
> -#define IOPORT_EMPTY USHRT_MAX
> -
>  struct kvm;
>  
>  struct ioport {
> diff --git a/include/kvm/pci.h b/include/kvm/pci.h
> index a86c15a..bdbd183 100644
> --- a/include/kvm/pci.h
> +++ b/include/kvm/pci.h
> @@ -19,6 +19,7 @@
>  #define PCI_CONFIG_DATA  0xcfc
>  #define PCI_CONFIG_BUS_FORWARD   0xcfa
>  #define PCI_IO_SIZE  0x100
> +#define PCI_IOPORT_START 0x6200
>  #define PCI_CFG_SIZE (1ULL << 24)
>  
>  struct kvm;
> @@ -153,6 +154,7 @@ int pci__init(struct kvm *kvm);
>  int pci__exit(struct kvm *kvm);
>  struct pci_device_header *pci__find_dev(u8 dev_num);
>  u32 pci_get_io_space_block(u32 size);

So I think this was already misnamed, but with your new function below
becomes utterly confusing. Can we rename this to pci_get_mmio_space_block?

> +u16 pci_get_io_port_block(u32 size);
>  void pci__assign_irq(struct device_header *dev_hdr);
>  void pci__config_wr(struct kvm *kvm, union pci_config_address addr, void 
> *data, int size);
>  void pci__config_rd(struct kvm *kvm, union pci_config_address addr, void 
> *data, int size);
> diff --git a/ioport.c b/ioport.c
> index a6dc65e..a72e403 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -16,24 +16,8 @@
>  
>  #define ioport_node(n) rb_entry(n, struct ioport, node)
>  
> -DEFINE_MUTEX(ioport_mutex);
> -
> -static u16   free_io_port_idx; /* protected by ioport_mutex 
> */
> -
>  static struct rb_rootioport_tree = RB_ROOT;
>  
> -static u16 ioport__find_free_port(void)
> -{
> - u16 free_port;
> -
> - mutex_lock(_mutex);
> - free_port = IOPORT_START + free_io_port_idx * IOPORT_SIZE;
> - free_io_port_idx++;
> - mutex_unlock(_mutex);
> -
> - return free_port;
> -}
> -
>  static struct ioport *ioport_search(struct rb_root *root, u64 addr)
>  {
>   struct rb_int_node *node;
> @@ -85,8 +69,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct 
> ioport_operations *ops, i
>   int r;
>  
>   br_write_lock(kvm);
> - if (port == IOPORT_EMPTY)
> - port = ioport__find_free_port();
>  
>   entry = ioport_search(_tree, port);
>   if (entry) {
> diff --git a/pci.c b/pci.c
> index 9edefa5..cd749db 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -19,6 +19,14 @@ static u32 pci_config_address_bits;
>   * 

Re: [PATCH kvmtool 04/16] pci: Fix BAR resource sizing arbitration

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:05 +
Julien Thierry  wrote:

Hi,

> From: Sami Mujawar 
> 
> According to the 'PCI Local Bus Specification, Revision 3.0,
> February 3, 2004, Section 6.2.5.1, Implementation Notes, page 227'
> 
> "Software saves the original value of the Base Address register,
> writes 0  h to the register, then reads it back. Size
> calculation can be done from the 32-bit value read by first
> clearing encoding information bits (bit 0 for I/O, bits 0-3 for
> memory), inverting all 32 bits (logical NOT), then incrementing
> by 1. The resultant 32-bit value is the memory/I/O range size
> decoded by the register. Note that the upper 16 bits of the result
> is ignored if the Base Address register is for I/O and bits 16-31
> returned zero upon read."
> 
> kvmtool was returning the actual BAR resource size which would be
> incorrect as the software software drivers would invert all 32 bits
> (logical NOT), then incrementing by 1. This ends up with a very large
> resource size (in some cases more than 4GB) due to which drivers
> assert/fail to work.
> 
> e.g if the BAR resource size was 0x1000, kvmtool would return 0x1000
> instead of 0xF00x.

Ouch!

> 
> Fixed pci__config_wr() to return the size of the BAR in accordance with
> the PCI Local Bus specification, Implementation Notes.
> 
> Signed-off-by: Sami Mujawar 
> Signed-off-by: Julien Thierry 
> ---
>  pci.c | 51 ---
>  1 file changed, 48 insertions(+), 3 deletions(-)
> 
> diff --git a/pci.c b/pci.c
> index 689869c..9edefa5 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -8,6 +8,9 @@
>  #include 
>  #include 
>  
> +/* Macro to check that a value is a power of 2 */
> +#define power_of_2(pow) (((pow) != 0) && (((pow) & ((pow) - 1)) == 0))

That is probably correct, but a bit hard to read. What about defining a
static function instead? That would allow this to be written in multiple
lines. And to improve readability, the prototype should probably be:
static bool is_power_of_two(u32 value)

> +
>  static u32 pci_config_address_bits;
>  
>  /* This is within our PCI gap - in an unused area.
> @@ -173,9 +176,51 @@ void pci__config_wr(struct kvm *kvm, union 
> pci_config_address addr, void *data,
>* BAR there next time it reads from it. When the kernel got the size it
>* would write the address back.
>*/
> - if (bar < 6 && ioport__read32(data) == 0x) {
> - u32 sz = pci_hdr->bar_size[bar];
> - memcpy(base + offset, , sizeof(sz));
> + if (bar < 6) {
> + /*
> +  * According to the PCI local bus specification REV 3.0:
> +  * The number of upper bits that a device actually implements
> +  * depends on how much of the address space the device will
> +  * respond to. A device that wants a 1 MB memory address space
> +  * (using a 32-bit base address register) would build the top
> +  * 12 bits of the address register, hardwiring the other bits
> +  * to 0.
> +  * Furthermore software can determine how much address space the
> +  * device requires by writing a value of all 1's to the register
> +  * and then reading the value back. The device will return 0's 
> in
> +  * all don't-care address bits, effectively specifying the 
> address
> +  * space required.
> +  *
> +  * The following code emulates this by storing the value written
> +  * to the BAR, applying the size mask to clear the lower bits,
> +  * restoring the information bits and then updating the BAR 
> value.
> +  */
> + u32 bar_value;
> + /* Get the size mask */
> + u32 sz = ~(pci_hdr->bar_size[bar] - 1);

It's a bit confusing to read just "sz", when it's actually a mask.
Actually there is only one user, so to make the comment there more
meaningful, you can just use the mask expression there, and get rid of
"sz" altogether.

> + /* Extract the info bits */
> + u32 info = pci_hdr->bar[bar] & 0xF;

Just a nit, but I find it a bit confusing to see those variable
definitions mixed with comments, especially with those short type names
and after the big introductory comment. Can we just put the comment on the
same line as the declaration, at least?

> +
> + /* Store the value written by software */
> + memcpy(base + offset, data, size);
> +
> + /* Apply the size mask to the bar value to clear the lower bits 
> */
> + bar_value = pci_hdr->bar[bar] & sz;
> +
> + /* Warn if the bar size is not a power of 2 */
> + WARN_ON(!power_of_2(pci_hdr->bar_size[bar]));

Is this actually useful? This would put out a warning to the console, but
is there something an admin could do about it? Looks more like a kvmtool
issue (for 

Re: [PATCH kvmtool 03/16] brlock: fix build with KVM_BRLOCK_DEBUG

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:04 +
Julien Thierry  wrote:

> Build breaks when using KVM_BRLOCK_DEBUG because the header was seamingly
> conceived to be included in a single .c file...
> 
> Fix this by moving the definition of the read/write lock into the kvm
> struct.
> 
> Signed-off-by: Julien Thierry 

Reviewed-by: Andre Przywara 

Cheers,
Andre.

> ---
>  include/kvm/brlock.h | 10 --
>  include/kvm/kvm.h|  4 
>  kvm.c|  4 
>  3 files changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/include/kvm/brlock.h b/include/kvm/brlock.h
> index 29f72e0..1862210 100644
> --- a/include/kvm/brlock.h
> +++ b/include/kvm/brlock.h
> @@ -21,13 +21,11 @@
>  
>  #include "kvm/rwsem.h"
>  
> -DECLARE_RWSEM(brlock_sem);
> +#define br_read_lock(kvm)down_read(&(kvm)->brlock_sem);
> +#define br_read_unlock(kvm)  up_read(&(kvm)->brlock_sem);
>  
> -#define br_read_lock(kvm)down_read(_sem);
> -#define br_read_unlock(kvm)  up_read(_sem);
> -
> -#define br_write_lock(kvm)   down_write(_sem);
> -#define br_write_unlock(kvm) up_write(_sem);
> +#define br_write_lock(kvm)   down_write(&(kvm)->brlock_sem);
> +#define br_write_unlock(kvm) up_write(&(kvm)->brlock_sem);
>  
>  #else
>  
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 1edacfd..7a73818 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -81,6 +81,10 @@ struct kvm {
>   int nr_disks;
>  
>   int vm_state;
> +
> +#ifdef KVM_BRLOCK_DEBUG
> + pthread_rwlock_tbrlock_sem;
> +#endif
>  };
>  
>  void kvm__set_dir(const char *fmt, ...);
> diff --git a/kvm.c b/kvm.c
> index d5249a0..57c4ff9 100644
> --- a/kvm.c
> +++ b/kvm.c
> @@ -160,6 +160,10 @@ struct kvm *kvm__new(void)
>   kvm->sys_fd = -1;
>   kvm->vm_fd = -1;
>  
> +#ifdef KVM_BRLOCK_DEBUG
> + kvm->brlock_sem = (pthread_rwlock_t) PTHREAD_RWLOCK_INITIALIZER;
> +#endif
> +
>   return kvm;
>  }
>  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 02/16] brlock: Always pass argument to br_read_lock/unlock

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:03 +
Julien Thierry  wrote:

> The kvm argument is not passed to br_read_lock/unlock, this works for
> the barrier implementation because the argument is not used. This ever
> breaks if another lock implementation is used.
> 
> Signed-off-by: Julien Thierry 

Reviewed-by: Andre Przywara 

Cheers,
Andre.

> ---
>  ioport.c | 4 ++--
>  mmio.c   | 4 ++--
>  2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/ioport.c b/ioport.c
> index 505e822..a6dc65e 100644
> --- a/ioport.c
> +++ b/ioport.c
> @@ -184,7 +184,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void 
> *data, int direction,
>   void *ptr = data;
>   struct kvm *kvm = vcpu->kvm;
>  
> - br_read_lock();
> + br_read_lock(kvm);
>   entry = ioport_search(_tree, port);
>   if (!entry)
>   goto out;
> @@ -201,7 +201,7 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void 
> *data, int direction,
>   }
>  
>  out:
> - br_read_unlock();
> + br_read_unlock(kvm);
>  
>   if (ret)
>   return true;
> diff --git a/mmio.c b/mmio.c
> index c648bec..61e1d47 100644
> --- a/mmio.c
> +++ b/mmio.c
> @@ -124,7 +124,7 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 
> phys_addr, u8 *data, u32 len, u
>  {
>   struct mmio_mapping *mmio;
>  
> - br_read_lock();
> + br_read_lock(vcpu->kvm);
>   mmio = mmio_search(_tree, phys_addr, len);
>  
>   if (mmio)
> @@ -135,7 +135,7 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 
> phys_addr, u8 *data, u32 len, u
>   to_direction(is_write),
>   (unsigned long long)phys_addr, len);
>   }
> - br_read_unlock();
> + br_read_unlock(vcpu->kvm);
>  
>   return true;
>  }

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 01/16] Makefile: Only compile vesa for archs that need it

2019-04-04 Thread Andre Przywara
On Thu, 7 Mar 2019 08:36:02 +
Julien Thierry  wrote:

> The vesa framebuffer is only used by architectures that explicitly
> require it (i.e. x86). Compile it out for architectures not using it, as
> its current implementation might not work for them.
> 
> Signed-off-by: Julien Thierry 

Reviewed-by: Andre Przywara 

Cheers,
Andre.

> ---
>  Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/Makefile b/Makefile
> index c4faff6..288e467 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -94,7 +94,6 @@ OBJS+= util/read-write.o
>  OBJS += util/util.o
>  OBJS += virtio/9p.o
>  OBJS += virtio/9p-pdu.o
> -OBJS += hw/vesa.o
>  OBJS += hw/pci-shmem.o
>  OBJS += kvm-ipc.o
>  OBJS += builtin-sandbox.o
> @@ -219,6 +218,8 @@ else
>  endif
>  
>  ifeq (y,$(ARCH_HAS_FRAMEBUFFER))
> + OBJS+= hw/vesa.o
> +
>   CFLAGS_GTK3 := $(shell pkg-config --cflags gtk+-3.0 2>/dev/null)
>   LDFLAGS_GTK3 := $(shell pkg-config --libs gtk+-3.0 2>/dev/null)
>   ifeq ($(call try-build,$(SOURCE_GTK3),$(CFLAGS) 
> $(CFLAGS_GTK3),$(LDFLAGS) $(LDFLAGS_GTK3)),y)

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] KVM: arm/arm64: vgic: Restrict setting irq->targets only in GICv2

2019-04-04 Thread Andre Przywara
On Thu, 4 Apr 2019 12:30:15 +
Zenghui Yu  wrote:

Hi,

> Commit ad275b8bb1e6 ("KVM: arm/arm64: vgic-new: vgic_init: implement
> vgic_init") had set irq->targets in kvm_vgic_vcpu_init(), regardless of
> the GIC architecture (v2 or v3). When the number of vcpu reaches 32
> (in v3), UBSAN will complain about it.

The first part looks similar to this one:
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-March/637209.html

>  
> 
>  UBSAN: Undefined behaviour in 
> arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:223:21
>  shift exponent 32 is too large for 32-bit type 'unsigned int'
>  CPU: 13 PID: 833 Comm: CPU 32/KVM Kdump: loaded Not tainted 5.1.0-rc1+ #16
>  Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
>  Call trace:
>   dump_backtrace+0x0/0x190
>   show_stack+0x24/0x30
>   dump_stack+0xc8/0x114
>   ubsan_epilogue+0x14/0x50
>   __ubsan_handle_shift_out_of_bounds+0x118/0x188
>   kvm_vgic_vcpu_init+0x1d4/0x200
>   kvm_arch_vcpu_init+0x3c/0x48
>   kvm_vcpu_init+0xa8/0x100
>   kvm_arch_vcpu_create+0x94/0x120
>   kvm_vm_ioctl+0x57c/0xe58
>   do_vfs_ioctl+0xc4/0x7f0
>   ksys_ioctl+0x8c/0xa0
>   __arm64_sys_ioctl+0x28/0x38
>   el0_svc_common+0xa0/0x190
>   el0_svc_handler+0x38/0x78
>   el0_svc+0x8/0xc
>  
> 
> 
> This patch Restricts setting irq->targets in GICv2, which only supports
> a maximum of eight PEs, to keep UBSAN quiet. And since irq->mpidr will
> only be used by SPI in GICv3, we decided to set mpidr to 0 for SGI and
> PPI.
> 
> Like commit ab2d5eb03dbb ("KVM: arm/arm64: vgic: Always initialize the
> group of private IRQs"), we should also take the creating order of the
> VGIC and VCPUs into consideration.
> 
> Cc: Eric Auger 
> Cc: Marc Zyngier 
> Cc: Andre Przywara 
> Cc: Christoffer Dall 
> Signed-off-by: Zenghui Yu 
> ---
>  virt/kvm/arm/vgic/vgic-init.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
> index 3bdb31e..0cba92e 100644
> --- a/virt/kvm/arm/vgic/vgic-init.c
> +++ b/virt/kvm/arm/vgic/vgic-init.c
> @@ -220,7 +220,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>   irq->intid = i;
>   irq->vcpu = NULL;
>   irq->target_vcpu = vcpu;
> - irq->targets = 1U << vcpu->vcpu_id;
>   kref_init(>refcount);
>   if (vgic_irq_is_sgi(i)) {
>   /* SGIs */
> @@ -231,10 +230,14 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
>   irq->config = VGIC_CONFIG_LEVEL;
>   }
>  
> - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>   irq->group = 1;
> - else
> + irq->mpidr = 0;
> + } else {
>   irq->group = 0;
> + if (vcpu->vcpu_id < VGIC_V2_MAX_CPUS)
> + irq->targets = 1U << vcpu->vcpu_id;
> + }
>   }
>  
>   if (!irqchip_in_kernel(vcpu->kvm))
> @@ -297,10 +300,13 @@ int vgic_init(struct kvm *kvm)
>  
>   for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
>   struct vgic_irq *irq = _cpu->private_irqs[i];
> - if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
> + if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
>   irq->group = 1;
> - else
> + irq->mpidr = 0;
> + } else {
>   irq->group = 0;
> + irq->targets = 1U << vcpu->vcpu_id;
> + }

So why would you need this part? That does the same as above, doesn't it?

Cheers,
Andre.


>   }
>   }
>  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH] KVM: arm/arm64: vgic: Restrict setting irq->targets only in GICv2

2019-04-04 Thread Zenghui Yu
Commit ad275b8bb1e6 ("KVM: arm/arm64: vgic-new: vgic_init: implement
vgic_init") had set irq->targets in kvm_vgic_vcpu_init(), regardless of
the GIC architecture (v2 or v3). When the number of vcpu reaches 32
(in v3), UBSAN will complain about it.

 

 UBSAN: Undefined behaviour in 
arch/arm64/kvm/../../../virt/kvm/arm/vgic/vgic-init.c:223:21
 shift exponent 32 is too large for 32-bit type 'unsigned int'
 CPU: 13 PID: 833 Comm: CPU 32/KVM Kdump: loaded Not tainted 5.1.0-rc1+ #16
 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.58 10/24/2018
 Call trace:
  dump_backtrace+0x0/0x190
  show_stack+0x24/0x30
  dump_stack+0xc8/0x114
  ubsan_epilogue+0x14/0x50
  __ubsan_handle_shift_out_of_bounds+0x118/0x188
  kvm_vgic_vcpu_init+0x1d4/0x200
  kvm_arch_vcpu_init+0x3c/0x48
  kvm_vcpu_init+0xa8/0x100
  kvm_arch_vcpu_create+0x94/0x120
  kvm_vm_ioctl+0x57c/0xe58
  do_vfs_ioctl+0xc4/0x7f0
  ksys_ioctl+0x8c/0xa0
  __arm64_sys_ioctl+0x28/0x38
  el0_svc_common+0xa0/0x190
  el0_svc_handler+0x38/0x78
  el0_svc+0x8/0xc
 


This patch Restricts setting irq->targets in GICv2, which only supports
a maximum of eight PEs, to keep UBSAN quiet. And since irq->mpidr will
only be used by SPI in GICv3, we decided to set mpidr to 0 for SGI and
PPI.

Like commit ab2d5eb03dbb ("KVM: arm/arm64: vgic: Always initialize the
group of private IRQs"), we should also take the creating order of the
VGIC and VCPUs into consideration.

Cc: Eric Auger 
Cc: Marc Zyngier 
Cc: Andre Przywara 
Cc: Christoffer Dall 
Signed-off-by: Zenghui Yu 
---
 virt/kvm/arm/vgic/vgic-init.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/virt/kvm/arm/vgic/vgic-init.c b/virt/kvm/arm/vgic/vgic-init.c
index 3bdb31e..0cba92e 100644
--- a/virt/kvm/arm/vgic/vgic-init.c
+++ b/virt/kvm/arm/vgic/vgic-init.c
@@ -220,7 +220,6 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
irq->intid = i;
irq->vcpu = NULL;
irq->target_vcpu = vcpu;
-   irq->targets = 1U << vcpu->vcpu_id;
kref_init(>refcount);
if (vgic_irq_is_sgi(i)) {
/* SGIs */
@@ -231,10 +230,14 @@ int kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu)
irq->config = VGIC_CONFIG_LEVEL;
}
 
-   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
irq->group = 1;
-   else
+   irq->mpidr = 0;
+   } else {
irq->group = 0;
+   if (vcpu->vcpu_id < VGIC_V2_MAX_CPUS)
+   irq->targets = 1U << vcpu->vcpu_id;
+   }
}
 
if (!irqchip_in_kernel(vcpu->kvm))
@@ -297,10 +300,13 @@ int vgic_init(struct kvm *kvm)
 
for (i = 0; i < VGIC_NR_PRIVATE_IRQS; i++) {
struct vgic_irq *irq = _cpu->private_irqs[i];
-   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3)
+   if (dist->vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) {
irq->group = 1;
-   else
+   irq->mpidr = 0;
+   } else {
irq->group = 0;
+   irq->targets = 1U << vcpu->vcpu_id;
+   }
}
}
 
-- 
1.8.3.1


___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3 2/3] vfio-pci: Add new function for INTx one-time initialisation

2019-04-04 Thread Jean-Philippe Brucker
On 26/03/2019 07:41, Leo Yan wrote:
> To support INTx enabling for multiple times, we need firstly to extract
> one-time initialisation and move the related code into a new function
> vfio_pci_init_intx(); if later disable and re-enable the INTx, we can
> skip these one-time operations.
> 
> This patch move below three main operations for INTx one-time
> initialisation from function vfio_pci_enable_intx() into function
> vfio_pci_init_intx():
> 
> - Reserve 2 FDs for INTx;
> - Sanity check with ioctl VFIO_DEVICE_GET_IRQ_INFO;
> - Setup pdev->intx_gsi.
> 
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Leo Yan 

Thanks for the patches

Reviewed-by: Jean-Philippe Brucker 

> ---
>  vfio/pci.c | 67 --
>  1 file changed, 40 insertions(+), 27 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 5224fee..3c39844 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -1018,30 +1018,7 @@ static int vfio_pci_enable_intx(struct kvm *kvm, 
> struct vfio_device *vdev)
>   struct vfio_irq_eventfd trigger;
>   struct vfio_irq_eventfd unmask;
>   struct vfio_pci_device *pdev = >pci;
> - int gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
> -
> - struct vfio_irq_info irq_info = {
> - .argsz = sizeof(irq_info),
> - .index = VFIO_PCI_INTX_IRQ_INDEX,
> - };
> -
> - vfio_pci_reserve_irq_fds(2);
> -
> - ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> - if (ret || irq_info.count == 0) {
> - vfio_dev_err(vdev, "no INTx reported by VFIO");
> - return -ENODEV;
> - }
> -
> - if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
> - vfio_dev_err(vdev, "interrupt not eventfd capable");
> - return -EINVAL;
> - }
> -
> - if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
> - vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
> - return -EINVAL;
> - }
> + int gsi = pdev->intx_gsi;
>  
>   /*
>* PCI IRQ is level-triggered, so we use two eventfds. trigger_fd
> @@ -1097,8 +1074,6 @@ static int vfio_pci_enable_intx(struct kvm *kvm, struct 
> vfio_device *vdev)
>  
>   pdev->intx_fd = trigger_fd;
>   pdev->unmask_fd = unmask_fd;
> - /* Guest is going to ovewrite our irq_line... */
> - pdev->intx_gsi = gsi;
>  
>   return 0;
>  
> @@ -1117,6 +1092,39 @@ err_close:
>   return ret;
>  }
>  
> +static int vfio_pci_init_intx(struct kvm *kvm, struct vfio_device *vdev)
> +{
> + int ret;
> + struct vfio_pci_device *pdev = >pci;
> + struct vfio_irq_info irq_info = {
> + .argsz = sizeof(irq_info),
> + .index = VFIO_PCI_INTX_IRQ_INDEX,
> + };
> +
> + vfio_pci_reserve_irq_fds(2);
> +
> + ret = ioctl(vdev->fd, VFIO_DEVICE_GET_IRQ_INFO, _info);
> + if (ret || irq_info.count == 0) {
> + vfio_dev_err(vdev, "no INTx reported by VFIO");
> + return -ENODEV;
> + }
> +
> + if (!(irq_info.flags & VFIO_IRQ_INFO_EVENTFD)) {
> + vfio_dev_err(vdev, "interrupt not eventfd capable");
> + return -EINVAL;
> + }
> +
> + if (!(irq_info.flags & VFIO_IRQ_INFO_AUTOMASKED)) {
> + vfio_dev_err(vdev, "INTx interrupt not AUTOMASKED");
> + return -EINVAL;
> + }
> +
> + /* Guest is going to ovewrite our irq_line... */
> + pdev->intx_gsi = pdev->hdr.irq_line - KVM_IRQ_OFFSET;
> +
> + return 0;
> +}
> +
>  static int vfio_pci_configure_dev_irqs(struct kvm *kvm, struct vfio_device 
> *vdev)
>  {
>   int ret = 0;
> @@ -1142,8 +1150,13 @@ static int vfio_pci_configure_dev_irqs(struct kvm 
> *kvm, struct vfio_device *vdev
>   return ret;
>   }
>  
> - if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> + if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> + ret = vfio_pci_init_intx(kvm, vdev);
> + if (ret)
> + return ret;
> +
>   ret = vfio_pci_enable_intx(kvm, vdev);
> + }
>  
>   return ret;
>  }
> 

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool v3 3/3] vfio-pci: Re-enable INTx mode when disable MSI/MSIX

2019-04-04 Thread Jean-Philippe Brucker
On 26/03/2019 07:41, Leo Yan wrote:
> Since PCI forbids enabling INTx, MSI or MSIX at the same time, it's by
> default to disable INTx mode when enable MSI/MSIX mode; but this logic is
> easily broken if the guest PCI driver detects the MSI/MSIX cannot work as
> expected and tries to rollback to use INTx mode.  In this case, the INTx
> mode has been disabled and has no chance to re-enable it, thus both INTx
> mode and MSI/MSIX mode cannot work in vfio.
> 
> Below shows the detailed flow for introducing this issue:
> 
>   vfio_pci_configure_dev_irqs()
> `-> vfio_pci_enable_intx()
> 
>   vfio_pci_enable_msis()
> `-> vfio_pci_disable_intx()
> 
>   vfio_pci_disable_msis()   => Guest PCI driver disables MSI
> 
> To fix this issue, when disable MSI/MSIX we need to check if INTx mode
> is available for this device or not; if the device can support INTx then
> re-enable it so that the device can fallback to use it.
> 
> Since vfio_pci_disable_intx() / vfio_pci_enable_intx() pair functions
> may be called for multiple times, this patch uses 'intx_fd == -1' to
> denote the INTx is disabled, the pair functions can directly bail out
> when detect INTx has been disabled and enabled respectively.
> 
> Suggested-by: Jean-Philippe Brucker 
> Signed-off-by: Leo Yan 
> ---
>  vfio/pci.c | 41 ++---
>  1 file changed, 30 insertions(+), 11 deletions(-)
> 
> diff --git a/vfio/pci.c b/vfio/pci.c
> index 3c39844..3b2b1e7 100644
> --- a/vfio/pci.c
> +++ b/vfio/pci.c
> @@ -28,6 +28,7 @@ struct vfio_irq_eventfd {
>   msi_update_state(state, val, VFIO_PCI_MSI_STATE_EMPTY)
>  
>  static void vfio_pci_disable_intx(struct kvm *kvm, struct vfio_device *vdev);
> +static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev);
>  
>  static int vfio_pci_enable_msis(struct kvm *kvm, struct vfio_device *vdev,
>   bool msix)
> @@ -50,17 +51,14 @@ static int vfio_pci_enable_msis(struct kvm *kvm, struct 
> vfio_device *vdev,
>   if (!msi_is_enabled(msis->virt_state))
>   return 0;
>  
> - if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX) {
> - /*
> -  * PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> -  * time. Since INTx has to be enabled from the start (we don't
> -  * have a reliable way to know when the user starts using it),
> -  * disable it now.
> -  */
> + /*
> +  * PCI (and VFIO) forbids enabling INTx, MSI or MSIX at the same
> +  * time. Since INTx has to be enabled from the start (after enabling
> +  * 'pdev->intx_fd' will be assigned to an eventfd and doesn't equal
> +  * to the init value -1), disable it now.
> +  */

I don't think the comment change is useful, we don't need that much
detail. The text that you replaced was trying to explain why we enable
INTx from the start, and would still apply (although it should have been
s/user/guest/)

> + if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
>   vfio_pci_disable_intx(kvm, vdev);
> - /* Permanently disable INTx */
> - pdev->irq_modes &= ~VFIO_PCI_IRQ_MODE_INTX;
> - }
>  
>   eventfds = (void *)msis->irq_set + sizeof(struct vfio_irq_set);
>  
> @@ -162,7 +160,16 @@ static int vfio_pci_disable_msis(struct kvm *kvm, struct 
> vfio_device *vdev,
>   msi_set_enabled(msis->phys_state, false);
>   msi_set_empty(msis->phys_state, true);
>  
> - return 0;
> + /*
> +  * When MSI or MSIX is disabled, this might be called when
> +  * PCI driver detects the MSI interrupt failure and wants to
> +  * rollback to INTx mode.  Thus enable INTx if the device
> +  * supports INTx mode in this case.
> +  */
> + if (pdev->irq_modes & VFIO_PCI_IRQ_MODE_INTX)
> + ret = vfio_pci_enable_intx(kvm, vdev);
> +
> + return ret >= 0 ? 0 : ret;
>  }
>  
>  static int vfio_pci_update_msi_entry(struct kvm *kvm, struct vfio_device 
> *vdev,
> @@ -1002,6 +1009,10 @@ static void vfio_pci_disable_intx(struct kvm *kvm, 
> struct vfio_device *vdev)
>   .index  = VFIO_PCI_INTX_IRQ_INDEX,
>   };
>  
> + /* INTx mode has been disabled */

Here as well, the comments on intx_fd seem unnecessary. But these are
only nits, the code is fine and I tested for regressions on my hardware, so:

Reviewed-by: Jean-Philippe Brucker 

> + if (pdev->intx_fd == -1)
> + return;
> +
>   pr_debug("user requested MSI, disabling INTx %d", gsi);
>  
>   ioctl(vdev->fd, VFIO_DEVICE_SET_IRQS, _set);
> @@ -1009,6 +1020,7 @@ static void vfio_pci_disable_intx(struct kvm *kvm, 
> struct vfio_device *vdev)
>  
>   close(pdev->intx_fd);
>   close(pdev->unmask_fd);
> + pdev->intx_fd = -1;
>  }
>  
>  static int vfio_pci_enable_intx(struct kvm *kvm, struct vfio_device *vdev)
> @@ -1020,6 +1032,10 @@ static int vfio_pci_enable_intx(struct kvm *kvm, 
> struct vfio_device *vdev)
>   

Re: [RFC] Question about enable doorbell irq and halt_poll process

2019-04-04 Thread Marc Zyngier
On Thu, 04 Apr 2019 11:07:59 +0100,
"Tangnianyao (ICT)"  wrote:
> 
> 
> 
> On 2019/3/30 17:43, Marc Zyngier wrote:
> 
> Hi, Marc
> 
> > On Sat, 30 Mar 2019 08:42:58 +,
> > "Tangnianyao (ICT)"  wrote:
> >>
> >> Hi, Marc
> >>
> >> On 2019/3/21 1:02, Marc Zyngier Wrote:
> >>> On Tue, 19 Mar 2019 21:25:47 +0800
> >>> "Tangnianyao (ICT)"  wrote:
> >>>
>  Hi, all
> 
>  Using gicv4, when guest is waiting for irq, it sends wfi and traps to 
>  kvm.
>  When vlpi is forwarded to PE after its_vpe_deschedule, before halt_poll 
>  in
>  kvm_vcpu_block, halt_poll may increase latency for this vlpi getting to 
>  guest.
>  In halt_poll process, it checks if there's pending irq for vcpu using 
>  pending_last.
>  However, doorbell is not enable at this moment and vlpi or doorbell can 
>  not set
>  pending_last true, to stop halt_poll. It will run until halt_poll time 
>  ends, if
>  there's no other physical irq coming in the meantime. And then vcpu is 
>  scheduled out.
>  This pending vlpi has to wait for vcpu getting schedule in next time.
> 
>  Should we enable doorbell before halt_poll process ?
> >>>
> >>> Enabling doorbells can be quite expensive. Depending on the HW, this is
> >>> either:
> >>>
> >>> - a write to memory (+DSB, potential cache maintenance), a write to the
> >>>   INVLPI register, and a poll of the SYNC register
> >>> - a write to memory (+DSB, potential cache maintenance), potentially
> >>>   a string of DISCARD+SYNC+MAPI+SYNC commands, and an INV+SYNC command
> >>>
> >> I have tested average cost of kvm_vgic_v4_enable_doorbell in our machine.
> >> When gic_rdists->has_direct_lpi is 1, it costs 0.35 us.
> >> When gic_rdists->has_direct_lpi is 0, it costs 1.4 us.
> > 
> > This looks pretty low. Which HW is that on? How about on something
> > like D05?
> 
> I tested it on D06.
> D05 doesn't not support gicv4 and I haven't tested on D05.

I'm afraid you're mistaken. D05 does have GICv4. I have a pretty good
idea it does because that's the system I used to write the SW
support. So please dig one out of the cupboard and test on it. I'm
pretty sure you're at the right place to do so.

> 
> > 
> >> Compared to default halt_poll_ns, 50ns, enabling doorbells is not
> >> large at all.
> > 
> > Sure. But I'm not sure this is a universal figure.
> > 
> >>
> >>> Frankly, you want to be careful with that. I'd rather enable them late
> >>> and have a chance of not blocking because of another (virtual)
> >>> interrupt, which saves us the doorbell business.
> >>>
> >>> I wonder if you wouldn't be in a better position by drastically
> >>> reducing halt_poll_ns for vcpu that can have directly injected
> >>> interrupts.
> >>>
> >>
> >> If we set halt_poll_ns to a small value, the chance of
> >> not blocking and vcpu scheduled out becomes larger. The cost
> >> of vcpu scheduled out is quite expensive.
> >> In many cases, one pcpu is assigned to only
> >> one vcpu, and halt_poll_ns is set quite large, to increase
> >> chance of halt_poll process got terminated. This is the
> >> reason we want to set halt_poll_ns large. And We hope vlpi
> >> stop halt_poll process in time.
> > 
> > Fair enough. It is certainly realistic to strictly partition the
> > system when GICv4 is in use, so I can see some potential benefit.
> > 
> >> Though it will check whether vcpu is runnable again by
> >> kvm_vcpu_check_block. Vlpi can prevent scheduling vcpu out.
> >> However it's somewhat later if halt_poll_ns is larger.
> >>
> >>> In any case, this is something that we should measure, not guess.
> > 
> > Please post results of realistic benchmarks that we can reproduce,
> > with and without this change. I'm willing to entertain the idea, but I
> > need more than just a vague number.
> > 
> > Thanks,
> > 
> > M.
> > 
> 
> I tested it with and without change (patch attached in the last).
> halt_poll_ns is keep default, 50ns.
> I have merged the patch "arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is 
> enabled"
> to my test kernel, to make sure ICH_HCR_EL2.EN=1 in guest.
> 
> netperf result:
> D06 as server, intel 8180 server as client
> with change:
> package 512 bytes - 5400 Mbits/s
> package 64 bytes - 740 Mbits/s
> without change:
> package 512 bytes - 5000 Mbits/s
> package 64 bytes - 710 Mbits/s
> 
> Also I have tested D06 as client, intel machine as server,
> with the change, the result remains the same.

So anywhere between 4% and 8% improvement. Please post results for D05
once you've found one.

> 
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 55fe8e2..0f56904 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2256,6 +2256,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
>   if (vcpu->halt_poll_ns) {
>   ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);
> 
> +#ifdef CONFIG_ARM64
> + /*
> +  * When using gicv4, enable doorbell before halt pool wait
> 

Re: [RFC] Question about enable doorbell irq and halt_poll process

2019-04-04 Thread Tangnianyao (ICT)



On 2019/3/30 17:43, Marc Zyngier wrote:

Hi, Marc

> On Sat, 30 Mar 2019 08:42:58 +,
> "Tangnianyao (ICT)"  wrote:
>>
>> Hi, Marc
>>
>> On 2019/3/21 1:02, Marc Zyngier Wrote:
>>> On Tue, 19 Mar 2019 21:25:47 +0800
>>> "Tangnianyao (ICT)"  wrote:
>>>
 Hi, all

 Using gicv4, when guest is waiting for irq, it sends wfi and traps to kvm.
 When vlpi is forwarded to PE after its_vpe_deschedule, before halt_poll in
 kvm_vcpu_block, halt_poll may increase latency for this vlpi getting to 
 guest.
 In halt_poll process, it checks if there's pending irq for vcpu using 
 pending_last.
 However, doorbell is not enable at this moment and vlpi or doorbell can 
 not set
 pending_last true, to stop halt_poll. It will run until halt_poll time 
 ends, if
 there's no other physical irq coming in the meantime. And then vcpu is 
 scheduled out.
 This pending vlpi has to wait for vcpu getting schedule in next time.

 Should we enable doorbell before halt_poll process ?
>>>
>>> Enabling doorbells can be quite expensive. Depending on the HW, this is
>>> either:
>>>
>>> - a write to memory (+DSB, potential cache maintenance), a write to the
>>>   INVLPI register, and a poll of the SYNC register
>>> - a write to memory (+DSB, potential cache maintenance), potentially
>>>   a string of DISCARD+SYNC+MAPI+SYNC commands, and an INV+SYNC command
>>>
>> I have tested average cost of kvm_vgic_v4_enable_doorbell in our machine.
>> When gic_rdists->has_direct_lpi is 1, it costs 0.35 us.
>> When gic_rdists->has_direct_lpi is 0, it costs 1.4 us.
> 
> This looks pretty low. Which HW is that on? How about on something
> like D05?

I tested it on D06.
D05 doesn't not support gicv4 and I haven't tested on D05.

> 
>> Compared to default halt_poll_ns, 50ns, enabling doorbells is not
>> large at all.
> 
> Sure. But I'm not sure this is a universal figure.
> 
>>
>>> Frankly, you want to be careful with that. I'd rather enable them late
>>> and have a chance of not blocking because of another (virtual)
>>> interrupt, which saves us the doorbell business.
>>>
>>> I wonder if you wouldn't be in a better position by drastically
>>> reducing halt_poll_ns for vcpu that can have directly injected
>>> interrupts.
>>>
>>
>> If we set halt_poll_ns to a small value, the chance of
>> not blocking and vcpu scheduled out becomes larger. The cost
>> of vcpu scheduled out is quite expensive.
>> In many cases, one pcpu is assigned to only
>> one vcpu, and halt_poll_ns is set quite large, to increase
>> chance of halt_poll process got terminated. This is the
>> reason we want to set halt_poll_ns large. And We hope vlpi
>> stop halt_poll process in time.
> 
> Fair enough. It is certainly realistic to strictly partition the
> system when GICv4 is in use, so I can see some potential benefit.
> 
>> Though it will check whether vcpu is runnable again by
>> kvm_vcpu_check_block. Vlpi can prevent scheduling vcpu out.
>> However it's somewhat later if halt_poll_ns is larger.
>>
>>> In any case, this is something that we should measure, not guess.
> 
> Please post results of realistic benchmarks that we can reproduce,
> with and without this change. I'm willing to entertain the idea, but I
> need more than just a vague number.
> 
> Thanks,
> 
>   M.
> 

I tested it with and without change (patch attached in the last).
halt_poll_ns is keep default, 50ns.
I have merged the patch "arm64: KVM: Always set ICH_HCR_EL2.EN if GICv4 is 
enabled"
to my test kernel, to make sure ICH_HCR_EL2.EN=1 in guest.

netperf result:
D06 as server, intel 8180 server as client
with change:
package 512 bytes - 5400 Mbits/s
package 64 bytes - 740 Mbits/s
without change:
package 512 bytes - 5000 Mbits/s
package 64 bytes - 710 Mbits/s

Also I have tested D06 as client, intel machine as server,
with the change, the result remains the same.


diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 55fe8e2..0f56904 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -2256,6 +2256,16 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu)
if (vcpu->halt_poll_ns) {
ktime_t stop = ktime_add_ns(ktime_get(), vcpu->halt_poll_ns);

+#ifdef CONFIG_ARM64
+   /*
+* When using gicv4, enable doorbell before halt pool wait
+* so that direct vlpi can stop halt poll.
+*/
+   if (vcpu->arch.vgic_cpu.vgic_v3.its_vpe.its_vm) {
+   kvm_vgic_v4_enable_doorbell(vcpu);
+   }
+#endif
+
++vcpu->stat.halt_attempted_poll;
do {
/*


Thanks,
Nianyao Tang




___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 26/27] KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 09:38:49AM +, Dave P Martin wrote:
> On Thu, Apr 04, 2019 at 10:34:15AM +0100, Andrew Jones wrote:
> > On Thu, Apr 04, 2019 at 09:35:26AM +0100, Dave Martin wrote:
> > > On Wed, Apr 03, 2019 at 10:27:46PM +0200, Andrew Jones wrote:
> > > > On Fri, Mar 29, 2019 at 01:00:51PM +, Dave Martin wrote:
> > > > > KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that
> > > > > are not documented (but hopefully not surprising either).  To give
> > > > > an indication of what these may mean, this patch adds brief
> > > > > documentation.
> > > > >
> > > > > Signed-off-by: Dave Martin 
> > > > > ---
> > > > >  Documentation/virtual/kvm/api.txt | 6 ++
> > > > >  1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > > b/Documentation/virtual/kvm/api.txt
> > > > > index 2d4f7ce..cd920dd 100644
> > > > > --- a/Documentation/virtual/kvm/api.txt
> > > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > > @@ -1871,6 +1871,9 @@ Architectures: all
> > > > >  Type: vcpu ioctl
> > > > >  Parameters: struct kvm_one_reg (in)
> > > > >  Returns: 0 on success, negative value on failure
> > > > > +Errors:
> > > > > +  ENOENT:   no such register
> > > > > +  EINVAL:   other errors, such as bad size encoding for a known 
> > > > > register
> > > > >
> > > > >  struct kvm_one_reg {
> > > > > __u64 id;
> > > > > @@ -2192,6 +2195,9 @@ Architectures: all
> > > > >  Type: vcpu ioctl
> > > > >  Parameters: struct kvm_one_reg (in and out)
> > > > >  Returns: 0 on success, negative value on failure
> > > > > +Errors:
> > > > > +  ENOENT:   no such register
> > > > > +  EINVAL:   other errors, such as bad size encoding for a known 
> > > > > register
> > > > >
> > > > >  This ioctl allows to receive the value of a single register 
> > > > > implemented
> > > > >  in a vcpu. The register to read is indicated by the "id" field of the
> > > > > --
> > > > > 2.1.4
> > > > >
> > > >
> > > > Are we sure all architectures have these, and only these errors? A quick
> > > > grep indicates not. I'm not sure we can document this easily here due to
> > > > it addressing all architectures at once. Maybe we could add 
> > > > arch-specific
> > > > subsections, but I'm not sure it's worth it.
> > >
> > > Error codes are generally indicative at best, and rarely mutually
> > > exclusive in a given situation.
> > >
> > > Writing a caveat to say that you shouldn't make assumptions about
> > > precisely what error code will be returned in a given situation would
> > > look odd: that really applies widely all over the kernel/user ABI, not
> > > just here.
> > >
> > > _Maybe_ everything is exhaustively correct in this file already, but
> > > given the size of it, and the fact that many things are implemented
> > > per-arch, the chance of this seems zero.
> > >
> > > I could add "invalid register ID" for EINVAL.  This allows some
> > > ambiguity about which error code applies for a nonexistent register.
> > >
> > > Alternatively, we could revert this documentation instead.
> >
> > Yeah, I vote we just drop this patch. It could add more confusion
> > when people start here to determine what to grep and then, when
> > grepping for it, can't find any occurrences, as would be the case
> > for ENOENT on at least a couple arches.
> 
> I guess so.  It's not like we're claiming this ioctl _can't_ fail.
> So leaving interpretation of the errno values to common sense is
> reasonable.
> 
> > > It seemed appropriate to document the EPERM error for finalization
> > > ordering (added later), but since correct userspace code should never
> > > see this maybe it's reasonable to leave it undocumented(?)
> 
> I assume you mean to remove this too?

I haven't reviewed that stuff yet, so I don't yet have an opinion :)
I'll get through it all today, but I'm trashing on mail and bugzilla
this morning, so progress on the review hasn't started back up since
last night yet...

drew

> 
> I don't think correctly written userspace should be relying on seeing
> this error code (as opposed to e.g., KVM_GET_REG_LIST's E2BIG which you
> must observe in order to allocate the right amount of memory for the reg
> list).
> 
> Cheers
> ---Dave
> IMPORTANT NOTICE: The contents of this email and any attachments are 
> confidential and may also be privileged. If you are not the intended 
> recipient, please notify the sender immediately and do not disclose the 
> contents to any other person, use it for any purpose, or store or copy the 
> information in any medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 26/27] KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG

2019-04-04 Thread Dave P Martin
On Thu, Apr 04, 2019 at 10:34:15AM +0100, Andrew Jones wrote:
> On Thu, Apr 04, 2019 at 09:35:26AM +0100, Dave Martin wrote:
> > On Wed, Apr 03, 2019 at 10:27:46PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:51PM +, Dave Martin wrote:
> > > > KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that
> > > > are not documented (but hopefully not surprising either).  To give
> > > > an indication of what these may mean, this patch adds brief
> > > > documentation.
> > > >
> > > > Signed-off-by: Dave Martin 
> > > > ---
> > > >  Documentation/virtual/kvm/api.txt | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > > b/Documentation/virtual/kvm/api.txt
> > > > index 2d4f7ce..cd920dd 100644
> > > > --- a/Documentation/virtual/kvm/api.txt
> > > > +++ b/Documentation/virtual/kvm/api.txt
> > > > @@ -1871,6 +1871,9 @@ Architectures: all
> > > >  Type: vcpu ioctl
> > > >  Parameters: struct kvm_one_reg (in)
> > > >  Returns: 0 on success, negative value on failure
> > > > +Errors:
> > > > +  ENOENT:   no such register
> > > > +  EINVAL:   other errors, such as bad size encoding for a known 
> > > > register
> > > >
> > > >  struct kvm_one_reg {
> > > > __u64 id;
> > > > @@ -2192,6 +2195,9 @@ Architectures: all
> > > >  Type: vcpu ioctl
> > > >  Parameters: struct kvm_one_reg (in and out)
> > > >  Returns: 0 on success, negative value on failure
> > > > +Errors:
> > > > +  ENOENT:   no such register
> > > > +  EINVAL:   other errors, such as bad size encoding for a known 
> > > > register
> > > >
> > > >  This ioctl allows to receive the value of a single register implemented
> > > >  in a vcpu. The register to read is indicated by the "id" field of the
> > > > --
> > > > 2.1.4
> > > >
> > >
> > > Are we sure all architectures have these, and only these errors? A quick
> > > grep indicates not. I'm not sure we can document this easily here due to
> > > it addressing all architectures at once. Maybe we could add arch-specific
> > > subsections, but I'm not sure it's worth it.
> >
> > Error codes are generally indicative at best, and rarely mutually
> > exclusive in a given situation.
> >
> > Writing a caveat to say that you shouldn't make assumptions about
> > precisely what error code will be returned in a given situation would
> > look odd: that really applies widely all over the kernel/user ABI, not
> > just here.
> >
> > _Maybe_ everything is exhaustively correct in this file already, but
> > given the size of it, and the fact that many things are implemented
> > per-arch, the chance of this seems zero.
> >
> > I could add "invalid register ID" for EINVAL.  This allows some
> > ambiguity about which error code applies for a nonexistent register.
> >
> > Alternatively, we could revert this documentation instead.
>
> Yeah, I vote we just drop this patch. It could add more confusion
> when people start here to determine what to grep and then, when
> grepping for it, can't find any occurrences, as would be the case
> for ENOENT on at least a couple arches.

I guess so.  It's not like we're claiming this ioctl _can't_ fail.
So leaving interpretation of the errno values to common sense is
reasonable.

> > It seemed appropriate to document the EPERM error for finalization
> > ordering (added later), but since correct userspace code should never
> > see this maybe it's reasonable to leave it undocumented(?)

I assume you mean to remove this too?

I don't think correctly written userspace should be relying on seeing
this error code (as opposed to e.g., KVM_GET_REG_LIST's E2BIG which you
must observe in order to allocate the right amount of memory for the reg
list).

Cheers
---Dave
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 26/27] KVM: Document errors for KVM_GET_ONE_REG and KVM_SET_ONE_REG

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 09:35:26AM +0100, Dave Martin wrote:
> On Wed, Apr 03, 2019 at 10:27:46PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:51PM +, Dave Martin wrote:
> > > KVM_GET_ONE_REG and KVM_SET_ONE_REG return some error codes that
> > > are not documented (but hopefully not surprising either).  To give
> > > an indication of what these may mean, this patch adds brief
> > > documentation.
> > > 
> > > Signed-off-by: Dave Martin 
> > > ---
> > >  Documentation/virtual/kvm/api.txt | 6 ++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/Documentation/virtual/kvm/api.txt 
> > > b/Documentation/virtual/kvm/api.txt
> > > index 2d4f7ce..cd920dd 100644
> > > --- a/Documentation/virtual/kvm/api.txt
> > > +++ b/Documentation/virtual/kvm/api.txt
> > > @@ -1871,6 +1871,9 @@ Architectures: all
> > >  Type: vcpu ioctl
> > >  Parameters: struct kvm_one_reg (in)
> > >  Returns: 0 on success, negative value on failure
> > > +Errors:
> > > +  ENOENT:   no such register
> > > +  EINVAL:   other errors, such as bad size encoding for a known register
> > >  
> > >  struct kvm_one_reg {
> > > __u64 id;
> > > @@ -2192,6 +2195,9 @@ Architectures: all
> > >  Type: vcpu ioctl
> > >  Parameters: struct kvm_one_reg (in and out)
> > >  Returns: 0 on success, negative value on failure
> > > +Errors:
> > > +  ENOENT:   no such register
> > > +  EINVAL:   other errors, such as bad size encoding for a known register
> > >  
> > >  This ioctl allows to receive the value of a single register implemented
> > >  in a vcpu. The register to read is indicated by the "id" field of the
> > > -- 
> > > 2.1.4
> > >
> > 
> > Are we sure all architectures have these, and only these errors? A quick
> > grep indicates not. I'm not sure we can document this easily here due to
> > it addressing all architectures at once. Maybe we could add arch-specific
> > subsections, but I'm not sure it's worth it.
> 
> Error codes are generally indicative at best, and rarely mutually
> exclusive in a given situation.
> 
> Writing a caveat to say that you shouldn't make assumptions about
> precisely what error code will be returned in a given situation would
> look odd: that really applies widely all over the kernel/user ABI, not
> just here.
> 
> _Maybe_ everything is exhaustively correct in this file already, but
> given the size of it, and the fact that many things are implemented
> per-arch, the chance of this seems zero.
> 
> I could add "invalid register ID" for EINVAL.  This allows some
> ambiguity about which error code applies for a nonexistent register.
> 
> Alternatively, we could revert this documentation instead.

Yeah, I vote we just drop this patch. It could add more confusion
when people start here to determine what to grep and then, when
grepping for it, can't find any occurrences, as would be the case
for ENOENT on at least a couple arches.

Thanks,
drew

> 
> It seemed appropriate to document the EPERM error for finalization
> ordering (added later), but since correct userspace code should never
> see this maybe it's reasonable to leave it undocumented(?)
> 
> Cheers
> ---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 12/27] KVM: arm64/sve: System register context switch and access support

2019-04-04 Thread Andrew Jones
On Fri, Mar 29, 2019 at 01:00:37PM +, Dave Martin wrote:
> This patch adds the necessary support for context switching ZCR_EL1
> for each vcpu.
> 
> ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
> sense for it to be handled as part of the guest FPSIMD/SVE context
> for context switch purposes instead of handling it as a general
> system register.  This means that it can be switched in lazily at
> the appropriate time.  No effort is made to track host context for
> this register, since SVE requires VHE: thus the hosts's value for
> this register lives permanently in ZCR_EL2 and does not alias the
> guest's value at any time.
> 
> The Hyp switch and fpsimd context handling code is extended
> appropriately.
> 
> Accessors are added in sys_regs.c to expose the SVE system
> registers and ID register fields.  Because these need to be
> conditionally visible based on the guest configuration, they are
> implemented separately for now rather than by use of the generic
> system register helpers.  This may be abstracted better later on
> when/if there are more features requiring this model.
> 
> ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
> guest, but for compatibility with non-SVE aware KVM implementations
> the register should not be enumerated at all for KVM_GET_REG_LIST
> in this case.  For consistency we also reject ioctl access to the
> register.  This ensures that a non-SVE-enabled guest looks the same
> to userspace, irrespective of whether the kernel KVM implementation
> supports SVE.
> 
> Signed-off-by: Dave Martin 
> Reviewed-by: Julien Thierry 
> Tested-by: zhang.lei 
> 
> ---
> 
> Changes since v5:
> 
>  * Port to the renamed visibility() framework.
> 
>  * Swap visiblity() helpers so that they appear by the relevant accessor
>functions.
> 
>  * [Julien Grall] With the visibility() checks, {get,set}_zcr_el1()
>degenerate to doing exactly what the common code does, so drop them.
> 
>The ID_AA64ZFR0_EL1 handlers are still needed to provide contitional
>RAZ behaviour.  This could be moved to the common code too, but since
>this is a one-off case I don't do this for now.  We can address this
>later if other regs need to follow the same pattern.
> 
>  * [Julien Thierry] Reset ZCR_EL1 to a fixed value using reset_val
>instead of using relying on reset_unknown() honouring set bits in val
>as RES0.
> 
>Most of the bits in ZCR_EL1 are RES0 anyway, and many implementations
>of SVE will support larger vectors than 128 bits, so 0 seems as good
>a value as any to expose guests that forget to initialise this
>register properly.
> ---
>  arch/arm64/include/asm/kvm_host.h |  1 +
>  arch/arm64/include/asm/sysreg.h   |  3 ++
>  arch/arm64/kvm/fpsimd.c   |  9 -
>  arch/arm64/kvm/hyp/switch.c   |  3 ++
>  arch/arm64/kvm/sys_regs.c | 83 
> ---
>  5 files changed, 93 insertions(+), 6 deletions(-)
>

Reviewed-by: Andrew Jones 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 12/27] KVM: arm64/sve: System register context switch and access support

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 10:32:18AM +0200, Andrew Jones wrote:
> On Thu, Apr 04, 2019 at 09:06:58AM +0100, Dave Martin wrote:
> > On Wed, Apr 03, 2019 at 09:39:43PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:37PM +, Dave Martin wrote:

[...]

> > > > +static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > > > +   const struct sys_reg_desc *rd,
> > > > +   const struct kvm_one_reg *reg, void __user *uaddr)
> > > > +{
> > > > +   u64 val;
> > > > +
> > > > +   if (!vcpu_has_sve(vcpu))
> > > > +   return -ENOENT;
> > > 
> > > This shouldn't be necessary. The visibility check in
> > > kvm_arm_sys_reg_get_reg already covers it.
> > > 
> > > > +
> > > > +   val = guest_id_aa64zfr0_el1(vcpu);
> > > > +   return reg_to_user(uaddr, , reg->id);
> > > > +}
> > > > +
> > > > +static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > > > +   const struct sys_reg_desc *rd,
> > > > +   const struct kvm_one_reg *reg, void __user *uaddr)
> > > > +{
> > > > +   const u64 id = sys_reg_to_index(rd);
> > > > +   int err;
> > > > +   u64 val;
> > > > +
> > > > +   if (!vcpu_has_sve(vcpu))
> > > > +   return -ENOENT;
> > > 
> > > Also not necessary.
> > 
> > Hmm, true.  Because the logic is a bit spread out I felt uneasy with
> > simply deleting these checks, but if they fire, something has
> > definitely gone wrong elsewhere.
> > 
> > In its current form the code makes it look like it could be legitimate
> > to get here with !vcpu_has_sve(vcpu), which is misleading.
> > 
> > What if we demote these to WARN_ON()?  This isn't a fast path.
> 
> A WARN_ON sounds good to me.

OK.  Are you happy to give your Reviewed-by on the following?

--8<--

>From 88aeb182ba787102e627e33728a08b70f467758c Mon Sep 17 00:00:00 2001
From: Dave Martin 
Date: Thu, 4 Apr 2019 09:41:46 +0100
Subject: [PATCH] KVM: arm64/sve: Demote redundant sysreg vcpu_has_sve() checks
 to WARNs

Because of the logic in kvm_arm_sys_reg_{get,set}_reg() and
sve_id_visibility(), we should never call
{get,set}_id_aa64zfr0_el1() for a vcpu where !vcpu_has_sve(vcpu).

To avoid the code giving the impression that it is valid for these
functions to be called in this situation, and to help the compiler
make the right optimisation decisions, this patch adds WARN_ON()
for these cases.

Given the way the logic is spread out, this seems preferable to
dropping the checks altogether.

Signed-off-by: Dave Martin 
---
 arch/arm64/kvm/sys_regs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c
index 09e9b06..7046c76 100644
--- a/arch/arm64/kvm/sys_regs.c
+++ b/arch/arm64/kvm/sys_regs.c
@@ -1144,7 +1144,7 @@ static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
 {
u64 val;
 
-   if (!vcpu_has_sve(vcpu))
+   if (WARN_ON(!vcpu_has_sve(vcpu)))
return -ENOENT;
 
val = guest_id_aa64zfr0_el1(vcpu);
@@ -1159,7 +1159,7 @@ static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
int err;
u64 val;
 
-   if (!vcpu_has_sve(vcpu))
+   if (WARN_ON(!vcpu_has_sve(vcpu)))
return -ENOENT;
 
err = reg_from_user(, uaddr, id);
-- 
2.1.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 13/27] KVM: arm64/sve: Context switch the SVE registers

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 10:35:02AM +0200, Andrew Jones wrote:
> On Thu, Apr 04, 2019 at 09:10:08AM +0100, Dave Martin wrote:
> > On Wed, Apr 03, 2019 at 10:01:45PM +0200, Andrew Jones wrote:
> > > On Fri, Mar 29, 2019 at 01:00:38PM +, Dave Martin wrote:
> > > > In order to give each vcpu its own view of the SVE registers, this
> > > > patch adds context storage via a new sve_state pointer in struct
> > > > vcpu_arch.  An additional member sve_max_vl is also added for each
> > > > vcpu, to determine the maximum vector length visible to the guest
> > > > and thus the value to be configured in ZCR_EL2.LEN while the vcpu
> > > > is active.  This also determines the layout and size of the storage
> > > > in sve_state, which is read and written by the same backend
> > > > functions that are used for context-switching the SVE state for
> > > > host tasks.
> > > > 
> > > > On SVE-enabled vcpus, SVE access traps are now handled by switching
> > > > in the vcpu's SVE context and disabling the trap before returning
> > > > to the guest.  On other vcpus, the trap is not handled and an exit
> > > > back to the host occurs, where the handle_sve() fallback path
> > > > reflects an undefined instruction exception back to the guest,
> > > > consistently with the behaviour of non-SVE-capable hardware (as was
> > > > done unconditionally prior to this patch).
> > > > 
> > > > No SVE handling is added on non-VHE-only paths, since VHE is an
> > > > architectural and Kconfig prerequisite of SVE.
> > > > 
> > > > Signed-off-by: Dave Martin 
> > > > Reviewed-by: Julien Thierry 
> > > > Tested-by: zhang.lei 
> > > > 
> > > > ---
> > > > 
> > > > Changes since v5:
> > > > 
> > > >  * [Julien Thierry, Julien Grall] Commit message typo fixes
> > > > 
> > > >  * [Mark Rutland] Rename trap_class to hsr_ec, for consistency with
> > > >existing code.
> > > > 
> > > >  * [Mark Rutland] Simplify condition for refusing to handle an
> > > >FPSIMD/SVE trap, using multiple if () statements for clarity.  The
> > > >previous condition was a bit tortuous, and how that the static_key
> > > >checks have been hoisted out, it makes little difference to the
> > > >compiler how we express the condition here.
> > > > ---
> > > >  arch/arm64/include/asm/kvm_host.h |  6 
> > > >  arch/arm64/kvm/fpsimd.c   |  5 +--
> > > >  arch/arm64/kvm/hyp/switch.c   | 75 
> > > > +--
> > > >  3 files changed, 66 insertions(+), 20 deletions(-)
> > > > 
> > > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > > b/arch/arm64/include/asm/kvm_host.h
> > > > index 22cf484..4fabfd2 100644
> > > > --- a/arch/arm64/include/asm/kvm_host.h
> > > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > > @@ -228,6 +228,8 @@ struct vcpu_reset_state {
> > > >  
> > > >  struct kvm_vcpu_arch {
> > > > struct kvm_cpu_context ctxt;
> > > > +   void *sve_state;
> > > > +   unsigned int sve_max_vl;
> > > >  
> > > > /* HYP configuration */
> > > > u64 hcr_el2;
> > > > @@ -323,6 +325,10 @@ struct kvm_vcpu_arch {
> > > > bool sysregs_loaded_on_cpu;
> > > >  };
> > > >  
> > > > +/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> > > > +#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) 
> > > > + \
> > > > + 
> > > > sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> > > 
> > > Maybe an inline function instead?
> > 
> > I tried, but that requires the definition of struct kvm_vcpu to be
> > visible.  I failed to get that here without circular #include problems,
> > and it looked tricky to fix.
> 
> Ah, OK
> 
> > 
> > Since this is a small bit of code which is unlikely to get used by
> > accident, I decided it was OK to keep it as a macro.
> > 
> > Can you see another way around this?
> 
> Nope

OK.  If someone eventually solves this, I'd be happy to change to an
inline function.

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 13/27] KVM: arm64/sve: Context switch the SVE registers

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 09:10:08AM +0100, Dave Martin wrote:
> On Wed, Apr 03, 2019 at 10:01:45PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:38PM +, Dave Martin wrote:
> > > In order to give each vcpu its own view of the SVE registers, this
> > > patch adds context storage via a new sve_state pointer in struct
> > > vcpu_arch.  An additional member sve_max_vl is also added for each
> > > vcpu, to determine the maximum vector length visible to the guest
> > > and thus the value to be configured in ZCR_EL2.LEN while the vcpu
> > > is active.  This also determines the layout and size of the storage
> > > in sve_state, which is read and written by the same backend
> > > functions that are used for context-switching the SVE state for
> > > host tasks.
> > > 
> > > On SVE-enabled vcpus, SVE access traps are now handled by switching
> > > in the vcpu's SVE context and disabling the trap before returning
> > > to the guest.  On other vcpus, the trap is not handled and an exit
> > > back to the host occurs, where the handle_sve() fallback path
> > > reflects an undefined instruction exception back to the guest,
> > > consistently with the behaviour of non-SVE-capable hardware (as was
> > > done unconditionally prior to this patch).
> > > 
> > > No SVE handling is added on non-VHE-only paths, since VHE is an
> > > architectural and Kconfig prerequisite of SVE.
> > > 
> > > Signed-off-by: Dave Martin 
> > > Reviewed-by: Julien Thierry 
> > > Tested-by: zhang.lei 
> > > 
> > > ---
> > > 
> > > Changes since v5:
> > > 
> > >  * [Julien Thierry, Julien Grall] Commit message typo fixes
> > > 
> > >  * [Mark Rutland] Rename trap_class to hsr_ec, for consistency with
> > >existing code.
> > > 
> > >  * [Mark Rutland] Simplify condition for refusing to handle an
> > >FPSIMD/SVE trap, using multiple if () statements for clarity.  The
> > >previous condition was a bit tortuous, and how that the static_key
> > >checks have been hoisted out, it makes little difference to the
> > >compiler how we express the condition here.
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h |  6 
> > >  arch/arm64/kvm/fpsimd.c   |  5 +--
> > >  arch/arm64/kvm/hyp/switch.c   | 75 
> > > +--
> > >  3 files changed, 66 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 22cf484..4fabfd2 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -228,6 +228,8 @@ struct vcpu_reset_state {
> > >  
> > >  struct kvm_vcpu_arch {
> > >   struct kvm_cpu_context ctxt;
> > > + void *sve_state;
> > > + unsigned int sve_max_vl;
> > >  
> > >   /* HYP configuration */
> > >   u64 hcr_el2;
> > > @@ -323,6 +325,10 @@ struct kvm_vcpu_arch {
> > >   bool sysregs_loaded_on_cpu;
> > >  };
> > >  
> > > +/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> > > +#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + 
> > > \
> > > +   sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> > 
> > Maybe an inline function instead?
> 
> I tried, but that requires the definition of struct kvm_vcpu to be
> visible.  I failed to get that here without circular #include problems,
> and it looked tricky to fix.

Ah, OK

> 
> Since this is a small bit of code which is unlikely to get used by
> accident, I decided it was OK to keep it as a macro.
> 
> Can you see another way around this?

Nope

drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 12/27] KVM: arm64/sve: System register context switch and access support

2019-04-04 Thread Andrew Jones
On Thu, Apr 04, 2019 at 09:06:58AM +0100, Dave Martin wrote:
> On Wed, Apr 03, 2019 at 09:39:43PM +0200, Andrew Jones wrote:
> > On Fri, Mar 29, 2019 at 01:00:37PM +, Dave Martin wrote:
> > > This patch adds the necessary support for context switching ZCR_EL1
> > > for each vcpu.
> > > 
> > > ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
> > > sense for it to be handled as part of the guest FPSIMD/SVE context
> > > for context switch purposes instead of handling it as a general
> > > system register.  This means that it can be switched in lazily at
> > > the appropriate time.  No effort is made to track host context for
> > > this register, since SVE requires VHE: thus the hosts's value for
> > > this register lives permanently in ZCR_EL2 and does not alias the
> > > guest's value at any time.
> > > 
> > > The Hyp switch and fpsimd context handling code is extended
> > > appropriately.
> > > 
> > > Accessors are added in sys_regs.c to expose the SVE system
> > > registers and ID register fields.  Because these need to be
> > > conditionally visible based on the guest configuration, they are
> > > implemented separately for now rather than by use of the generic
> > > system register helpers.  This may be abstracted better later on
> > > when/if there are more features requiring this model.
> > > 
> > > ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
> > > guest, but for compatibility with non-SVE aware KVM implementations
> > > the register should not be enumerated at all for KVM_GET_REG_LIST
> > > in this case.  For consistency we also reject ioctl access to the
> > > register.  This ensures that a non-SVE-enabled guest looks the same
> > > to userspace, irrespective of whether the kernel KVM implementation
> > > supports SVE.
> > > 
> > > Signed-off-by: Dave Martin 
> > > Reviewed-by: Julien Thierry 
> > > Tested-by: zhang.lei 
> > > 
> > > ---
> > > 
> > > Changes since v5:
> > > 
> > >  * Port to the renamed visibility() framework.
> > > 
> > >  * Swap visiblity() helpers so that they appear by the relevant accessor
> > >functions.
> > > 
> > >  * [Julien Grall] With the visibility() checks, {get,set}_zcr_el1()
> > >degenerate to doing exactly what the common code does, so drop them.
> > > 
> > >The ID_AA64ZFR0_EL1 handlers are still needed to provide contitional
> > >RAZ behaviour.  This could be moved to the common code too, but since
> > >this is a one-off case I don't do this for now.  We can address this
> > >later if other regs need to follow the same pattern.
> > > 
> > >  * [Julien Thierry] Reset ZCR_EL1 to a fixed value using reset_val
> > >instead of using relying on reset_unknown() honouring set bits in val
> > >as RES0.
> > > 
> > >Most of the bits in ZCR_EL1 are RES0 anyway, and many implementations
> > >of SVE will support larger vectors than 128 bits, so 0 seems as good
> > >a value as any to expose guests that forget to initialise this
> > >register properly.
> > > ---
> 
> [...]
> 
> > > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > > index 3563fe6..9d46066 100644
> > > --- a/arch/arm64/kvm/hyp/switch.c
> > > +++ b/arch/arm64/kvm/hyp/switch.c
> 
> [...]
> 
> > > +static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > > + const struct sys_reg_desc *rd,
> > > + const struct kvm_one_reg *reg, void __user *uaddr)
> > > +{
> > > + u64 val;
> > > +
> > > + if (!vcpu_has_sve(vcpu))
> > > + return -ENOENT;
> > 
> > This shouldn't be necessary. The visibility check in
> > kvm_arm_sys_reg_get_reg already covers it.
> > 
> > > +
> > > + val = guest_id_aa64zfr0_el1(vcpu);
> > > + return reg_to_user(uaddr, , reg->id);
> > > +}
> > > +
> > > +static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > > + const struct sys_reg_desc *rd,
> > > + const struct kvm_one_reg *reg, void __user *uaddr)
> > > +{
> > > + const u64 id = sys_reg_to_index(rd);
> > > + int err;
> > > + u64 val;
> > > +
> > > + if (!vcpu_has_sve(vcpu))
> > > + return -ENOENT;
> > 
> > Also not necessary.
> 
> Hmm, true.  Because the logic is a bit spread out I felt uneasy with
> simply deleting these checks, but if they fire, something has
> definitely gone wrong elsewhere.
> 
> In its current form the code makes it look like it could be legitimate
> to get here with !vcpu_has_sve(vcpu), which is misleading.
> 
> What if we demote these to WARN_ON()?  This isn't a fast path.

A WARN_ON sounds good to me.

Thanks,
drew
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 13/27] KVM: arm64/sve: Context switch the SVE registers

2019-04-04 Thread Dave Martin
On Wed, Apr 03, 2019 at 10:01:45PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:38PM +, Dave Martin wrote:
> > In order to give each vcpu its own view of the SVE registers, this
> > patch adds context storage via a new sve_state pointer in struct
> > vcpu_arch.  An additional member sve_max_vl is also added for each
> > vcpu, to determine the maximum vector length visible to the guest
> > and thus the value to be configured in ZCR_EL2.LEN while the vcpu
> > is active.  This also determines the layout and size of the storage
> > in sve_state, which is read and written by the same backend
> > functions that are used for context-switching the SVE state for
> > host tasks.
> > 
> > On SVE-enabled vcpus, SVE access traps are now handled by switching
> > in the vcpu's SVE context and disabling the trap before returning
> > to the guest.  On other vcpus, the trap is not handled and an exit
> > back to the host occurs, where the handle_sve() fallback path
> > reflects an undefined instruction exception back to the guest,
> > consistently with the behaviour of non-SVE-capable hardware (as was
> > done unconditionally prior to this patch).
> > 
> > No SVE handling is added on non-VHE-only paths, since VHE is an
> > architectural and Kconfig prerequisite of SVE.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Julien Thierry 
> > Tested-by: zhang.lei 
> > 
> > ---
> > 
> > Changes since v5:
> > 
> >  * [Julien Thierry, Julien Grall] Commit message typo fixes
> > 
> >  * [Mark Rutland] Rename trap_class to hsr_ec, for consistency with
> >existing code.
> > 
> >  * [Mark Rutland] Simplify condition for refusing to handle an
> >FPSIMD/SVE trap, using multiple if () statements for clarity.  The
> >previous condition was a bit tortuous, and how that the static_key
> >checks have been hoisted out, it makes little difference to the
> >compiler how we express the condition here.
> > ---
> >  arch/arm64/include/asm/kvm_host.h |  6 
> >  arch/arm64/kvm/fpsimd.c   |  5 +--
> >  arch/arm64/kvm/hyp/switch.c   | 75 
> > +--
> >  3 files changed, 66 insertions(+), 20 deletions(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 22cf484..4fabfd2 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -228,6 +228,8 @@ struct vcpu_reset_state {
> >  
> >  struct kvm_vcpu_arch {
> > struct kvm_cpu_context ctxt;
> > +   void *sve_state;
> > +   unsigned int sve_max_vl;
> >  
> > /* HYP configuration */
> > u64 hcr_el2;
> > @@ -323,6 +325,10 @@ struct kvm_vcpu_arch {
> > bool sysregs_loaded_on_cpu;
> >  };
> >  
> > +/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
> > +#define vcpu_sve_pffr(vcpu) ((void *)((char *)((vcpu)->arch.sve_state) + \
> > + sve_ffr_offset((vcpu)->arch.sve_max_vl)))
> 
> Maybe an inline function instead?

I tried, but that requires the definition of struct kvm_vcpu to be
visible.  I failed to get that here without circular #include problems,
and it looked tricky to fix.

Since this is a small bit of code which is unlikely to get used by
accident, I decided it was OK to keep it as a macro.

Can you see another way around this?

[...]

> Reviewed-by: Andrew Jones 

Thanks

---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v7 12/27] KVM: arm64/sve: System register context switch and access support

2019-04-04 Thread Dave Martin
On Wed, Apr 03, 2019 at 09:39:43PM +0200, Andrew Jones wrote:
> On Fri, Mar 29, 2019 at 01:00:37PM +, Dave Martin wrote:
> > This patch adds the necessary support for context switching ZCR_EL1
> > for each vcpu.
> > 
> > ZCR_EL1 is trapped alongside the FPSIMD/SVE registers, so it makes
> > sense for it to be handled as part of the guest FPSIMD/SVE context
> > for context switch purposes instead of handling it as a general
> > system register.  This means that it can be switched in lazily at
> > the appropriate time.  No effort is made to track host context for
> > this register, since SVE requires VHE: thus the hosts's value for
> > this register lives permanently in ZCR_EL2 and does not alias the
> > guest's value at any time.
> > 
> > The Hyp switch and fpsimd context handling code is extended
> > appropriately.
> > 
> > Accessors are added in sys_regs.c to expose the SVE system
> > registers and ID register fields.  Because these need to be
> > conditionally visible based on the guest configuration, they are
> > implemented separately for now rather than by use of the generic
> > system register helpers.  This may be abstracted better later on
> > when/if there are more features requiring this model.
> > 
> > ID_AA64ZFR0_EL1 is RO-RAZ for MRS/MSR when SVE is disabled for the
> > guest, but for compatibility with non-SVE aware KVM implementations
> > the register should not be enumerated at all for KVM_GET_REG_LIST
> > in this case.  For consistency we also reject ioctl access to the
> > register.  This ensures that a non-SVE-enabled guest looks the same
> > to userspace, irrespective of whether the kernel KVM implementation
> > supports SVE.
> > 
> > Signed-off-by: Dave Martin 
> > Reviewed-by: Julien Thierry 
> > Tested-by: zhang.lei 
> > 
> > ---
> > 
> > Changes since v5:
> > 
> >  * Port to the renamed visibility() framework.
> > 
> >  * Swap visiblity() helpers so that they appear by the relevant accessor
> >functions.
> > 
> >  * [Julien Grall] With the visibility() checks, {get,set}_zcr_el1()
> >degenerate to doing exactly what the common code does, so drop them.
> > 
> >The ID_AA64ZFR0_EL1 handlers are still needed to provide contitional
> >RAZ behaviour.  This could be moved to the common code too, but since
> >this is a one-off case I don't do this for now.  We can address this
> >later if other regs need to follow the same pattern.
> > 
> >  * [Julien Thierry] Reset ZCR_EL1 to a fixed value using reset_val
> >instead of using relying on reset_unknown() honouring set bits in val
> >as RES0.
> > 
> >Most of the bits in ZCR_EL1 are RES0 anyway, and many implementations
> >of SVE will support larger vectors than 128 bits, so 0 seems as good
> >a value as any to expose guests that forget to initialise this
> >register properly.
> > ---

[...]

> > diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> > index 3563fe6..9d46066 100644
> > --- a/arch/arm64/kvm/hyp/switch.c
> > +++ b/arch/arm64/kvm/hyp/switch.c

[...]

> > +static int get_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > +   const struct sys_reg_desc *rd,
> > +   const struct kvm_one_reg *reg, void __user *uaddr)
> > +{
> > +   u64 val;
> > +
> > +   if (!vcpu_has_sve(vcpu))
> > +   return -ENOENT;
> 
> This shouldn't be necessary. The visibility check in
> kvm_arm_sys_reg_get_reg already covers it.
> 
> > +
> > +   val = guest_id_aa64zfr0_el1(vcpu);
> > +   return reg_to_user(uaddr, , reg->id);
> > +}
> > +
> > +static int set_id_aa64zfr0_el1(struct kvm_vcpu *vcpu,
> > +   const struct sys_reg_desc *rd,
> > +   const struct kvm_one_reg *reg, void __user *uaddr)
> > +{
> > +   const u64 id = sys_reg_to_index(rd);
> > +   int err;
> > +   u64 val;
> > +
> > +   if (!vcpu_has_sve(vcpu))
> > +   return -ENOENT;
> 
> Also not necessary.

Hmm, true.  Because the logic is a bit spread out I felt uneasy with
simply deleting these checks, but if they fire, something has
definitely gone wrong elsewhere.

In its current form the code makes it look like it could be legitimate
to get here with !vcpu_has_sve(vcpu), which is misleading.

What if we demote these to WARN_ON()?  This isn't a fast path.

[...]

> >  /*
> >   * cpufeature ID register user accessors
> >   *
> > @@ -1346,7 +1418,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> > ID_SANITISED(ID_AA64PFR1_EL1),
> > ID_UNALLOCATED(4,2),
> > ID_UNALLOCATED(4,3),
> > -   ID_UNALLOCATED(4,4),
> > +   { SYS_DESC(SYS_ID_AA64ZFR0_EL1), access_id_aa64zfr0_el1, .get_user = 
> > get_id_aa64zfr0_el1, .set_user = set_id_aa64zfr0_el1, .visibility = 
> > sve_id_visibility },
> > ID_UNALLOCATED(4,5),
> > ID_UNALLOCATED(4,6),
> > ID_UNALLOCATED(4,7),
> > @@ -1383,6 +1455,7 @@ static const struct sys_reg_desc sys_reg_descs[] = {
> >  
> > { SYS_DESC(SYS_SCTLR_EL1), access_vm_reg, reset_val, SCTLR_EL1, 
> > 0x00C50078 },
> > { SYS_DESC(SYS_CPACR_EL1), 

Re: [PATCH v7 09/27] KVM: arm64: Add a vcpu flag to control SVE visibility for the guest

2019-04-04 Thread Dave Martin
On Thu, Apr 04, 2019 at 04:17:24AM +0100, Marc Zyngier wrote:
> On Wed, 03 Apr 2019 20:14:13 +0100,
> Andrew Jones  wrote:
> > 
> > On Fri, Mar 29, 2019 at 01:00:34PM +, Dave Martin wrote:
> > > Since SVE will be enabled or disabled on a per-vcpu basis, a flag
> > > is needed in order to track which vcpus have it enabled.
> > > 
> > > This patch adds a suitable flag and a helper for checking it.
> > > 
> > > Signed-off-by: Dave Martin 
> > > Reviewed-by: Alex Bennée 
> > > Tested-by: zhang.lei 
> > > ---
> > >  arch/arm64/include/asm/kvm_host.h | 4 
> > >  1 file changed, 4 insertions(+)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > > b/arch/arm64/include/asm/kvm_host.h
> > > index 6d10100..ad4f7f0 100644
> > > --- a/arch/arm64/include/asm/kvm_host.h
> > > +++ b/arch/arm64/include/asm/kvm_host.h
> > > @@ -328,6 +328,10 @@ struct kvm_vcpu_arch {
> > >  #define KVM_ARM64_FP_HOST(1 << 2) /* host FP regs loaded 
> > > */
> > >  #define KVM_ARM64_HOST_SVE_IN_USE(1 << 3) /* backup for host 
> > > TIF_SVE */
> > >  #define KVM_ARM64_HOST_SVE_ENABLED   (1 << 4) /* SVE enabled for EL0 
> > > */
> > > +#define KVM_ARM64_GUEST_HAS_SVE  (1 << 5) /* SVE exposed to 
> > > guest */
> > > +
> > > +#define vcpu_has_sve(vcpu) (system_supports_sve() && \
> > > + ((vcpu)->arch.flags & KVM_ARM64_GUEST_HAS_SVE))
> > 
> > We shouldn't need the system_supports_sve() here. vcpu->arch.flags can
> > only have the KVM_ARM64_GUEST_HAS_SVE flag set when system_supports_sve()
> > is true, and it can't change later.
> 
> This is a performance optimisation. system_supports_sve() results in a
> static key being emitted, and that avoids a number of loads on the
> fast path for non-SVE systems (which is likely to be the absolute
> majority for the foreseeable future).

Exactly so.  The same applies to many of the other call sites for
system_supports_sve().

Cheers
---Dave
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v6 09/22] vfio: VFIO_IOMMU_BIND/UNBIND_MSI

2019-04-04 Thread Auger Eric
Hi Marc, Robin, Alex,

On 4/3/19 7:38 PM, Alex Williamson wrote:
> On Wed, 3 Apr 2019 16:30:15 +0200
> Auger Eric  wrote:
> 
>> Hi Alex,
>>
>> On 3/22/19 11:09 PM, Alex Williamson wrote:
>>> On Fri, 22 Mar 2019 10:30:02 +0100
>>> Auger Eric  wrote:
>>>   
 Hi Alex,
 On 3/22/19 12:01 AM, Alex Williamson wrote:  
> On Sun, 17 Mar 2019 18:22:19 +0100
> Eric Auger  wrote:
> 
>> This patch adds the VFIO_IOMMU_BIND/UNBIND_MSI ioctl which aim
>> to pass/withdraw the guest MSI binding to/from the host.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v3 -> v4:
>> - add UNBIND
>> - unwind on BIND error
>>
>> v2 -> v3:
>> - adapt to new proto of bind_guest_msi
>> - directly use vfio_iommu_for_each_dev
>>
>> v1 -> v2:
>> - s/vfio_iommu_type1_guest_msi_binding/vfio_iommu_type1_bind_guest_msi
>> ---
>>  drivers/vfio/vfio_iommu_type1.c | 58 +
>>  include/uapi/linux/vfio.h   | 29 +
>>  2 files changed, 87 insertions(+)
>>
>> diff --git a/drivers/vfio/vfio_iommu_type1.c 
>> b/drivers/vfio/vfio_iommu_type1.c
>> index 12a40b9db6aa..66513679081b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -1710,6 +1710,25 @@ static int vfio_cache_inv_fn(struct device *dev, 
>> void *data)
>>  return iommu_cache_invalidate(d, dev, >info);
>>  }
>>  
>> +static int vfio_bind_msi_fn(struct device *dev, void *data)
>> +{
>> +struct vfio_iommu_type1_bind_msi *ustruct =
>> +(struct vfio_iommu_type1_bind_msi *)data;
>> +struct iommu_domain *d = iommu_get_domain_for_dev(dev);
>> +
>> +return iommu_bind_guest_msi(d, dev, ustruct->iova,
>> +ustruct->gpa, ustruct->size);
>> +}
>> +
>> +static int vfio_unbind_msi_fn(struct device *dev, void *data)
>> +{
>> +dma_addr_t *iova = (dma_addr_t *)data;
>> +struct iommu_domain *d = iommu_get_domain_for_dev(dev);
>
> Same as previous, we can encapsulate domain in our own struct to avoid
> a lookup.
> 
>> +
>> +iommu_unbind_guest_msi(d, dev, *iova);
>
> Is it strange that iommu-core is exposing these interfaces at a device
> level if every one of them requires us to walk all the devices?  Thanks,  
>   

 Hum this per device API was devised in response of Robin's comments on

 [RFC v2 12/20] dma-iommu: Implement NESTED_MSI cookie.

 "
 But that then seems to reveal a somewhat bigger problem - if the callers
 are simply registering IPAs, and relying on the ITS driver to grab an
 entry and fill in a PA later, then how does either one know *which* PA
 is supposed to belong to a given IPA in the case where you have multiple
 devices with different ITS targets assigned to the same guest? (and if
 it's possible to assume a guest will use per-device stage 1 mappings and
 present it with a single vITS backed by multiple pITSes, I think things
 start breaking even harder.)
 "

 However looking back into the problem I wonder if there was an issue
 with the iommu_domain based API.

 If my understanding is correct, when assigned devices are protected by a
 vIOMMU then they necessarily end up in separate host iommu domains even
 if they belong to the same iommu_domain on the guest. And there can only
 be a single device in this iommu_domain.  
>>>
>>> Don't forget that a container represents the IOMMU context in a vfio
>>> environment, groups are associated with containers and a group may
>>> contain one or more devices.  When a vIOMMU comes into play, we still
>>> only have an IOMMU context per container.  If we have multiple devices
>>> in a group, we run into problems with vIOMMU.  We can resolve this by
>>> requiring that the user ignore all but one device in the group,
>>> or making sure that the devices in the group have the same IOMMU
>>> context.  The latter we could do in QEMU if PCIe-to-PCI bridges there
>>> masked the per-device address space as it does on real hardware (ie.
>>> there is no requester ID on conventional PCI, all transactions appear to
>>> the IOMMU with the bridge requester ID).  So I raise this question
>>> because vfio's minimum domain granularity is a group.
>>>   
 If this is confirmed, there is a non ambiguous association between 1
 physical iommu_domain, 1 device, 1 S1 mapping and 1 physical MSI
 controller.

 I added the device handle handle to disambiguate those associations. The
 gIOVA ->gDB mapping is associated with a device handle. Then when the
 host needs a stage 1 mapping for this device, to build the nested
 mapping towards the physical DB it can easily grab the gIOVA->gDB stage
 1 mapping registered