Re: [PATCH v2 7/7] KVM: arm: implement kvm_set_msi by gsi direct mapping

2015-07-10 Thread Andre Przywara
On 09/07/15 09:22, Eric Auger wrote:
> If the ITS modality is not available, let's simply support MSI
> injection by transforming the MSI.data into an SPI ID.
> 
> This becomes possible to use KVM_SIGNAL_MSI ioctl for arm too.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - introduce vgic_v2m_inject_msi in vgic-v2-emul.c following Andre's
>   advice
> ---
>  arch/arm/kvm/Kconfig|  1 +
>  virt/kvm/arm/vgic-v2-emul.c | 12 
>  2 files changed, 13 insertions(+)
> 
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 151e710..0f58baf 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -31,6 +31,7 @@ config KVM
>   select KVM_VFIO
>   select HAVE_KVM_EVENTFD
>   select HAVE_KVM_IRQFD
> + select HAVE_KVM_MSI

I wonder if this requires some more code to only advertise
KVM_CAP_SIGNAL_MSI if userland actually sets up a GICv2M?
Otherwise userland could get the idea of being able to inectj MSIs
without the guest actually being prepared for that (because the GICv2M
driver did not initialize).

Cheers,
Andre.

>   select HAVE_KVM_IRQCHIP
>   select HAVE_KVM_IRQ_ROUTING
>   depends on ARM_VIRT_EXT && ARM_LPAE && ARM_ARCH_TIMER
> diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
> index 1390797..43013cc 100644
> --- a/virt/kvm/arm/vgic-v2-emul.c
> +++ b/virt/kvm/arm/vgic-v2-emul.c
> @@ -478,6 +478,17 @@ static bool vgic_v2_queue_sgi(struct kvm_vcpu *vcpu, int 
> irq)
>  }
>  
>  /**
> + * Emulates GICv2M MSI injection by injecting the SPI ID matching
> + * the msi data
> + * @kvm: pointer to the kvm struct
> + * @msi: the msi struct handle
> + */
> +static int vgic_v2m_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
> +{
> + return kvm_vgic_inject_irq(kvm, 0, msi->data, 1);
> +}
> +
> +/**
>   * kvm_vgic_map_resources - Configure global VGIC state before running any 
> VCPUs
>   * @kvm: pointer to the kvm struct
>   *
> @@ -566,6 +577,7 @@ void vgic_v2_init_emulation(struct kvm *kvm)
>   dist->vm_ops.add_sgi_source = vgic_v2_add_sgi_source;
>   dist->vm_ops.init_model = vgic_v2_init_model;
>   dist->vm_ops.map_resources = vgic_v2_map_resources;
> + dist->vm_ops.inject_msi = vgic_v2m_inject_msi;
>  
>   kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 6/7] KVM: arm/arm64: enable MSI routing

2015-07-10 Thread Andre Przywara
On 09/07/15 09:22, Eric Auger wrote:
> Up to now, only irqchip routing entries could be set. This patch
> adds the capability to insert MSI routing entries, with or without
> device id. Although standard MSI entries can be set, their
> injection still is not supported. For ARM64, let's also increase
> KVM_MAX_IRQ_ROUTES to 4096: include SPI irqchip flat routes plus
> MSI routes. In the future this might be extended.
> 
> The new MSI routing entry type also must be managed similarly to
> legacy KVM_IRQ_ROUTING_MSI in eventfd irqfd_wakeup and irqfd_update.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v1 -> v2:
> - adapt to new routing entry types
> 
> RFC -> PATCH:
> - move api MSI routing updates into that patch file
> - use new devid field of user api struct
> ---
>  Documentation/virtual/kvm/api.txt | 10 ++
>  include/linux/kvm_host.h  |  2 ++
>  virt/kvm/arm/vgic.c   | 13 +
>  virt/kvm/eventfd.c|  6 --
>  4 files changed, 29 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 9276dac..686b121 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1431,6 +1431,11 @@ Sets the GSI routing table entries, overwriting any 
> previously set entries.
>  On arm/arm64, GSI routing has the following limitation:
>  - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD.
>  
> +On arm/arm64, MSI routing through in-kernel GICv3 ITS must use
> +KVM_IRQ_ROUTING_EXTENDED_MSI routing type and device ID must be set
> +in msi struct. Otherwise, KVM_IRQ_ROUTING_MSI must be used without
> +populating the msi devid field.
> +
>  struct kvm_irq_routing {
>   __u32 nr;
>   __u32 flags;
> @@ -2342,6 +2347,11 @@ On arm/arm64, gsi routing being supported, the 
> following can happen:
>  - in case no routing entry is associated to this gsi, injection fails
>  - in case the gsi is associated to an irqchip routing entry,
>irqchip.pin + 32 corresponds to the injected SPI ID.
> +- in case the gsi is associated to an MSI routing entry,
> +  * without GICv3 ITS in-kernel emulation, MSI data matches the SPI ID
> +of the injected SPI
> +  * with GICv3 ITS in-kernel emulation, the MSI message and device ID
> +are translated into an LPI.
>  
>  4.76 KVM_PPC_ALLOCATE_HTAB
>  
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index cac0fe4..ea1a810 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -976,6 +976,8 @@ static inline int mmu_notifier_retry(struct kvm *kvm, 
> unsigned long mmu_seq)
>  
>  #ifdef CONFIG_S390
>  #define KVM_MAX_IRQ_ROUTES 4096 //FIXME: we can have more than that...
> +#elif defined(CONFIG_ARM64)
> +#define KVM_MAX_IRQ_ROUTES 4096
>  #else
>  #define KVM_MAX_IRQ_ROUTES 1024
>  #endif
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 63be67e..c5546d8 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2252,6 +2252,19 @@ int kvm_set_routing_entry(struct 
> kvm_kernel_irq_routing_entry *e,
>   (e->irqchip.irqchip >= KVM_NR_IRQCHIPS))
>   goto out;
>   break;

what about:

+   case KVM_IRQ_ROUTING_EXTENDED_MSI:
+   e->devid = ue->u.msi.devid;
+   /* fall through */
> + case KVM_IRQ_ROUTING_MSI:
...

Also if we avoid KVM_IRQ_ROUTING_EXTENDED_MSI in the kernel, we could:
+   e->type = KVM_IRQ_ROUTING_MSI;
(before the fall through) and then get rid of the rest of this patch at all.

Cheers,
Andre.

> + e->set = kvm_set_msi;
> + e->msi.address_lo = ue->u.msi.address_lo;
> + e->msi.address_hi = ue->u.msi.address_hi;
> + e->msi.data = ue->u.msi.data;
> + break;



> + case KVM_IRQ_ROUTING_EXTENDED_MSI:
> + e->set = kvm_set_msi;
> + e->msi.address_lo = ue->u.msi.address_lo;
> + e->msi.address_hi = ue->u.msi.address_hi;
> + e->msi.data = ue->u.msi.data;
> + e->devid = ue->u.msi.devid;
> + break;
>   default:
>   goto out;
>   }
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 9ff4193..d76d05d 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -238,7 +238,8 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
> void *key)
>   irq = irqfd->irq_entry;
>   } while (read_seqcount_retry(&irqfd->irq_entry_sc, seq));
>   /* An event has been signaled, inject an interrupt */
> - if (irq.type == KVM_IRQ_ROUTING_MSI)
> + if (irq.type == KVM_IRQ_ROUTING_MSI ||
> + irq.type == KVM_IRQ_ROUTING_EXTENDED_MSI)
>   kvm_set_msi(&irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
>   false);
>   else
> @@ -294,7 +295,8 @@ static void irqfd_updat

Re: [PATCH v2 3/7] KVM: irqchip: convey devid to kvm_set_msi

2015-07-10 Thread Andre Przywara
On 09/07/15 09:22, Eric Auger wrote:
> on ARM, a devid field is populated in kvm_msi struct in case the
> flag is set to KVM_MSI_VALID_DEVID. Let's populate the corresponding
> kvm_kernel_irq_routing_entry devid field and set the msi type to
> KVM_IRQ_ROUTING_EXTENDED_MSI.
> 
> Signed-off-by: Eric Auger 
> ---
>  virt/kvm/irqchip.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 21c1424..e678f8a 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -72,9 +72,17 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi 
> *msi)
>  {
>   struct kvm_kernel_irq_routing_entry route;
>  
> - if (!irqchip_in_kernel(kvm) || msi->flags != 0)
> + if (!irqchip_in_kernel(kvm))
>   return -EINVAL;
>  
> + if (msi->flags & KVM_MSI_VALID_DEVID) {
> + route.devid = msi->devid;
> + route.type = KVM_IRQ_ROUTING_EXTENDED_MSI;
> + } else if (!msi->flags)
> + return -EINVAL;

I think we get away without using the extended type on the kernel side.
Within the kernel we don't have an ABI that we have to stick to forever,
so we can simplify things by re-using the existing type (no need to
check for both MSI types later).
So we always set the device ID, the only code that looks at it later is
the ITS emulation anyway, any other code path simply ignores that.

To be honest I am not 100% sure that is actually better, but I had
hacked it in a way where I didn't need it.
Also have a look at the other comments in the following patches for
clarification.

Cheers,
Andre.


> +
> + /* historically the route.type was not set */
> +
>   route.msi.address_lo = msi->address_lo;
>   route.msi.address_hi = msi->address_hi;
>   route.msi.data = msi->data;
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 4/7] KVM: arm/arm64: enable irqchip routing

2015-07-10 Thread Andre Przywara
Hi Eric,

On 09/07/15 09:22, Eric Auger wrote:
> This patch adds compilation and link against irqchip.
> 
> On ARM, irqchip routing is not really useful since there is
> a single irqchip. However main motivation behind using irqchip
> code is to enable MSI routing code. With the support of in-kernel
> GICv3 ITS emulation, it now seems to be a MUST HAVE requirement.
> 



> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 3630971..6c6c25e 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -2215,44 +2215,65 @@ out_free_irq:
>   return ret;
>  }
>  
> -int kvm_irq_map_gsi(struct kvm *kvm,
> - struct kvm_kernel_irq_routing_entry *entries,
> - int gsi)
> +int vgic_irqfd_set_irq(struct kvm_kernel_irq_routing_entry *e,
> + struct kvm *kvm, int irq_source_id,
> + int level, bool line_status)
>  {
> - return 0;
> -}
> -
> -int kvm_irq_map_chip_pin(struct kvm *kvm, unsigned irqchip, unsigned pin)
> -{
> - return pin;
> -}
> -
> -int kvm_set_irq(struct kvm *kvm, int irq_source_id,
> - u32 irq, int level, bool line_status)
> -{
> - unsigned int spi = irq + VGIC_NR_PRIVATE_IRQS;
> + unsigned int spi_id = e->irqchip.pin + VGIC_NR_PRIVATE_IRQS;
>  
> - trace_kvm_set_irq(irq, level, irq_source_id);
> + trace_kvm_set_irq(spi_id, level, irq_source_id);
>  
>   BUG_ON(!vgic_initialized(kvm));
>  
> - return kvm_vgic_inject_irq(kvm, 0, spi, level);
> + if (spi_id > min(kvm->arch.vgic.nr_irqs, 1020))
> + return -EINVAL;
> + return kvm_vgic_inject_irq(kvm, 0, spi_id, level);
> +}
> +
> +/**
> + * Populates a kvm routing entry from a user routing entry
> + * @e: kvm internal formatted entry
> + * @ue: user api formatted entry
> + * return 0 on success, -EINVAL on errors.
> + */
> +int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e,
> +   const struct kvm_irq_routing_entry *ue)
> +{
> + int r = -EINVAL;
> +
> + switch (ue->type) {
> + case KVM_IRQ_ROUTING_IRQCHIP:
> + e->set = vgic_irqfd_set_irq;
> + e->irqchip.irqchip = ue->u.irqchip.irqchip;
> + e->irqchip.pin = ue->u.irqchip.pin;
> + if ((e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS) ||
> + (e->irqchip.irqchip >= KVM_NR_IRQCHIPS))
> + goto out;
> + break;
> + default:
> + goto out;
> + }
> + r = 0;
> +out:
> + return r;
>  }
>  
> -/* MSI not implemented yet */
>  int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>   struct kvm *kvm, int irq_source_id,
>   int level, bool line_status)
>  {
> - return 0;
> -}
> + struct kvm_msi msi;
> +
> + msi.address_lo = e->msi.address_lo;
> + msi.address_hi = e->msi.address_hi;
> + msi.data = e->msi.data;
> + if (e->type == KVM_IRQ_ROUTING_EXTENDED_MSI) {
> + msi.devid = e->devid;
> + msi.flags = KVM_MSI_VALID_DEVID;
> + }

Can't we make the assignment unconditional?
The GICv2m MSI code does not care about the devid and the ITS code
requires it.
This simplifies quite something in the following patches.
(This refers to the idea of not using the extended type in the kernel).

>  
> -#ifdef CONFIG_HAVE_KVM_MSI
> -int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
> -{
>   if (kvm->arch.vgic.vm_ops.inject_msi)
> - return kvm->arch.vgic.vm_ops.inject_msi(kvm, msi);
> + return kvm->arch.vgic.vm_ops.inject_msi(kvm, &msi);
>   else
>   return -ENODEV;
>  }
> -#endif
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index e678f8a..f26cadd 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -29,7 +29,9 @@
>  #include 
>  #include 
>  #include 
> +#if !defined(CONFIG_ARM) && !defined(CONFIG_ARM64)
>  #include "irq.h"
> +#endif

To what irq.h is that referring to? And why is ARM not allowed to
include that?

Cheers,
Andre.

>  
>  struct kvm_irq_routing_table {
>   int chip[KVM_NR_IRQCHIPS][KVM_IRQCHIP_NUM_PINS];
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost-blk: Add vhost-blk support v6

2015-07-10 Thread Ming Lin
On Sat, Dec 1, 2012 at 5:33 PM, Asias He  wrote:
> vhost-blk is an in-kernel virito-blk device accelerator.
>
> Due to lack of proper in-kernel AIO interface, this version converts
> guest's I/O request to bio and use submit_bio() to submit I/O directly.
> So this version any supports raw block device as guest's disk image,
> e.g. /dev/sda, /dev/ram0. We can add file based image support to
> vhost-blk once we have in-kernel AIO interface. There are some work in
> progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown:
>
>http://marc.info/?l=linux-fsdevel&m=133312234313122
>
> Performance evaluation:
> -
> LKVM: Fio with libaio ioengine on 1 Fusion IO device
> IOPS(k)BeforeAfterImprovement
> seq-read   107   121  +13.0%
> seq-write  130   179  +37.6%
> rnd-read   102   122  +19.6%
> rnd-write  125   159  +27.0%
>
> QEMU: Fio with libaio ioengine on 1 Fusion IO device
> IOPS(k)BeforeAfterImprovement
> seq-read   76123  +61.8%
> seq-write  139   173  +24.4%
> rnd-read   73120  +64.3%
> rnd-write  75156  +108.0%
>
> QEMU: Fio with libaio ioengine on 1 Ramdisk device
> IOPS(k)BeforeAfterImprovement
> seq-read   138   437  +216%
> seq-write  191   436  +128%
> rnd-read   137   426  +210%
> rnd-write  140   415  +196%
>
> QEMU: Fio with libaio ioengine on 8 Ramdisk device
> 50% read + 50% write
> IOPS(k)BeforeAfterImprovement
> randrw 64/64 189/189  +195%/+195%
>
> Userspace bits:
> -
> 1) LKVM
> The latest vhost-blk userspace bits for kvm tool can be found here:
> g...@github.com:asias/linux-kvm.git blk.vhost-blk
>
> 2) QEMU
> The latest vhost-blk userspace prototype for QEMU can be found here:
> g...@github.com:asias/qemu.git blk.vhost-blk
>
> Changes in v6:
> - Use inline req_page_list to reduce kmalloc
> - Switch to single thread model, thanks mst!
> - Wait until requests fired before vhost_blk_flush to be finished
>
> Changes in v5:
> - Do not assume the buffer layout
> - Fix wakeup race
>
> Changes in v4:
> - Mark req->status as userspace pointer
> - Use __copy_to_user() instead of copy_to_user() in vhost_blk_set_status()
> - Add if (need_resched()) schedule() in blk thread
> - Kill vhost_blk_stop_vq() and move it into vhost_blk_stop()
> - Use vq_err() instead of pr_warn()
> - Fail un Unsupported request
> - Add flush in vhost_blk_set_features()
>
> Changes in v3:
> - Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
> - Check file passed by user is a raw block device file
>
> Acked-by: David S. Miller 
> Signed-off-by: Asias He 

Hi Asias,

Is this still under development or stopped?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/7] KVM: api: introduce KVM_IRQ_ROUTING_EXTENDED_MSI

2015-07-10 Thread Andre Przywara
On 09/07/15 09:22, Eric Auger wrote:
> On ARM, the MSI msg (address and data) comes along with
> out-of-band device ID information. The device ID encodes the
> device that writes the MSI msg. Let's convey the device id in
> kvm_irq_routing_msi and use a new routing entry type to
> indicate the devid is populated.
> 
> Signed-off-by: Eric Auger 

Reviewed-by: Andre Przywara 

Just some forward looking statement: If this gets based on top of my ITS
emulation v2 series, the documentation should state that the new
KVM_CAP_MSI_DEVID capability tells userland about the need to provide
the device ID.

Cheers,
Andre.

> 
> ---
> 
> v1 -> v2:
> - devid id passed in kvm_irq_routing_msi instead of in
>   kvm_irq_routing_entry
> 
> RFC -> PATCH
> - remove kvm_irq_routing_extended_msi and use union instead
> ---
>  Documentation/virtual/kvm/api.txt | 10 +-
>  include/uapi/linux/kvm.h  |  6 +-
>  2 files changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index bea8de7..3d920cd 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -1453,6 +1453,7 @@ struct kvm_irq_routing_entry {
>  #define KVM_IRQ_ROUTING_IRQCHIP 1
>  #define KVM_IRQ_ROUTING_MSI 2
>  #define KVM_IRQ_ROUTING_S390_ADAPTER 3
> +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
>  
>  No flags are specified so far, the corresponding field must be set to zero.
>  
> @@ -1465,9 +1466,16 @@ struct kvm_irq_routing_msi {
>   __u32 address_lo;
>   __u32 address_hi;
>   __u32 data;
> - __u32 pad;
> + union {
> + __u32 pad;
> + __u32 devid;
> + };
>  };
>  
> +for KVM_IRQ_ROUTING_EXTENDED_MSI routing entry type, the kvm_irq_routing_msi
> +routing entry is used and devid is populated with the device ID that wrote
> +the MSI message. For PCI, this is usually a BFD identifier in the lower 16 
> bits.
> +
>  struct kvm_irq_routing_s390_adapter {
>   __u64 ind_addr;
>   __u64 summary_addr;
> diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
> index 9a261e5..39ec164 100644
> --- a/include/uapi/linux/kvm.h
> +++ b/include/uapi/linux/kvm.h
> @@ -829,7 +829,10 @@ struct kvm_irq_routing_msi {
>   __u32 address_lo;
>   __u32 address_hi;
>   __u32 data;
> - __u32 pad;
> + union {
> + __u32 pad;
> + __u32 devid;
> + };
>  };
>  
>  struct kvm_irq_routing_s390_adapter {
> @@ -844,6 +847,7 @@ struct kvm_irq_routing_s390_adapter {
>  #define KVM_IRQ_ROUTING_IRQCHIP 1
>  #define KVM_IRQ_ROUTING_MSI 2
>  #define KVM_IRQ_ROUTING_S390_ADAPTER 3
> +#define KVM_IRQ_ROUTING_EXTENDED_MSI 4
>  
>  struct kvm_irq_routing_entry {
>   __u32 gsi;
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virt: IRQ bypass manager

2015-07-10 Thread Paolo Bonzini


On 10/07/2015 19:52, Alex Williamson wrote:
> Perhaps if a second consumer comes along that would be justification for
> tying it elsewhere in the build system.  ARM will obviously need to do
> similar.  Are there better options?
> 
> Also, there's no maintainer for the top level virt/ directory.  Paolo,
> would you feel comfortable taking this, maybe with some additional acks?

That's okay; alternatively, we can share it since after all you wrote
most of it.

Paolo

> That would probably be the most convenient for merging the consumer code.
> Thanks,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] virt: IRQ bypass manager

2015-07-10 Thread Alex Williamson
When a physical I/O device is assigned to a virtual machine through
facilities like VFIO and KVM, the interrupt for the device generally
bounces through the host system before being injected into the VM.
However, hardware technologies exist that often allow the host to be
bypassed for some of these scenarios.  Intel Posted Interrupts allow
the specified physical edge interrupts to be directly injected into a
guest when delivered to a physical processor while the vCPU is
running.  ARM IRQ Forwarding allows the hypervisor to handle level
triggered device interrupts as edge interrupts, by giving the guest
control of de-asserting and unmasking the interrupt line.

The IRQ bypass manager here is meant to provide the shim to connect
interrupt producers, generally the host physical device driver, with
interrupt consumers, generally the hypervisor, in order to configure
these bypass mechanism.  To do this, we base the connection on a
shared, opaque token.  For KVM-VFIO this is expected to be an
eventfd_ctx since this is the connection we already use to connect an
eventfd to an irqfd on the in-kernel path.  When a producer and
consumer with matching tokens is found, callbacks via both registered
participants allow the bypass facilities to be automatically enabled.

Signed-off-by: Alex Williamson 
Signed-off-by: Eric Auger 
---

Changes:
 - Moved to virt/lib/
 - Dropped update callback
 - Filled in missing documentation
 - @resume callback renamed to @stop
 - Only @start/@stop are optional

One of the difficulties with moving this code to virt/lib is that nobody
builds it by default.  Thinking about this for a bit, it really needs a
consumer to be useful and KVM is currently the only consumer, so I tested
with the following:

 --- a/arch/x86/kvm/Kconfig
 +++ b/arch/x86/kvm/Kconfig
 @@ -100,5 +101,6 @@ config KVM_DEVICE_ASSIGNMENT
  # the virtualization menu.
  source drivers/vhost/Kconfig
  source drivers/lguest/Kconfig
 +source virt/lib/Kconfig
  
  endif # VIRTUALIZATION
 --- a/arch/x86/kvm/Makefile
 +++ b/arch/x86/kvm/Makefile
 @@ -20,3 +20,5 @@ kvm-amd-y+= svm.o
  obj-$(CONFIG_KVM) += kvm.o
  obj-$(CONFIG_KVM_INTEL)   += kvm-intel.o
  obj-$(CONFIG_KVM_AMD) += kvm-amd.o
 +
 +obj-y += ../../../virt/lib/

Perhaps if a second consumer comes along that would be justification for
tying it elsewhere in the build system.  ARM will obviously need to do
similar.  Are there better options?

Also, there's no maintainer for the top level virt/ directory.  Paolo,
would you feel comfortable taking this, maybe with some additional acks?
That would probably be the most convenient for merging the consumer code.
Thanks,
Alex

 include/linux/irqbypass.h |   90 +++
 virt/lib/Kconfig  |2 
 virt/lib/Makefile |1 
 virt/lib/irqbypass.c  |  212 +
 4 files changed, 305 insertions(+)
 create mode 100644 include/linux/irqbypass.h
 create mode 100644 virt/lib/Kconfig
 create mode 100644 virt/lib/Makefile
 create mode 100644 virt/lib/irqbypass.c

diff --git a/include/linux/irqbypass.h b/include/linux/irqbypass.h
new file mode 100644
index 000..41df18d
--- /dev/null
+++ b/include/linux/irqbypass.h
@@ -0,0 +1,90 @@
+/*
+ * IRQ offload/bypass manager
+ *
+ * Copyright (C) 2015 Red Hat, Inc.
+ * Copyright (c) 2015 Linaro Ltd.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+#ifndef IRQBYPASS_H
+#define IRQBYPASS_H
+
+#include 
+
+struct irq_bypass_consumer;
+
+/*
+ * Theory of operation
+ *
+ * The IRQ bypass manager is a simple set of lists and callbacks that allows
+ * IRQ producers (ex. physical interrupt sources) to be matched to IRQ
+ * consumers (ex. virtualization hardware that allows IRQ bypass or offload)
+ * via a shared token (ex. eventfd_ctx).  Producers and consumers register
+ * independently.  When a token match is found, the optional @stop callback
+ * will be called for each participant.  The pair will then be connected via
+ * the @add_* callbacks, and finally the optional @start callback will allow
+ * any final coordination.  When either participant is unregistered, the
+ * process is repeated using the @del_* callbacks in place of the @add_*
+ * callbacks.  Match tokens must be unique per producer/consumer, 1:N parings
+ * are not supported.
+ */
+
+/**
+ * struct irq_bypass_producer - IRQ bypass producer definition
+ * @node: IRQ bypass manager private list management
+ * @token: opaque token to match between producer and consumer
+ * @irq: Linux IRQ number for the producer device
+ * @add_consumer: Connect the IRQ producer to an IRQ consumer
+ * @del_consumer: Disconnect the IRQ producer from an IRQ consumer
+ * @stop: Perform any quiesce operations necessary prior to add/del (optional)
+ * @start: Perform any startup operations necessary after add/del (optional)
+ 

[kvm-unit-tests PATCH 4/9] run_tests.sh: allow default unittests.cfg override

2015-07-10 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 run_tests.sh | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/run_tests.sh b/run_tests.sh
index 4246f1b60a733..4a9c54b0817cd 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -110,13 +110,14 @@ function usage()
 {
 cat > config.mak
 ;;
+u)
+config=$OPTARG
+;;
 *)
 exit
 ;;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 8/9] Makefile: change 'make install' to install standalone tests

2015-07-10 Thread Andrew Jones
make install wasn't very useful without also installing scripts for
the qemu command lines. I suspect 'make install' was never used.
Installing standalone tests could be useful though, so let's do
that instead.

Signed-off-by: Andrew Jones 
---
 Makefile | 6 +++---
 config/config-arm-common.mak | 2 --
 config/config-x86-common.mak | 2 --
 3 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/Makefile b/Makefile
index 7b1986b9133ad..def8ea25372e9 100644
--- a/Makefile
+++ b/Makefile
@@ -5,7 +5,7 @@ endif
 
 include config.mak
 
-DESTDIR := $(PREFIX)/share/qemu/tests
+DESTDIR := $(PREFIX)/share/kvm-unit-tests/tests
 
 .PHONY: arch_clean clean distclean cscope
 
@@ -67,9 +67,9 @@ $(LIBFDT_archive): $(addprefix 
$(LIBFDT_objdir)/,$(LIBFDT_OBJS))
 standalone: all
@scripts/mkallstandalone.sh
 
-install:
+install: standalone
mkdir -p $(DESTDIR)
-   install $(tests_and_config) $(DESTDIR)
+   install tests/* $(DESTDIR)
 
 clean: arch_clean
$(RM) lib/.*.d $(libcflat) $(cflatobjs)
diff --git a/config/config-arm-common.mak b/config/config-arm-common.mak
index 0674daaa476d7..698555d6a676f 100644
--- a/config/config-arm-common.mak
+++ b/config/config-arm-common.mak
@@ -64,8 +64,6 @@ arm_clean: libfdt_clean asm_offsets_clean
 
 ##
 
-tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
-
 generated_files = $(asm-offsets)
 
 test_cases: $(generated_files) $(tests-common) $(tests)
diff --git a/config/config-x86-common.mak b/config/config-x86-common.mak
index d45c9a8e454d8..2ee7b7724bbaf 100644
--- a/config/config-x86-common.mak
+++ b/config/config-x86-common.mak
@@ -43,8 +43,6 @@ tests-common += api/dirty-log
 tests-common += api/dirty-log-perf
 endif
 
-tests_and_config = $(TEST_DIR)/*.flat $(TEST_DIR)/unittests.cfg
-
 test_cases: $(tests-common) $(tests)
 
 $(TEST_DIR)/%.o: CFLAGS += -std=gnu99 -ffreestanding -I lib -I lib/x86
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 6/9] arm/unittests.cfg: make test names more friendly

2015-07-10 Thread Andrew Jones
Prepares for the mkstandalone script coming in a later patch.

Signed-off-by: Andrew Jones 
---
 arm/unittests.cfg | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index 32fb4bb6c146b..8fae8a714a83b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -11,26 +11,26 @@
 # that the configured amount of memory (-m ) are correctly setup
 # by the framework.
 #
-[selftest::setup]
+[selftest-setup]
 file = selftest.flat
 smp = 2
 extra_params = -m 256 -append "setup smp=2 mem=256"
 groups = selftest
 
 # Test vector setup and exception handling (kernel mode).
-[selftest::vectors-kernel]
+[selftest-vectors-kernel]
 file = selftest.flat
 extra_params = -append "vectors-kernel"
 groups = selftest
 
 # Test vector setup and exception handling (user mode).
-[selftest::vectors-user]
+[selftest-vectors-user]
 file = selftest.flat
 extra_params = -append "vectors-user"
 groups = selftest
 
 # Test SMP support
-[selftest::smp]
+[selftest-smp]
 file = selftest.flat
 smp = $(getconf _NPROCESSORS_CONF)
 extra_params = -append "smp"
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 5/9] unittests.cfg: use double quotes

2015-07-10 Thread Andrew Jones
Quotes, bash, and bash generation from bash are painful enough
without having to handle both single and double quotes. Let's
just handle double. This prepares for the mkstandalone script
coming in a later patch.

Signed-off-by: Andrew Jones 
---
 arm/unittests.cfg |  8 
 x86/unittests.cfg | 16 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/arm/unittests.cfg b/arm/unittests.cfg
index ee655b2678a4e..32fb4bb6c146b 100644
--- a/arm/unittests.cfg
+++ b/arm/unittests.cfg
@@ -14,24 +14,24 @@
 [selftest::setup]
 file = selftest.flat
 smp = 2
-extra_params = -m 256 -append 'setup smp=2 mem=256'
+extra_params = -m 256 -append "setup smp=2 mem=256"
 groups = selftest
 
 # Test vector setup and exception handling (kernel mode).
 [selftest::vectors-kernel]
 file = selftest.flat
-extra_params = -append 'vectors-kernel'
+extra_params = -append "vectors-kernel"
 groups = selftest
 
 # Test vector setup and exception handling (user mode).
 [selftest::vectors-user]
 file = selftest.flat
-extra_params = -append 'vectors-user'
+extra_params = -append "vectors-user"
 groups = selftest
 
 # Test SMP support
 [selftest::smp]
 file = selftest.flat
 smp = $(getconf _NPROCESSORS_CONF)
-extra_params = -append 'smp'
+extra_params = -append "smp"
 groups = selftest
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index a38544f77c056..bc49d33670287 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -27,44 +27,44 @@ smp = 3
 
 [vmexit_cpuid]
 file = vmexit.flat
-extra_params = -append 'cpuid'
+extra_params = -append "cpuid"
 groups = vmexit
 
 [vmexit_vmcall]
 file = vmexit.flat
-extra_params = -append 'vmcall'
+extra_params = -append "vmcall"
 groups = vmexit
 
 [vmexit_mov_from_cr8]
 file = vmexit.flat
-extra_params = -append 'mov_from_cr8'
+extra_params = -append "mov_from_cr8"
 groups = vmexit
 
 [vmexit_mov_to_cr8]
 file = vmexit.flat
-extra_params = -append 'mov_to_cr8'
+extra_params = -append "mov_to_cr8"
 groups = vmexit
 
 [vmexit_inl_pmtimer]
 file = vmexit.flat
-extra_params = -append 'inl_from_pmtimer'
+extra_params = -append "inl_from_pmtimer"
 groups = vmexit
 
 [vmexit_ipi]
 file = vmexit.flat
 smp = 2
-extra_params = -append 'ipi'
+extra_params = -append "ipi"
 groups = vmexit
 
 [vmexit_ipi_halt]
 file = vmexit.flat
 smp = 2
-extra_params = -append 'ipi_halt'
+extra_params = -append "ipi_halt"
 groups = vmexit
 
 [vmexit_ple_round_robin]
 file = vmexit.flat
-extra_params = -append 'ple_round_robin'
+extra_params = -append "ple_round_robin"
 groups = vmexit
 
 [access]
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 3/9] run_tests.sh: add '-d' for dry-run

2015-07-10 Thread Andrew Jones
Add an option that allows us to just get the qemu command line
for each test. Running ./run_tests.sh -d will result in each
test saying it passed, and test.log will contain a list of each
qemu command line that would have been executed, had it not been
a dry-run.

Signed-off-by: Andrew Jones 
---
 arm/run  | 12 +++-
 run_tests.sh | 41 +
 x86/run  | 11 +++
 3 files changed, 39 insertions(+), 25 deletions(-)

diff --git a/arm/run b/arm/run
index e50709dcd12f4..3eacf3b77d625 100755
--- a/arm/run
+++ b/arm/run
@@ -48,9 +48,11 @@ fi
 
 command="$qemu $M -cpu $processor $chr_testdev"
 command+=" -display none -serial stdio -kernel"
-
 echo $command "$@"
-$command "$@"
-ret=$?
-echo Return value from qemu: $ret
-exit $ret
+
+if [ "$DRYRUN" != "yes" ]; then
+   $command "$@"
+   ret=$?
+   echo Return value from qemu: $ret
+   exit $ret
+fi
diff --git a/run_tests.sh b/run_tests.sh
index 6f00aa495744c..4246f1b60a733 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -28,22 +28,24 @@ function run()
 return
 fi
 
-if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
-echo "skip $1 ($arch only)"
-return
-fi
-
-# check a file for a particular value before running a test
-# the check line can contain multiple files to check separated by a space
-# but each check parameter needs to be of the form =
-for check_param in ${check[@]}; do
-path=${check_param%%=*}
-value=${check_param#*=}
-if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
-echo "skip $1 ($path not equal to $value)"
+if [ "$DRYRUN" != "yes" ]; then
+if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
+echo "skip $1 ($arch only)"
 return
 fi
-done
+
+# Check a file for a particular value before running a test. The
+# check line can contain multiple files to check, separated by a space,
+# but each check parameter needs to be of the form =
+for check_param in ${check[@]}; do
+path=${check_param%%=*}
+value=${check_param#*=}
+if [ "$path" ] && [ "$(cat $path)" != "$value" ]; then
+echo "skip $1 ($path not equal to $value)"
+return
+fi
+done
+fi
 
 cmdline="./$TEST_DIR-run $kernel -smp $smp $opts"
 if [ $verbose != 0 ]; then
@@ -108,11 +110,13 @@ function usage()
 {
 cat > config.mak
+;;
 *)
 exit
 ;;
 esac
 done
 
+source config.mak # reload, parsing options may have changed it
 run_all $config
+sed -i '/^DRYRUN=.*/d' config.mak
diff --git a/x86/run b/x86/run
index d00e8fece4b6d..38b56e9a6b531 100755
--- a/x86/run
+++ b/x86/run
@@ -54,7 +54,10 @@ fi
 
 command="${qemu} -enable-kvm $pc_testdev -vnc none -serial stdio $pci_testdev 
-kernel"
 echo ${command} "$@"
-${command} "$@"
-ret=$?
-echo Return value from qemu: $ret
-exit $ret
+
+if [ "$DRYRUN" != "yes" ]; then
+   ${command} "$@"
+   ret=$?
+   echo Return value from qemu: $ret
+   exit $ret
+fi
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 7/9] scripts: introduce mk[all]standalone.sh

2015-07-10 Thread Andrew Jones
This is a super ugly bash script, wrapped around a couple ugly
bash scripts, and it generates an ugly bash script. But, it gets
the job done. What's the job? Take a unit test and generate a
standalone script that can be run on any appropriate system with
an appropriate qemu. This makes distributing single, pre-compiled,
unit tests easy, even through email, which can be useful for
getting test results from a variety of users, or even for deploying
the unit test as utility (if it works that way) in a distribution.
Works like shar, but doesn't use shar, as it's better to avoid the
dependency. mkstandalone.sh just creates one unit test.
mkallstandalone.sh does all (run with 'make standalone'). The
standalone test(s) are placed in ./tests

Signed-off-by: Andrew Jones 
---
 Makefile   |   4 ++
 scripts/mkallstandalone.sh |  36 +++
 scripts/mkstandalone.sh| 110 +
 3 files changed, 150 insertions(+)
 create mode 100755 scripts/mkallstandalone.sh
 create mode 100755 scripts/mkstandalone.sh

diff --git a/Makefile b/Makefile
index 4f28f072ae3d7..7b1986b9133ad 100644
--- a/Makefile
+++ b/Makefile
@@ -64,6 +64,9 @@ $(LIBFDT_archive): $(addprefix 
$(LIBFDT_objdir)/,$(LIBFDT_OBJS))
 
 -include */.*.d */*/.*.d
 
+standalone: all
+   @scripts/mkallstandalone.sh
+
 install:
mkdir -p $(DESTDIR)
install $(tests_and_config) $(DESTDIR)
@@ -78,6 +81,7 @@ libfdt_clean:
 
 distclean: clean libfdt_clean
$(RM) lib/asm config.mak $(TEST_DIR)-run test.log msr.out cscope.*
+   $(RM) -r tests
 
 cscope: common_dirs = lib lib/libfdt lib/asm lib/asm-generic
 cscope:
diff --git a/scripts/mkallstandalone.sh b/scripts/mkallstandalone.sh
new file mode 100755
index 0..028ae79543b50
--- /dev/null
+++ b/scripts/mkallstandalone.sh
@@ -0,0 +1,36 @@
+#!/bin/bash
+
+kernel=
+testname=
+
+if [ ! -f config.mak ]; then
+   echo "run ./configure && make first. See ./configure -h"
+   exit
+fi
+source config.mak
+
+exec {fd}<$TEST_DIR/unittests.cfg
+
+while read -u $fd line; do
+   if [[ "$line" =~ ^\[(.*)\]$ ]]; then
+   if [ "$testname" ]; then
+   printf "%-40s(%s)\n" $testname $kernel
+   scripts/mkstandalone.sh $kernel $testname
+   fi
+   testname=${BASH_REMATCH[1]}
+   kernel=""
+   arch=""
+   elif [[ $line =~ ^file\ *=\ *(.*)$ ]]; then
+   kernel=$TEST_DIR/${BASH_REMATCH[1]}
+   elif [[ $line =~ ^arch\ *=\ *(.*)$ ]]; then
+   arch=${BASH_REMATCH[1]}
+   if [ -n "$arch" ] && [ "$arch" != "$ARCH" ]; then
+   testname=""
+   kernel=""
+   arch=""
+   fi
+   fi
+done
+printf "%-40s(%s)\n" $testname $kernel
+scripts/mkstandalone.sh $kernel $testname
+exec {fd}<&-
diff --git a/scripts/mkstandalone.sh b/scripts/mkstandalone.sh
new file mode 100755
index 0..b288c7cf9aaba
--- /dev/null
+++ b/scripts/mkstandalone.sh
@@ -0,0 +1,110 @@
+#!/usr/bin/bash
+
+if [ -z "$1" ]; then
+   echo "usage: mkstandalone.sh  [testname]"
+   exit 1
+fi
+
+if [ ! -f config.mak ]; then
+   echo "run ./configure && make first. See ./configure -h"
+   exit
+fi
+source config.mak
+
+kernel=$1
+kernel_base=$(basename $kernel)
+testname=$2
+if [ -z "$testname" ]; then
+   testname=${kernel_base%.*}
+fi
+standalone="tests/${testname}"
+bin=$(mktemp)
+cfg=$(mktemp)
+cfg2=$(mktemp)
+arch=
+check=
+opts=
+
+mkdir -p tests
+trap "{ rm -f $bin $cfg $cfg2; exit 1; }" INT TERM
+trap "{ rm -f $bin $cfg $cfg2; }" EXIT
+
+gzip - < $kernel | base64 > $bin
+
+if grep -q "\[$testname\]" $TEST_DIR/unittests.cfg; then
+   sed -n "/\\[$testname\\]/,/^\\[/p" $TEST_DIR/unittests.cfg \
+   | awk '!/^\[/ || NR == 1' > $cfg
+   arch="$(grep ^arch $cfg | cut -d= -f2- | sed 's/^ *//')"
+   check="$(grep ^check $cfg | cut -d= -f2- | sed 's/^ *//')"
+   opts="$(grep ^extra_params $cfg | cut -d= -f2- | sed 's/^ *//')"
+   grep -ve ^arch -e ^check -e ^extra_params $cfg > $cfg2
+   cp -f $cfg2 $cfg
+else
+   echo "[$testname]" > $cfg
+   echo "file = $kernel_base" >> $cfg
+fi
+./run_tests.sh -d -u $cfg >& /dev/null
+qemu=$(cut -d' ' -f1 < test.log)
+cmdline=$(cut -d' ' -f2- < test.log)
+
+cat < $standalone
+#!/bin/bash
+scr=\$(mktemp)
+bin1=\$(mktemp)
+bin2=\$(mktemp)
+trap '{ rm -f \$scr \$bin1 \$bin2; exit 1; }' INT TERM
+trap '{ rm -f \$scr \$bin1 \$bin2; }' EXIT
+# = SCR ==
+cat - > \$scr << 'SHAR_EOF' &&
+#!/bin/bash
+
+standalone=\$1
+kernel=\$2
+
+ARCH=\`uname -m | sed -e s/i.86/i386/ | sed -e 's/arm.*/arm/'\`
+[ "\$ARCH" = "aarch64" ] && ARCH="arm64"
+
+qemu="\${QEMU:-$qemu}"
+cmdline_orig='$cmdline $opts'
+cmdline="\$(echo '$cmdline $opts' | sed s%$kernel%\$kernel%)"
+arch="$arch"
+check="$check"
+
+if [ -n "\$arch" ] && [ "\$arch" != "\$ARCH" ]; then
+

[kvm-unit-tests PATCH 1/9] x86/run: source config.mak

2015-07-10 Thread Andrew Jones
We'll use a variable from config.mak in a later patch. Try not to break
users that have gotten used to doing cd x86; ./run test.flat by also
looking in the parent directory. While at it, add this parent directory
check to arm too.

Signed-off-by: Andrew Jones 
---
 arm/run |  8 ++--
 x86/run | 10 ++
 2 files changed, 16 insertions(+), 2 deletions(-)

diff --git a/arm/run b/arm/run
index 662a8564674a3..e50709dcd12f4 100755
--- a/arm/run
+++ b/arm/run
@@ -1,10 +1,14 @@
 #!/bin/bash
 
-if [ ! -f config.mak ]; then
+if [ ! -f config.mak ] && [ ! -f ../config.mak ]; then
echo run ./configure first. See ./configure -h
exit 2
 fi
-source config.mak
+if [ -f config.mak ]; then
+   source config.mak
+else
+   source ../config.mak
+fi
 processor="$PROCESSOR"
 
 qemu="${QEMU:-qemu-system-$ARCH_NAME}"
diff --git a/x86/run b/x86/run
index 5281fca2125d8..d00e8fece4b6d 100755
--- a/x86/run
+++ b/x86/run
@@ -2,6 +2,16 @@
 NOTFOUND=1
 TESTDEVNOTSUPP=2
 
+if [ ! -f config.mak ] && [ ! -f ../config.mak ]; then
+   echo run ./configure first. See ./configure -h
+   exit 2
+fi
+if [ -f config.mak ]; then
+   source config.mak
+else
+   source ../config.mak
+fi
+
 qemubinarysearch="${QEMU:-qemu-kvm qemu-system-x86_64}"
 
 for qemucmd in ${qemubinarysearch}
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 0/9] Generate standalone tests

2015-07-10 Thread Andrew Jones
Add support to convert unit tests to standalone scripts that
can be run outside the framework. This is almost an RFC, but
it doesn't impact the current framework (except for 'make install',
but was that ever used?). The scripting is ugly, but I see value
in having easily distributable unit tests.

Testing: if you run all standalone tests, concatenating all output
to a file, then that file will match test.log after running
run_tests.sh. Additionally, all prechecks are preserved, i.e.
specific arch and 'check' conditions from unittests.cfg.

Thanks,
drew

Andrew Jones (9):
  x86/run: source config.mak
  run_tests.sh: remove blank line from start of log
  run_tests.sh: add '-d' for dry-run
  run_tests.sh: allow default unittests.cfg override
  unittests.cfg: use double quotes
  arm/unittests.cfg: make test names more friendly
  scripts: introduce mk[all]standalone.sh
  Makefile: change 'make install' to install standalone tests
  standalone: add documentation to README

 Makefile |  10 ++--
 README   |  10 
 arm/run  |  20 +---
 arm/unittests.cfg|  16 +++
 config/config-arm-common.mak |   2 -
 config/config-x86-common.mak |   2 -
 run_tests.sh |  47 +++---
 scripts/mkallstandalone.sh   |  36 ++
 scripts/mkstandalone.sh  | 110 +++
 x86/run  |  21 +++--
 x86/unittests.cfg|  16 +++
 11 files changed, 239 insertions(+), 51 deletions(-)
 create mode 100755 scripts/mkallstandalone.sh
 create mode 100755 scripts/mkstandalone.sh

-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 2/9] run_tests.sh: remove blank line from start of log

2015-07-10 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 run_tests.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/run_tests.sh b/run_tests.sh
index e48f1db049f81..6f00aa495744c 100755
--- a/run_tests.sh
+++ b/run_tests.sh
@@ -120,7 +120,7 @@ specify the appropriate qemu binary for ARCH-run.
 EOF
 }
 
-echo > test.log
+>test.log
 while getopts "g:hv" opt; do
 case $opt in
 g)
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[kvm-unit-tests PATCH 9/9] standalone: add documentation to README

2015-07-10 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 README | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/README b/README
index e9869d12bfa20..8bc966bdf1284 100644
--- a/README
+++ b/README
@@ -16,6 +16,16 @@ To select a specific qemu binary, specify the QEMU=
 environment variable, e.g.
   QEMU=/tmp/qemu/x86_64-softmmu/qemu-system-x86_64 ./x86-run ./x86/msr.flat
 
+To create/use standalone tests do
+  ./configure
+  make standalone
+  (send tests/some-test anywhere)
+  (go to anywhere)
+  ./some-test
+
+'make install' will install all tests in $PREFIX/share/kvm-unit-tests/tests,
+each as a standalone test.
+
 Directory structure:
 .: configure script, top-level Makefile, and run_tests.sh
 ./config:  collection of architecture dependent makefiles
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: svm: remove KVM_QUIRK_CD_NW_CLEARED quirk

2015-07-10 Thread Xiao Guangrong



On 07/10/2015 08:01 PM, Paolo Bonzini wrote:

We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.

When there are assigned devices, emulate cache-disabled operation
through the page tables.  This behavior is consistent with VMX,
where CD/NW are not touched by vmentry/vmexit.

Note that buggy firmware that does not clear CD/NW is _seriously_
old: SeaBIOS for example has been doing it since October 2008.


Reviewed-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: SVM: use NPT page attributes

2015-07-10 Thread Xiao Guangrong



On 07/10/2015 06:47 PM, Paolo Bonzini wrote:



On 10/07/2015 03:19, Xiao Guangrong wrote:

yes, this is correct.  QEMU still does not have support for disabling
"quirks", so gCR0.CD is currently hidden on SVM.  I would like to
include this series in 4.2, while for 4.3 I will disable the quirk above
altogether (it is superseded by the way PAT is forced to all-WB).


That plan sounds good to me.

You will drop disabled_quirks completely or just enable it in Qemu? :)


I will drop this quirk completely.  Other quirks (well, there's just
one) will remain.


Make senses.

The whole patchset looks good to me!

Reviewed-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/x86: add support for MONITOR_TRAP_FLAG

2015-07-10 Thread Mihai Donțu
On Friday 10 July 2015 13:28:26 Paolo Bonzini wrote:
> On 09/07/2015 21:49, Jan Kiszka wrote:
> >> >  CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
> >> > -CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
> >> > +CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_TRAP_FLAG 
> >> > | CPU_BASED_MONITOR_EXITING |
> > Overlong line.
> 
> Fixed and applied.

Thank you!

-- 
Mihai Donțu
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-10 Thread Bandan Das
Laszlo Ersek  writes:

> On 07/10/15 16:59, Paolo Bonzini wrote:
>> 
>> 
>> On 10/07/2015 16:57, Laszlo Ersek wrote:
> ... In any case, please understand that I'm not campaigning for this
> warning :) IIRC the warning was your (very welcome!) idea after I
> reported the problem; I'm just trying to ensure that the warning match
> the exact issue I encountered.

 Yup.  I think the right thing to do would be to hide memory above the
 limit.
>>> How so?
>>>
>>> - The stack would not be doing what the user asks for. Pass -m ,
>>> and the guest would silently see less memory. If the user found out,
>>> he'd immediately ask (or set out debugging) why. I think if the user's
>>> request cannot be satisfied, the stack should fail hard.
>> 
>> That's another possibility.  I think both of them are wrong depending on
>> _why_ you're using "-m " in the first place.
>> 
>> Considering that this really happens (on Xeons) only for 1TB+ guests,
>
> I reported this issue because I ran into it with a ~64GB guest. From my
> /proc/cpuinfo:
>
> model name  : Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GHz
> address sizes   : 36 bits physical, 48 bits virtual
>
> I was specifically developing 64GB+ support for OVMF, and this
> limitation caused me to think that there was a bug in my OVMF patches.
> (There wasn't.) An error message from QEMU, advising me to turn off EPT,
> would have saved me many hours.

Right, I specifically reserved a system with 36 bits physical to reproduce
this and it was very easy to reproduce. If it's a hardware bug, I would say,
it's a very annoying one (if not serious). I wonder if Intel folks can
chime in.

> Thanks
> Laszlo
>
>> it's probably just for debugging and then hiding the memory makes some
>> sense.
Actually, I agree with Laszlo here. Hiding memory is synonymous to forcing the
user to use less for the -m argument as is failing. But failing and letting the
user do it himself can save hours of debugging.

Regards,
The confused teenager who can't make up his mind.

>> Paolo
>> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: svm: remove KVM_QUIRK_CD_NW_CLEARED quirk

2015-07-10 Thread Joerg Roedel
On Fri, Jul 10, 2015 at 02:01:33PM +0200, Paolo Bonzini wrote:
> We can disable CD unconditionally when there is no assigned device.
> KVM now forces guest PAT to all-writeback in that case, so it makes
> sense to also force CR0.CD=0.
> 
> When there are assigned devices, emulate cache-disabled operation
> through the page tables.  This behavior is consistent with VMX,
> where CD/NW are not touched by vmentry/vmexit.
> 
> Note that buggy firmware that does not clear CD/NW is _seriously_
> old: SeaBIOS for example has been doing it since October 2008.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/svm.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)

Looks good to me.

Reviewed-by: Joerg Roedel 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-10 Thread Laszlo Ersek
On 07/10/15 16:59, Paolo Bonzini wrote:
> 
> 
> On 10/07/2015 16:57, Laszlo Ersek wrote:
 ... In any case, please understand that I'm not campaigning for this
 warning :) IIRC the warning was your (very welcome!) idea after I
 reported the problem; I'm just trying to ensure that the warning match
 the exact issue I encountered.
>>>
>>> Yup.  I think the right thing to do would be to hide memory above the
>>> limit.
>> How so?
>>
>> - The stack would not be doing what the user asks for. Pass -m ,
>> and the guest would silently see less memory. If the user found out,
>> he'd immediately ask (or set out debugging) why. I think if the user's
>> request cannot be satisfied, the stack should fail hard.
> 
> That's another possibility.  I think both of them are wrong depending on
> _why_ you're using "-m " in the first place.
> 
> Considering that this really happens (on Xeons) only for 1TB+ guests,

I reported this issue because I ran into it with a ~64GB guest. From my
/proc/cpuinfo:

model name  : Intel(R) Core(TM) i7 CPU   M 620  @ 2.67GHz
address sizes   : 36 bits physical, 48 bits virtual

I was specifically developing 64GB+ support for OVMF, and this
limitation caused me to think that there was a bug in my OVMF patches.
(There wasn't.) An error message from QEMU, advising me to turn off EPT,
would have saved me many hours.

Thanks
Laszlo

> it's probably just for debugging and then hiding the memory makes some
> sense.
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-10 Thread Paolo Bonzini


On 10/07/2015 16:57, Laszlo Ersek wrote:
> > > ... In any case, please understand that I'm not campaigning for this
> > > warning :) IIRC the warning was your (very welcome!) idea after I
> > > reported the problem; I'm just trying to ensure that the warning match
> > > the exact issue I encountered.
> > 
> > Yup.  I think the right thing to do would be to hide memory above the
> > limit.
> How so?
> 
> - The stack would not be doing what the user asks for. Pass -m ,
> and the guest would silently see less memory. If the user found out,
> he'd immediately ask (or set out debugging) why. I think if the user's
> request cannot be satisfied, the stack should fail hard.

That's another possibility.  I think both of them are wrong depending on
_why_ you're using "-m " in the first place.

Considering that this really happens (on Xeons) only for 1TB+ guests,
it's probably just for debugging and then hiding the memory makes some
sense.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-10 Thread Laszlo Ersek
On 07/10/15 16:13, Paolo Bonzini wrote:
> 
> 
> On 09/07/2015 20:57, Laszlo Ersek wrote:
>>> Without EPT, you don't
>>> hit the processor limitation with your setup, but the user should 
>>> nevertheless
>>> still be notified.
>>
>> I disagree.
> 
> FWIW, I also disagree (and it looks like Bandan disagrees with himself
> now :)).
> 
>>> In fact, I think shadow paging code should also emulate
>>> this behavior if the gpa is out of range.
>>
>> I disagree.
> 
> Same here.
> 
>> There is no "out of range" gpa. QEMU allocates enough memory, and it
>> should be completely transparent to the guest. The fact that it silently
>> breaks with nested paging if the host processor doesn't have enough
>> address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not
>> sure, but I suspect it's a hardware bug).
> 
> It's a hardware bug, possibly due to some limitations in the physical
> addresses that the TLB can store?  I guess KVM could detect the
> situation and fall back to slw shadow paging.
> 
>> ... In any case, please understand that I'm not campaigning for this
>> warning :) IIRC the warning was your (very welcome!) idea after I
>> reported the problem; I'm just trying to ensure that the warning match
>> the exact issue I encountered.
> 
> Yup.  I think the right thing to do would be to hide memory above the
> limit.

How so?

- The stack would not be doing what the user asks for. Pass -m ,
and the guest would silently see less memory. If the user found out,
he'd immediately ask (or set out debugging) why. I think if the user's
request cannot be satisfied, the stack should fail hard.

- Assuming the user didn't find out, and the guest just worked (with
less memory than the user asked for), then the hidden portion of the
memory (that QEMU allocated nonetheless) would be just wasted, on the
host system. (Especially with overcommit_memory=2 (which is the most
prudent setting).)

Thanks
Laszlo

>  A kernel patch to query the limit is definitely necessary, but
> it needs to return e.g. 48 for shadow paging (otherwise you could just
> use CPUID).  I'm not sure if the rest is possible with just QEMU, or it
> requires help from the firmware.  Probably yes.
> 
> Paolo
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 15/15] KVM: arm64: enable ITS emulation as a virtual MSI controller

2015-07-10 Thread Andre Przywara
If userspace has provided a base address for the ITS register frame,
we enable the bits that advertise LPIs in the GICv3.
When the guest has enabled LPIs and the ITS, we enable the emulation
part by initializing the ITS data structures and trapping on ITS
register frame accesses by the guest.
Also we enable the KVM_SIGNAL_MSI feature to allow userland to inject
MSIs into the guest. Not having enabled the ITS emulation will lead
to a -ENODEV when trying to inject a MSI.

Signed-off-by: Andre Przywara 
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/arm64/kvm/Kconfig|  1 +
 arch/arm64/kvm/reset.c|  6 ++
 include/kvm/arm_vgic.h|  6 ++
 virt/kvm/arm/its-emul.c   | 10 +-
 virt/kvm/arm/vgic-v3-emul.c   | 20 ++--
 virt/kvm/arm/vgic.c   |  8 
 7 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index cb04095..1b53155 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2134,7 +2134,7 @@ after pausing the vcpu, but before it is resumed.
 4.71 KVM_SIGNAL_MSI
 
 Capability: KVM_CAP_SIGNAL_MSI
-Architectures: x86
+Architectures: x86 arm64
 Type: vm ioctl
 Parameters: struct kvm_msi (in)
 Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index bfffe8f..ff9722f 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -31,6 +31,7 @@ config KVM
select KVM_VFIO
select HAVE_KVM_EVENTFD
select HAVE_KVM_IRQFD
+   select HAVE_KVM_MSI
---help---
  Support hosting virtualized guest machines.
 
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 866502b..aff209e 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -64,6 +64,12 @@ int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long 
ext)
case KVM_CAP_ARM_EL1_32BIT:
r = cpu_has_32bit_el1();
break;
+   case KVM_CAP_MSI_DEVID:
+   if (!kvm)
+   r = -EINVAL;
+   else
+   r = kvm->arch.vgic.msis_require_devid;
+   break;
default:
r = 0;
}
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9e1abf9..f50081c 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -162,6 +162,7 @@ struct vgic_io_device {
 
 struct vgic_its {
boolenabled;
+   struct vgic_io_device   iodev;
spinlock_t  lock;
u64 cbaser;
int creadr;
@@ -180,6 +181,9 @@ struct vgic_dist {
/* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
u32 vgic_model;
 
+   /* Do injected MSIs require an additional device ID? */
+   boolmsis_require_devid;
+
int nr_cpus;
int nr_irqs;
 
@@ -371,4 +375,6 @@ static inline int vgic_v3_probe(struct device_node 
*vgic_node,
 }
 #endif
 
+int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 #endif
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index a1c12bb..b6caefd 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -1073,6 +1073,7 @@ int vits_init(struct kvm *kvm)
 {
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_its *its = &dist->its;
+   int ret;
 
dist->pendbaser = kmalloc(sizeof(u64) * dist->nr_cpus, GFP_KERNEL);
if (!dist->pendbaser)
@@ -1087,9 +1088,16 @@ int vits_init(struct kvm *kvm)
INIT_LIST_HEAD(&its->device_list);
INIT_LIST_HEAD(&its->collection_list);
 
+   ret = vgic_register_kvm_io_dev(kvm, dist->vgic_its_base,
+  KVM_VGIC_V3_ITS_SIZE, vgicv3_its_ranges,
+  -1, &its->iodev);
+   if (ret)
+   return ret;
+
its->enabled = false;
+   dist->msis_require_devid = true;
 
-   return -ENXIO;
+   return 0;
 }
 
 void vits_destroy(struct kvm *kvm)
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 30bf7035..9fd1238 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -8,7 +8,6 @@
  *
  * Limitations of the emulation:
  * (RAZ/WI: read as zero, write ignore, RAO/WI: read as one, write ignore)
- * - We do not support LPIs (yet). TYPER.LPIS is reported as 0 and is RAZ/WI.
  * - We do not support the message based interrupts (MBIs) triggered by
  *   writes to the GICD_{SET,CLR}SPI_* registers. TYPER.MBIS is reported as 0.
  * - We do not support the (optional) backwards compatibility feature.
@@ -87,10 +86,10 @@ static bool handle_mmio_ctlr(struct kvm_vcpu *vcpu,
 /*
  * As this implementation does not provide compati

[PATCH v2 14/15] KVM: arm64: implement MSI injection in ITS emulation

2015-07-10 Thread Andre Przywara
When userland wants to inject a MSI into the guest, we have to use
our data structures to find the LPI number and the VCPU to receive
the interrupt.
Use the wrapper functions to iterate the linked lists and find the
proper Interrupt Translation Table Entry. Then set the pending bit
in this ITTE to be later picked up by the LR handling code. Kick
the VCPU which is meant to handle this interrupt.
We provide a VGIC emulation model specific routine for the actual
MSI injection. The wrapper functions return an error for models not
(yet) implementing MSIs (like the GICv2 emulation).
We also provide the handler for the ITS "INT" command, which allows a
guest to trigger an MSI via the ITS command queue.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/its-emul.c | 65 +
 virt/kvm/arm/its-emul.h |  2 ++
 virt/kvm/arm/vgic-v3-emul.c |  1 +
 4 files changed, 69 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 323c33a..9e1abf9 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -149,6 +149,7 @@ struct vgic_vm_ops {
int (*map_resources)(struct kvm *, const struct vgic_params *);
bool(*queue_lpis)(struct kvm_vcpu *);
void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
+   int (*inject_msi)(struct kvm *, struct kvm_msi *);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 89534c6..a1c12bb 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -323,6 +323,55 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
 }
 
 /*
+ * Translates an incoming MSI request into the redistributor (=VCPU) and
+ * the associated LPI number. Sets the LPI pending bit and also marks the
+ * VCPU as having a pending interrupt.
+ */
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi)
+{
+   struct vgic_dist *dist = &kvm->arch.vgic;
+   struct vgic_its *its = &dist->its;
+   struct its_itte *itte;
+   int cpuid;
+   bool inject = false;
+   int ret = 0;
+
+   if (!vgic_has_its(kvm))
+   return -ENODEV;
+
+   if (!(msi->flags & KVM_MSI_VALID_DEVID))
+   return -EINVAL;
+
+   spin_lock(&its->lock);
+
+   if (!its->enabled || !dist->lpis_enabled) {
+   ret = -EAGAIN;
+   goto out_unlock;
+   }
+
+   itte = find_itte(kvm, msi->devid, msi->data);
+   /* Triggering an unmapped IRQ gets silently dropped. */
+   if (!itte || !itte->collection)
+   goto out_unlock;
+
+   cpuid = itte->collection->target_addr;
+   __set_bit(cpuid, itte->pending);
+   inject = itte->enabled;
+
+out_unlock:
+   spin_unlock(&its->lock);
+
+   if (inject) {
+   spin_lock(&dist->lock);
+   __set_bit(cpuid, dist->irq_pending_on_cpu);
+   spin_unlock(&dist->lock);
+   kvm_vcpu_kick(kvm_get_vcpu(kvm, cpuid));
+   }
+
+   return ret;
+}
+
+/*
  * Find all enabled and pending LPIs and queue them into the list
  * registers.
  * The dist lock is held by the caller.
@@ -787,6 +836,19 @@ static int vits_cmd_handle_movall(struct kvm *kvm, u64 
*its_cmd)
return 0;
 }
 
+/* The INT command injects the LPI associated with that DevID/EvID pair. */
+static int vits_cmd_handle_int(struct kvm *kvm, u64 *its_cmd)
+{
+   struct kvm_msi msi = {
+   .data = its_cmd_get_id(its_cmd),
+   .devid = its_cmd_get_deviceid(its_cmd),
+   .flags = KVM_MSI_VALID_DEVID,
+   };
+
+   vits_inject_msi(kvm, &msi);
+   return 0;
+}
+
 static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd)
 {
u8 cmd = its_cmd_get_command(its_cmd);
@@ -817,6 +879,9 @@ static int vits_handle_command(struct kvm_vcpu *vcpu, u64 
*its_cmd)
case GITS_CMD_MOVALL:
ret = vits_cmd_handle_movall(vcpu->kvm, its_cmd);
break;
+   case GITS_CMD_INT:
+   ret = vits_cmd_handle_int(vcpu->kvm, its_cmd);
+   break;
case GITS_CMD_INV:
ret = vits_cmd_handle_inv(vcpu->kvm, its_cmd);
break;
diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h
index 830524a..95e56a7 100644
--- a/virt/kvm/arm/its-emul.h
+++ b/virt/kvm/arm/its-emul.h
@@ -36,6 +36,8 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 int vits_init(struct kvm *kvm);
 void vits_destroy(struct kvm *kvm);
 
+int vits_inject_msi(struct kvm *kvm, struct kvm_msi *msi);
+
 bool vits_queue_lpis(struct kvm_vcpu *vcpu);
 void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 4132c26..30bf7035 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -948,6 +948,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.init_model = vgic_v3_init_model;

[PATCH v2 11/15] KVM: arm64: handle pending bit for LPIs in ITS emulation

2015-07-10 Thread Andre Przywara
As the actual LPI number in a guest can be quite high, but is mostly
assigned using a very sparse allocation scheme, bitmaps and arrays
for storing the virtual interrupt status are a waste of memory.
We use our equivalent of the "Interrupt Translation Table Entry"
(ITTE) to hold this extra status information for a virtual LPI.
As the normal VGIC code cannot use it's fancy bitmaps to manage
pending interrupts, we provide a hook in the VGIC code to let the
ITS emulation handle the list register queueing itself.
LPIs are located in a separate number range (>=8192), so
distinguishing them is easy. With LPIs being only edge-triggered, we
get away with a less complex IRQ handling.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h  |  2 ++
 virt/kvm/arm/its-emul.c | 71 
 virt/kvm/arm/its-emul.h |  3 ++
 virt/kvm/arm/vgic-v3-emul.c |  2 ++
 virt/kvm/arm/vgic.c | 72 ++---
 5 files changed, 133 insertions(+), 17 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 1648668..2a67a10 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -147,6 +147,8 @@ struct vgic_vm_ops {
int (*init_model)(struct kvm *);
void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
+   bool(*queue_lpis)(struct kvm_vcpu *);
+   void(*unqueue_lpi)(struct kvm_vcpu *, int irq);
 };
 
 struct vgic_io_device {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 7f217fa..b9c40d7 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -50,8 +50,26 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   bool enabled;
+   unsigned long *pending;
 };
 
+#define for_each_lpi(dev, itte, kvm) \
+   list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
+   list_for_each_entry(itte, &(dev)->itt, itte_list)
+
+static struct its_itte *find_itte_by_lpi(struct kvm *kvm, int lpi)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi == lpi)
+   return itte;
+   }
+   return NULL;
+}
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -145,6 +163,59 @@ static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
return false;
 }
 
+/*
+ * Find all enabled and pending LPIs and queue them into the list
+ * registers.
+ * The dist lock is held by the caller.
+ */
+bool vits_queue_lpis(struct kvm_vcpu *vcpu)
+{
+   struct vgic_its *its = &vcpu->kvm->arch.vgic.its;
+   struct its_device *device;
+   struct its_itte *itte;
+   bool ret = true;
+
+   if (!vgic_has_its(vcpu->kvm))
+   return true;
+   if (!its->enabled || !vcpu->kvm->arch.vgic.lpis_enabled)
+   return true;
+
+   spin_lock(&its->lock);
+   for_each_lpi(device, itte, vcpu->kvm) {
+   if (!itte->enabled || !test_bit(vcpu->vcpu_id, itte->pending))
+   continue;
+
+   if (!itte->collection)
+   continue;
+
+   if (itte->collection->target_addr != vcpu->vcpu_id)
+   continue;
+
+   __clear_bit(vcpu->vcpu_id, itte->pending);
+
+   ret &= vgic_queue_irq(vcpu, 0, itte->lpi);
+   }
+
+   spin_unlock(&its->lock);
+   return ret;
+}
+
+/* Called with the distributor lock held by the caller. */
+void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
+{
+   struct vgic_its *its = &vcpu->kvm->arch.vgic.its;
+   struct its_itte *itte;
+
+   spin_lock(&its->lock);
+
+   /* Find the right ITTE and put the pending state back in there */
+   itte = find_itte_by_lpi(vcpu->kvm, lpi);
+   if (itte)
+   __set_bit(vcpu->vcpu_id, itte->pending);
+
+   spin_unlock(&its->lock);
+}
+
 static int vits_handle_command(struct kvm_vcpu *vcpu, u64 *its_cmd)
 {
return -ENODEV;
diff --git a/virt/kvm/arm/its-emul.h b/virt/kvm/arm/its-emul.h
index 472a6d0..cc5d5ff 100644
--- a/virt/kvm/arm/its-emul.h
+++ b/virt/kvm/arm/its-emul.h
@@ -33,4 +33,7 @@ void vgic_enable_lpis(struct kvm_vcpu *vcpu);
 int vits_init(struct kvm *kvm);
 void vits_destroy(struct kvm *kvm);
 
+bool vits_queue_lpis(struct kvm_vcpu *vcpu);
+void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int irq);
+
 #endif
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 49be3c3..4132c26 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -948,6 +948,8 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.init_model = vgic_v3_init_model;
dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resource

[PATCH v2 03/15] KVM: arm/arm64: add emulation model specific destroy function

2015-07-10 Thread Andre Przywara
Currently we destroy the VGIC emulation in one function that cares for
all emulated models. To be on par with init_model (which is model
specific), lets introduce a per-emulation-model destroy method, too.
Use it for a tiny GICv3 specific code already, later it will be handy
for the ITS emulation.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
 include/kvm/arm_vgic.h  |  1 +
 virt/kvm/arm/vgic-v3-emul.c |  9 +
 virt/kvm/arm/vgic.c | 11 ++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2ccfa9a..b18e2c5 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -144,6 +144,7 @@ struct vgic_vm_ops {
bool(*queue_sgi)(struct kvm_vcpu *, int irq);
void(*add_sgi_source)(struct kvm_vcpu *, int irq, int source);
int (*init_model)(struct kvm *);
+   void(*destroy_model)(struct kvm *);
int (*map_resources)(struct kvm *, const struct vgic_params *);
 };
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index e661e7f..d2eeb20 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -862,6 +862,14 @@ static int vgic_v3_init_model(struct kvm *kvm)
return 0;
 }
 
+static void vgic_v3_destroy_model(struct kvm *kvm)
+{
+   struct vgic_dist *dist = &kvm->arch.vgic;
+
+   kfree(dist->irq_spi_mpidr);
+   dist->irq_spi_mpidr = NULL;
+}
+
 /* GICv3 does not keep track of SGI sources anymore. */
 static void vgic_v3_add_sgi_source(struct kvm_vcpu *vcpu, int irq, int source)
 {
@@ -874,6 +882,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.queue_sgi = vgic_v3_queue_sgi;
dist->vm_ops.add_sgi_source = vgic_v3_add_sgi_source;
dist->vm_ops.init_model = vgic_v3_init_model;
+   dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 394622c..cc8f5ed 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -100,6 +100,14 @@ int kvm_vgic_map_resources(struct kvm *kvm)
return kvm->arch.vgic.vm_ops.map_resources(kvm, vgic);
 }
 
+static void vgic_destroy_model(struct kvm *kvm)
+{
+   struct vgic_vm_ops *vm_ops = &kvm->arch.vgic.vm_ops;
+
+   if (vm_ops->destroy_model)
+   vm_ops->destroy_model(kvm);
+}
+
 /*
  * struct vgic_bitmap contains a bitmap made of unsigned longs, but
  * extracts u32s out of them.
@@ -1629,6 +1637,8 @@ void kvm_vgic_destroy(struct kvm *kvm)
struct kvm_vcpu *vcpu;
int i;
 
+   vgic_destroy_model(kvm);
+
kvm_for_each_vcpu(i, vcpu, kvm)
kvm_vgic_vcpu_destroy(vcpu);
 
@@ -1645,7 +1655,6 @@ void kvm_vgic_destroy(struct kvm *kvm)
}
kfree(dist->irq_sgi_sources);
kfree(dist->irq_spi_cpu);
-   kfree(dist->irq_spi_mpidr);
kfree(dist->irq_spi_target);
kfree(dist->irq_pending_on_cpu);
kfree(dist->irq_active_on_cpu);
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/15] KVM: arm64: add data structures to model ITS interrupt translation

2015-07-10 Thread Andre Przywara
The GICv3 Interrupt Translation Service (ITS) uses tables in memory
to allow a sophisticated interrupt routing. It features device tables,
an interrupt table per device and a table connecting "collections" to
actual CPUs (aka. redistributors in the GICv3 lingo).
Since the interrupt numbers for the LPIs are allocated quite sparsely
and the range can be quite huge (8192 LPIs being the minimum), using
bitmaps or arrays for storing information is a waste of memory.
We use linked lists instead, which we iterate linearily. This works
very well with the actual number of LPIs/MSIs in the guest being
quite low. Should the number of LPIs exceed the number where iterating
through lists seems acceptable, we can later revisit this and use more
efficient data structures.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h  |  3 +++
 virt/kvm/arm/its-emul.c | 48 
 2 files changed, 51 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index b432055..1648668 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define VGIC_NR_IRQS_LEGACY256
 #define VGIC_NR_SGIS   16
@@ -162,6 +163,8 @@ struct vgic_its {
u64 cbaser;
int creadr;
int cwriter;
+   struct list_headdevice_list;
+   struct list_headcollection_list;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index b498f06..7f217fa 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,25 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+struct its_device {
+   struct list_head dev_list;
+   struct list_head itt;
+   u32 device_id;
+};
+
+struct its_collection {
+   struct list_head coll_list;
+   u32 collection_id;
+   u32 target_addr;
+};
+
+struct its_itte {
+   struct list_head itte_list;
+   struct its_collection *collection;
+   u32 lpi;
+   u32 event_id;
+};
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
 /* The distributor lock is held by the VGIC MMIO handler. */
@@ -311,6 +331,9 @@ int vits_init(struct kvm *kvm)
 
spin_lock_init(&its->lock);
 
+   INIT_LIST_HEAD(&its->device_list);
+   INIT_LIST_HEAD(&its->collection_list);
+
its->enabled = false;
 
return -ENXIO;
@@ -320,11 +343,36 @@ void vits_destroy(struct kvm *kvm)
 {
struct vgic_dist *dist = &kvm->arch.vgic;
struct vgic_its *its = &dist->its;
+   struct its_device *dev;
+   struct its_itte *itte;
+   struct list_head *dev_cur, *dev_temp;
+   struct list_head *cur, *temp;
 
if (!vgic_has_its(kvm))
return;
 
+   if (!its->device_list.next)
+   return;
+
+   spin_lock(&its->lock);
+   list_for_each_safe(dev_cur, dev_temp, &its->device_list) {
+   dev = container_of(dev_cur, struct its_device, dev_list);
+   list_for_each_safe(cur, temp, &dev->itt) {
+   itte = (container_of(cur, struct its_itte, itte_list));
+   list_del(cur);
+   kfree(itte);
+   }
+   list_del(dev_cur);
+   kfree(dev);
+   }
+
+   list_for_each_safe(cur, temp, &its->collection_list) {
+   list_del(cur);
+   kfree(container_of(cur, struct its_collection, coll_list));
+   }
+
kfree(dist->pendbaser);
 
its->enabled = false;
+   spin_unlock(&its->lock);
 }
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/15] KVM: arm64: Introduce new MMIO region for the ITS base address

2015-07-10 Thread Andre Przywara
The ARM GICv3 ITS controller requires a separate register frame to
cover ITS specific registers. Add a new VGIC address type and store
the address in a field in the vgic_dist structure.
Provide a function to check whether userland has provided the address,
so ITS functionality can be guarded by that check.

Signed-off-by: Andre Przywara 
---
 Documentation/virtual/kvm/devices/arm-vgic.txt |  9 +
 arch/arm64/include/uapi/asm/kvm.h  |  2 ++
 include/kvm/arm_vgic.h |  3 +++
 virt/kvm/arm/vgic-v3-emul.c|  2 ++
 virt/kvm/arm/vgic.c| 16 
 virt/kvm/arm/vgic.h|  1 +
 6 files changed, 33 insertions(+)

diff --git a/Documentation/virtual/kvm/devices/arm-vgic.txt 
b/Documentation/virtual/kvm/devices/arm-vgic.txt
index 3fb9054..ec715f9e 100644
--- a/Documentation/virtual/kvm/devices/arm-vgic.txt
+++ b/Documentation/virtual/kvm/devices/arm-vgic.txt
@@ -39,6 +39,15 @@ Groups:
   Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
   This address needs to be 64K aligned.
 
+KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit)
+  Base address in the guest physical address space of the GICv3 ITS
+  control register frame. The ITS allows MSI(-X) interrupts to be
+  injected into guests. This extension is optional, if the kernel
+  does not support the ITS, the call returns -ENODEV.
+  This memory is solely for the guest to access the ITS control
+  registers and does not cover the ITS translation register.
+  Only valid for KVM_DEV_TYPE_ARM_VGIC_V3.
+  This address needs to be 64K aligned and the region covers 64 KByte.
 
   KVM_DEV_ARM_VGIC_GRP_DIST_REGS
   Attributes:
diff --git a/arch/arm64/include/uapi/asm/kvm.h 
b/arch/arm64/include/uapi/asm/kvm.h
index d268320..a89b407c 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -81,9 +81,11 @@ struct kvm_regs {
 /* Supported VGICv3 address types  */
 #define KVM_VGIC_V3_ADDR_TYPE_DIST 2
 #define KVM_VGIC_V3_ADDR_TYPE_REDIST   3
+#define KVM_VGIC_V3_ADDR_TYPE_ITS  4
 
 #define KVM_VGIC_V3_DIST_SIZE  SZ_64K
 #define KVM_VGIC_V3_REDIST_SIZE(2 * SZ_64K)
+#define KVM_VGIC_V3_ITS_SIZE   SZ_64K
 
 #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */
 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index b18e2c5..3ee063b 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -178,6 +178,9 @@ struct vgic_dist {
phys_addr_t vgic_redist_base;
};
 
+   /* The base address of the ITS control register frame */
+   phys_addr_t vgic_its_base;
+
/* Distributor enabled */
u32 enabled;
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index 1f42348..a8cf669 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -887,6 +887,7 @@ void vgic_v3_init_emulation(struct kvm *kvm)
 
dist->vgic_dist_base = VGIC_ADDR_UNDEF;
dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_its_base = VGIC_ADDR_UNDEF;
 
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
@@ -1059,6 +1060,7 @@ static int vgic_v3_has_attr(struct kvm_device *dev,
return -ENXIO;
case KVM_VGIC_V3_ADDR_TYPE_DIST:
case KVM_VGIC_V3_ADDR_TYPE_REDIST:
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
return 0;
}
break;
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 59f1801..15e447f 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -930,6 +930,16 @@ int vgic_register_kvm_io_dev(struct kvm *kvm, gpa_t base, 
int len,
return ret;
 }
 
+bool vgic_has_its(struct kvm *kvm)
+{
+   struct vgic_dist *dist = &kvm->arch.vgic;
+
+   if (dist->vgic_model != KVM_DEV_TYPE_ARM_VGIC_V3)
+   return false;
+
+   return !IS_VGIC_ADDR_UNDEF(dist->vgic_its_base);
+}
+
 static int vgic_nr_shared_irqs(struct vgic_dist *dist)
 {
return dist->nr_irqs - VGIC_NR_PRIVATE_IRQS;
@@ -1927,6 +1937,12 @@ int kvm_vgic_addr(struct kvm *kvm, unsigned long type, 
u64 *addr, bool write)
block_size = KVM_VGIC_V3_REDIST_SIZE;
alignment = SZ_64K;
break;
+   case KVM_VGIC_V3_ADDR_TYPE_ITS:
+   type_needed = KVM_DEV_TYPE_ARM_VGIC_V3;
+   addr_ptr = &vgic->vgic_its_base;
+   block_size = KVM_VGIC_V3_ITS_SIZE;
+   alignment = SZ_64K;
+   break;
 #endif
default:
r = -ENODEV;
diff --git a/virt/kvm/arm/vgic.h b/virt/kvm/arm/vgic.h
index 0df74cb..a093f5c 100644
--- a/virt/kvm/arm/vgic.h
+++ b/virt/kvm/arm/vgic.h
@@ -136,5 +136,6 @@ int vgic_get_common_attr(struct kvm_device *dev, struct 
k

[PATCH v2 12/15] KVM: arm64: sync LPI configuration and pending tables

2015-07-10 Thread Andre Przywara
The LPI configuration and pending tables of the GICv3 LPIs are held
in tables in (guest) memory. To achieve reasonable performance, we
cache this data in our own data structures, so we need to sync those
two views from time to time. This behaviour is well described in the
GICv3 spec and is also exercised by hardware, so the sync points are
well known.

Provide functions that read the guest memory and store the
information from the configuration and pending tables in the kernel.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h  |   2 +
 virt/kvm/arm/its-emul.c | 124 
 virt/kvm/arm/its-emul.h |   3 ++
 3 files changed, 129 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 2a67a10..323c33a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -167,6 +167,8 @@ struct vgic_its {
int cwriter;
struct list_headdevice_list;
struct list_headcollection_list;
+   /* memory used for buffering guest's memory */
+   void*buffer_page;
 };
 
 struct vgic_dist {
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index b9c40d7..05245cb 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -50,6 +50,7 @@ struct its_itte {
struct its_collection *collection;
u32 lpi;
u32 event_id;
+   u8 priority;
bool enabled;
unsigned long *pending;
 };
@@ -70,8 +71,124 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+#define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
+#define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
+
+/* stores the priority and enable bit for a given LPI */
+static void update_lpi_config(struct kvm *kvm, struct its_itte *itte, u8 prop)
+{
+   itte->priority = LPI_PROP_PRIORITY(prop);
+   itte->enabled  = LPI_PROP_ENABLE_BIT(prop);
+}
+
+#define GIC_LPI_OFFSET 8192
+
+/* We scan the table in chunks the size of the smallest page size */
+#define CHUNK_SIZE 4096U
+
 #define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
 
+static int nr_idbits_propbase(u64 propbaser)
+{
+   int nr_idbits = (1U << (propbaser & 0x1f)) + 1;
+
+   return max(nr_idbits, INTERRUPT_ID_BITS_ITS);
+}
+
+/*
+ * Scan the whole LPI configuration table and put the LPI configuration
+ * data in our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_update_lpis_configuration(struct kvm *kvm)
+{
+   struct vgic_dist *dist = &kvm->arch.vgic;
+   u8 *prop = dist->its.buffer_page;
+   u32 tsize;
+   gpa_t propbase;
+   int lpi = GIC_LPI_OFFSET;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+
+   propbase = BASER_BASE_ADDRESS(dist->propbaser);
+   tsize = nr_idbits_propbase(dist->propbaser);
+
+   while (tsize > 0) {
+   int chunksize = min(tsize, CHUNK_SIZE);
+
+   ret = kvm_read_guest(kvm, propbase, prop, chunksize);
+   if (ret)
+   return false;
+
+   spin_lock(&dist->its.lock);
+   /*
+* Updating the status for all allocated LPIs. We catch
+* those LPIs that get disabled. We really don't care
+* about unmapped LPIs, as they need to be updated
+* later manually anyway once they get mapped.
+*/
+   for_each_lpi(device, itte, kvm) {
+   if (itte->lpi < lpi || itte->lpi >= lpi + chunksize)
+   continue;
+
+   update_lpi_config(kvm, itte, prop[itte->lpi - lpi]);
+   }
+   spin_unlock(&dist->its.lock);
+   tsize -= chunksize;
+   lpi += chunksize;
+   propbase += chunksize;
+   }
+
+   return true;
+}
+
+/*
+ * Scan the whole LPI pending table and sync the pending bit in there
+ * with our own data structures. This relies on the LPI being
+ * mapped before.
+ */
+static bool its_sync_lpi_pending_table(struct kvm_vcpu *vcpu)
+{
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+   unsigned long *pendmask = dist->its.buffer_page;
+   u32 nr_lpis = VITS_NR_LPIS;
+   gpa_t pendbase;
+   int lpi = 0;
+   struct its_itte *itte;
+   struct its_device *device;
+   int ret;
+   int lpi_bit, nr_bits;
+
+   pendbase = BASER_BASE_ADDRESS(dist->pendbaser[vcpu->vcpu_id]);
+
+   while (nr_lpis > 0) {
+   nr_bits = min(nr_lpis, CHUNK_SIZE * 8);
+
+   ret = kvm_read_guest(vcpu->kvm, pendbase, pendmask,
+nr_bits / 8);
+   if (ret)
+   return false;
+
+   spin_lock(&dist->its.lock);
+   for_each_lpi(device, itte, vcpu->kvm) {
+   lpi_bit = itte->lpi - lpi;
+ 

[PATCH v2 09/15] KVM: arm64: implement basic ITS register handlers

2015-07-10 Thread Andre Przywara
Add emulation for some basic MMIO registers used in the ITS emulation.
This includes:
- GITS_{CTLR,TYPER,IIDR}
- ID registers
- GITS_{CBASER,CREADR,CWRITER}
  those implement the ITS command buffer handling

Most of the handlers are pretty straight forward, but CWRITER goes
some extra miles to allow fine grained locking. The idea here
is to let only the first instance iterate through the command ring
buffer, CWRITER accesses on other VCPUs meanwhile will be picked up
by that first instance and handled as well. The ITS lock is thus only
hold for very small periods of time and is dropped before the actual
command handler is called.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h |   3 +
 include/linux/irqchip/arm-gic-v3.h |   8 ++
 virt/kvm/arm/its-emul.c| 205 +
 virt/kvm/arm/its-emul.h|   1 +
 virt/kvm/arm/vgic-v3-emul.c|   2 +
 5 files changed, 219 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9e9d4aa..b432055 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -159,6 +159,9 @@ struct vgic_io_device {
 struct vgic_its {
boolenabled;
spinlock_t  lock;
+   u64 cbaser;
+   int creadr;
+   int cwriter;
 };
 
 struct vgic_dist {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index df4e527..0b450c7 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -179,15 +179,23 @@
 #define GITS_BASER 0x0100
 #define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
+#define GITS_PIDR4 0xffd0
+#define GITS_CIDR0 0xfff0
+#define GITS_CIDR1 0xfff4
+#define GITS_CIDR2 0xfff8
+#define GITS_CIDR3 0xfffc
 
 #define GITS_TRANSLATER0x10040
 
 #define GITS_CTLR_ENABLE   (1U << 0)
 #define GITS_CTLR_QUIESCENT(1U << 31)
 
+#define GITS_TYPER_PLPIS   (1UL << 0)
+#define GITS_TYPER_IDBITS_SHIFT8
 #define GITS_TYPER_DEVBITS_SHIFT   13
 #define GITS_TYPER_DEVBITS(r)  r) >> GITS_TYPER_DEVBITS_SHIFT) & 
0x1f) + 1)
 #define GITS_TYPER_PTA (1UL << 19)
+#define GITS_TYPER_HWCOLLCNT_SHIFT 24
 
 #define GITS_CBASER_VALID  (1UL << 63)
 #define GITS_CBASER_nCnB   (0UL << 59)
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 659dd39..b498f06 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -32,10 +32,62 @@
 #include "vgic.h"
 #include "its-emul.h"
 
+#define BASER_BASE_ADDRESS(x) ((x) & 0xf000ULL)
+
+/* The distributor lock is held by the VGIC MMIO handler. */
 static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
  struct kvm_exit_mmio *mmio,
  phys_addr_t offset)
 {
+   struct vgic_its *its = &vcpu->kvm->arch.vgic.its;
+   u32 reg;
+   bool was_enabled;
+
+   switch (offset & ~3) {
+   case 0x00:  /* GITS_CTLR */
+   /* We never defer any command execution. */
+   reg = GITS_CTLR_QUIESCENT;
+   if (its->enabled)
+   reg |= GITS_CTLR_ENABLE;
+   was_enabled = its->enabled;
+   vgic_reg_access(mmio, ®, offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_VALUE);
+   its->enabled = !!(reg & GITS_CTLR_ENABLE);
+   return !was_enabled && its->enabled;
+   case 0x04:  /* GITS_IIDR */
+   reg = (PRODUCT_ID_KVM << 24) | (IMPLEMENTER_ARM << 0);
+   vgic_reg_access(mmio, ®, offset & 3,
+   ACCESS_READ_VALUE | ACCESS_WRITE_IGNORED);
+   break;
+   case 0x08:  /* GITS_TYPER */
+   /*
+* We use linear CPU numbers for redistributor addressing,
+* so GITS_TYPER.PTA is 0.
+* To avoid memory waste on the guest side, we keep the
+* number of IDBits and DevBits low for the time being.
+* This could later be made configurable by userland.
+* Since we have all collections in linked list, we claim
+* that we can hold all of the collection tables in our
+* own memory and that the ITT entry size is 1 byte (the
+* smallest possible one).
+*/
+   reg = GITS_TYPER_PLPIS;
+   reg |= 0xff << GITS_TYPER_HWCOLLCNT_SHIFT;
+   reg |= 0x0f << GITS_TYPER_DEVBITS_SHIFT;
+   reg |= 0x0f << GITS_TYPER_IDBITS_SHIFT;
+   vgic_reg_access(mmio, ®, offset & 3,
+

[PATCH v2 13/15] KVM: arm64: implement ITS command queue command handlers

2015-07-10 Thread Andre Przywara
The connection between a device, an event ID, the LPI number and the
allocated CPU is stored in in-memory tables in a GICv3, but their
format is not specified by the spec. Instead software uses a command
queue in a ring buffer to let the ITS implementation use their own
format.
Implement handlers for the various ITS commands and let them store
the requested relation into our own data structures.
To avoid kmallocs inside the ITS spinlock, we preallocate possibly
needed memory outside of the lock and free that if it turns out to
be not needed (mostly error handling).
Error handling is very basic at this point, as we don't have a good
way of communicating errors to the guest (usually a SError).
The INT command handler is missing at this point, as we gain the
capability of actually injecting MSIs into the guest only later on.

Signed-off-by: Andre Przywara 
---
 include/linux/irqchip/arm-gic-v3.h |   5 +-
 virt/kvm/arm/its-emul.c| 497 -
 virt/kvm/arm/its-emul.h|  11 +
 3 files changed, 511 insertions(+), 2 deletions(-)

diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index 0b450c7..80db4f6 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -253,7 +253,10 @@
  */
 #define GITS_CMD_MAPD  0x08
 #define GITS_CMD_MAPC  0x09
-#define GITS_CMD_MAPVI 0x0a
+#define GITS_CMD_MAPTI 0x0a
+/* older GIC documentation used MAPVI for this command */
+#define GITS_CMD_MAPVI GITS_CMD_MAPTI
+#define GITS_CMD_MAPI  0x0b
 #define GITS_CMD_MOVI  0x01
 #define GITS_CMD_DISCARD   0x0f
 #define GITS_CMD_INV   0x0c
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
index 05245cb..89534c6 100644
--- a/virt/kvm/arm/its-emul.c
+++ b/virt/kvm/arm/its-emul.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -55,6 +56,34 @@ struct its_itte {
unsigned long *pending;
 };
 
+static struct its_device *find_its_device(struct kvm *kvm, u32 device_id)
+{
+   struct vgic_its *its = &kvm->arch.vgic.its;
+   struct its_device *device;
+
+   list_for_each_entry(device, &its->device_list, dev_list)
+   if (device_id == device->device_id)
+   return device;
+
+   return NULL;
+}
+
+static struct its_itte *find_itte(struct kvm *kvm, u32 device_id, u32 event_id)
+{
+   struct its_device *device;
+   struct its_itte *itte;
+
+   device = find_its_device(kvm, device_id);
+   if (device == NULL)
+   return NULL;
+
+   list_for_each_entry(itte, &device->itt, itte_list)
+   if (itte->event_id == event_id)
+   return itte;
+
+   return NULL;
+}
+
 #define for_each_lpi(dev, itte, kvm) \
list_for_each_entry(dev, &(kvm)->arch.vgic.its.device_list, dev_list) \
list_for_each_entry(itte, &(dev)->itt, itte_list)
@@ -71,6 +100,19 @@ static struct its_itte *find_itte_by_lpi(struct kvm *kvm, 
int lpi)
return NULL;
 }
 
+static struct its_collection *find_collection(struct kvm *kvm, int coll_id)
+{
+   struct its_collection *collection;
+
+   list_for_each_entry(collection, &kvm->arch.vgic.its.collection_list,
+   coll_list) {
+   if (coll_id == collection->collection_id)
+   return collection;
+   }
+
+   return NULL;
+}
+
 #define LPI_PROP_ENABLE_BIT(p) ((p) & LPI_PROP_ENABLED)
 #define LPI_PROP_PRIORITY(p)   ((p) & 0xfc)
 
@@ -333,9 +375,461 @@ void vits_unqueue_lpi(struct kvm_vcpu *vcpu, int lpi)
spin_unlock(&its->lock);
 }
 
+static u64 its_cmd_mask_field(u64 *its_cmd, int word, int shift, int size)
+{
+   return (le64_to_cpu(its_cmd[word]) >> shift) & (BIT_ULL(size) - 1);
+}
+
+#define its_cmd_get_command(cmd)   its_cmd_mask_field(cmd, 0,  0,  8)
+#define its_cmd_get_deviceid(cmd)  its_cmd_mask_field(cmd, 0, 32, 32)
+#define its_cmd_get_id(cmd)its_cmd_mask_field(cmd, 1,  0, 32)
+#define its_cmd_get_physical_id(cmd)   its_cmd_mask_field(cmd, 1, 32, 32)
+#define its_cmd_get_collection(cmd)its_cmd_mask_field(cmd, 2,  0, 16)
+#define its_cmd_get_target_addr(cmd)   its_cmd_mask_field(cmd, 2, 16, 32)
+#define its_cmd_get_validbit(cmd)  its_cmd_mask_field(cmd, 2, 63,  1)
+
+/* The DISCARD command frees an Interrupt Translation Table Entry (ITTE). */
+static int vits_cmd_handle_discard(struct kvm *kvm, u64 *its_cmd)
+{
+   struct vgic_its *its = &kvm->arch.vgic.its;
+   u32 device_id;
+   u32 event_id;
+   struct its_itte *itte;
+   int ret = 0;
+
+   device_id = its_cmd_get_deviceid(its_cmd);
+   event_id = its_cmd_get_id(its_cmd);
+
+   spin_lock(&its->lock);
+   itte = find_itte(kvm, device_id, event_id);
+   if (!itte || !itte->collection) {
+  

[PATCH v2 04/15] KVM: arm/arm64: extend arch CAP checks to allow per-VM capabilities

2015-07-10 Thread Andre Przywara
KVM capabilities can be a per-VM property, though ARM/ARM64 currently
does not pass on the VM pointer to the architecture specific
capability handlers.
Add a "struct kvm*" parameter to those function to later allow proper
per-VM capability reporting.

Signed-off-by: Andre Przywara 
---
 arch/arm/include/asm/kvm_host.h   | 2 +-
 arch/arm/kvm/arm.c| 2 +-
 arch/arm64/include/asm/kvm_host.h | 2 +-
 arch/arm64/kvm/reset.c| 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index e896d2c..56cac05 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -213,7 +213,7 @@ static inline void __cpu_init_hyp_mode(phys_addr_t 
boot_pgd_ptr,
kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr);
 }
 
-static inline int kvm_arch_dev_ioctl_check_extension(long ext)
+static inline int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
return 0;
 }
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index bc738d2..7c65353 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -196,7 +196,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
r = KVM_MAX_VCPUS;
break;
default:
-   r = kvm_arch_dev_ioctl_check_extension(ext);
+   r = kvm_arch_dev_ioctl_check_extension(kvm, ext);
break;
}
return r;
diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 2709db2..8d78a72 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -47,7 +47,7 @@
 
 int __attribute_const__ kvm_target_cpu(void);
 int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
-int kvm_arch_dev_ioctl_check_extension(long ext);
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
 
 struct kvm_arch {
/* The VMID generation used for the virt. memory system */
diff --git a/arch/arm64/kvm/reset.c b/arch/arm64/kvm/reset.c
index 0b43265..866502b 100644
--- a/arch/arm64/kvm/reset.c
+++ b/arch/arm64/kvm/reset.c
@@ -56,7 +56,7 @@ static bool cpu_has_32bit_el1(void)
return !!(pfr0 & 0x20);
 }
 
-int kvm_arch_dev_ioctl_check_extension(long ext)
+int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext)
 {
int r;
 
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 02/15] KVM: extend struct kvm_msi to hold a 32-bit device ID

2015-07-10 Thread Andre Przywara
The ARM GICv3 ITS MSI controller requires a device ID to be able to
assign the proper interrupt vector. On real hardware, this ID is
sampled from the bus. To be able to emulate an ITS controller, extend
the KVM MSI interface to let userspace provide such a device ID. For
PCI devices, the device ID is simply the 16-bit bus-device-function
triplet, which should be easily available to the userland tool.

Also there is a new KVM capability which advertises whether the
current VM requires a device ID to be set along with the MSI data.
This flag is still reported as not available everywhere, later we will
enable it when ITS emulation is used.

Signed-off-by: Andre Przywara 
Reviewed-by: Eric Auger 
---
 Documentation/virtual/kvm/api.txt | 12 ++--
 include/uapi/linux/kvm.h  |  5 -
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index a7926a9..cb04095 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2147,10 +2147,18 @@ struct kvm_msi {
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
-No flags are defined so far. The corresponding field must be 0.
+flags: KVM_MSI_VALID_DEVID: devid contains a valid value
+devid: If KVM_MSI_VALID_DEVID is set, contains a unique device identifier
+   for the device that wrote the MSI message.
+   For PCI, this is usually a BFD identifier in the lower 16 bits.
+
+The per-VM KVM_CAP_MSI_DEVID capability advertises the need to provide
+the device ID. If this capability is not set, userland cannot rely on
+the kernel to allow the KVM_MSI_VALID_DEVID flag being set.
 
 
 4.71 KVM_CREATE_PIT2
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 716ad4a..1c48def 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -817,6 +817,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_DISABLE_QUIRKS 116
 #define KVM_CAP_X86_SMM 117
 #define KVM_CAP_MULTI_ADDRESS_SPACE 118
+#define KVM_CAP_MSI_DEVID 119
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -968,12 +969,14 @@ struct kvm_one_reg {
__u64 addr;
 };
 
+#define KVM_MSI_VALID_DEVID(1U << 0)
 struct kvm_msi {
__u32 address_lo;
__u32 address_hi;
__u32 data;
__u32 flags;
-   __u8  pad[16];
+   __u32 devid;
+   __u8  pad[12];
 };
 
 struct kvm_arm_device_addr {
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 00/15] KVM: arm64: GICv3 ITS emulation

2015-07-10 Thread Andre Przywara
Hi,

this respin tries to address all comments I got so far from the list.
Thanks to Eric, Pavel and Christoffer for the review!
The major change in this series is the reworked locking. The current
implementation is now much more fine grained and avoids any calls to
functions that could possibly sleep when the ITS spinlock is held.
For many command handlers this change was straight forward, for some
memory is pre-allocated to avoid kmalloc calls. The function actually
triggering the command processing required some more trickery
(see patch 09/15).
Patch 01, 03, 06, 07, 10 were mostly unchanged, see the Changelog
below for a list of changes in the rest. On [1] there is a branch
called its-emul/v1-updated which is v1 plus the fixes rebased on top
of 4.2-rc1 and has all the changes from v2 in one single commit on
top of it.

Please have a look and feel free to ask questions or give comments.
Also holler if I should have missed to address a previous comment.
For this drop I haven't considered Eric's IRQ routing series too
much, but we should definitely coordinate those two series from this
point on.

Cheers,
Andre.

Changelog v1..v2
- fix issues when using non-ITS GICv3 emulation
- streamline frame address initialization (new patch 05/15)
- preallocate buffer memory for reading from guest's memory
- move locking into the actual command handlers
-   preallocate memory for new structures if needed
- use non-atomic __set_bit() and __clear_bit() when under the lock
- add INT command handler to allow LPI injection from the guest
- rewrite CWRITER handler to align with new locking scheme
- remove unneeded CONFIG_HAVE_KVM_MSI #ifdefs
- check memory table size against our LPI limit (65536 interrupts)
- observe initial gap of 1024 interrupts in pending table
- use term "configuration table" to be in line with the spec
- clarify and extend documentation on API extensions
- introduce new KVM_CAP_MSI_DEVID capability to advertise device ID requirement
- update, fix and add many comments
- minor style changes as requested by reviewers

---

The GICv3 ITS (Interrupt Translation Service) is a part of the
ARM GICv3 interrupt controller [3] used for implementing MSIs.
It specifies a new kind of interrupts (LPIs), which are mapped to
establish a connection between a device, its MSI payload value and
the target processor the IRQ is eventually delivered to.
In order to allow using MSIs in an ARM64 KVM guest, we emulate this
ITS widget in the kernel.
The ITS works by reading commands written by software (from the guest
in our case) into a (guest allocated) memory region and establishing
the mapping between a device, the MSI payload and the target CPU.
We parse these commands and update our internal data structures to
reflect those changes. On an MSI injection we iterate those
structures to learn the LPI number we have to inject.
For the time being we use simple lists to hold the data, this is
good enough for the small number of entries each of the components
currently have. Should this become a performance bottleneck in the
future, those can be extended to arrays or trees if needed.

Most of the code lives in a separate source file (its-emul.c), though
there are some changes necessary both in vgic.c and vgic-v3-emul.c.

Patch 01/15 gets rid of the internal tracking of the used LR for
an injected IRQ, see the commit message for more details.
Patch 02/13 extends the KVM MSI ioctl to hold a device ID.
Patch 03-05 make small changes to the existing VGIC code which make
adaptions to the ITS later easier.
The rest of the patches implement the ITS functionality step by step.
For more details see the respective commit messages.

For the time being this series gives us the ability to use emulated
PCI devices that can use MSIs in the guest. Those have to be
triggered by letting the userland device emulation simulate the MSI
write with the KVM_SIGNAL_MSI ioctl. This will be translated into
the proper LPI by the ITS emulation and injected into the guest in
the usual way (just with a higher IRQ number).

This series is based on 4.2-rc1 and can be found at the its-emul/v2
branch of this repository [1].
For this to be used you need a GICv3 host machine (a fast model would
do), though it does not rely on any host ITS bits (neither in hardware
or software).

To test this you can use the kvmtool patches available in the "its"
branch here [2].
Start a guest with: "$ lkvm run --irqchip=gicv3-its --force-pci"
and see the ITS being used for instance by the virtio devices.

[1]: git://linux-arm.org/linux-ap.git
 http://www.linux-arm.org/git?p=linux-ap.git;a=log;h=refs/heads/its-emul/v2
[2]: git://linux-arm.org/kvmtool.git
 http://www.linux-arm.org/git?p=kvmtool.git;a=log;h=refs/heads/its
[3]: 
http://arminfo.emea.arm.com/help/topic/com.arm.doc.ihi0069a/IHI0069A_gic_architecture_specification.pdf

Andre Przywara (15):
  KVM: arm/arm64: VGIC: don't track used LRs in the distributor
  KVM: extend struct kvm_msi to hold a 32-bit device I

[PATCH v2 01/15] KVM: arm/arm64: VGIC: don't track used LRs in the distributor

2015-07-10 Thread Andre Przywara
Currently we track which IRQ has been mapped to which VGIC list
register and also have to synchronize both. We used to do this
to hold some extra state (for instance the active bit).
It turns out that this extra state in the LRs is no longer needed and
this extra tracking causes some pain later.
Remove the tracking feature (lr_map and lr_used) and get rid of
quite some code on the way.
On a guest exit we pick up all still pending IRQs from the LRs and put
them back in the distributor. We don't care about active-only IRQs,
so we keep them in the LRs. They will be retired either by our
vgic_process_maintenance() routine or by the GIC hardware in case of
edge triggered interrupts.
In places where we scan LRs we now use our shadow copy of the ELRSR
register directly.
This code change means we lose the "piggy-back" optimization, which
would re-use an active-only LR to inject the pending state on top of
it. Tracing with various workloads shows that this actually occurred
very rarely, the ballpark figure is about once every 10,000 exits
in a disk I/O heavy workload. Also the list registers don't seem to
as scarce as assumed, with all 4 LRs on the popular implementations
used less than once every 100,000 exits.

This has been briefly tested on Midway, Juno and the model (the latter
both with GICv2 and GICv3 guests).

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h |   6 ---
 virt/kvm/arm/vgic-v2.c |   1 +
 virt/kvm/arm/vgic-v3.c |   1 +
 virt/kvm/arm/vgic.c| 143 ++---
 4 files changed, 66 insertions(+), 85 deletions(-)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 133ea00..2ccfa9a 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -279,9 +279,6 @@ struct vgic_v3_cpu_if {
 };
 
 struct vgic_cpu {
-   /* per IRQ to LR mapping */
-   u8  *vgic_irq_lr_map;
-
/* Pending/active/both interrupts on this VCPU */
DECLARE_BITMAP( pending_percpu, VGIC_NR_PRIVATE_IRQS);
DECLARE_BITMAP( active_percpu, VGIC_NR_PRIVATE_IRQS);
@@ -292,9 +289,6 @@ struct vgic_cpu {
unsigned long   *active_shared;
unsigned long   *pend_act_shared;
 
-   /* Bitmap of used/free list registers */
-   DECLARE_BITMAP( lr_used, VGIC_V2_MAX_LRS);
-
/* Number of list registers on this CPU */
int nr_lr;
 
diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
index f9b9c7c..f723710 100644
--- a/virt/kvm/arm/vgic-v2.c
+++ b/virt/kvm/arm/vgic-v2.c
@@ -144,6 +144,7 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vcpu->arch.vgic_cpu.vgic_v2.vgic_vmcr = 0;
+   vcpu->arch.vgic_cpu.vgic_v2.vgic_elrsr = ~0;
 
/* Get the show on the road... */
vcpu->arch.vgic_cpu.vgic_v2.vgic_hcr = GICH_HCR_EN;
diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
index dff0602..21e5d28 100644
--- a/virt/kvm/arm/vgic-v3.c
+++ b/virt/kvm/arm/vgic-v3.c
@@ -178,6 +178,7 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
 * anyway.
 */
vgic_v3->vgic_vmcr = 0;
+   vgic_v3->vgic_elrsr = ~0;
 
/*
 * If we are emulating a GICv3, we do it in an non-GICv2-compatible
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index bc40137..394622c 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -79,7 +79,6 @@
 #include "vgic.h"
 
 static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu);
-static void vgic_retire_lr(int lr_nr, int irq, struct kvm_vcpu *vcpu);
 static struct vgic_lr vgic_get_lr(const struct kvm_vcpu *vcpu, int lr);
 static void vgic_set_lr(struct kvm_vcpu *vcpu, int lr, struct vgic_lr lr_desc);
 
@@ -647,6 +646,17 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
return false;
 }
 
+static void vgic_sync_lr_elrsr(struct kvm_vcpu *vcpu, int lr,
+  struct vgic_lr vlr)
+{
+   vgic_ops->sync_lr_elrsr(vcpu, lr, vlr);
+}
+
+static inline u64 vgic_get_elrsr(struct kvm_vcpu *vcpu)
+{
+   return vgic_ops->get_elrsr(vcpu);
+}
+
 /**
  * vgic_unqueue_irqs - move pending/active IRQs from LRs to the distributor
  * @vgic_cpu: Pointer to the vgic_cpu struct holding the LRs
@@ -658,9 +668,11 @@ bool vgic_handle_cfg_reg(u32 *reg, struct kvm_exit_mmio 
*mmio,
 void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 {
struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu;
+   u64 elrsr = vgic_get_elrsr(vcpu);
+   unsigned long *elrsr_ptr = u64_to_bitmask(&elrsr);
int i;
 
-   for_each_set_bit(i, vgic_cpu->lr_used, vgic_cpu->nr_lr) {
+   for_each_clear_bit(i, elrsr_ptr, vgic_cpu->nr_lr) {
struct vgic_lr lr = vgic_get_lr(vcpu, i);
 
/*
@@ -703,7 +715,7 @@ void vgic_unqueue_irqs(struct kvm_vcpu *vcpu)
 * Mark the LR as free for other use.
 */
BUG_ON(lr.state & LR_STATE_MASK);
-   vgic_retire_lr(i, lr.irq, vc

[PATCH v2 05/15] KVM: arm/arm64: make GIC frame address initialization model specific

2015-07-10 Thread Andre Przywara
Currently we initialize all the possible GIC frame addresses in one
function, without looking at the specific GIC model we instantiate
for the guest.
As this gets confusing when adding another VGIC model later, lets
move these initializations into the respective model's init functions.

Signed-off-by: Andre Przywara 
---
 virt/kvm/arm/vgic-v2-emul.c | 3 +++
 virt/kvm/arm/vgic-v3-emul.c | 3 +++
 virt/kvm/arm/vgic.c | 3 ---
 3 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/virt/kvm/arm/vgic-v2-emul.c b/virt/kvm/arm/vgic-v2-emul.c
index 1390797..8faa28c 100644
--- a/virt/kvm/arm/vgic-v2-emul.c
+++ b/virt/kvm/arm/vgic-v2-emul.c
@@ -567,6 +567,9 @@ void vgic_v2_init_emulation(struct kvm *kvm)
dist->vm_ops.init_model = vgic_v2_init_model;
dist->vm_ops.map_resources = vgic_v2_map_resources;
 
+   dist->vgic_cpu_base = VGIC_ADDR_UNDEF;
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = VGIC_V2_MAX_CPUS;
 }
 
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index d2eeb20..1f42348 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -885,6 +885,9 @@ void vgic_v3_init_emulation(struct kvm *kvm)
dist->vm_ops.destroy_model = vgic_v3_destroy_model;
dist->vm_ops.map_resources = vgic_v3_map_resources;
 
+   dist->vgic_dist_base = VGIC_ADDR_UNDEF;
+   dist->vgic_redist_base = VGIC_ADDR_UNDEF;
+
kvm->arch.max_vcpus = KVM_MAX_VCPUS;
 }
 
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index cc8f5ed..59f1801 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -1830,9 +1830,6 @@ int kvm_vgic_create(struct kvm *kvm, u32 type)
kvm->arch.vgic.in_kernel = true;
kvm->arch.vgic.vgic_model = type;
kvm->arch.vgic.vctrl_base = vgic->vctrl_base;
-   kvm->arch.vgic.vgic_dist_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_cpu_base = VGIC_ADDR_UNDEF;
-   kvm->arch.vgic.vgic_redist_base = VGIC_ADDR_UNDEF;
 
 out_unlock:
for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) {
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/15] KVM: arm64: introduce ITS emulation file with stub functions

2015-07-10 Thread Andre Przywara
The ARM GICv3 ITS emulation code goes into a separate file, but
needs to be connected to the GICv3 emulation, of which it is an
option.
Introduce the skeleton with function stubs to be filled later.
Introduce the basic ITS data structure and initialize it, but don't
return any success yet, as we are not yet ready for the show.

Signed-off-by: Andre Przywara 
---
 arch/arm64/kvm/Makefile|   1 +
 include/kvm/arm_vgic.h |   6 ++
 include/linux/irqchip/arm-gic-v3.h |   1 +
 virt/kvm/arm/its-emul.c| 125 +
 virt/kvm/arm/its-emul.h|  35 +++
 virt/kvm/arm/vgic-v3-emul.c|  24 ++-
 6 files changed, 189 insertions(+), 3 deletions(-)
 create mode 100644 virt/kvm/arm/its-emul.c
 create mode 100644 virt/kvm/arm/its-emul.h

diff --git a/arch/arm64/kvm/Makefile b/arch/arm64/kvm/Makefile
index f90f4aa..9803307 100644
--- a/arch/arm64/kvm/Makefile
+++ b/arch/arm64/kvm/Makefile
@@ -25,5 +25,6 @@ kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v2-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v2-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o
+kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/its-emul.o
 kvm-$(CONFIG_KVM_ARM_HOST) += vgic-v3-switch.o
 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 8c6cb0e..9e9d4aa 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -156,6 +156,11 @@ struct vgic_io_device {
struct kvm_io_device dev;
 };
 
+struct vgic_its {
+   boolenabled;
+   spinlock_t  lock;
+};
+
 struct vgic_dist {
spinlock_t  lock;
boolin_kernel;
@@ -264,6 +269,7 @@ struct vgic_dist {
u64 *pendbaser;
 
boollpis_enabled;
+   struct vgic_its its;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/include/linux/irqchip/arm-gic-v3.h 
b/include/linux/irqchip/arm-gic-v3.h
index ffbc034..df4e527 100644
--- a/include/linux/irqchip/arm-gic-v3.h
+++ b/include/linux/irqchip/arm-gic-v3.h
@@ -177,6 +177,7 @@
 #define GITS_CWRITER   0x0088
 #define GITS_CREADR0x0090
 #define GITS_BASER 0x0100
+#define GITS_IDREGS_BASE   0xffd0
 #define GITS_PIDR2 GICR_PIDR2
 
 #define GITS_TRANSLATER0x10040
diff --git a/virt/kvm/arm/its-emul.c b/virt/kvm/arm/its-emul.c
new file mode 100644
index 000..659dd39
--- /dev/null
+++ b/virt/kvm/arm/its-emul.c
@@ -0,0 +1,125 @@
+/*
+ * GICv3 ITS emulation
+ *
+ * Copyright (C) 2015 ARM Ltd.
+ * Author: Andre Przywara 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "vgic.h"
+#include "its-emul.h"
+
+static bool handle_mmio_misc_gits(struct kvm_vcpu *vcpu,
+ struct kvm_exit_mmio *mmio,
+ phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_idregs(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cbaser(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_cwriter(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   return false;
+}
+
+static bool handle_mmio_gits_creadr(struct kvm_vcpu *vcpu,
+   struct kvm_exit_mmio *mmio,
+   phys_addr_t offset)
+{
+   return false;
+}
+
+static const struct vgic_io_range vgicv3_its_ranges[] = {
+   {
+   .base   = GITS_CTLR,
+   .len= 0x10,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_misc_gits,
+   },
+   {
+   .base   = GITS_CBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio  

[PATCH v2 07/15] KVM: arm64: handle ITS related GICv3 redistributor registers

2015-07-10 Thread Andre Przywara
In the GICv3 redistributor there are the PENDBASER and PROPBASER
registers which we did not emulate so far, as they only make sense
when having an ITS. In preparation for that emulate those MMIO
accesses by storing the 64-bit data written into it into a variable
which we later read in the ITS emulation.

Signed-off-by: Andre Przywara 
---
 include/kvm/arm_vgic.h  |  8 
 virt/kvm/arm/vgic-v3-emul.c | 44 
 virt/kvm/arm/vgic.c | 35 +++
 virt/kvm/arm/vgic.h |  4 
 4 files changed, 91 insertions(+)

diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 3ee063b..8c6cb0e 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -256,6 +256,14 @@ struct vgic_dist {
struct vgic_vm_ops  vm_ops;
struct vgic_io_device   dist_iodev;
struct vgic_io_device   *redist_iodevs;
+
+   /* Address of LPI configuration table shared by all redistributors */
+   u64 propbaser;
+
+   /* Addresses of LPI pending tables per redistributor */
+   u64 *pendbaser;
+
+   boollpis_enabled;
 };
 
 struct vgic_v2_cpu_if {
diff --git a/virt/kvm/arm/vgic-v3-emul.c b/virt/kvm/arm/vgic-v3-emul.c
index a8cf669..5269ad1 100644
--- a/virt/kvm/arm/vgic-v3-emul.c
+++ b/virt/kvm/arm/vgic-v3-emul.c
@@ -651,6 +651,38 @@ static bool handle_mmio_cfg_reg_redist(struct kvm_vcpu 
*vcpu,
return vgic_handle_cfg_reg(reg, mmio, offset);
 }
 
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_propbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_handle_base_register(vcpu, mmio, offset, &dist->propbaser, mode);
+
+   return false;
+}
+
+/* We don't trigger any actions here, just store the register value */
+static bool handle_mmio_pendbaser_redist(struct kvm_vcpu *vcpu,
+struct kvm_exit_mmio *mmio,
+phys_addr_t offset)
+{
+   struct kvm_vcpu *rdvcpu = mmio->private;
+   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
+   int mode = ACCESS_READ_VALUE;
+
+   /* Storing a value with LPIs already enabled is undefined */
+   mode |= dist->lpis_enabled ? ACCESS_WRITE_IGNORED : ACCESS_WRITE_VALUE;
+   vgic_handle_base_register(vcpu, mmio, offset,
+ &dist->pendbaser[rdvcpu->vcpu_id], mode);
+
+   return false;
+}
+
 #define SGI_base(x) ((x) + SZ_64K)
 
 static const struct vgic_io_range vgic_redist_ranges[] = {
@@ -679,6 +711,18 @@ static const struct vgic_io_range vgic_redist_ranges[] = {
.handle_mmio= handle_mmio_raz_wi,
},
{
+   .base   = GICR_PENDBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_pendbaser_redist,
+   },
+   {
+   .base   = GICR_PROPBASER,
+   .len= 0x08,
+   .bits_per_irq   = 0,
+   .handle_mmio= handle_mmio_propbaser_redist,
+   },
+   {
.base   = GICR_IDREGS,
.len= 0x30,
.bits_per_irq   = 0,
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
index 15e447f..49ee92b 100644
--- a/virt/kvm/arm/vgic.c
+++ b/virt/kvm/arm/vgic.c
@@ -446,6 +446,41 @@ void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
}
 }
 
+/* handle a 64-bit register access */
+void vgic_handle_base_register(struct kvm_vcpu *vcpu,
+  struct kvm_exit_mmio *mmio,
+  phys_addr_t offset, u64 *basereg,
+  int mode)
+{
+   u32 reg;
+   u64 breg;
+
+   switch (offset & ~3) {
+   case 0x00:
+   breg = *basereg;
+   reg = lower_32_bits(breg);
+   vgic_reg_access(mmio, ®, offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg &= GENMASK_ULL(63, 32);
+   breg |= reg;
+   *basereg = breg;
+   }
+   break;
+   case 0x04:
+   breg = *basereg;
+   reg = upper_32_bits(breg);
+   vgic_reg_access(mmio, ®, offset & 3, mode);
+   if (mmio->is_write && (mode & ACCESS_WRITE_VALUE)) {
+   breg  = lower_32_bits(breg);
+   breg |= (u64)reg << 32;
+  

Re: [PATCH] KVM: x86: Add host physical address width capability

2015-07-10 Thread Paolo Bonzini


On 09/07/2015 20:57, Laszlo Ersek wrote:
>> Without EPT, you don't
>> hit the processor limitation with your setup, but the user should 
>> nevertheless
>> still be notified.
> 
> I disagree.

FWIW, I also disagree (and it looks like Bandan disagrees with himself
now :)).

>> In fact, I think shadow paging code should also emulate
>> this behavior if the gpa is out of range.
> 
> I disagree.

Same here.

> There is no "out of range" gpa. QEMU allocates enough memory, and it
> should be completely transparent to the guest. The fact that it silently
> breaks with nested paging if the host processor doesn't have enough
> address bits is a bug (maybe a hardware bug, maybe a KVM bug; I'm not
> sure, but I suspect it's a hardware bug).

It's a hardware bug, possibly due to some limitations in the physical
addresses that the TLB can store?  I guess KVM could detect the
situation and fall back to slw shadow paging.

> ... In any case, please understand that I'm not campaigning for this
> warning :) IIRC the warning was your (very welcome!) idea after I
> reported the problem; I'm just trying to ensure that the warning match
> the exact issue I encountered.

Yup.  I think the right thing to do would be to hide memory above the
limit.  A kernel patch to query the limit is definitely necessary, but
it needs to return e.g. 48 for shadow paging (otherwise you could just
use CPUID).  I'm not sure if the rest is possible with just QEMU, or it
requires help from the firmware.  Probably yes.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-10 Thread Alex Williamson
On Fri, 2015-07-10 at 14:47 +0200, Paolo Bonzini wrote:
> 
> On 10/07/2015 10:28, Wu, Feng wrote:
> > > Yes, you are right. All we need is the producer information which has been
> > > passed in the register routine. And we can easily make this update logic
> > > inside the consumer. Thanks for your comments!
> > 
> > BTW, Paolo & Alex, in VFIO framework, how can we know a vCPU or a guest
> > has assigned devices to it?
> 
> See here:
> http://article.gmane.org/gmane.comp.emulators.kvm.devel/137930/raw


In general, VFIO has zero visibility into KVM.  VFIO doesn't know or
care what the userspace driver is, whether it's QEMU/KVM, a set of ruby
bindings for VFIO, a DPDK library, etc.  As Paolo points out, KVM does
have ways to be told about assigned devices from userspace and probe
some properties, like whether the IOMMU allows non-coherent DMA.  These
are handled by the KVM-VFIO pseudo device (virt/kvm/vfio.c).  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-10 Thread Paolo Bonzini


On 10/07/2015 10:28, Wu, Feng wrote:
> > Yes, you are right. All we need is the producer information which has been
> > passed in the register routine. And we can easily make this update logic
> > inside the consumer. Thanks for your comments!
> 
> BTW, Paolo & Alex, in VFIO framework, how can we know a vCPU or a guest
> has assigned devices to it?

See here:
http://article.gmane.org/gmane.comp.emulators.kvm.devel/137930/raw

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: svm: remove KVM_QUIRK_CD_NW_CLEARED quirk

2015-07-10 Thread Paolo Bonzini
We can disable CD unconditionally when there is no assigned device.
KVM now forces guest PAT to all-writeback in that case, so it makes
sense to also force CR0.CD=0.

When there are assigned devices, emulate cache-disabled operation
through the page tables.  This behavior is consistent with VMX,
where CD/NW are not touched by vmentry/vmexit.

Note that buggy firmware that does not clear CD/NW is _seriously_
old: SeaBIOS for example has been doing it since October 2008.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/svm.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index bbc678a66b18..9b1513065a6a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1173,6 +1173,9 @@ static u64 svm_get_mt_mask(struct kvm_vcpu *vcpu, gfn_t 
gfn, bool is_mmio)
if (!is_mmio && !kvm_arch_has_assigned_device(vcpu->kvm))
return 0;
 
+   if (kvm_read_cr0(vcpu) & X86_CR0_CD)
+   return _PAGE_NOCACHE;
+
mtrr = kvm_mtrr_get_guest_memory_type(vcpu, gfn);
return mtrr2protval[mtrr];
 }
@@ -1667,13 +1670,10 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned 
long cr0)
 
if (!vcpu->fpu_active)
cr0 |= X86_CR0_TS;
-   /*
-* re-enable caching here because the QEMU bios
-* does not do it - this results in some delay at
-* reboot
-*/
-   if (!(vcpu->kvm->arch.disabled_quirks & KVM_QUIRK_CD_NW_CLEARED))
-   cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
+
+   /* These are emulated via page tables.  */
+   cr0 &= ~(X86_CR0_CD | X86_CR0_NW);
+
svm->vmcb->save.cr0 = cr0;
mark_dirty(svm->vmcb, VMCB_CR);
update_cr0_intercept(svm);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm/x86: add support for MONITOR_TRAP_FLAG

2015-07-10 Thread Paolo Bonzini


On 09/07/2015 21:49, Jan Kiszka wrote:
>> >CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
>> > -  CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
>> > +  CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_TRAP_FLAG | 
>> > CPU_BASED_MONITOR_EXITING |
> Overlong line.

Fixed and applied.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/13] arm64: KVM: VHE: Patch out use of HVC

2015-07-10 Thread Paolo Bonzini


On 08/07/2015 19:54, Marc Zyngier wrote:
> Alternatively, I could move it to arch/arm64/include/asm (renamed to
> kvm_vhe_macros.h?), which would solve this issue. I just gave it a go,
> and that seems sensible enough.

Yes, that would do.  Alternatively:

* create kvm_hyp_macros.h and put more stuff in there from kvm_mmu.h

* just put it in kvm_host.h

Whatever looks better to you.

Paolo

> Thanks for the suggestion!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4] KVM: SVM: use NPT page attributes

2015-07-10 Thread Paolo Bonzini


On 10/07/2015 03:19, Xiao Guangrong wrote:
>> yes, this is correct.  QEMU still does not have support for disabling
>> "quirks", so gCR0.CD is currently hidden on SVM.  I would like to
>> include this series in 4.2, while for 4.3 I will disable the quirk above
>> altogether (it is superseded by the way PAT is forced to all-WB).
> 
> That plan sounds good to me.
> 
> You will drop disabled_quirks completely or just enable it in Qemu? :)

I will drop this quirk completely.  Other quirks (well, there's just
one) will remain.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [RFC v1 5/5] Call irqbypass update routine when updating irqfd

2015-07-10 Thread Wu, Feng


> -Original Message-
> From: Wu, Feng
> Sent: Friday, July 10, 2015 12:13 PM
> To: Alex Williamson
> Cc: kvm@vger.kernel.org; pbonz...@redhat.com; j...@8bytes.org;
> avi.kiv...@gmail.com; eric.au...@linaro.org; Wu, Feng
> Subject: RE: [RFC v1 5/5] Call irqbypass update routine when updating irqfd
> 
> 
> 
> > -Original Message-
> > From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
> > Behalf Of Alex Williamson
> > Sent: Friday, July 10, 2015 11:26 AM
> > To: Wu, Feng
> > Cc: kvm@vger.kernel.org; pbonz...@redhat.com; j...@8bytes.org;
> > avi.kiv...@gmail.com; eric.au...@linaro.org
> > Subject: Re: [RFC v1 5/5] Call irqbypass update routine when updating irqfd
> >
> > On Fri, 2015-07-10 at 11:00 +0800, Feng Wu wrote:
> > > Call update routine when updating irqfd, this can update the
> > > IRTE for Intel posted-interrupts.
> > >
> > > Signed-off-by: Feng Wu 
> > > ---
> > >  virt/kvm/eventfd.c | 4 +++-
> > >  1 file changed, 3 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> > > index a32cf6c..1226835 100644
> > > --- a/virt/kvm/eventfd.c
> > > +++ b/virt/kvm/eventfd.c
> > > @@ -570,8 +570,10 @@ void kvm_irq_routing_update(struct kvm *kvm)
> > >
> > >   spin_lock_irq(&kvm->irqfds.lock);
> > >
> > > - list_for_each_entry(irqfd, &kvm->irqfds.items, list)
> > > + list_for_each_entry(irqfd, &kvm->irqfds.items, list) {
> > >   irqfd_update(kvm, irqfd);
> > > + irqfd->consumer.update(&irqfd->consumer);
> > > + }
> > >
> > >   spin_unlock_irq(&kvm->irqfds.lock);
> > >  }
> >
> > I don't understand why the irq bypass manager needs to know about this
> > update callback.  We could just as easily make it be a function pointer
> > on the irqfd structure or maybe just open code it.  It's defined by the
> > consumer and called by the consumer, the irq bypass manager shouldn't
> > know about it.  Thanks,
> 
> Yes, you are right. All we need is the producer information which has been
> passed in the register routine. And we can easily make this update logic
> inside the consumer. Thanks for your comments!
> 
> Thanks,
> Feng

BTW, Paolo & Alex, in VFIO framework, how can we know a vCPU or a guest
has assigned devices to it?

Thanks,
Feng

> 
> >
> > Alex
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�&j:+v���zZ+��+zf���h���~i���z��w���?�&�)ߢf

Re: [PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts

2015-07-10 Thread Marc Zyngier
On 10/07/15 09:17, Jiang Liu wrote:
> On 2015/7/10 15:52, Marc Zyngier wrote:
>> On 09/07/15 22:33, Thomas Gleixner wrote:
>>> On Thu, 9 Jul 2015, Marc Zyngier wrote:
>>>
 When used as a primary interrupt controller, the GIC driver uses
 irq_data->chip_data to extract controller-specific information.

 When used as a secondary interrupt controller, it uses handler_data
 instead. As this difference is relatively pointless and only creates
 confusion, change the secondary path to match what the rest of the
 driver does.

 Signed-off-by: Marc Zyngier 
 ---
  drivers/irqchip/irq-gic.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

 diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
 index e264675..3c7f3a4 100644
 --- a/drivers/irqchip/irq-gic.c
 +++ b/drivers/irqchip/irq-gic.c
 @@ -298,7 +298,7 @@ static void __exception_irq_entry 
 gic_handle_irq(struct pt_regs *regs)
  
  static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc 
 *desc)
  {
 -  struct gic_chip_data *chip_data = irq_get_handler_data(irq);
 +  struct gic_chip_data *chip_data = irq_get_chip_data(irq);
struct irq_chip *chip = irq_get_chip(irq);
>>>
>>> You should make that
>>>
>>> chip_data = irq_desc_get_chip_data(desc);
>>> chip = irq_desc_get_chip(desc);
>>>
>>> That avoids two pointless lookups of irqdesc 
>>
>> Ah, very good point.
>>
>> And it turns out that these constructs (use of irq_get_* when the
>> irq_desc is readily available) is actually fairly common in a number of
>> irqchip implementations used as secondary interrupt controllers.
>>
>> Time for another cleanup series, I believe... ;-)
> Hi Marc,
>   I'm ahead of your request and expect an reviewed-by from
> you then:). Please refer to http://lkml.org/lkml/2015/6/4/743 for
> what we have done to clear up this interfaces.

Ah, let me dig this now (yeah, I've been slacking off on reviewing
lately...). Looks awesome! :-)

Thanks!

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts

2015-07-10 Thread Jiang Liu
On 2015/7/10 15:52, Marc Zyngier wrote:
> On 09/07/15 22:33, Thomas Gleixner wrote:
>> On Thu, 9 Jul 2015, Marc Zyngier wrote:
>>
>>> When used as a primary interrupt controller, the GIC driver uses
>>> irq_data->chip_data to extract controller-specific information.
>>>
>>> When used as a secondary interrupt controller, it uses handler_data
>>> instead. As this difference is relatively pointless and only creates
>>> confusion, change the secondary path to match what the rest of the
>>> driver does.
>>>
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>  drivers/irqchip/irq-gic.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>>> index e264675..3c7f3a4 100644
>>> --- a/drivers/irqchip/irq-gic.c
>>> +++ b/drivers/irqchip/irq-gic.c
>>> @@ -298,7 +298,7 @@ static void __exception_irq_entry gic_handle_irq(struct 
>>> pt_regs *regs)
>>>  
>>>  static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
>>>  {
>>> -   struct gic_chip_data *chip_data = irq_get_handler_data(irq);
>>> +   struct gic_chip_data *chip_data = irq_get_chip_data(irq);
>>> struct irq_chip *chip = irq_get_chip(irq);
>>
>> You should make that
>>
>> chip_data = irq_desc_get_chip_data(desc);
>> chip = irq_desc_get_chip(desc);
>>
>> That avoids two pointless lookups of irqdesc 
> 
> Ah, very good point.
> 
> And it turns out that these constructs (use of irq_get_* when the
> irq_desc is readily available) is actually fairly common in a number of
> irqchip implementations used as secondary interrupt controllers.
> 
> Time for another cleanup series, I believe... ;-)
Hi Marc,
I'm ahead of your request and expect an reviewed-by from
you then:). Please refer to http://lkml.org/lkml/2015/6/4/743 for
what we have done to clear up this interfaces.
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vfio: Drop owner assignment from platform_driver

2015-07-10 Thread Baptiste Reynal
Thanks for the point Krzysztof.

Acked-by: Baptiste Reynal 

On Fri, Jul 10, 2015 at 8:37 AM, Krzysztof Kozlowski
 wrote:
> platform_driver does not need to set an owner because
> platform_driver_register() will set it.
>
> Signed-off-by: Krzysztof Kozlowski 
>
> ---
>
> The coccinelle script which generated the patch was sent here:
> http://www.spinics.net/lists/kernel/msg2029903.html
> ---
>  drivers/vfio/platform/vfio_platform.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/vfio/platform/vfio_platform.c 
> b/drivers/vfio/platform/vfio_platform.c
> index cef645c83996..09a8caa4eda9 100644
> --- a/drivers/vfio/platform/vfio_platform.c
> +++ b/drivers/vfio/platform/vfio_platform.c
> @@ -91,7 +91,6 @@ static struct platform_driver vfio_platform_driver = {
> .remove = vfio_platform_remove,
> .driver = {
> .name   = "vfio-platform",
> -   .owner  = THIS_MODULE,
> },
>  };
>
> --
> 1.9.1
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6] irqchip: GIC: Use chip_data instead of handler_data for cascaded interrupts

2015-07-10 Thread Marc Zyngier
On 09/07/15 22:33, Thomas Gleixner wrote:
> On Thu, 9 Jul 2015, Marc Zyngier wrote:
> 
>> When used as a primary interrupt controller, the GIC driver uses
>> irq_data->chip_data to extract controller-specific information.
>>
>> When used as a secondary interrupt controller, it uses handler_data
>> instead. As this difference is relatively pointless and only creates
>> confusion, change the secondary path to match what the rest of the
>> driver does.
>>
>> Signed-off-by: Marc Zyngier 
>> ---
>>  drivers/irqchip/irq-gic.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
>> index e264675..3c7f3a4 100644
>> --- a/drivers/irqchip/irq-gic.c
>> +++ b/drivers/irqchip/irq-gic.c
>> @@ -298,7 +298,7 @@ static void __exception_irq_entry gic_handle_irq(struct 
>> pt_regs *regs)
>>  
>>  static void gic_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
>>  {
>> -struct gic_chip_data *chip_data = irq_get_handler_data(irq);
>> +struct gic_chip_data *chip_data = irq_get_chip_data(irq);
>>  struct irq_chip *chip = irq_get_chip(irq);
> 
> You should make that
> 
> chip_data = irq_desc_get_chip_data(desc);
> chip = irq_desc_get_chip(desc);
> 
> That avoids two pointless lookups of irqdesc 

Ah, very good point.

And it turns out that these constructs (use of irq_get_* when the
irq_desc is readily available) is actually fairly common in a number of
irqchip implementations used as secondary interrupt controllers.

Time for another cleanup series, I believe... ;-)

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html