Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-22 Thread peng.hao2
>Hi,
>
>On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
>> >On 19/11/2018 09:10, Mark Rutland wrote:
>> >> On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
>>  On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
>> >> Hi,
>> >>> When virtual machine starts, hang up.
>> >>
>> >> I take it you mean the *guest* hangs? Because it doesn't get a timer
>> >> interrupt?
>> >>
>> >>> The kernel version of guest
>> >>> is 4.16. Host support vgic_v3.
>> >>
>> >> Your host kernel is something recent, I guess?
>> >>
>> >>> It was mainly due to the incorrect vgic_irq's(intid=27) group value
>> >>> during injection interruption. when kvm_vgic_vcpu_init is called,
>> >>> dist is not initialized at this time. Unable to get vgic V3 or V2
>> >>> correctly, so group is not set.
>> >>
>> >> Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
>> >> version?) or some other userland tool?
>> >>
>> >
>> > QEMU emulator version 3.0.50 .
>> >
>> >>> group is setted to 1 when vgic_mmio_write_group is invoked at some
>> >>> time.
>> >>> when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
>> >>> interrupt injection failed.
>> >>>
>> >>> Signed-off-by: Peng Hao 
>> >>> ---
>> >>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
>> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
>> >>>
>> >>> diff --git a/virt/kvm/arm/vgic/vgic-v3.c 
>> >>> b/virt/kvm/arm/vgic/vgic-v3.c
>> >>> index 9c0dd23..d101000 100644
>> >>> --- a/virt/kvm/arm/vgic/vgic-v3.c
>> >>> +++ b/virt/kvm/arm/vgic/vgic-v3.c
>> >>> @@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
>> >>> struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
>> >>> (val & ICH_LR_PENDING_BIT)) irq->line_level = false;
>> >>>
>> >>> -if (irq->group)
>> >>> +if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
>> >>
>> >> This is not the right fix, not only because it basically reverts the
>> >> GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
>> >> their configured group).
>> >>
>> >> Can you try to work out why kvm_vgic_vcpu_init() is apparently called
>> >> before dist->vgic_model is set, also what value it has?
>> >> If I understand the code correctly, that shouldn't happen for a GICv3.
>> >>
>> > Even if the value of  group is correctly assigned in 
>> > kvm_vgic_vcpu_init, the group is then written 0 through 
>> > >vgic_mmio_write_group.
>> >   If the interrupt comes at this time, the interrupt injection fails.
>> 
>>  Does that mean that the guest is configuring its interrupts as Group0?
>>  That sounds wrong, Linux should configure all it's interrupts as
>>  non-secure group1.
>> >>>
>> >>> no, I think that uefi dose this, not linux.
>> >>> 1. kvm_vgic_vcpu_init
>> >>> 2. vgic_create
>> >>> 3. kvm_vgic_dist_init
>> >>> 4.vgic_mmio_write_group: uefi as guest, write group=0
>> >>> 5.vgic_mmio_write_group: linux as guest, write group=1
>> >>
>> >> Is this the same issue fixed by EDK2 commit:
>> >>
>> >> 66127011a544b90e ("ArmPkg/ArmGicDxe ARM: fix encoding for GICv3 interrupt 
>> >> acknowledge")
>> >>
>> >> ... where EDK2 would try to use IAR0 rather than IAR1?
>> >>
>> >> The commit messages notes this lead to a boot-time hang.
>> >
>> >I managed to trigger an issue with a really old EFI implementation that
>> >doesn't configure its interrupts as Group1, and yet tries to ACK its
>> >interrupts using the Group1 accessor. Guess what? It is not going to work.
>> >
>> >Commit c7fefb690661f2e38afcb8200bd318ecf38ab961 in the edk2 tree seems
>> >to be the fix (I only assume it does, I haven't actually checked). A
>> >recent build, as found in Debian Buster, works perfectly (tested with
>> >both QEMU v2.12 and tip of tree).
>> >
>> >Now, I really don't get what you're saying about Linux not getting
>> >interrupts. How do you get to booting Linux if EFI is not making any
>> >forward progress? Are you trying them independently?
>> >
>> I start linux with bypassing uefi, the print info is the same.
>> [507107.748908]  vgic_mmio_write_group:## intid/27 group=0
>> [507107.752185]  vgic_mmio_write_group:## intid/27 group=0
>> [507107.899566]  vgic_mmio_write_group:## intid/27 group=1
>> [507107.907370]  vgic_mmio_write_group:## intid/27 group=1
>> the command line is like this:
>> /home/qemu-patch/linshi/qemu/aarch64-softmmu/qemu-system-aarch64  -machine 
>> virt-3.1,accel=kvm,usb=off,dump-guest-core=off,gic-version=3  -kernel 
>> /home/kernelboot/vmlinuz-4.16.0+ -initrd 
>> /home/kernelboot/initramfs-4.16.0+.img -append root=/dev/mapper/cla-root ro 
>> crashkernel=auto rd.lvm.lv=cla/root rd.lvm.lv=cla/swap.UTF-8  -drive 
>> file=/home/centos74-ph/boot.qcow2,format=qcow2,if=none,id=drive-scsi0-0-0-0 
>> -device 
>> 

[PATCH v5 6/7] iommu/virtio: Add probe request

2018-11-22 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 156 --
 include/uapi/linux/virtio_iommu.h |  38 
 2 files changed, 188 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 7540dab9c8dc..0c7a7fa2628d 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -46,6 +46,7 @@ struct viommu_dev {
struct iommu_domain_geometrygeometry;
u64 pgsize_bitmap;
u8  domain_bits;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -67,8 +68,10 @@ struct viommu_domain {
 };
 
 struct viommu_endpoint {
+   struct device   *dev;
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -119,6 +122,9 @@ static off_t viommu_get_write_desc_offset(struct viommu_dev 
*viommu,
 {
size_t tail_size = sizeof(struct virtio_iommu_req_tail);
 
+   if (req->type == VIRTIO_IOMMU_T_PROBE)
+   return len - viommu->probe_size - tail_size;
+
return len - tail_size;
 }
 
@@ -393,6 +399,110 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   size_t size;
+   u64 start64, end64;
+   phys_addr_t start, end;
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   start = start64 = le64_to_cpu(mem->start);
+   end = end64 = le64_to_cpu(mem->end);
+   size = end64 - start64 + 1;
+
+   /* Catch any overflow, including the unlikely end64 - start64 + 1 = 0 */
+   if (start != start64 || end != end64 || size < end64 - start64)
+   return -EOVERFLOW;
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   default:
+   dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
+mem->subtype);
+   /* Fall-through */
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   region = iommu_alloc_resv_region(start, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(start, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   }
+   if (!region)
+   return -ENOMEM;
+
+   list_add(>resv_regions, >list);
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   size_t probe_len;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev->iommu_fwspec;
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   return -EINVAL;
+
+   probe_len = sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail);
+   probe = kzalloc(probe_len, GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe, probe_len);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length) + sizeof(*prop);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
+   break;
+   default:
+   dev_err(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 0x%x\n", 

[PATCH v5 7/7] iommu/virtio: Add event queue

2018-11-22 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.

Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 115 +++---
 include/uapi/linux/virtio_iommu.h |  19 +
 2 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 0c7a7fa2628d..e6ff515d41c0 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -29,7 +29,8 @@
 #define MSI_IOVA_LENGTH0x10
 
 #define VIOMMU_REQUEST_VQ  0
-#define VIOMMU_NR_VQS  1
+#define VIOMMU_EVENT_VQ1
+#define VIOMMU_NR_VQS  2
 
 struct viommu_dev {
struct iommu_device iommu;
@@ -41,6 +42,7 @@ struct viommu_dev {
struct virtqueue*vqs[VIOMMU_NR_VQS];
spinlock_t  request_lock;
struct list_headrequests;
+   void*evts;
 
/* Device configuration */
struct iommu_domain_geometrygeometry;
@@ -82,6 +84,15 @@ struct viommu_request {
charbuf[];
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain)   \
container_of(domain, struct viommu_domain, domain)
 
@@ -503,6 +514,68 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return ret;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+   return 0;
+}
+
+static void viommu_event_handler(struct virtqueue *vq)
+{
+   int ret;
+   unsigned int len;
+   struct scatterlist sg[1];
+   struct viommu_event *evt;
+   struct viommu_dev *viommu = vq->vdev->priv;
+
+   while ((evt = virtqueue_get_buf(vq, )) != NULL) {
+   if (len > sizeof(*evt)) {
+   dev_err(viommu->dev,
+   "invalid event buffer (len %u != %zu)\n",
+   len, sizeof(*evt));
+   } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) {
+   viommu_fault_handler(viommu, >fault);
+   }
+
+   sg_init_one(sg, evt, sizeof(*evt));
+   ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC);
+   if (ret)
+   dev_err(viommu->dev, "could not add event buffer\n");
+   }
+
+   virtqueue_kick(vq);
+}
+
 /* IOMMU API */
 
 static struct iommu_domain *viommu_domain_alloc(unsigned type)
@@ -885,16 +958,35 @@ static struct iommu_ops viommu_ops = {
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
-   const char *name = "request";
-   void *ret;
+   const char *names[] = { "request", "event" };
+   vq_callback_t *callbacks[] = {
+   NULL, /* No async requests */
+   viommu_event_handler,
+   };
 
-   ret = virtio_find_single_vq(vdev, NULL, name);
-   if (IS_ERR(ret)) {
-   dev_err(viommu->dev, "cannot find VQ\n");
-   return PTR_ERR(ret);
-   }
+   return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks,
+  names, 

[PATCH v5 1/7] dt-bindings: virtio-mmio: Add IOMMU description

2018-11-22 Thread Jean-Philippe Brucker
The nature of a virtio-mmio node is discovered by the virtio driver at
probe time. However the DMA relation between devices must be described
statically. When a virtio-mmio node is a virtio-iommu device, it needs an
"#iommu-cells" property as specified by bindings/iommu/iommu.txt.

Otherwise, the virtio-mmio device may perform DMA through an IOMMU, which
requires an "iommus" property. Describe these requirements in the
device-tree bindings documentation.

Reviewed-by: Rob Herring 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/mmio.txt   | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..21af30fbb81f 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -8,10 +8,40 @@ Required properties:
 - reg: control registers base address and size including configuration 
space
 - interrupts:  interrupt generated by the device
 
+Required properties for virtio-iommu:
+
+- #iommu-cells:When the node corresponds to a virtio-iommu device, it 
is
+   linked to DMA masters using the "iommus" or "iommu-map"
+   properties [1][2]. #iommu-cells specifies the size of the
+   "iommus" property. For virtio-iommu #iommu-cells must be
+   1, each cell describing a single endpoint ID.
+
+Optional properties:
+
+- iommus:  If the device accesses memory through an IOMMU, it should
+   have an "iommus" property [1]. Since virtio-iommu itself
+   does not access memory through an IOMMU, the "virtio,mmio"
+   node cannot have both an "#iommu-cells" and an "iommus"
+   property.
+
 Example:
 
virtio_block@3000 {
compatible = "virtio,mmio";
reg = <0x3000 0x100>;
interrupts = <41>;
+
+   /* Device has endpoint ID 23 */
+   iommus = < 23>
}
+
+   viommu: iommu@3100 {
+   compatible = "virtio,mmio";
+   reg = <0x3100 0x100>;
+   interrupts = <42>;
+
+   #iommu-cells = <1>
+   }
+
+[1] Documentation/devicetree/bindings/iommu/iommu.txt
+[2] Documentation/devicetree/bindings/pci/pci-iommu.txt
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2018-11-22 Thread Jean-Philippe Brucker
Some systems implement virtio-iommu as a PCI endpoint. The operating
system needs to discover the relationship between IOMMU and masters long
before the PCI endpoint gets probed. Add a PCI child node to describe the
virtio-iommu device.

The virtio-pci-iommu is conceptually split between a PCI programming
interface and a translation component on the parent bus. The latter
doesn't have a node in the device tree. The virtio-pci-iommu node
describes both, by linking the PCI endpoint to "iommus" property of DMA
master nodes and to "iommu-map" properties of bus nodes.

Reviewed-by: Rob Herring 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/iommu.txt  | 66 +++
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
new file mode 100644
index ..2407fea0651c
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/iommu.txt
@@ -0,0 +1,66 @@
+* virtio IOMMU PCI device
+
+When virtio-iommu uses the PCI transport, its programming interface is
+discovered dynamically by the PCI probing infrastructure. However the
+device tree statically describes the relation between IOMMU and DMA
+masters. Therefore, the PCI root complex that hosts the virtio-iommu
+contains a child node representing the IOMMU device explicitly.
+
+Required properties:
+
+- compatible:  Should be "virtio,pci-iommu"
+- reg: PCI address of the IOMMU. As defined in the PCI Bus
+   Binding reference [1], the reg property is a five-cell
+   address encoded as (phys.hi phys.mid phys.lo size.hi
+   size.lo). phys.hi should contain the device's BDF as
+   0b  dfff . The other cells
+   should be zero.
+- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
+   an endpoint ID, described by the "iommus" property [2].
+   For virtio-iommu, #iommu-cells must be 1.
+
+Notes:
+
+- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+Example:
+
+pcie@1000 {
+   compatible = "pci-host-ecam-generic";
+   ...
+
+   /* The IOMMU programming interface uses slot 00:01.0 */
+   iommu0: iommu@0008 {
+   compatible = "virtio,pci-iommu";
+   reg = <0x0800 0 0 0 0>;
+   #iommu-cells = <1>;
+   };
+
+   /*
+* The IOMMU manages all functions in this PCI domain except
+* itself. Omit BDF 00:01.0.
+*/
+   iommu-map = <0x0  0x0 0x8>
+   <0x9  0x9 0xfff7>;
+};
+
+pcie@2000 {
+   compatible = "pci-host-ecam-generic";
+   ...
+   /*
+* The IOMMU also manages all functions from this domain,
+* with endpoint IDs 0x1 - 0x1
+*/
+   iommu-map = <0x0  0x1 0x1>;
+};
+
+ethernet@fe001000 {
+   ...
+   /* The IOMMU manages this platform device with endpoint ID 0x2 */
+   iommus = < 0x2>;
+};
+
+[1] Documentation/devicetree/bindings/pci/pci.txt
+[2] Documentation/devicetree/bindings/iommu/iommu.txt
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-11-22 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio transport without emulating page
tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables.

Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   7 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 916 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 104 
 6 files changed, 1040 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 1689dcfec800..3d8550c76f4a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -15946,6 +15946,13 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+L: virtualizat...@lists.linux-foundation.org
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index bf2bbfa2a399..db5f2b8c23f5 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -464,4 +464,15 @@ config QCOM_IOMMU
help
  Support for IOMMU on certain Qualcomm SoCs.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO=y
+   select IOMMU_API
+   select INTERVAL_TREE
+   select ARM_DMA_USE_IOMMU if ARM
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 5481e5fe1f95..bd7e55751d09 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -36,3 +36,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
 obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..7540dab9c8dc
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,916 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2018 Arm Limited
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+#define VIOMMU_REQUEST_VQ  0
+#define VIOMMU_NR_VQS  1
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vqs[VIOMMU_NR_VQS];
+   spinlock_t  request_lock;
+   struct list_headrequests;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u8  domain_bits;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   u32 flags;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex; /* protects viommu pointer */
+   unsigned intid;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   unsigned long   nr_endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct list_headlist;
+   void*writeback;
+   unsigned intwrite_offset;
+   unsigned intlen;
+   charbuf[];
+};
+
+#define to_viommu_domain(domain)   \
+   container_of(domain, struct viommu_domain, domain)
+
+static int viommu_get_req_errno(void *buf, size_t len)
+{
+   struct virtio_iommu_req_tail *tail = buf + len - sizeof(*tail);
+
+   switch (tail->status) {
+   case 

[PATCH v5 0/7] Add virtio-iommu driver

2018-11-22 Thread Jean-Philippe Brucker
Implement the virtio-iommu driver, following specification v0.9 [1].

Since v4 [2] I fixed the issues reported by Eric, and added Reviewed-by
from Eric and Rob. Thanks!

I changed the specification to fix one inconsistency discussed in v4.
That the device fills the probe buffer with zeroes is now a "SHOULD"
instead of a "MAY", since it's the only way for the driver to know if
the device wrote the status. Existing devices already do this. In
addition the device now needs to fill the three padding bytes at the
tail with zeroes.

You can find Linux driver and kvmtool device on branches
virtio-iommu/v0.9 [3]. I also lightly tested with Eric's latest QEMU
device [4].

[1] Virtio-iommu specification v0.9, sources, pdf and diff from v0.8
git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf

http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.8-v0.9.pdf

[2] [PATCH v4 0/7] Add virtio-iommu driver
https://lists.linuxfoundation.org/pipermail/iommu/2018-November/031074.html

[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9
git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9

[4] [RFC v9 00/17] VIRTIO-IOMMU device
https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html

Jean-Philippe Brucker (7):
  dt-bindings: virtio-mmio: Add IOMMU description
  dt-bindings: virtio: Add virtio-pci-iommu node
  of: Allow the iommu-map property to omit untranslated devices
  PCI: OF: Initialize dev->fwnode appropriately
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue

 .../devicetree/bindings/virtio/iommu.txt  |   66 +
 .../devicetree/bindings/virtio/mmio.txt   |   30 +
 MAINTAINERS   |7 +
 drivers/iommu/Kconfig |   11 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1157 +
 drivers/of/base.c |   10 +-
 drivers/pci/of.c  |7 +
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  161 +++
 10 files changed, 1448 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 4/7] PCI: OF: Initialize dev->fwnode appropriately

2018-11-22 Thread Jean-Philippe Brucker
For PCI devices that have an OF node, set the fwnode as well. This way
drivers that rely on fwnode don't need the special case described by
commit f94277af03ea ("of/platform: Initialise dev->fwnode appropriately").

Acked-by: Bjorn Helgaas 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/pci/of.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 4c4217d0c3f1..c272ecfcd038 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -21,12 +21,15 @@ void pci_set_of_node(struct pci_dev *dev)
return;
dev->dev.of_node = of_pci_find_child_device(dev->bus->dev.of_node,
dev->devfn);
+   if (dev->dev.of_node)
+   dev->dev.fwnode = >dev.of_node->fwnode;
 }
 
 void pci_release_of_node(struct pci_dev *dev)
 {
of_node_put(dev->dev.of_node);
dev->dev.of_node = NULL;
+   dev->dev.fwnode = NULL;
 }
 
 void pci_set_bus_of_node(struct pci_bus *bus)
@@ -35,12 +38,16 @@ void pci_set_bus_of_node(struct pci_bus *bus)
bus->dev.of_node = pcibios_get_phb_of_node(bus);
else
bus->dev.of_node = of_node_get(bus->self->dev.of_node);
+
+   if (bus->dev.of_node)
+   bus->dev.fwnode = >dev.of_node->fwnode;
 }
 
 void pci_release_bus_of_node(struct pci_bus *bus)
 {
of_node_put(bus->dev.of_node);
bus->dev.of_node = NULL;
+   bus->dev.fwnode = NULL;
 }
 
 struct device_node * __weak pcibios_get_phb_of_node(struct pci_bus *bus)
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v5 3/7] of: Allow the iommu-map property to omit untranslated devices

2018-11-22 Thread Jean-Philippe Brucker
In PCI root complex nodes, the iommu-map property describes the IOMMU that
translates each endpoint. On some platforms, the IOMMU itself is presented
as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). This isn't supported
by the current OF driver, which expects all endpoints to have an IOMMU.
Allow the iommu-map property to have gaps.

Relaxing of_map_rid() also allows the msi-map property to have gaps, which
is invalid since MSIs always reach an MSI controller. In that case
pci_msi_setup_msi_irqs() will return an error when attempting to find the
device's MSI domain.

Reviewed-by: Rob Herring 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/of/base.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 09692c9b32a7..99f6bfa9b898 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2237,8 +2237,12 @@ int of_map_rid(struct device_node *np, u32 rid,
return 0;
}
 
-   pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
-   np, map_name, rid, target && *target ? *target : NULL);
-   return -EFAULT;
+   pr_info("%pOF: no %s translation for rid 0x%x on %pOF\n", np, map_name,
+   rid, target && *target ? *target : NULL);
+
+   /* Bypasses translation */
+   if (id_out)
+   *id_out = rid;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(of_map_rid);
-- 
2.19.1

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 4/4] arm64: KVM: Implement workaround for Cortex-A76 erratum 1165522

2018-11-22 Thread Marc Zyngier
On 08/11/2018 20:10, Christoffer Dall wrote:
> On Thu, Nov 08, 2018 at 06:05:55PM +, Marc Zyngier wrote:
>> On 06/11/18 08:15, Christoffer Dall wrote:
>>> On Mon, Nov 05, 2018 at 02:36:16PM +, Marc Zyngier wrote:
 Early versions of Cortex-A76 can end-up with corrupt TLBs if they
 speculate an AT instruction in during a guest switch while the
 S1/S2 system registers are in an inconsistent state.

 Work around it by:
 - Mandating VHE
 - Make sure that S1 and S2 system registers are consistent before
   clearing HCR_EL2.TGE, which allows AT to target the EL1 translation
   regime

 These two things together ensure that we cannot hit this erratum.

 Signed-off-by: Marc Zyngier 
 ---
  Documentation/arm64/silicon-errata.txt |  1 +
  arch/arm64/Kconfig | 12 
  arch/arm64/include/asm/cpucaps.h   |  3 ++-
  arch/arm64/include/asm/kvm_host.h  |  3 +++
  arch/arm64/include/asm/kvm_hyp.h   |  6 ++
  arch/arm64/kernel/cpu_errata.c |  8 
  arch/arm64/kvm/hyp/switch.c| 14 ++
  7 files changed, 46 insertions(+), 1 deletion(-)

 diff --git a/Documentation/arm64/silicon-errata.txt 
 b/Documentation/arm64/silicon-errata.txt
 index 76ccded8b74c..04f0bc4690c6 100644
 --- a/Documentation/arm64/silicon-errata.txt
 +++ b/Documentation/arm64/silicon-errata.txt
 @@ -57,6 +57,7 @@ stable kernels.
  | ARM| Cortex-A73  | #858921 | 
 ARM64_ERRATUM_858921|
  | ARM| Cortex-A55  | #1024718| 
 ARM64_ERRATUM_1024718   |
  | ARM| Cortex-A76  | #1188873| 
 ARM64_ERRATUM_1188873   |
 +| ARM| Cortex-A76  | #1165522| 
 ARM64_ERRATUM_1165522   |
  | ARM| MMU-500 | #841119,#826419 | N/A
  |
  || | |
  |
  | Cavium | ThunderX ITS| #22375, #24313  | 
 CAVIUM_ERRATUM_22375|
 diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
 index 787d7850e064..a68bc6cc2167 100644
 --- a/arch/arm64/Kconfig
 +++ b/arch/arm64/Kconfig
 @@ -497,6 +497,18 @@ config ARM64_ERRATUM_1188873
  
  If unsure, say Y.
  
 +config ARM64_ERRATUM_1165522
 +  bool "Cortex-A76: Speculative AT instruction using out-of-context 
 translation regime could cause subsequent request to generate an incorrect 
 translation"
 +  default y
 +  help
 +This option adds work arounds for ARM Cortex-A76 erratum 1165522
 +
 +Affected Cortex-A76 cores (r0p0, r1p0, r2p0) could end-up with
 +corrupted TLBs by speculating an AT instruction during a guest
 +context switch.
 +
 +If unsure, say Y.
 +
  config CAVIUM_ERRATUM_22375
bool "Cavium erratum 22375, 24313"
default y
 diff --git a/arch/arm64/include/asm/cpucaps.h 
 b/arch/arm64/include/asm/cpucaps.h
 index 6e2d254c09eb..62d8cd15fdf2 100644
 --- a/arch/arm64/include/asm/cpucaps.h
 +++ b/arch/arm64/include/asm/cpucaps.h
 @@ -54,7 +54,8 @@
  #define ARM64_HAS_CRC32   33
  #define ARM64_SSBS34
  #define ARM64_WORKAROUND_1188873  35
 +#define ARM64_WORKAROUND_1165522  36
  
 -#define ARM64_NCAPS   36
 +#define ARM64_NCAPS   37
  
  #endif /* __ASM_CPUCAPS_H */
 diff --git a/arch/arm64/include/asm/kvm_host.h 
 b/arch/arm64/include/asm/kvm_host.h
 index 7d6e974d024a..8f486026ac87 100644
 --- a/arch/arm64/include/asm/kvm_host.h
 +++ b/arch/arm64/include/asm/kvm_host.h
 @@ -435,6 +435,9 @@ static inline bool kvm_arch_sve_requires_vhe(void)
  static inline bool kvm_arch_impl_requires_vhe(void)
  {
/* Some implementations have defects that confine them to VHE */
 +  if (cpus_have_cap(ARM64_WORKAROUND_1165522))
 +  return true;
 +
return false;
  }
  
 diff --git a/arch/arm64/include/asm/kvm_hyp.h 
 b/arch/arm64/include/asm/kvm_hyp.h
 index 23aca66767f9..9681360d0959 100644
 --- a/arch/arm64/include/asm/kvm_hyp.h
 +++ b/arch/arm64/include/asm/kvm_hyp.h
 @@ -163,6 +163,12 @@ static __always_inline void __hyp_text 
 __load_guest_stage2(struct kvm *kvm)
  {
write_sysreg(kvm->arch.vtcr, vtcr_el2);
write_sysreg(kvm->arch.vttbr, vttbr_el2);
 +
 +  /*
 +   * ARM erratum 1165522 requires the actual execution of the
 +   * above before we can switch to the guest translation regime.
 +   */
>>>
>>> Is it about a guest translation 'regime' or should this just say before
>>> we can enable stage 2 

Re: [RFC PATCH v2 00/23] KVM: arm64: Initial support for SVE guests

2018-11-22 Thread Alex Bennée

Dave Martin  writes:

> This series implements basic support for allowing KVM guests to use the
> Arm Scalable Vector Extension (SVE).
>
> The patches are based on v4.19-rc5.
>
> The patches are also available on a branch for reviewer convenience. [1]
>
> This is a significant overhaul of the previous preliminary series [2],
> with the major changes outlined below, and additional minor updates in
> response to review feedback (see the individual patches for those).
>
> In the interest of getting this series out for review,
> This series is **completely untested**.

Richard is currently working on VHE support for QEMU and we already have
SVE system emulation support as of 3.1 so hopefully QEMU will be able to
test this soon (and probably shake out a few of our own bugs ;-).

> Reviewers should focus on the proposed API (but any other comments are
> of course welcome!)


I've finished my pass for this revision. Sorry it took so long to get to
it.

--
Alex Bennée
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 23/23] KVM: arm64/sve: Document KVM API extensions for SVE

2018-11-22 Thread Alex Bennée

Dave Martin  writes:

> This patch adds sections to the KVM API documentation describing
> the extensions for supporting the Scalable Vector Extension (SVE)
> in guests.
>
> Signed-off-by: Dave Martin 
> ---
>  Documentation/virtual/kvm/api.txt | 142 
> +-
>  1 file changed, 139 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index a58067b..b8257d4 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2054,13 +2054,21 @@ Specifically:
>0x6030  0010 004c SPSR_UND64  spsr[KVM_SPSR_UND]
>0x6030  0010 004e SPSR_IRQ64  spsr[KVM_SPSR_IRQ]
>0x6060  0010 0050 SPSR_FIQ64  spsr[KVM_SPSR_FIQ]
> -  0x6040  0010 0054 V0 128  fp_regs.vregs[0]
> -  0x6040  0010 0058 V1 128  fp_regs.vregs[1]
> +  0x6040  0010 0054 V0 128  fp_regs.vregs[0](*)
> +  0x6040  0010 0058 V1 128  fp_regs.vregs[1](*)
>  ...
> -  0x6040  0010 00d0 V31128  fp_regs.vregs[31]
> +  0x6040  0010 00d0 V31128  fp_regs.vregs[31]   (*)
>0x6020  0010 00d4 FPSR32  fp_regs.fpsr
>0x6020  0010 00d5 FPCR32  fp_regs.fpcr
>
> +(*) These encodings are not accepted for SVE-enabled vcpus.  See
> +KVM_ARM_SVE_CONFIG for details of how SVE support is configured for
> +a vcpu.
> +
> +The equivalent register content can be accessed via bits [2047:0]
> of

You mean [127:0] I think.

> +the corresponding SVE Zn registers instead for vcpus that have SVE
> +enabled (see below).
> +
>  arm64 CCSIDR registers are demultiplexed by CSSELR value:
>0x6020  0011 00 
>
> @@ -2070,6 +2078,14 @@ arm64 system registers have the following id bit 
> patterns:
>  arm64 firmware pseudo-registers have the following bit pattern:
>0x6030  0014 
>
> +arm64 SVE registers have the following bit patterns:
> +  0x6080  0015 00 Zn bits[2048*slice + 2047 : 
> 2048*slice]
> +  0x6050  0015 04 Pn bits[256*slice + 255 : 256*slice]
> +  0x6050  0015 060 FFR bits[256*slice + 255 : 256*slice]
> +
> +  These registers are only accessible on SVE-enabled vcpus.  See
> +  KVM_ARM_SVE_CONFIG for details.
> +
>
>  MIPS registers are mapped using the lower 32 bits.  The upper 16 of that is
>  the register group type:
> @@ -3700,6 +3716,126 @@ Returns: 0 on success, -1 on error
>  This copies the vcpu's kvm_nested_state struct from userspace to the kernel. 
>  For
>  the definition of struct kvm_nested_state, see KVM_GET_NESTED_STATE.
>
> +4.116 KVM_ARM_SVE_CONFIG
> +
> +Capability: KVM_CAP_ARM_SVE
> +Architectures: arm64
> +Type: vm and vcpu ioctl
> +Parameters: struct kvm_sve_vls (in/out)
> +Returns: 0 on success
> +Errors:
> +  EINVAL:Unrecognised subcommand or bad arguments
> +  EBADFD:vcpu in wrong state for request
> + (KVM_ARM_SVE_CONFIG_SET, KVM_ARM_SVE_CONFIG_SET)
> +  ENOMEM:Out of memory
> +  EFAULT:Bad user address
> +
> +struct kvm_sve_vls {
> + __u16 cmd;
> + __u16 max_vq;
> + __u16 _reserved[2];
> + __u64 required_vqs[8];
> +};
> +
> +General:
> +
> +cmd: This ioctl supports a few different subcommands, selected by the
> +value of cmd (described in detail in the following sections).
> +
> +_reserved[]: these fields may be meaningful to later kernels.  For
> +forward compatibility, they must be zeroed before invoking this ioctl
> +for the first time on a given struct kvm_sve_vls object.  (So, memset()
> +it to zero before first use, or allocate with calloc() for example.)
> +
> +max_vq, required_vqs[]: encode a set of SVE vector lengths.  The set is
> +encoded as follows:
> +
> +If (a * 64 + b + 1) <= max_vq, then the bit represented by
> +
> +required_vqs[a] & ((__u64)1 << b)
> +
> +(where a is in the range 0..7 and b is in the range 0..63)
> +indicates that the vector length (a * 64 + b + 1) * 128 bits is
> +supported (KVM_ARM_SVE_CONFIG_QUERY, KVM_ARM_SVE_CONFIG_GET) or required
> +(KVM_ARM_SVE_CONFIG_SET).
> +
> +If (a * 64 + b + 1) > max_vq, then the vector length
> +(a * 64 + b + 1) * 128 bits is unsupported or prohibited respectively.
> +In other words, only the first max_vq bits in required_vqs[] are
> +significant; remaining bits are implicitly treated as if they were zero.
> +
> +max_vq must be in the range SVE_VQ_MIN (1) to SVE_VQ_MAX (512).
> +
> +See Documentation/arm64/sve.txt for an explanation of vector lengths and
> +the meaning associated with "VQ".
> +
> +Subcommands:
> +
> +/* values for cmd: */
> +#define KVM_ARM_SVE_CONFIG_QUERY 0 /* query what the host can support */
> +#define KVM_ARM_SVE_CONFIG_SET   1 /* enable SVE for vcpu and 
> set VLs */
> +#define KVM_ARM_SVE_CONFIG_GET   2 /* read the set of VLs for a 
> vcpu */
> +
> +Subcommand details:
> +
> +4.116.1 KVM_ARM_SVE_CONFIG_QUERY
> +Type: vm and vcpu
> +
> +Retrieve the full set of 

Re: [RFC PATCH v2 21/23] KVM: arm64/sve: allow KVM_ARM_SVE_CONFIG_QUERY on vm fd

2018-11-22 Thread Alex Bennée

Dave Martin  writes:

> Since userspace may need to decide on the set of vector lengths for
> the guest before setting up a vm, it is onerous to require a vcpu
> fd to be available first.  KVM_ARM_SVE_CONFIG_QUERY is not
> vcpu-dependent anyway, so this patch wires up KVM_ARM_SVE_CONFIG to
> be usable on a vm fd where appropriate.
>
> Subcommands that are vcpu-dependent (currently
> KVM_ARM_SVE_CONFIG_SET, KVM_ARM_SVE_CONFIG_GET) will return -EINVAL
> if invoked on a vm fd.
>
> Signed-off-by: Dave Martin 

Apropos comments on last patch, this could go away if we went with just
reporting the max VL via the SVE capability probe which works on the
vmfd.

> ---
>  arch/arm64/kvm/guest.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index f066b17..2313c22 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -574,6 +574,9 @@ static int kvm_vcpu_set_sve_vls(struct kvm_vcpu *vcpu, 
> struct kvm_sve_vls *vls,
>   unsigned int vq, max_vq;
>   int ret;
>
> + if (!vcpu)
> + return -EINVAL; /* per-vcpu operation on vm fd */
> +
>   if (vcpu->arch.has_run_once || vcpu_has_sve(vcpu))
>   return -EBADFD; /* too late, or already configured */
>
> @@ -659,6 +662,9 @@ static int kvm_vcpu_query_sve_vls(struct kvm_vcpu *vcpu, 
> struct kvm_sve_vls *vls
>  static int kvm_vcpu_get_sve_vls(struct kvm_vcpu *vcpu, struct kvm_sve_vls 
> *vls,
>   struct kvm_sve_vls __user *userp)
>  {
> + if (!vcpu)
> + return -EINVAL; /* per-vcpu operation on vm fd */
> +
>   if (!vcpu_has_sve(vcpu))
>   return -EBADFD; /* not configured yet */
>
> @@ -668,6 +674,7 @@ static int kvm_vcpu_get_sve_vls(struct kvm_vcpu *vcpu, 
> struct kvm_sve_vls *vls,
>   sve_vq_from_vl(vcpu->arch.sve_max_vl), userp);
>  }
>
> +/* vcpu may be NULL if this is called via a vm fd */
>  static int kvm_vcpu_sve_config(struct kvm_vcpu *vcpu,
>  struct kvm_sve_vls __user *userp)
>  {
> @@ -717,7 +724,15 @@ int kvm_arm_arch_vcpu_ioctl(struct kvm_vcpu *vcpu,
>  int kvm_arm_arch_vm_ioctl(struct kvm *kvm,
> unsigned int ioctl, unsigned long arg)
>  {
> - return -EINVAL;
> + void __user *userp = (void __user *)arg;
> +
> + switch (ioctl) {
> + case KVM_ARM_SVE_CONFIG:
> + return kvm_vcpu_sve_config(NULL, userp);
> +
> + default:
> + return -EINVAL;
> + }
>  }
>
>  int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu)


--
Alex Bennée
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 19/23] KVM: arm64/sve: Report and enable SVE API extensions for userspace

2018-11-22 Thread Alex Bennée

Dave Martin  writes:

> This patch adds the necessary API extensions to allow userspace to
> detect SVE support for guests and enable it.
>
> A new capability KVM_CAP_ARM_SVE is defined to allow userspace to
> detect the availability of the KVM SVE API extensions in the usual
> way.
>
> Userspace needs to enable SVE explicitly per vcpu and configure the
> set of SVE vector lengths available to the guest before the vcpu is
> allowed to run.  For these purposes, a new arm64-specific vcpu
> ioctl KVM_ARM_SVE_CONFIG is added, with the following subcommands
> (in rough order of expected use):
>
> KVM_ARM_SVE_CONFIG_QUERY: report the set of vector lengths
> supported by this host.
>
> The resulting set can be supplied directly to
> KVM_ARM_SVE_CONFIG_SET in order to obtain the maximal possible
> set, or used to inform userspace's decision on the appropriate
> set of vector lengths (possibly taking into account the
> configuration of other nodes in the cluster so that the VM can
> migrate freely).
>
> KVM_ARM_SVE_CONFIG_SET: enable SVE for this vcpu and configure the
> set of vector lengths it offers to the guest.
>
> This can only be done once, before the vcpu is run.
>
> KVM_ARM_SVE_CONFIG_GET: report the set of vector lengths available
> to the guest on this vcpu (for use when snapshotting or
> migrating a VM).
>
> Signed-off-by: Dave Martin 
> ---
>
> Changes since RFCv1:
>
>  * The new feature bit for PREFERRED_TARGET / VCPU_INIT is gone in
>favour of a capability and a new ioctl to enable/configure SVE.
>
>Perhaps the SVE configuration could be done via device attributes,
>but it still has to be done early, so crowbarring support for this
>behind a generic API may cause more trouble than it solves.
>
>This is still up for discussion if anybody feels strongly about it.
>
>  * An ioctl KVM_ARM_SVE_CONFIG has been added to report the set of
>vector lengths available and configure SVE for a vcpu.
>
>To reduce ioctl namespace pollution the new operations are grouped
>as subcommands under a single ioctl, since they use the same
>argument format anyway.
> ---
>  arch/arm64/include/asm/kvm_host.h |   8 +-
>  arch/arm64/include/uapi/asm/kvm.h |  14 
>  arch/arm64/kvm/guest.c| 164 
> +-
>  arch/arm64/kvm/reset.c|  50 
>  include/uapi/linux/kvm.h  |   4 +
>  5 files changed, 238 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index bbde597..5225485 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -52,6 +52,12 @@
>
>  DECLARE_STATIC_KEY_FALSE(userspace_irqchip_in_use);
>
> +#ifdef CONFIG_ARM64_SVE
> +bool kvm_sve_supported(void);
> +#else
> +static inline bool kvm_sve_supported(void) { return false; }
> +#endif
> +
>  int __attribute_const__ kvm_target_cpu(void);
>  int kvm_reset_vcpu(struct kvm_vcpu *vcpu);
>  int kvm_arch_dev_ioctl_check_extension(struct kvm *kvm, long ext);
> @@ -441,7 +447,7 @@ static inline void kvm_arch_sync_events(struct kvm *kvm) 
> {}
>  static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {}
>  static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {}
>
> -static inline void kvm_arm_arch_vcpu_uninit(struct kvm_vcpu *vcpu) {}
> +void kvm_arm_arch_vcpu_uninit(struct kvm_vcpu *vcpu);
>
>  void kvm_arm_init_debug(void);
>  void kvm_arm_setup_debug(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index 1ff68fa..94f6932 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -32,6 +32,7 @@
>  #define KVM_NR_SPSR  5
>
>  #ifndef __ASSEMBLY__
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -108,6 +109,19 @@ struct kvm_vcpu_init {
>   __u32 features[7];
>  };
>
> +/* Vector length set for KVM_ARM_SVE_CONFIG */
> +struct kvm_sve_vls {
> + __u16 cmd;
> + __u16 max_vq;
> + __u16 _reserved[2];
> + __u64 required_vqs[__KERNEL_DIV_ROUND_UP(SVE_VQ_MAX - SVE_VQ_MIN + 1, 
> 64)];
> +};
> +
> +/* values for cmd: */
> +#define KVM_ARM_SVE_CONFIG_QUERY 0 /* query what the host can support */
> +#define KVM_ARM_SVE_CONFIG_SET   1 /* enable SVE for vcpu and 
> set VLs */
> +#define KVM_ARM_SVE_CONFIG_GET   2 /* read the set of VLs for a 
> vcpu */
> +
>  struct kvm_sregs {
>  };
>
> diff --git a/arch/arm64/kvm/guest.c b/arch/arm64/kvm/guest.c
> index 331b85e..d96145a 100644
> --- a/arch/arm64/kvm/guest.c
> +++ b/arch/arm64/kvm/guest.c
> @@ -26,6 +26,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -56,6 +59,11 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu)
>   return 0;
>  }
>
> +void kvm_arm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
> +{
> + 

Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
On Thu, Nov 22, 2018 at 01:32:37PM +0100, Dave P Martin wrote:
> On Thu, Nov 22, 2018 at 11:27:53AM +, Alex Bennée wrote:
> > 
> > Christoffer Dall  writes:
> > 
> > > [Adding Peter and Alex for their view on the QEMU side]
> > >
> > > On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> > >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
> > >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
> > >> > > KVM_GET_REG_LIST should only enumerate registers that are actually
> > >> > > accessible, so it is necessary to filter out any register that is
> > >> > > not exposed to the guest.  For features that are configured at
> > >> > > runtime, this will require a dynamic check.
> > >> > >
> > >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
> > >> > > if SVE is not enabled for the guest.
> > >> >
> > >> > This implies that userspace can never access this interface for a vcpu
> > >> > before having decided whether such features are enabled for the guest 
> > >> > or
> > >> > not, since otherwise userspace will see different states for a VCPU
> > >> > depending on sequencing of the API, which sounds fragile to me.
> > >> >
> > >> > That should probably be documented somewhere, and I hope the
> > >> > enable/disable API for SVE in guests already takes that into account.
> > >> >
> > >> > Not sure if there's an action to take here, but it was the best place I
> > >> > could raise this concern.
> > >>
> > >> Fair point.  I struggled to come up with something better that solves
> > >> all problems.
> > >>
> > >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> > >> creating the vcpu, so that if issued at all for a vcpu, it is issued
> > >> very soon after KVM_VCPU_INIT.
> > >>
> > >> I think this worked OK with the current structure of kvmtool and I
> > >> seem to remember discussing this with Peter Maydell re qemu -- but
> > >> it sounds like I should double-check.
> > >
> > > QEMU does some thing around enumerating all the system registers exposed
> > > by KVM and saving/restoring them as part of its startup, but I don't
> > > remember the exact sequence.
> > 
> > QEMU does this for each vCPU as part of it's start-up sequence:
> > 
> >   kvm_init_vcpu
> > kvm_get_cpu (-> KVM_CREATE_VCPU)
> > KVM_GET_VCPU_MMAP_SIZE
> > kvm_arch_init_vcpu
> >   kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT)
> >   kvm_get_one_reg(ARM_CPU_ID_MPIDR)
> >   kvm_arm_init_debug (chk for KVM_CAP 
> > SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS)
> >   kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR)
> >   kvm_arm_init_cpreg_list (KVM_GET_REG_LIST)
> > 
> > At this point we have the register list we need for
> > kvm_arch_get_registers which is what we call every time we want to
> > synchronise state. We only really do this for debug events, crashes and
> > at some point when migrating.
> 
> So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence,
> meaning that the new capability is not strictly necessary.
> 
> I sympathise with Christoffer's view though that without the capability
> mechanism it may be too easy for software to make mistakes: code
> refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls
> over and then things would go wrong with no immediate error indication.
> 
> In effect, the SVE regs would be missing from the list yielded by
> KVM_GET_REG_LIST, possibly leading to silent migration failures.
> 
> I'm a bit uneasy about that.  Am I being too paranoid now?
> 

No, we've made decisions in the past where we didn't enforce ordering
which ended up being a huge pain (vgic lazy init, as a clear example of
something really bad).  Of course, it's a tradeoff.  If it's a huge pain
to implement, maybe things will be ok, but if it's just a read/write
capability handshake, I think it's worth doing.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Peter Maydell
On 22 November 2018 at 12:34, Christoffer Dall  wrote:
> So on migration, will you have the required information for
> KVM_ARM_VCPU_INIT before setting the registers from the migration
> stream?
>
> (I assume so, because presumably this comes from a command-line switch
> or from the machine definition, which must match the source.)

Yes. QEMU always sets up the VCPU completely before doing any
inbound migration.

> Therefore, I don't think there's an issue with this patch, but from
> bitter experience I think we should enforce ordering if possible.

Yes, if there are semantic ordering constraints on the various
calls it would be nice to have the kernel enforce them.

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
On Thu, Nov 22, 2018 at 11:13:51AM +, Peter Maydell wrote:
> On 22 November 2018 at 10:53, Christoffer Dall  
> wrote:
> > [Adding Peter and Alex for their view on the QEMU side]
> >
> > On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> >> creating the vcpu, so that if issued at all for a vcpu, it is issued
> >> very soon after KVM_VCPU_INIT.
> >>
> >> I think this worked OK with the current structure of kvmtool and I
> >> seem to remember discussing this with Peter Maydell re qemu -- but
> >> it sounds like I should double-check.
> >
> > QEMU does some thing around enumerating all the system registers exposed
> > by KVM and saving/restoring them as part of its startup, but I don't
> > remember the exact sequence.
> 
> This all happens in kvm_arch_init_vcpu(), which does:
>  * KVM_ARM_VCPU_INIT ioctl (with the appropriate kvm_init_features set)
>  * read the guest MPIDR with GET_ONE_REG so we know what KVM
>is doing with MPIDR assignment across CPUs
>  * check for interesting extensions like KVM_CAP_SET_GUEST_DEBUG
>  * get and cache a list of what system registers the vcpu has,
>using KVM_GET_REG_LIST. This is where we do the "size must
>be U32 or U64" sanity check.
> 
> So if there's something we can't do by setting kvm_init_features
> for KVM_ARM_VCPU_INIT but have to do immediately afterwards,
> that is straightforward.
> 
> The major requirement for QEMU is that if we don't specifically
> enable SVE in the VCPU then we must not see any registers
> in the KVM_GET_REG_LIST that are not u32 or u64 -- otherwise
> QEMU will refuse to start.
> 

So on migration, will you have the required information for
KVM_ARM_VCPU_INIT before setting the registers from the migration
stream?

(I assume so, because presumably this comes from a command-line switch
or from the machine definition, which must match the source.)

Therefore, I don't think there's an issue with this patch, but from
bitter experience I think we should enforce ordering if possible.


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Dave P Martin
On Thu, Nov 22, 2018 at 11:27:53AM +, Alex Bennée wrote:
>
> Christoffer Dall  writes:
>
> > [Adding Peter and Alex for their view on the QEMU side]
> >
> > On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> >> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
> >> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
> >> > > KVM_GET_REG_LIST should only enumerate registers that are actually
> >> > > accessible, so it is necessary to filter out any register that is
> >> > > not exposed to the guest.  For features that are configured at
> >> > > runtime, this will require a dynamic check.
> >> > >
> >> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
> >> > > if SVE is not enabled for the guest.
> >> >
> >> > This implies that userspace can never access this interface for a vcpu
> >> > before having decided whether such features are enabled for the guest or
> >> > not, since otherwise userspace will see different states for a VCPU
> >> > depending on sequencing of the API, which sounds fragile to me.
> >> >
> >> > That should probably be documented somewhere, and I hope the
> >> > enable/disable API for SVE in guests already takes that into account.
> >> >
> >> > Not sure if there's an action to take here, but it was the best place I
> >> > could raise this concern.
> >>
> >> Fair point.  I struggled to come up with something better that solves
> >> all problems.
> >>
> >> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> >> creating the vcpu, so that if issued at all for a vcpu, it is issued
> >> very soon after KVM_VCPU_INIT.
> >>
> >> I think this worked OK with the current structure of kvmtool and I
> >> seem to remember discussing this with Peter Maydell re qemu -- but
> >> it sounds like I should double-check.
> >
> > QEMU does some thing around enumerating all the system registers exposed
> > by KVM and saving/restoring them as part of its startup, but I don't
> > remember the exact sequence.
>
> QEMU does this for each vCPU as part of it's start-up sequence:
>
>   kvm_init_vcpu
> kvm_get_cpu (-> KVM_CREATE_VCPU)
> KVM_GET_VCPU_MMAP_SIZE
> kvm_arch_init_vcpu
>   kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT)
>   kvm_get_one_reg(ARM_CPU_ID_MPIDR)
>   kvm_arm_init_debug (chk for KVM_CAP 
> SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS)
>   kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR)
>   kvm_arm_init_cpreg_list (KVM_GET_REG_LIST)
>
> At this point we have the register list we need for
> kvm_arch_get_registers which is what we call every time we want to
> synchronise state. We only really do this for debug events, crashes and
> at some point when migrating.

So we would need to insert KVM_ARM_SVE_CONFIG_SET into this sequence,
meaning that the new capability is not strictly necessary.

I sympathise with Christoffer's view though that without the capability
mechanism it may be too easy for software to make mistakes: code
refactoring might swap the KVM_GET_REG_LIST and KVM_ARM_SVE_CONFIG ioctls
over and then things would go wrong with no immediate error indication.

In effect, the SVE regs would be missing from the list yielded by
KVM_GET_REG_LIST, possibly leading to silent migration failures.

I'm a bit uneasy about that.  Am I being too paranoid now?

Cheers
---Dave
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Alex Bennée

Christoffer Dall  writes:

> [Adding Peter and Alex for their view on the QEMU side]
>
> On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
>> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
>> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
>> > > KVM_GET_REG_LIST should only enumerate registers that are actually
>> > > accessible, so it is necessary to filter out any register that is
>> > > not exposed to the guest.  For features that are configured at
>> > > runtime, this will require a dynamic check.
>> > >
>> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
>> > > if SVE is not enabled for the guest.
>> >
>> > This implies that userspace can never access this interface for a vcpu
>> > before having decided whether such features are enabled for the guest or
>> > not, since otherwise userspace will see different states for a VCPU
>> > depending on sequencing of the API, which sounds fragile to me.
>> >
>> > That should probably be documented somewhere, and I hope the
>> > enable/disable API for SVE in guests already takes that into account.
>> >
>> > Not sure if there's an action to take here, but it was the best place I
>> > could raise this concern.
>>
>> Fair point.  I struggled to come up with something better that solves
>> all problems.
>>
>> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
>> creating the vcpu, so that if issued at all for a vcpu, it is issued
>> very soon after KVM_VCPU_INIT.
>>
>> I think this worked OK with the current structure of kvmtool and I
>> seem to remember discussing this with Peter Maydell re qemu -- but
>> it sounds like I should double-check.
>
> QEMU does some thing around enumerating all the system registers exposed
> by KVM and saving/restoring them as part of its startup, but I don't
> remember the exact sequence.

QEMU does this for each vCPU as part of it's start-up sequence:

  kvm_init_vcpu
kvm_get_cpu (-> KVM_CREATE_VCPU)
KVM_GET_VCPU_MMAP_SIZE
kvm_arch_init_vcpu
  kvm_arm_vcpu_init (-> KVM_ARM_VCPU_INIT)
  kvm_get_one_reg(ARM_CPU_ID_MPIDR)
  kvm_arm_init_debug (chk for KVM_CAP 
SET_GUEST_DEBUG/GUEST_DEBUG_HW_WPS/BPS)
  kvm_arm_init_serror_injection (chk KVM_CAP_ARM_INJECT_SERROR_ESR)
  kvm_arm_init_cpreg_list (KVM_GET_REG_LIST)

At this point we have the register list we need for
kvm_arch_get_registers which is what we call every time we want to
synchronise state. We only really do this for debug events, crashes and
at some point when migrating.

>
>>
>> Either way, you're right, this needs to be clearly documented.
>>
>>
>> If we want to be more robust, maybe we should add a capability too,
>> so that userspace that enables this capability promises to call
>> KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN,
>> KVM_GET_REG_LIST etc.) are forbidden until that is done?
>>
>> That should help avoid accidents.
>>
>> I could add a special meaning for an empty kvm_sve_vls, such that
>> it doesn't enable SVE on the affected vcpu.  That retains the ability
>> to create heterogeneous guests while still following the above flow.
>>
> I think making sure that userspace can ever only see the same list of
> available system regiters is going to cause us less pain going forward.
>
> If the separate ioctl and capability check is the easiest way of doing
> that, then I think that sounds good.  (I had wished we could have just
> added some data to KVM_CREATE_VCPU, but that doesn't seem to be the
> case.)
>
>
> Thanks,
>
> Christoffer


--
Alex Bennée
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 0/4] arm64: Support perf event modifiers :G and :H

2018-11-22 Thread Andrew Murray
This patchset provides support for perf event modifiers :G and :H which
allows for filtering of PMU events between host and guests when used
with KVM.

As the underlying hardware cannot distinguish between guest and host
context, the performance counters must be stopped and started upon
entry/exit to the guest. This is performed at EL2 in a way that
minimizes overhead and improves accuracy of recording events that only
occur in the requested context.

This has been tested with VHE and non-VHE kernels with a KVM guest.

Changes from v2:

 - Ensured that exclude_kernel works for guest
 - Removed unnecessary exclusion of EL2 with exclude_host on !VHE
 - Renamed kvm_clr_set_host_pmu_events to reflect args order
 - Added additional information to isb patch

Changes from v1:

 - Removed unnecessary exclusion of EL1 with exclude_guest on VHE
 - Removed unnecessary isb from existing perf_event.c driver
 - Folded perf_event.c patches together
 - Added additional information to last patch commit message


Andrew Murray (4):
  arm64: arm_pmu: remove unnecessary isb instruction
  arm64: KVM: add accessors to track guest/host only counters
  arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes
  arm64: KVM: Enable support for :G/:H perf event modifiers

 arch/arm64/include/asm/kvm_host.h | 20 +++
 arch/arm64/kernel/perf_event.c| 53 +--
 arch/arm64/kvm/hyp/switch.c   | 38 
 3 files changed, 103 insertions(+), 8 deletions(-)

-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 2/4] arm64: KVM: add accessors to track guest/host only counters

2018-11-22 Thread Andrew Murray
In order to effeciently enable/disable guest/host only perf counters
at guest entry/exit we add bitfields to kvm_cpu_context for guest and
host only events as well as accessors for updating them.

Signed-off-by: Andrew Murray 
---
 arch/arm64/include/asm/kvm_host.h | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm64/include/asm/kvm_host.h 
b/arch/arm64/include/asm/kvm_host.h
index 1550192..4a828eb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -203,6 +203,8 @@ struct kvm_cpu_context {
};
 
struct kvm_vcpu *__hyp_running_vcpu;
+   u32 events_host_only;
+   u32 events_guest_only;
 };
 
 typedef struct kvm_cpu_context kvm_cpu_context_t;
@@ -472,6 +474,24 @@ static inline int kvm_arch_vcpu_run_pid_change(struct 
kvm_vcpu *vcpu)
 {
return kvm_arch_vcpu_run_map_fp(vcpu);
 }
+static inline void kvm_clr_set_host_pmu_events(u32 clr, u32 set)
+{
+   kvm_cpu_context_t *ctx = this_cpu_ptr(_host_cpu_state);
+
+   ctx->events_host_only &= ~clr;
+   ctx->events_host_only |= set;
+}
+
+static inline void kvm_clr_set_guest_pmu_events(u32 clr, u32 set)
+{
+   kvm_cpu_context_t *ctx = this_cpu_ptr(_host_cpu_state);
+
+   ctx->events_guest_only &= ~clr;
+   ctx->events_guest_only |= set;
+}
+#else
+static inline void kvm_clr_set_host_pmu_events(u32 clr, u32 set) {}
+static inline void kvm_clr_set_guest_pmu_events(u32 clr, u32 set) {}
 #endif
 
 static inline void kvm_arm_vhe_guest_enter(void)
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 4/4] arm64: KVM: Enable support for :G/:H perf event modifiers

2018-11-22 Thread Andrew Murray
Enable/disable event counters as appropriate when entering and exiting
the guest to enable support for guest or host only event counting.

For both VHE and non-VHE we switch the counters between host/guest at
EL2. EL2 is filtered out by the PMU when we are using the :G modifier.

The PMU may be on when we change which counters are enabled however
we avoid adding an isb as we instead rely on existing context
synchronisation events: the isb in kvm_arm_vhe_guest_exit for VHE and
the eret from the hvc in kvm_call_hyp.

Signed-off-by: Andrew Murray 
---
 arch/arm64/kvm/hyp/switch.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
index d496ef5..ebf0aac 100644
--- a/arch/arm64/kvm/hyp/switch.c
+++ b/arch/arm64/kvm/hyp/switch.c
@@ -373,6 +373,32 @@ static bool __hyp_text __hyp_switch_fpsimd(struct kvm_vcpu 
*vcpu)
return true;
 }
 
+static bool __hyp_text __pmu_switch_to_guest(struct kvm_cpu_context *host_ctxt)
+{
+   u32 host_only = host_ctxt->events_host_only;
+   u32 guest_only = host_ctxt->events_guest_only;
+
+   if (host_only)
+   write_sysreg(host_only, pmcntenclr_el0);
+
+   if (guest_only)
+   write_sysreg(guest_only, pmcntenset_el0);
+
+   return (host_only || guest_only);
+}
+
+static void __hyp_text __pmu_switch_to_host(struct kvm_cpu_context *host_ctxt)
+{
+   u32 host_only = host_ctxt->events_host_only;
+   u32 guest_only = host_ctxt->events_guest_only;
+
+   if (guest_only)
+   write_sysreg(guest_only, pmcntenclr_el0);
+
+   if (host_only)
+   write_sysreg(host_only, pmcntenset_el0);
+}
+
 /*
  * Return true when we were able to fixup the guest exit and should return to
  * the guest, false when we should restore the host state and return to the
@@ -488,12 +514,15 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
+   bool pmu_switch_needed;
u64 exit_code;
 
host_ctxt = vcpu->arch.host_cpu_context;
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
+   pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
+
sysreg_save_host_state_vhe(host_ctxt);
 
__activate_traps(vcpu);
@@ -524,6 +553,9 @@ int kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu)
 
__debug_switch_to_host(vcpu);
 
+   if (pmu_switch_needed)
+   __pmu_switch_to_host(host_ctxt);
+
return exit_code;
 }
 
@@ -532,6 +564,7 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 {
struct kvm_cpu_context *host_ctxt;
struct kvm_cpu_context *guest_ctxt;
+   bool pmu_switch_needed;
u64 exit_code;
 
vcpu = kern_hyp_va(vcpu);
@@ -540,6 +573,8 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
host_ctxt->__hyp_running_vcpu = vcpu;
guest_ctxt = >arch.ctxt;
 
+   pmu_switch_needed = __pmu_switch_to_guest(host_ctxt);
+
__sysreg_save_state_nvhe(host_ctxt);
 
__activate_traps(vcpu);
@@ -586,6 +621,9 @@ int __hyp_text __kvm_vcpu_run_nvhe(struct kvm_vcpu *vcpu)
 */
__debug_switch_to_host(vcpu);
 
+   if (pmu_switch_needed)
+   __pmu_switch_to_host(host_ctxt);
+
return exit_code;
 }
 
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 3/4] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2018-11-22 Thread Andrew Murray
Add support for the :G and :H attributes in perf by handling the
exclude_host/exclude_guest event attributes.

We notify KVM of counters that we wish to be enabled or disabled on
guest entry/exit and thus defer from starting or stopping :G events
as per the events exclude_host attribute.

With both VHE and non-VHE we switch the counters between host/guest
at EL2. We are able to eliminate counters counting host events on
the boundaries of guest entry/exit when using :G by filtering out
EL2 for exclude_host. However when using :H unless exclude_hv is set
on non-VHE then there is a small blackout window at the guest
entry/exit where host events are not captured.

Signed-off-by: Andrew Murray 
---
 arch/arm64/kernel/perf_event.c | 52 --
 1 file changed, 45 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index de564ae..e0619ba 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -26,6 +26,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -647,11 +648,23 @@ static inline int armv8pmu_enable_counter(int idx)
 
 static inline void armv8pmu_enable_event_counter(struct perf_event *event)
 {
+   struct perf_event_attr *attr = >attr;
int idx = event->hw.idx;
+   u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
 
-   armv8pmu_enable_counter(idx);
if (armv8pmu_event_is_chained(event))
-   armv8pmu_enable_counter(idx - 1);
+   counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
+
+   if (attr->exclude_host)
+   kvm_clr_set_guest_pmu_events(0, counter_bits);
+   if (attr->exclude_guest)
+   kvm_clr_set_host_pmu_events(0, counter_bits);
+
+   if (!attr->exclude_host) {
+   armv8pmu_enable_counter(idx);
+   if (armv8pmu_event_is_chained(event))
+   armv8pmu_enable_counter(idx - 1);
+   }
 }
 
 static inline int armv8pmu_disable_counter(int idx)
@@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int idx)
 static inline void armv8pmu_disable_event_counter(struct perf_event *event)
 {
struct hw_perf_event *hwc = >hw;
+   struct perf_event_attr *attr = >attr;
int idx = hwc->idx;
+   u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
 
if (armv8pmu_event_is_chained(event))
-   armv8pmu_disable_counter(idx - 1);
-   armv8pmu_disable_counter(idx);
+   counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
+
+   if (attr->exclude_host)
+   kvm_clr_set_guest_pmu_events(counter_bits, 0);
+   if (attr->exclude_guest)
+   kvm_clr_set_host_pmu_events(counter_bits, 0);
+
+   if (!attr->exclude_host) {
+   if (armv8pmu_event_is_chained(event))
+   armv8pmu_disable_counter(idx - 1);
+   armv8pmu_disable_counter(idx);
+   }
 }
 
 static inline int armv8pmu_enable_intens(int idx)
@@ -943,16 +968,25 @@ static int armv8pmu_set_event_filter(struct hw_perf_event 
*event,
 * Therefore we ignore exclude_hv in this configuration, since
 * there's no hypervisor to sample anyway. This is consistent
 * with other architectures (x86 and Power).
+*
+* To eliminate counting host events on the boundaries of
+* guest entry/exit we ensure EL2 is not included in hyp mode
+* with !exclude_host.
 */
if (is_kernel_in_hyp_mode()) {
-   if (!attr->exclude_kernel)
+   if (!attr->exclude_kernel && !attr->exclude_host)
config_base |= ARMV8_PMU_INCLUDE_EL2;
} else {
-   if (attr->exclude_kernel)
-   config_base |= ARMV8_PMU_EXCLUDE_EL1;
if (!attr->exclude_hv)
config_base |= ARMV8_PMU_INCLUDE_EL2;
}
+
+   /*
+* Filter out !VHE kernels and guest kernels
+*/
+   if (attr->exclude_kernel)
+   config_base |= ARMV8_PMU_EXCLUDE_EL1;
+
if (attr->exclude_user)
config_base |= ARMV8_PMU_EXCLUDE_EL0;
 
@@ -976,6 +1010,10 @@ static void armv8pmu_reset(void *info)
armv8pmu_disable_intens(idx);
}
 
+   /* Clear the counters we flip at guest entry/exit */
+   kvm_clr_set_host_pmu_events(U32_MAX, 0);
+   kvm_clr_set_guest_pmu_events(U32_MAX, 0);
+
/*
 * Initialize & Reset PMNC. Request overflow interrupt for
 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


[PATCH v3 1/4] arm64: arm_pmu: remove unnecessary isb instruction

2018-11-22 Thread Andrew Murray
The armv8pmu_enable_event_counter function issues an isb instruction
after enabling a pair of counters - this doesn't provide any value
and is inconsistent with the armv8pmu_disable_event_counter.

In any case armv8pmu_enable_event_counter is always called with the
PMU stopped. Starting the PMU with armv8pmu_start results in an isb
instruction being issued prior to writing to PMCR_EL0.

Let's remove the unnecessary isb instruction.

Signed-off-by: Andrew Murray 
Reviewed-by: Suzuki K Poulose 
Acked-by: Mark Rutland 
---
 arch/arm64/kernel/perf_event.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 8e38d52..de564ae 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -652,7 +652,6 @@ static inline void armv8pmu_enable_event_counter(struct 
perf_event *event)
armv8pmu_enable_counter(idx);
if (armv8pmu_event_is_chained(event))
armv8pmu_enable_counter(idx - 1);
-   isb();
 }
 
 static inline int armv8pmu_disable_counter(int idx)
-- 
2.7.4

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Peter Maydell
On 22 November 2018 at 10:53, Christoffer Dall  wrote:
> [Adding Peter and Alex for their view on the QEMU side]
>
> On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
>> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
>> creating the vcpu, so that if issued at all for a vcpu, it is issued
>> very soon after KVM_VCPU_INIT.
>>
>> I think this worked OK with the current structure of kvmtool and I
>> seem to remember discussing this with Peter Maydell re qemu -- but
>> it sounds like I should double-check.
>
> QEMU does some thing around enumerating all the system registers exposed
> by KVM and saving/restoring them as part of its startup, but I don't
> remember the exact sequence.

This all happens in kvm_arch_init_vcpu(), which does:
 * KVM_ARM_VCPU_INIT ioctl (with the appropriate kvm_init_features set)
 * read the guest MPIDR with GET_ONE_REG so we know what KVM
   is doing with MPIDR assignment across CPUs
 * check for interesting extensions like KVM_CAP_SET_GUEST_DEBUG
 * get and cache a list of what system registers the vcpu has,
   using KVM_GET_REG_LIST. This is where we do the "size must
   be U32 or U64" sanity check.

So if there's something we can't do by setting kvm_init_features
for KVM_ARM_VCPU_INIT but have to do immediately afterwards,
that is straightforward.

The major requirement for QEMU is that if we don't specifically
enable SVE in the VCPU then we must not see any registers
in the KVM_GET_REG_LIST that are not u32 or u64 -- otherwise
QEMU will refuse to start.

thanks
-- PMM
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 11/23] KVM: arm64: Support runtime sysreg filtering for KVM_GET_REG_LIST

2018-11-22 Thread Christoffer Dall
[Adding Peter and Alex for their view on the QEMU side]

On Thu, Nov 15, 2018 at 05:27:11PM +, Dave Martin wrote:
> On Fri, Nov 02, 2018 at 09:16:25AM +0100, Christoffer Dall wrote:
> > On Fri, Sep 28, 2018 at 02:39:15PM +0100, Dave Martin wrote:
> > > KVM_GET_REG_LIST should only enumerate registers that are actually
> > > accessible, so it is necessary to filter out any register that is
> > > not exposed to the guest.  For features that are configured at
> > > runtime, this will require a dynamic check.
> > > 
> > > For example, ZCR_EL1 and ID_AA64ZFR0_EL1 would need to be hidden
> > > if SVE is not enabled for the guest.
> > 
> > This implies that userspace can never access this interface for a vcpu
> > before having decided whether such features are enabled for the guest or
> > not, since otherwise userspace will see different states for a VCPU
> > depending on sequencing of the API, which sounds fragile to me.
> > 
> > That should probably be documented somewhere, and I hope the
> > enable/disable API for SVE in guests already takes that into account.
> > 
> > Not sure if there's an action to take here, but it was the best place I
> > could raise this concern.
> 
> Fair point.  I struggled to come up with something better that solves
> all problems.
> 
> My expectation is that KVM_ARM_SVE_CONFIG_SET is considered part of
> creating the vcpu, so that if issued at all for a vcpu, it is issued
> very soon after KVM_VCPU_INIT.
> 
> I think this worked OK with the current structure of kvmtool and I
> seem to remember discussing this with Peter Maydell re qemu -- but
> it sounds like I should double-check.

QEMU does some thing around enumerating all the system registers exposed
by KVM and saving/restoring them as part of its startup, but I don't
remember the exact sequence.

> 
> Either way, you're right, this needs to be clearly documented.
> 
> 
> If we want to be more robust, maybe we should add a capability too,
> so that userspace that enables this capability promises to call
> KVM_ARM_SVE_CONFIG_SET for each vcpu, and affected ioctls (KVM_RUN,
> KVM_GET_REG_LIST etc.) are forbidden until that is done?
> 
> That should help avoid accidents.
> 
> I could add a special meaning for an empty kvm_sve_vls, such that
> it doesn't enable SVE on the affected vcpu.  That retains the ability
> to create heterogeneous guests while still following the above flow.
> 
I think making sure that userspace can ever only see the same list of
available system regiters is going to cause us less pain going forward.

If the separate ioctl and capability check is the easiest way of doing
that, then I think that sounds good.  (I had wished we could have just
added some data to KVM_CREATE_VCPU, but that doesn't seem to be the
case.)


Thanks,

Christoffer
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH] kvm: arm/arm64 : fix vm's hanging at startup time

2018-11-22 Thread Christoffer Dall
On Wed, Nov 21, 2018 at 03:53:03PM +, Julien Thierry wrote:
> 
> 
> On 21/11/18 15:24, Christoffer Dall wrote:
> >On Wed, Nov 21, 2018 at 12:17:45PM +, Julien Thierry wrote:
> >>
> >>
> >>On 21/11/18 11:06, Christoffer Dall wrote:
> >>>Hi,
> >>>
> >>>On Wed, Nov 21, 2018 at 04:56:54PM +0800, peng.h...@zte.com.cn wrote:
> >On 19/11/2018 09:10, Mark Rutland wrote:
> >>On Sat, Nov 17, 2018 at 10:58:37AM +0800, peng.h...@zte.com.cn wrote:
> On 16/11/18 00:23, peng.h...@zte.com.cn wrote:
> >>Hi,
> >>>When virtual machine starts, hang up.
> >>
> >>I take it you mean the *guest* hangs? Because it doesn't get a timer
> >>interrupt?
> >>
> >>>The kernel version of guest
> >>>is 4.16. Host support vgic_v3.
> >>
> >>Your host kernel is something recent, I guess?
> >>
> >>>It was mainly due to the incorrect vgic_irq's(intid=27) group value
> >>>during injection interruption. when kvm_vgic_vcpu_init is called,
> >>>dist is not initialized at this time. Unable to get vgic V3 or V2
> >>>correctly, so group is not set.
> >>
> >>Mmh, that shouldn't happen with (v)GICv3. Do you use QEMU (which
> >>version?) or some other userland tool?
> >>
> >
> >QEMU emulator version 3.0.50 .
> >
> >>>group is setted to 1 when vgic_mmio_write_group is invoked at some
> >>>time.
> >>>when irq->group=0 (intid=27), No ICH_LR_GROUP flag was set and
> >>>interrupt injection failed.
> >>>
> >>>Signed-off-by: Peng Hao 
> >>>---
> >>>   virt/kvm/arm/vgic/vgic-v3.c | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>>diff --git a/virt/kvm/arm/vgic/vgic-v3.c 
> >>>b/virt/kvm/arm/vgic/vgic-v3.c
> >>>index 9c0dd23..d101000 100644
> >>>--- a/virt/kvm/arm/vgic/vgic-v3.c
> >>>+++ b/virt/kvm/arm/vgic/vgic-v3.c
> >>>@@ -198,7 +198,7 @@ void vgic_v3_populate_lr(struct kvm_vcpu *vcpu,
> >>>struct vgic_irq *irq, int lr) if (vgic_irq_is_mapped_level(irq) &&
> >>>(val & ICH_LR_PENDING_BIT)) irq->line_level = false;
> >>>
> >>>-if (irq->group)
> >>>+if (model == KVM_DEV_TYPE_ARM_VGIC_V3)
> >>
> >>This is not the right fix, not only because it basically reverts the
> >>GICv3 part of 87322099052 (KVM: arm/arm64: vgic: Signal IRQs using
> >>their configured group).
> >>
> >>Can you try to work out why kvm_vgic_vcpu_init() is apparently 
> >>called
> >>before dist->vgic_model is set, also what value it has?
> >>If I understand the code correctly, that shouldn't happen for a 
> >>GICv3.
> >>
> >Even if the value of  group is correctly assigned in 
> >kvm_vgic_vcpu_init, the group is then written 0 through 
> >vgic_mmio_write_group.
> >   If the interrupt comes at this time, the interrupt injection 
> > fails.
> 
> Does that mean that the guest is configuring its interrupts as Group0?
> That sounds wrong, Linux should configure all it's interrupts as
> non-secure group1.
> >>>
> >>>no, I think that uefi dose this, not linux.
> >>>1. kvm_vgic_vcpu_init
> >>>2. vgic_create
> >>>3. kvm_vgic_dist_init
> >>>4.vgic_mmio_write_group: uefi as guest, write group=0
> >>>5.vgic_mmio_write_group: linux as guest, write group=1
> >>
> >>Is this the same issue fixed by EDK2 commit:
> >>
> >>66127011a544b90e ("ArmPkg/ArmGicDxe ARM: fix encoding for GICv3 
> >>interrupt acknowledge")
> >>
> >>... where EDK2 would try to use IAR0 rather than IAR1?
> >>
> >>The commit messages notes this lead to a boot-time hang.
> >
> >I managed to trigger an issue with a really old EFI implementation that
> >doesn't configure its interrupts as Group1, and yet tries to ACK its
> >interrupts using the Group1 accessor. Guess what? It is not going to 
> >work.
> >
> >Commit c7fefb690661f2e38afcb8200bd318ecf38ab961 in the edk2 tree seems
> >to be the fix (I only assume it does, I haven't actually checked). A
> >recent build, as found in Debian Buster, works perfectly (tested with
> >both QEMU v2.12 and tip of tree).
> >
> >Now, I really don't get what you're saying about Linux not getting
> >interrupts. How do you get to booting Linux if EFI is not making any
> >forward progress? Are you trying them independently?
> >
> I start linux with bypassing uefi, the print info is the same.
> [507107.748908]  vgic_mmio_write_group:## intid/27 group=0
> [507107.752185]  vgic_mmio_write_group:## intid/27 group=0
> [507107.899566]  vgic_mmio_write_group:## intid/27 group=1
> [507107.907370]  vgic_mmio_write_group:## intid/27 

Re: [PATCH v2 3/4] arm64: arm_pmu: Add support for exclude_host/exclude_guest attributes

2018-11-22 Thread Andrew Murray
On Wed, Nov 21, 2018 at 05:01:47PM +, Suzuki K Poulose wrote:
> 
> 
> On 21/11/2018 13:23, Andrew Murray wrote:
> > On Wed, Nov 21, 2018 at 10:56:02AM +, Andrew Murray wrote:
> > > On Tue, Nov 20, 2018 at 02:49:05PM +, Suzuki K Poulose wrote:
> > > > On 11/20/2018 02:15 PM, Andrew Murray wrote:
> > > > > Add support for the :G and :H attributes in perf by handling the
> > > > > exclude_host/exclude_guest event attributes.
> > > > > 
> > > > > We notify KVM of counters that we wish to be enabled or disabled on
> > > > > guest entry/exit and thus defer from starting or stopping :G events
> > > > > as per the events exclude_host attribute.
> > > > > 
> > > > > When using VHE, EL2 is unused by the guest - therefore we can filter
> > > > > out these events with the PMU as per the 'exclude_host' attribute.
> > > > > 
> > > > > With both VHE and non-VHE we switch the counters between host/guest
> > > > > at EL2. With non-VHE when using 'exclude_host' we filter out EL2.
> > > > > 
> > > > > These changes eliminate counters counting host events on the
> > > > > boundaries of guest entry/exit when using :G. However when using :H
> > > > > unless exclude_hv is set on non-VHE then there is a small blackout
> > > > > window at the guest entry/exit where host events are not captured.
> > > > > 
> > > > > Signed-off-by: Andrew Murray 
> > > > > ---
> > > > >arch/arm64/kernel/perf_event.c | 41 
> > > > > +++--
> > > 
> > 
> > 
> > > > 
> > > > > + }
> > > > >}
> > > > >static inline int armv8pmu_disable_counter(int idx)
> > > > > @@ -664,11 +677,23 @@ static inline int armv8pmu_disable_counter(int 
> > > > > idx)
> > > > >static inline void armv8pmu_disable_event_counter(struct 
> > > > > perf_event *event)
> > > > >{
> > > > >   struct hw_perf_event *hwc = >hw;
> > > > > + struct perf_event_attr *attr = >attr;
> > > > >   int idx = hwc->idx;
> > > > > + u32 counter_bits = BIT(ARMV8_IDX_TO_COUNTER(idx));
> > > > >   if (armv8pmu_event_is_chained(event))
> > > > > - armv8pmu_disable_counter(idx - 1);
> > > > > - armv8pmu_disable_counter(idx);
> > > > > + counter_bits |= BIT(ARMV8_IDX_TO_COUNTER(idx - 1));
> > > > > +
> > > > > + if (attr->exclude_host)
> > > > > + kvm_set_clr_guest_pmu_events(counter_bits, 0);
> > > > > + if (attr->exclude_guest)
> > > > > + kvm_set_clr_host_pmu_events(counter_bits, 0);
> > > > > +
> > > > > + if (!attr->exclude_host) {
> > > > > + if (armv8pmu_event_is_chained(event))
> > > > > + armv8pmu_disable_counter(idx - 1);
> > > > > + armv8pmu_disable_counter(idx);
> > > > > + }
> > > > >}
> > > > >static inline int armv8pmu_enable_intens(int idx)
> > > > > @@ -945,12 +970,12 @@ static int armv8pmu_set_event_filter(struct 
> > > > > hw_perf_event *event,
> > > > >* with other architectures (x86 and Power).
> > > > >*/
> > > > >   if (is_kernel_in_hyp_mode()) {
> > > > > - if (!attr->exclude_kernel)
> > > > > + if (!attr->exclude_kernel && !attr->exclude_host)
> > > > >   config_base |= ARMV8_PMU_INCLUDE_EL2;
> > > > 
> > > > Shouldn't we handle "exclude_kernel" for a "Guest" event ?
> > > > i.e, what if we have exclude_kernel + exclude_host set ? Doesn't
> > > > the "exclude_kernel" apply to the event filtering after we enter
> > > > guest and thus, we need to set the EXCLUDE_EL1 ?
> > > 
> > > Yes you're right. This is a problem not just for a VHE host's guest but
> > > for the host as well - for example perf -e instructions:h and
> > > -e instructions:G will currently give the same count.
> > > 
> > > > 
> > > > Also I am wondering what is the situation with Nested virtualisation
> > > > coming in. i.e, if the host hyp wanted to profile the guest hyp, should
> > > > we set EL2 events ? I understand this is something which should be
> > > > solved with the nested virt changes. But it would be good to see
> > > > if we could filter "exclude_host" and "exclude_guest" at the hypervisor
> > > > level (i.e, in software, without PMU filtering) to allow the normal
> > > > controls to make use of the hardware filtering ?
> > > 
> > > It took me a while to think this through and I think you're right in
> > > that we can do more to future proof this for nested virt. In fact we
> > > don't depend on the hunks in this function (i.e. addition of extra
> > > '!attr->exclude_host' conditions) - we don't need them due to the
> > > enable/disable of counters at guest entry/exit.
> > > 
> > > With the assumption of the hypervisor switch enabling/disabling
> > > host/guest counters I think we can now do the following:
> > > 
> > >  /*
> > >   * If we're running in hyp mode, then we *are* the hypervisor.
> > >   * Therefore we ignore exclude_hv in this configuration, since
> > >   * there's no hypervisor 

Re: [PATCH v2 2/4] arm64: KVM: add accessors to track guest/host only counters

2018-11-22 Thread Andrew Murray
On Wed, Nov 21, 2018 at 04:04:03PM +, Marc Zyngier wrote:
> On 20/11/2018 14:15, Andrew Murray wrote:
> > In order to effeciently enable/disable guest/host only perf counters
> > at guest entry/exit we add bitfields to kvm_cpu_context for guest and
> > host only events as well as accessors for updating them.
> > 
> > Signed-off-by: Andrew Murray 
> > ---
> >  arch/arm64/include/asm/kvm_host.h | 20 
> >  1 file changed, 20 insertions(+)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_host.h 
> > b/arch/arm64/include/asm/kvm_host.h
> > index 1550192..b6f998b 100644
> > --- a/arch/arm64/include/asm/kvm_host.h
> > +++ b/arch/arm64/include/asm/kvm_host.h
> > @@ -203,6 +203,8 @@ struct kvm_cpu_context {
> > };
> >  
> > struct kvm_vcpu *__hyp_running_vcpu;
> > +   u32 events_host_only;
> > +   u32 events_guest_only;
> >  };
> >  
> >  typedef struct kvm_cpu_context kvm_cpu_context_t;
> > @@ -472,6 +474,24 @@ static inline int kvm_arch_vcpu_run_pid_change(struct 
> > kvm_vcpu *vcpu)
> >  {
> > return kvm_arch_vcpu_run_map_fp(vcpu);
> >  }
> > +static inline void kvm_set_clr_host_pmu_events(u32 clr, u32 set)
> 
> nit: keep the arguments in an order that match the function name (saves
> a lot of brain power...)

Thanks, every bit helps.

Andrew Murray

> 
> Thanks,
> 
>   M.
> -- 
> Jazz is not dead. It just smells funny...
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v2 1/4] arm64: arm_pmu: remove unnecessary isb instruction

2018-11-22 Thread Andrew Murray
On Wed, Nov 21, 2018 at 03:58:14PM +, Mark Rutland wrote:
> On Tue, Nov 20, 2018 at 02:15:19PM +, Andrew Murray wrote:
> > The armv8pmu_enable_event_counter function issues an isb instruction
> > after enabling a pair of counters - this doesn't provide any value
> > and is inconsistent with the armv8pmu_disable_event_counter.
> > 
> > In any case armv8pmu_enable_event_counter is always called with the
> > PMU stopped. Starting the PMU with armv8pmu_start results in an isb
> > instruction being issued.
> > 
> > Let's remove the unnecessary isb instruction.
> > 
> > Signed-off-by: Andrew Murray 
> 
> Acked-by: Mark Rutland 
> 
> ... though it would be nice to explicitly point out that the ISB in 
> armv8pmu_start() is before the write to PMCR_EL1, if you happen to
> respin this patch.

Thanks, I'm respining so I'll update the log and preserve your ack.

Andrew Murray

> 
> Mark.
> 
> > ---
> >  arch/arm64/kernel/perf_event.c | 1 -
> >  1 file changed, 1 deletion(-)
> > 
> > diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> > index 8e38d52..de564ae 100644
> > --- a/arch/arm64/kernel/perf_event.c
> > +++ b/arch/arm64/kernel/perf_event.c
> > @@ -652,7 +652,6 @@ static inline void armv8pmu_enable_event_counter(struct 
> > perf_event *event)
> > armv8pmu_enable_counter(idx);
> > if (armv8pmu_event_is_chained(event))
> > armv8pmu_enable_counter(idx - 1);
> > -   isb();
> >  }
> >  
> >  static inline int armv8pmu_disable_counter(int idx)
> > -- 
> > 2.7.4
> > 
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm