Re: [PATCH 0/3] Add support for ACPI VIOT

2021-03-19 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> Add a driver for the ACPI VIOT table, which enables virtio-iommu on
> non-devicetree platforms, including x86. This series depends on the
> ACPICA changes of patch 1, which will be included in next release [1]
> and pulled into Linux.
> 
> The Virtual I/O Translation table (VIOT) describes the topology of
> para-virtual I/O translation devices and the endpoints they manage.
> It was recently approved for inclusion into the ACPI standard [2].
> A provisional version of the specification can be found at [3].
> 
> After discussing non-devicetree support for virtio-iommu at length
> [4][5][6] we concluded that it should use this new ACPI table. And for
> platforms that don't implement either devicetree or ACPI, a structure
> that uses roughly the same format [6] can be built into the device.
> 
> [1] https://github.com/acpica/acpica/pull/666
> [2] https://lore.kernel.org/linux-iommu/20210218233943.gh702...@redhat.com/
> [3] https://jpbrucker.net/virtio-iommu/viot/viot-v9.pdf
> [4] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> [5] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [6] 
> https://lore.kernel.org/linux-iommu/20200821131540.2801801-1-jean-phili...@linaro.org/

Do you have a qemu branch to share for us to start exercising different
kinds of topology?

Thanks

Eric
> 
> Jean-Philippe Brucker (3):
>   ACPICA: iASL: Add definitions for the VIOT table
>   ACPI: Add driver for the VIOT table
>   iommu/virtio: Enable x86 support
> 
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   4 +-
>  drivers/acpi/Makefile|   2 +
>  include/acpi/actbl3.h|  67 ++
>  include/linux/acpi_viot.h|  26 +++
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |   6 +
>  drivers/acpi/viot.c  | 406 +++
>  drivers/iommu/virtio-iommu.c |   3 +
>  MAINTAINERS  |   8 +
>  10 files changed, 526 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/3] ACPI: Add driver for the VIOT table

2021-03-19 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> The ACPI Virtual I/O Translation Table describes topology of
> para-virtual platforms. For now it describes the relation between
> virtio-iommu and the endpoints it manages. Supporting that requires
> three steps:
> 
> (1) acpi_viot_init(): parse the VIOT table, build a list of endpoints
> and vIOMMUs.
> 
> (2) acpi_viot_set_iommu_ops(): when the vIOMMU driver is loaded and the
> device probed, register it to the VIOT driver. This step is required
> because unlike similar drivers, VIOT doesn't create the vIOMMU
> device.
> 
> (3) acpi_viot_dma_setup(): when the endpoint is initialized, find the
> vIOMMU and register the endpoint's DMA ops.
> 
> If step (3) happens before step (2), it is deferred until the IOMMU is
> initialized, then retried.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/acpi/Kconfig |   3 +
>  drivers/iommu/Kconfig|   1 +
>  drivers/acpi/Makefile|   2 +
>  include/linux/acpi_viot.h|  26 +++
>  drivers/acpi/bus.c   |   2 +
>  drivers/acpi/scan.c  |   6 +
>  drivers/acpi/viot.c  | 406 +++
>  drivers/iommu/virtio-iommu.c |   3 +
>  MAINTAINERS  |   8 +
>  9 files changed, 457 insertions(+)
>  create mode 100644 include/linux/acpi_viot.h
>  create mode 100644 drivers/acpi/viot.c
> 
> diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig
> index eedec61e3476..3758c6940ed7 100644
> --- a/drivers/acpi/Kconfig
> +++ b/drivers/acpi/Kconfig
> @@ -526,6 +526,9 @@ endif
>  
>  source "drivers/acpi/pmic/Kconfig"
>  
> +config ACPI_VIOT
> + bool
> +
>  endif# ACPI
>  
>  config X86_PM_TIMER
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 192ef8f61310..2819b5c8ec30 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,6 +403,7 @@ config VIRTIO_IOMMU
>   depends on ARM64
>   select IOMMU_API
>   select INTERVAL_TREE
> + select ACPI_VIOT if ACPI
>   help
> Para-virtualised IOMMU driver with virtio.
>  
> diff --git a/drivers/acpi/Makefile b/drivers/acpi/Makefile
> index 700b41adf2db..a6e644c48987 100644
> --- a/drivers/acpi/Makefile
> +++ b/drivers/acpi/Makefile
> @@ -118,3 +118,5 @@ video-objs+= acpi_video.o 
> video_detect.o
>  obj-y+= dptf/
>  
>  obj-$(CONFIG_ARM64)  += arm64/
> +
> +obj-$(CONFIG_ACPI_VIOT)  += viot.o
> diff --git a/include/linux/acpi_viot.h b/include/linux/acpi_viot.h
> new file mode 100644
> index ..1f5837595488
> --- /dev/null
> +++ b/include/linux/acpi_viot.h
> @@ -0,0 +1,26 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +
> +#ifndef __ACPI_VIOT_H__
> +#define __ACPI_VIOT_H__
> +
> +#include 
> +
> +#ifdef CONFIG_ACPI_VIOT
> +void __init acpi_viot_init(void);
> +int acpi_viot_dma_setup(struct device *dev, enum dev_dma_attr attr);
> +int acpi_viot_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
> +#else
> +static inline void acpi_viot_init(void) {}
> +static inline int acpi_viot_dma_setup(struct device *dev,
> +   enum dev_dma_attr attr)
> +{
> + return 0;
> +}
> +static inline int acpi_viot_set_iommu_ops(struct device *dev,
> +   struct iommu_ops *ops)
> +{
> + return -ENODEV;
> +}
> +#endif
> +
> +#endif /* __ACPI_VIOT_H__ */
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index be7da23fad76..f9a965c6617e 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -27,6 +27,7 @@
>  #include 
>  #endif
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1338,6 +1339,7 @@ static int __init acpi_init(void)
>  
>   pci_mmcfg_late_init();
>   acpi_iort_init();
> + acpi_viot_init();
>   acpi_scan_init();
>   acpi_ec_init();
>   acpi_debugfs_init();
> diff --git a/drivers/acpi/scan.c b/drivers/acpi/scan.c
> index a184529d8fa4..4af01fddd94c 100644
> --- a/drivers/acpi/scan.c
> +++ b/drivers/acpi/scan.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -1506,12 +1507,17 @@ int acpi_dma_configure_id(struct device *dev, enum 
> dev_dma_attr attr,
>  {
>   const struct iommu_ops *iommu;
>   u64 dma_addr = 0, size = 0;
> + int ret;
>  
>   if (attr == DEV_DMA_NOT_SUPPORTED) {
>   set_dma_ops(dev, &dma_dummy_ops);
>   return 0;
>   }
>  
> + ret = acpi_viot_dma_setup(dev, attr);
> + if (ret)
> + return ret > 0 ? 0 : ret;
> +
>   iort_dma_setup(dev, &dma_addr, &size);
>  
>   iommu = iort_iommu_configure_id(dev, input_id);
> diff --git a/drivers/acpi/viot.c b/drivers/acpi/viot.c
> new file mode 100644
> index ..57a092e8551b
> --- /dev/null
> +++ b/drivers/acpi/viot.c
> @@ -0,0 +1,406 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Vir

Re: [PATCH 1/3] ACPICA: iASL: Add definitions for the VIOT table

2021-03-18 Thread Auger Eric
Hi Jean,

On 3/16/21 8:16 PM, Jean-Philippe Brucker wrote:
> Just here for reference, don't merge!
> 
> The actual commits will be pulled from the next ACPICA release.
> I squashed the three relevant commits:
> 
> ACPICA commit fc4e33319c1ee08f20f5c44853dd8426643f6dfd
> ACPICA commit 2197e354fb5dcafaddd2016ffeb0620e5bc3d5e2
> ACPICA commit 856a96fdf4b51b2b8da17529df0255e6f51f1b5b
> 
> Link: https://github.com/acpica/acpica/commit/fc4e3331
> Link: https://github.com/acpica/acpica/commit/2197e354
> Link: https://github.com/acpica/acpica/commit/856a96fd
> Signed-off-by: Bob Moore 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  include/acpi/actbl3.h | 67 +++
>  1 file changed, 67 insertions(+)
> 
> diff --git a/include/acpi/actbl3.h b/include/acpi/actbl3.h
> index df5f4b27f3aa..09d15898e9a8 100644
> --- a/include/acpi/actbl3.h
> +++ b/include/acpi/actbl3.h
> @@ -33,6 +33,7 @@
>  #define ACPI_SIG_TCPA   "TCPA"   /* Trusted Computing Platform 
> Alliance table */
>  #define ACPI_SIG_TPM2   "TPM2"   /* Trusted Platform Module 2.0 
> H/W interface table */
>  #define ACPI_SIG_UEFI   "UEFI"   /* Uefi Boot Optimization Table 
> */
> +#define ACPI_SIG_VIOT   "VIOT"   /* Virtual I/O Translation 
> Table */
>  #define ACPI_SIG_WAET   "WAET"   /* Windows ACPI Emulated 
> devices Table */
>  #define ACPI_SIG_WDAT   "WDAT"   /* Watchdog Action Table */
>  #define ACPI_SIG_WDDT   "WDDT"   /* Watchdog Timer Description 
> Table */
> @@ -483,6 +484,72 @@ struct acpi_table_uefi {
>   u16 data_offset;/* Offset of remaining data in table */
>  };
>  
> +/***
> + *
> + * VIOT - Virtual I/O Translation Table
> + *Version 1
For other tables I see
Conforms to ../.. Shouldn't we have such section too
> + *
> + 
> **/
> +
> +struct acpi_table_viot {
> + struct acpi_table_header header;/* Common ACPI table header */
> + u16 node_count;
> + u16 node_offset;
> + u8 reserved[8];
> +};
> +
> +/* VIOT subtable header */
> +
> +struct acpi_viot_header {
> + u8 type;
> + u8 reserved;
> + u16 length;
> +};
> +
> +/* Values for Type field above */
> +
> +enum acpi_viot_node_type {
> + ACPI_VIOT_NODE_PCI_RANGE = 0x01,
> + ACPI_VIOT_NODE_MMIO = 0x02,
> + ACPI_VIOT_NODE_VIRTIO_IOMMU_PCI = 0x03,
> + ACPI_VIOT_NODE_VIRTIO_IOMMU_MMIO = 0x04,
> + ACPI_VIOT_RESERVED = 0x05
> +};
> +
> +/* VIOT subtables */
> +
> +struct acpi_viot_pci_range {
> + struct acpi_viot_header header;
> + u32 endpoint_start;
> + u16 segment_start;
> + u16 segment_end;
> + u16 bdf_start;
> + u16 bdf_end;
> + u16 output_node;
> + u8 reserved[6];
> +};
> +
> +struct acpi_viot_mmio {
> + struct acpi_viot_header header;
> + u32 endpoint;
> + u64 base_address;
> + u16 output_node;
> + u8 reserved[6];
> +};
> +
> +struct acpi_viot_virtio_iommu_pci {
> + struct acpi_viot_header header;
> + u16 segment;
> + u16 bdf;
> + u8 reserved[8];
> +};
> +
> +struct acpi_viot_virtio_iommu_mmio {
> + struct acpi_viot_header header;
> + u8 reserved[4];
> + u64 base_address;
> +};
> +
>  
> /***
>   *
>   * WAET - Windows ACPI Emulated devices Table
> 

Besides
Reviewed-by: Eric Auger 

Thanks

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-15 Thread Auger Eric
Hi Christoph,

On 3/14/21 4:58 PM, Christoph Hellwig wrote:
> On Sun, Mar 14, 2021 at 11:44:52AM +0100, Auger Eric wrote:
>> As mentionned by Robin, there are series planning to use
>> DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu (ARM
>> and Intel):
>>
>> [Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
>> patches 1, 2, 3
>>
>> Is the plan to introduce a new domain_get_nesting_info ops then?
> 
> The plan as usual would be to add it the series adding that support.
> Not sure what the merge plans are - if the series is ready to be
> merged I could rebase on top of it, otherwise that series will need
> to add the method.
OK I think your series may be upstreamed first.

Thanks

Eric
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 15/17] iommu: remove DOMAIN_ATTR_NESTING

2021-03-14 Thread Auger Eric
Hi Christoph,

On 3/1/21 9:42 AM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 40 ++---
>  drivers/iommu/arm/arm-smmu/arm-smmu.c   | 30 ++--
>  drivers/iommu/intel/iommu.c | 28 +--
>  drivers/iommu/iommu.c   |  8 +
>  drivers/vfio/vfio_iommu_type1.c |  5 +--
>  include/linux/iommu.h   |  4 ++-
>  6 files changed, 50 insertions(+), 65 deletions(-)

As mentionned by Robin, there are series planning to use
DOMAIN_ATTR_NESTING to get info about the nested caps of the iommu (ARM
and Intel):

[Patch v8 00/10] vfio: expose virtual Shared Virtual Addressing to VMs
patches 1, 2, 3

Is the plan to introduce a new domain_get_nesting_info ops then?

Thanks

Eric


> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index bf96172e8c1f71..8e6fee3ea454d3 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2466,41 +2466,21 @@ static void arm_smmu_dma_enable_flush_queue(struct 
> iommu_domain *domain)
>   to_smmu_domain(domain)->non_strict = true;
>  }
>  
> -static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
> - enum iommu_attr attr, void *data)
> +static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
>  {
> - int ret = 0;
>   struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + int ret = -EPERM;
>  
> - mutex_lock(&smmu_domain->init_mutex);
> + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> + return -EINVAL;
>  
> - switch (domain->type) {
> - case IOMMU_DOMAIN_UNMANAGED:
> - switch (attr) {
> - case DOMAIN_ATTR_NESTING:
> - if (smmu_domain->smmu) {
> - ret = -EPERM;
> - goto out_unlock;
> - }
> -
> - if (*(int *)data)
> - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> - else
> - smmu_domain->stage = ARM_SMMU_DOMAIN_S1;
> - break;
> - default:
> - ret = -ENODEV;
> - }
> - break;
> - case IOMMU_DOMAIN_DMA:
> - ret = -ENODEV;
> - break;
> - default:
> - ret = -EINVAL;
> + mutex_lock(&smmu_domain->init_mutex);
> + if (!smmu_domain->smmu) {
> + smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> + ret = 0;
>   }
> -
> -out_unlock:
>   mutex_unlock(&smmu_domain->init_mutex);
> +
>   return ret;
>  }
>  
> @@ -2603,7 +2583,7 @@ static struct iommu_ops arm_smmu_ops = {
>   .device_group   = arm_smmu_device_group,
>   .dma_use_flush_queue= arm_smmu_dma_use_flush_queue,
>   .dma_enable_flush_queue = arm_smmu_dma_enable_flush_queue,
> - .domain_set_attr= arm_smmu_domain_set_attr,
> + .domain_enable_nesting  = arm_smmu_domain_enable_nesting,
>   .of_xlate   = arm_smmu_of_xlate,
>   .get_resv_regions   = arm_smmu_get_resv_regions,
>   .put_resv_regions   = generic_iommu_put_resv_regions,
> diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
> b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> index e7893e96f5177a..2e17d990d04481 100644
> --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
> +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
> @@ -1497,6 +1497,24 @@ static void arm_smmu_dma_enable_flush_queue(struct 
> iommu_domain *domain)
>   to_smmu_domain(domain)->pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT;
>  }
>  
> +static int arm_smmu_domain_enable_nesting(struct iommu_domain *domain)
> +{
> + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain);
> + int ret = -EPERM;
> + 
> + if (domain->type != IOMMU_DOMAIN_UNMANAGED)
> + return -EINVAL;
> +
> + mutex_lock(&smmu_domain->init_mutex);
> + if (!smmu_domain->smmu) {
> + smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED;
> + ret = 0;
> + }
> + mutex_unlock(&smmu_domain->init_mutex);
> +
> + return ret;
> +}
> +
>  static int arm_smmu_domain_set_attr(struct iommu_domain *domain,
>   enum iommu_attr attr, void *data)
>  {
> @@ -1508,17 +1526,6 @@ static int arm_smmu_domain_set_attr(struct 
> iommu_domain *domain,
>   switch(domain->type) {
>   case IOMMU_DOMAIN_UNMANAGED:
>   switch (attr) {
> - case DOMAIN_ATTR_NESTING:
> - if (smmu_domain->smmu) {
> - ret = -EPERM;
> - goto out_unlock;
> - }
> -
> - if (*(int *)data)
> - smmu_domain->stage = ARM_SMMU_

Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
IOMMU_NESTING_FEAT_PAGE_RESP definition is missing too

Eric
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(&info->padding, 0x0, 12);
> + memset(&info->vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
>   return 0;
>  }
>  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] iommu: arm-smmu-v3: Report domain nesting info reuqired for stage1

2021-02-12 Thread Auger Eric
Hi Vivek,

On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Update nested domain information required for stage1 page table.

s/reuqired/required in the commit title
> 
> Signed-off-by: Vivek Gautam 
> ---
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
> b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> index c11dd3940583..728018921fae 100644
> --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
> @@ -2555,6 +2555,7 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   void *data)
>  {
>   struct iommu_nesting_info *info = (struct iommu_nesting_info *)data;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
>   unsigned int size;
>  
>   if (!info || smmu_domain->stage != ARM_SMMU_DOMAIN_NESTED)
> @@ -2571,9 +2572,20 @@ static int arm_smmu_domain_nesting_info(struct 
> arm_smmu_domain *smmu_domain,
>   return 0;
>   }
>  
> - /* report an empty iommu_nesting_info for now */
> - memset(info, 0x0, size);
> + /* Update the nesting info as required for stage1 page tables */
> + info->addr_width = smmu->ias;
> + info->format = IOMMU_PASID_FORMAT_ARM_SMMU_V3;
> + info->features = IOMMU_NESTING_FEAT_BIND_PGTBL |
I understood IOMMU_NESTING_FEAT_BIND_PGTBL advertises the requirement to
bind tables per PASID, ie. passing iommu_gpasid_bind_data.
In ARM case I guess you plan to use attach/detach_pasid_table API with
iommu_pasid_table_config struct. So I understood we should add a new
feature here.
> +  IOMMU_NESTING_FEAT_PAGE_RESP |
> +  IOMMU_NESTING_FEAT_CACHE_INVLD;
> + info->pasid_bits = smmu->ssid_bits;
> + info->vendor.smmuv3.asid_bits = smmu->asid_bits;
> + info->vendor.smmuv3.pgtbl_fmt = ARM_64_LPAE_S1;
> + memset(&info->padding, 0x0, 12);
> + memset(&info->vendor.smmuv3.padding, 0x0, 9);
> +
>   info->argsz = size;
> +
spurious new line
>   return 0;
>  }
>  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] iommu: Report domain nesting info for arm-smmu-v3

2021-02-12 Thread Auger Eric
Hi Vivek,
On 2/12/21 11:58 AM, Vivek Gautam wrote:
> Add a vendor specific structure for domain nesting info for
> arm smmu-v3, and necessary info fields required to populate
> stage1 page tables.
> 
> Signed-off-by: Vivek Gautam 
> ---
>  include/uapi/linux/iommu.h | 31 +--
>  1 file changed, 25 insertions(+), 6 deletions(-)
> 
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 4d3d988fa353..5f059bcf7720 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -323,7 +323,8 @@ struct iommu_gpasid_bind_data {
>  #define IOMMU_GPASID_BIND_VERSION_1  1
>   __u32 version;
>  #define IOMMU_PASID_FORMAT_INTEL_VTD 1
> -#define IOMMU_PASID_FORMAT_LAST  2
> +#define IOMMU_PASID_FORMAT_ARM_SMMU_V3   2
> +#define IOMMU_PASID_FORMAT_LAST  3
>   __u32 format;
>   __u32 addr_width;
>  #define IOMMU_SVA_GPASID_VAL (1 << 0) /* guest PASID valid */
> @@ -409,6 +410,21 @@ struct iommu_nesting_info_vtd {
>   __u64   ecap_reg;
>  };
>  
> +/*
> + * struct iommu_nesting_info_arm_smmuv3 - Arm SMMU-v3 nesting info.
> + */
> +struct iommu_nesting_info_arm_smmuv3 {
> + __u32   flags;
> + __u16   asid_bits;
> +
> + /* Arm LPAE page table format as per kernel */
> +#define ARM_PGTBL_32_LPAE_S1 (0x0)
> +#define ARM_PGTBL_64_LPAE_S1 (0x2)
Shouldn't it be a bitfield instead as both can be supported (the actual
driver only supports 64b table format though). Does it match matches
IDR0.TTF?
> + __u8pgtbl_fmt;
So I understand this API is supposed to allow VFIO to expose those info
early enough to the userspace to help configuring the viommu and avoid
errors later on. I wonder how far we want to go on this path. What about
those other caps that impact the STE/CD validity. There may be others...

SMMU_IDR0.CD2L (support of 2 stage CD)
SMMU_IDR0.TTENDIAN (endianness)
SMMU_IDR0.HTTU (if 0 forbids HA/HD setting in the CD)
SMMU_IDR3.STT (impacts T0SZ)

Thanks

Eric

> +
> + __u8padding[9];
> +};
> +
>  /*
>   * struct iommu_nesting_info - Information for nesting-capable IOMMU.
>   *  userspace should check it before using
> @@ -445,11 +461,13 @@ struct iommu_nesting_info_vtd {
>   * +---+--+
>   *
>   * data struct types defined for @format:
> - * ++=+
> - * | @format| data struct |
> - * ++=+
> - * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd   |
> - * ++-+
> + * ++==+
> + * | @format| data struct  |
> + * ++==+
> + * | IOMMU_PASID_FORMAT_INTEL_VTD   | struct iommu_nesting_info_vtd|
> + * +---+---+
> + * | IOMMU_PASID_FORMAT_ARM_SMMU_V3 | struct iommu_nesting_info_arm_smmuv3 |
> + * ++--+
>   *
>   */
>  struct iommu_nesting_info {
> @@ -466,6 +484,7 @@ struct iommu_nesting_info {
>   /* Vendor specific data */
>   union {
>   struct iommu_nesting_info_vtd vtd;
> + struct iommu_nesting_info_arm_smmuv3 smmuv3;
>   } vendor;
>  };
>  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-25 Thread Auger Eric
Hi Vivek,

On 1/21/21 6:34 PM, Vivek Kumar Gautam wrote:
> Hi Eric,
> 
> 
> On 1/19/21 2:33 PM, Auger Eric wrote:
>> Hi Vivek,
>>
>> On 1/15/21 1:13 PM, Vivek Gautam wrote:
>>> This patch-series aims at enabling Nested stage translation in guests
>>> using virtio-iommu as the paravirtualized iommu. The backend is
>>> supported
>>> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
>>>
>>> This series derives its purpose from various efforts happening to add
>>> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
>>> most of the support for SVA has already landed. The support for nested
>>> stage translation and fault reporting to guest has been proposed [1].
>>> The related changes required in VFIO [2] framework have also been put
>>> forward.
>>>
>>> This series proposes changes in virtio-iommu to program PASID tables
>>> and related stage-1 page tables. A simple iommu-pasid-table library
>>> is added for this purpose that interacts with vendor drivers to
>>> allocate and populate PASID tables.
>>> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
>>> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
>>> to support allocating CD tables, and populating them with right values.
>>> These CD tables are essentially the PASID tables and contain stage-1
>>> page table configurations too.
>>> A request to setup these CD tables come from virtio-iommu driver using
>>> the iommu-pasid-table library when running on Arm. The virtio-iommu
>>> then pass these PASID tables to the host using the right virtio backend
>>> and support in VMM.
>>>
>>> For testing we have added necessary support in kvmtool. The changes in
>>> kvmtool are based on virtio-iommu development branch by Jean-Philippe
>>> Brucker [3].
>>>
>>> The tested kernel branch contains following in the order bottom to top
>>> on the git hash -
>>> a) v5.11-rc3
>>> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>>>     table support for Arm.
>>> c) Smmu test engine patches from Jean-Philippe's branch [4]
>>> d) This series
>>> e) Domain nesting info patches [5][6][7].
>>> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>>>     the list).
>>>
>>> This kernel is tested on Neoverse reference software stack with
>>> Fixed virtual platform. Public version of the software stack and
>>> FVP is available here[8][9].
>>>
>>> A big thanks to Jean-Philippe for his contributions towards this work
>>> and for his valuable guidance.
>>>
>>> [1]
>>> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
>>>
>>> [2]
>>> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
>>>
>>> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
>>> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
>>> [5]
>>> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
>>>
>>> [6]
>>> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
>>>
>>> [7]
>>> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
>>>
>>> [8]
>>> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
>>>
>>> [9]
>>> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst
>>>
>>
>> Could you share a public branch where we could find all the kernel
>> pieces.
>>
>> Thank you in advance
> 
> Apologies for the delay. It took a bit of time to sort things out for a
> public branch.
> The branch is available in my github now. Please have a look.
> 
> https://github.com/vivek-arm/linux/tree/5.11-rc3-nested-pgtbl-arm-smmuv3-virtio-iommu

no problem. Thank you for the link.

Best Regards

Eric
> 
> 
> 
> Thanks and regards
> Vivek
> 
>>
>> Best Regards
>>
>> Eric
>>>
>>> Jean-Philippe Brucker (6):
>>>    iommu/virtio: Add headers for table format probing
>>>    iommu/virtio: Add table format probing
>>>    iommu/virtio: Add headers for binding pasid table in iommu
>>>    iommu/virtio: Add support for INVALIDATE request
>>>    iommu/virtio: At

Re: [PATCH RFC v1 00/15] iommu/virtio: Nested stage support with Arm

2021-01-19 Thread Auger Eric
Hi Vivek,

On 1/15/21 1:13 PM, Vivek Gautam wrote:
> This patch-series aims at enabling Nested stage translation in guests
> using virtio-iommu as the paravirtualized iommu. The backend is supported
> with Arm SMMU-v3 that provides nested stage-1 and stage-2 translation.
> 
> This series derives its purpose from various efforts happening to add
> support for Shared Virtual Addressing (SVA) in host and guest. On Arm,
> most of the support for SVA has already landed. The support for nested
> stage translation and fault reporting to guest has been proposed [1].
> The related changes required in VFIO [2] framework have also been put
> forward.
> 
> This series proposes changes in virtio-iommu to program PASID tables
> and related stage-1 page tables. A simple iommu-pasid-table library
> is added for this purpose that interacts with vendor drivers to
> allocate and populate PASID tables.
> In Arm SMMUv3 we propose to pull the Context Descriptor (CD) management
> code out of the arm-smmu-v3 driver and add that as a glue vendor layer
> to support allocating CD tables, and populating them with right values.
> These CD tables are essentially the PASID tables and contain stage-1
> page table configurations too.
> A request to setup these CD tables come from virtio-iommu driver using
> the iommu-pasid-table library when running on Arm. The virtio-iommu
> then pass these PASID tables to the host using the right virtio backend
> and support in VMM.
> 
> For testing we have added necessary support in kvmtool. The changes in
> kvmtool are based on virtio-iommu development branch by Jean-Philippe
> Brucker [3].
> 
> The tested kernel branch contains following in the order bottom to top
> on the git hash -
> a) v5.11-rc3
> b) arm-smmu-v3 [1] and vfio [2] changes from Eric to add nested page
>table support for Arm.
> c) Smmu test engine patches from Jean-Philippe's branch [4]
> d) This series
> e) Domain nesting info patches [5][6][7].
> f) Changes to add arm-smmu-v3 specific nesting info (to be sent to
>the list).
> 
> This kernel is tested on Neoverse reference software stack with
> Fixed virtual platform. Public version of the software stack and
> FVP is available here[8][9].
> 
> A big thanks to Jean-Philippe for his contributions towards this work
> and for his valuable guidance.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20201118112151.25412-1-eric.au...@redhat.com/T/
> [2] 
> https://lore.kernel.org/kvmarm/20201116110030.32335-12-eric.au...@redhat.com/T/
> [3] https://jpbrucker.net/git/kvmtool/log/?h=virtio-iommu/devel
> [4] https://jpbrucker.net/git/linux/log/?h=sva/smmute
> [5] 
> https://lore.kernel.org/kvm/1599734733-6431-2-git-send-email-yi.l@intel.com/
> [6] 
> https://lore.kernel.org/kvm/1599734733-6431-3-git-send-email-yi.l@intel.com/
> [7] 
> https://lore.kernel.org/kvm/1599734733-6431-4-git-send-email-yi.l@intel.com/
> [8] 
> https://developer.arm.com/tools-and-software/open-source-software/arm-platforms-software/arm-ecosystem-fvps
> [9] 
> https://git.linaro.org/landing-teams/working/arm/arm-reference-platforms.git/about/docs/rdn1edge/user-guide.rst

Could you share a public branch where we could find all the kernel pieces.

Thank you in advance

Best Regards

Eric
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Add headers for table format probing
>   iommu/virtio: Add table format probing
>   iommu/virtio: Add headers for binding pasid table in iommu
>   iommu/virtio: Add support for INVALIDATE request
>   iommu/virtio: Attach Arm PASID tables when available
>   iommu/virtio: Add support for Arm LPAE page table format
> 
> Vivek Gautam (9):
>   iommu/arm-smmu-v3: Create a Context Descriptor library
>   iommu: Add a simple PASID table library
>   iommu/arm-smmu-v3: Update drivers to work with iommu-pasid-table
>   iommu/arm-smmu-v3: Update CD base address info for user-space
>   iommu/arm-smmu-v3: Set sync op from consumer driver of cd-lib
>   iommu: Add asid_bits to arm smmu-v3 stage1 table info
>   iommu/virtio: Update table format probing header
>   iommu/virtio: Prepare to add attach pasid table infrastructure
>   iommu/virtio: Update fault type and reason info for viommu fault
> 
>  drivers/iommu/arm/arm-smmu-v3/Makefile|   2 +-
>  .../arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c  | 283 +++
>  .../iommu/arm/arm-smmu-v3/arm-smmu-v3-sva.c   |  16 +-
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c   | 268 +--
>  drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h   |   4 +-
>  drivers/iommu/iommu-pasid-table.h | 140 
>  drivers/iommu/virtio-iommu.c  | 692 +-
>  include/uapi/linux/iommu.h|   2 +-
>  include/uapi/linux/virtio_iommu.h | 158 +++-
>  9 files changed, 1303 insertions(+), 262 deletions(-)
>  create mode 100644 drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-cd-lib.c
>  create mode 100644 drivers/iommu/iommu-pasid-table.h
> 

___
Virtualization mail

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-10-06 Thread Auger Eric
Hello Al,

On 10/2/20 8:23 PM, Al Stone wrote:
> On 24 Sep 2020 11:54, Auger Eric wrote:
>> Hi,
>>
>> Adding Al in the loop
>>
>> On 9/24/20 11:38 AM, Michael S. Tsirkin wrote:
>>> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
>>>> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
>>>>> OK so this looks good. Can you pls repost with the minor tweak
>>>>> suggested and all acks included, and I will queue this?
>>>>
>>>> My NACK still stands, as long as a few questions are open:
>>>>
>>>>1) The format used here will be the same as in the ACPI table? I
>>>>   think the answer to this questions must be Yes, so this leads
>>>>   to the real question:
>>>
>>> I am not sure it's a must.
>>> We can always tweak the parser if there are slight differences
>>> between ACPI and virtio formats.
>>>
>>> But we do want the virtio format used here to be approved by the virtio
>>> TC, so it won't change.
>>>
>>> Eric, Jean-Philippe, does one of you intend to create a github issue
>>> and request a ballot for the TC? It's been posted end of August with no
>>> changes ...
>> Jean-Philippe, would you?
>>>
>>>>2) Has the ACPI table format stabalized already? If and only if
>>>>   the answer is Yes I will Ack these patches. We don't need to
>>>>   wait until the ACPI table format is published in a
>>>>   specification update, but at least some certainty that it
>>>>   will not change in incompatible ways anymore is needed.
>>>>
>>
>> Al, do you have any news about the the VIOT definition submission to
>> the UEFI ASWG?
>>
>> Thank you in advance
>>
>> Best Regards
>>
>> Eric
> 
> A follow-up to my earlier post 
> 
> Hearing no objection, I've submitted the VIOT table description to
> the ASWG for consideration under what they call the "code first"
> process.  The "first reading" -- a brief discussion on what the
> table is and why we would like to add it -- was held yesterday.
> No concerns have been raised as yet.  Given the discussions that
> have already occurred, I don't expect any, either.  I have been
> wrong at least once before, however.
> 
> At this point, ASWG will revisit the request to add VIOT each
> week.  If there have been no comments in the prior week, and no
> further discussion during the meeting, then a vote will be taken.
> Otherwise, there will be discussion and we try again the next
> week.
> 
> The ASWG was also told that the likelihood of this definition of
> the table changing is pretty low, and that it has been thought out
> pretty well already.  ASWG's consideration will therefore start
> from the assumption that it would be best _not_ to make changes.
> 
> So, I'll let you know what happens next week.

Thank you very much for the updates and for your support backing the
proposal in the best delays.

Best Regards

Eric
> 
>>
>>>
>>> Not that I know, but I don't see why it's a must.
>>>
>>>> So what progress has been made with the ACPI table specification, is it
>>>> just a matter of time to get it approved or are there concerns?
>>>>
>>>> Regards,
>>>>
>>>>Joerg
>>>
>>
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-24 Thread Auger Eric
Hi,

Adding Al in the loop

On 9/24/20 11:38 AM, Michael S. Tsirkin wrote:
> On Thu, Sep 24, 2020 at 11:21:29AM +0200, Joerg Roedel wrote:
>> On Thu, Sep 24, 2020 at 05:00:35AM -0400, Michael S. Tsirkin wrote:
>>> OK so this looks good. Can you pls repost with the minor tweak
>>> suggested and all acks included, and I will queue this?
>>
>> My NACK still stands, as long as a few questions are open:
>>
>>  1) The format used here will be the same as in the ACPI table? I
>> think the answer to this questions must be Yes, so this leads
>> to the real question:
> 
> I am not sure it's a must.
> We can always tweak the parser if there are slight differences
> between ACPI and virtio formats.
> 
> But we do want the virtio format used here to be approved by the virtio
> TC, so it won't change.
> 
> Eric, Jean-Philippe, does one of you intend to create a github issue
> and request a ballot for the TC? It's been posted end of August with no
> changes ...
Jean-Philippe, would you?
> 
>>  2) Has the ACPI table format stabalized already? If and only if
>> the answer is Yes I will Ack these patches. We don't need to
>> wait until the ACPI table format is published in a
>> specification update, but at least some certainty that it
>> will not change in incompatible ways anymore is needed.
>>

Al, do you have any news about the the VIOT definition submission to
the UEFI ASWG?

Thank you in advance

Best Regards

Eric


> 
> Not that I know, but I don't see why it's a must.
> 
>> So what progress has been made with the ACPI table specification, is it
>> just a matter of time to get it approved or are there concerns?
>>
>> Regards,
>>
>>  Joerg
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-04 Thread Auger Eric
Hi,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> Add a topology description to the virtio-iommu driver and enable x86
> platforms.
> 
> Since [v2] we have made some progress on adding ACPI support for
> virtio-iommu, which is the preferred boot method on x86. It will be a
> new vendor-agnostic table describing para-virtual topologies in a
> minimal format. However some platforms don't use either ACPI or DT for
> booting (for example microvm), and will need the alternative topology
> description method proposed here. In addition, since the process to get
> a new ACPI table will take a long time, this provides a boot method even
> to ACPI-based platforms, if only temporarily for testing and
> development.
> 
> v3:
> * Add patch 1 that moves virtio-iommu to a subfolder.
> * Split the rest:
>   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> support.
>   * Patch 4 adds definitions.
>   * Patch 5 adds parser in topology.c.
> * Address other comments.
> 
> Linux and QEMU patches available at:
> https://jpbrucker.net/git/linux virtio-iommu/devel
> https://jpbrucker.net/git/qemu virtio-iommu/devel
I have tested that series with above QEMU branch on ARM with virtio-net
and virtio-blk translated devices in non DT mode.

It works for me:
Tested-by: Eric Auger 

Thanks

Eric

> 
> [spec] https://lists.oasis-open.org/archives/virtio-dev/202008/msg00067.html
> [v2] 
> https://lore.kernel.org/linux-iommu/20200228172537.377327-1-jean-phili...@linaro.org/
> [v1] 
> https://lore.kernel.org/linux-iommu/20200214160413.1475396-1-jean-phili...@linaro.org/
> [rfc] 
> https://lore.kernel.org/linux-iommu/20191122105000.800410-1-jean-phili...@linaro.org/
> 
> Jean-Philippe Brucker (6):
>   iommu/virtio: Move to drivers/iommu/virtio/
>   iommu/virtio: Add topology helpers
>   PCI: Add DMA configuration for virtual platforms
>   iommu/virtio: Add topology definitions
>   iommu/virtio: Support topology description in config space
>   iommu/virtio: Enable x86 support
> 
>  drivers/iommu/Kconfig |  18 +-
>  drivers/iommu/Makefile|   3 +-
>  drivers/iommu/virtio/Makefile |   4 +
>  drivers/iommu/virtio/topology-helpers.h   |  50 +
>  include/linux/virt_iommu.h|  15 ++
>  include/uapi/linux/virtio_iommu.h |  44 
>  drivers/iommu/virtio/topology-helpers.c   | 196 
>  drivers/iommu/virtio/topology.c   | 259 ++
>  drivers/iommu/{ => virtio}/virtio-iommu.c |   4 +
>  drivers/pci/pci-driver.c  |   5 +
>  MAINTAINERS   |   3 +-
>  11 files changed, 597 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/iommu/virtio/Makefile
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
>  create mode 100644 drivers/iommu/virtio/topology.c
>  rename drivers/iommu/{ => virtio}/virtio-iommu.c (99%)
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/6] iommu/virtio: Add topology helpers

2020-09-04 Thread Auger Eric
Hi Jean,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> To support topology description from ACPI and from the builtin
> description, add helpers to keep track of I/O topology descriptors.
> 
> To ease re-use of the helpers by other drivers and future ACPI
> extensions, use the "virt_" prefix rather than "virtio_" when naming
> structs and functions.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig   |   3 +
>  drivers/iommu/virtio/Makefile   |   1 +
>  drivers/iommu/virtio/topology-helpers.h |  50 ++
>  include/linux/virt_iommu.h  |  15 ++
>  drivers/iommu/virtio/topology-helpers.c | 196 
>  drivers/iommu/virtio/virtio-iommu.c |   4 +
>  MAINTAINERS |   1 +
>  7 files changed, 270 insertions(+)
>  create mode 100644 drivers/iommu/virtio/topology-helpers.h
>  create mode 100644 include/linux/virt_iommu.h
>  create mode 100644 drivers/iommu/virtio/topology-helpers.c
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index bef5d75e306b..e29ae50f7100 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -391,4 +391,7 @@ config VIRTIO_IOMMU
>  
> Say Y here if you intend to run this kernel as a guest.
>  
> +config VIRTIO_IOMMU_TOPOLOGY_HELPERS
> + bool
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/virtio/Makefile b/drivers/iommu/virtio/Makefile
> index 279368fcc074..b42ad47eac7e 100644
> --- a/drivers/iommu/virtio/Makefile
> +++ b/drivers/iommu/virtio/Makefile
> @@ -1,2 +1,3 @@
>  # SPDX-License-Identifier: GPL-2.0
>  obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS) += topology-helpers.o
> diff --git a/drivers/iommu/virtio/topology-helpers.h 
> b/drivers/iommu/virtio/topology-helpers.h
> new file mode 100644
> index ..436ca6a900c5
> --- /dev/null
> +++ b/drivers/iommu/virtio/topology-helpers.h
> @@ -0,0 +1,50 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef TOPOLOGY_HELPERS_H_
> +#define TOPOLOGY_HELPERS_H_
> +
> +#ifdef CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS
> +
> +/* Identify a device node in the topology */
> +struct virt_topo_dev_id {
> + unsigned inttype;
> +#define VIRT_TOPO_DEV_TYPE_PCI   1
> +#define VIRT_TOPO_DEV_TYPE_MMIO  2
> + union {
> + /* PCI endpoint or range */
> + struct {
> + u16 segment;
> + u16 bdf_start;
> + u16 bdf_end;
> + };
> + /* MMIO region */
> + u64 base;
> + };
> +};
> +
> +/* Specification of an IOMMU */
> +struct virt_topo_iommu {
> + struct virt_topo_dev_id dev_id;
> + struct device   *dev; /* transport device */
> + struct fwnode_handle*fwnode;
> + struct iommu_ops*ops;
> + struct list_headlist;
> +};
> +
> +/* Specification of an endpoint */
> +struct virt_topo_endpoint {
> + struct virt_topo_dev_id dev_id;
> + u32 endpoint_id;
> + struct virt_topo_iommu  *viommu;
> + struct list_headlist;
> +};
> +
> +void virt_topo_add_endpoint(struct virt_topo_endpoint *ep);
> +void virt_topo_add_iommu(struct virt_topo_iommu *viommu);
> +
> +void virt_topo_set_iommu_ops(struct device *dev, struct iommu_ops *ops);
> +
> +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +static inline void virt_topo_set_iommu_ops(struct device *dev, struct 
> iommu_ops *ops)
> +{ }
> +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +#endif /* TOPOLOGY_HELPERS_H_ */
> diff --git a/include/linux/virt_iommu.h b/include/linux/virt_iommu.h
> new file mode 100644
> index ..17d2bd4732e0
> --- /dev/null
> +++ b/include/linux/virt_iommu.h
> @@ -0,0 +1,15 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +#ifndef VIRT_IOMMU_H_
> +#define VIRT_IOMMU_H_
> +
> +#ifdef CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS
> +int virt_dma_configure(struct device *dev);
> +
> +#else /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +static inline int virt_dma_configure(struct device *dev)
> +{
> + /* Don't disturb the normal DMA configuration methods */
> + return 0;
> +}
> +#endif /* !CONFIG_VIRTIO_IOMMU_TOPOLOGY_HELPERS */
> +#endif /* VIRT_IOMMU_H_ */
> diff --git a/drivers/iommu/virtio/topology-helpers.c 
> b/drivers/iommu/virtio/topology-helpers.c
> new file mode 100644
> index ..8815e3a5d431
> --- /dev/null
> +++ b/drivers/iommu/virtio/topology-helpers.c
> @@ -0,0 +1,196 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "topology-helpers.h"
> +
> +static LIST_HEAD(viommus);
> +static LIST_HEAD(pci_endpoints);
> +static LIST_HEAD(mmio_endpoints);
> +sta

Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-04 Thread Auger Eric
}
> +
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE, &common_cfg->device_status);
> + iowrite8(VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER,
> +  &common_cfg->device_status);
> +
> + /* Find out if the device supports topology description */
> + iowrite32(0, &common_cfg->device_feature_select);
> + features = ioread32(&common_cfg->device_feature);
> +
> + if (!(features & BIT(VIRTIO_IOMMU_F_TOPOLOGY))) {
> + pci_dbg(dev, "device doesn't have topology description");
> + goto out_reset;
> +     }
> +
> + ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_DEVICE_CFG, &cap);
> + if (!ret) {
> + pci_warn(dev, "device config capability not found\n");
> + goto out_reset;
> + }
> +
> + regs = pci_iomap(dev, cap.bar, 0);
> + if (!regs)
> + goto out_reset;
> +
> + pci_info(dev, "parsing virtio-iommu topology\n");
> + ret = viommu_parse_topology(&dev->dev, regs + cap.offset,
> + pci_resource_len(dev, 0) - cap.offset);
> + if (ret)
> + pci_warn(dev, "failed to parse topology: %d\n", ret);
> +
> + pci_iounmap(dev, regs);
> +out_reset:
> + ret = viommu_pci_reset(common_cfg);
> + if (ret)
> + pci_warn(dev, "unable to reset device\n");
> +out_unmap_common:
> + pci_iounmap(dev, common_regs);
> +}
> +
> +/*
> + * Catch a PCI virtio-iommu implementation early to get the topology 
> description
> + * before we start probing other endpoints.
> + */
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1040 + 
> VIRTIO_ID_IOMMU,
> + viommu_pci_parse_topology);
> 
Reviewed-by: Eric Auger 

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 4/6] iommu/virtio: Add topology definitions

2020-09-04 Thread Auger Eric
Hi Jean,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> Add struct definitions for describing endpoints managed by the
> virtio-iommu. When VIRTIO_IOMMU_F_TOPOLOGY is offered, an array of
> virtio_iommu_topo_* structures in config space describes the endpoints,
> identified either by their PCI BDF or their physical MMIO address.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  include/uapi/linux/virtio_iommu.h | 44 +++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/include/uapi/linux/virtio_iommu.h 
> b/include/uapi/linux/virtio_iommu.h
> index 237e36a280cb..70cba30644d5 100644
> --- a/include/uapi/linux/virtio_iommu.h
> +++ b/include/uapi/linux/virtio_iommu.h
> @@ -16,6 +16,7 @@
>  #define VIRTIO_IOMMU_F_BYPASS3
>  #define VIRTIO_IOMMU_F_PROBE 4
>  #define VIRTIO_IOMMU_F_MMIO  5
> +#define VIRTIO_IOMMU_F_TOPOLOGY  6
>  
>  struct virtio_iommu_range_64 {
>   __le64  start;
> @@ -27,6 +28,17 @@ struct virtio_iommu_range_32 {
>   __le32  end;
>  };
>  
> +struct virtio_iommu_topo_config {
> + /* Number of topology description structures */
> + __le16  count;
> + /*
> +  * Offset to the first topology description structure
> +  * (virtio_iommu_topo_*) from the start of the virtio_iommu config
> +  * space. Aligned on 8 bytes.
> +  */
> + __le16  offset;
> +};
> +
>  struct virtio_iommu_config {
>   /* Supported page sizes */
>   __le64  page_size_mask;
> @@ -36,6 +48,38 @@ struct virtio_iommu_config {
>   struct virtio_iommu_range_32domain_range;
>   /* Probe buffer size */
>   __le32  probe_size;
> + struct virtio_iommu_topo_config topo_config;
> +};
> +
> +#define VIRTIO_IOMMU_TOPO_PCI_RANGE  0x1
> +#define VIRTIO_IOMMU_TOPO_MMIO   0x2
> +
> +struct virtio_iommu_topo_pci_range {
> + /* VIRTIO_IOMMU_TOPO_PCI_RANGE */
> + __u8type;
> + __u8reserved;
> + /* Length of this structure */
> + __le16  length;
> + /* First endpoint ID in the range */
> + __le32  endpoint_start;
> + /* PCI domain number */
> + __le16  segment;
> + /* PCI Bus:Device.Function range */
> + __le16  bdf_start;
> + __le16  bdf_end;
> + __le16  padding;
> +};
> +
> +struct virtio_iommu_topo_mmio {
> + /* VIRTIO_IOMMU_TOPO_MMIO */
> + __u8type;
> + __u8reserved;
> + /* Length of this structure */
> + __le16  length;
> + /* Endpoint ID */
> + __le32  endpoint;
> + /* Address of the first MMIO region */
> + __le64  address;
>  };
>  
>  /* Request types */
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/6] iommu/virtio: Move to drivers/iommu/virtio/

2020-09-04 Thread Auger Eric
Hi Jean,

On 8/21/20 3:15 PM, Jean-Philippe Brucker wrote:
> Before adding new files to the virtio-iommu driver, move it to its own
> subfolder, similarly to other IOMMU drivers.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Makefile| 3 +--
>  drivers/iommu/virtio/Makefile | 2 ++
>  drivers/iommu/{ => virtio}/virtio-iommu.c | 0
>  MAINTAINERS   | 2 +-
>  4 files changed, 4 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/iommu/virtio/Makefile
>  rename drivers/iommu/{ => virtio}/virtio-iommu.c (100%)
> 
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 11f1771104f3..fc7523042512 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: GPL-2.0
> -obj-y += amd/ intel/ arm/
> +obj-y += amd/ intel/ arm/ virtio/
>  obj-$(CONFIG_IOMMU_API) += iommu.o
>  obj-$(CONFIG_IOMMU_API) += iommu-traces.o
>  obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
> @@ -26,4 +26,3 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> -obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio/Makefile b/drivers/iommu/virtio/Makefile
> new file mode 100644
> index ..279368fcc074
> --- /dev/null
> +++ b/drivers/iommu/virtio/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c 
> b/drivers/iommu/virtio/virtio-iommu.c
> similarity index 100%
> rename from drivers/iommu/virtio-iommu.c
> rename to drivers/iommu/virtio/virtio-iommu.c
> diff --git a/MAINTAINERS b/MAINTAINERS
> index deaafb617361..3602b223c9b2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -18451,7 +18451,7 @@ VIRTIO IOMMU DRIVER
>  M:   Jean-Philippe Brucker 
>  L:   virtualization@lists.linux-foundation.org
>  S:   Maintained
> -F:   drivers/iommu/virtio-iommu.c
> +F:   drivers/iommu/virtio/
>  F:   include/uapi/linux/virtio_iommu.h
not related to this patch but you may add an entry for
Documentation/devicetree/bindings/virtio/iommu.txt

Reviewed-by: Eric Auger 

Thanks

Eric
>  
>  VIRTIO MEM DRIVER
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Auger Eric
Hi,

On 5/6/20 2:22 AM, Michael S. Tsirkin wrote:
> On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
>> Different endpoint can support different page size, probe
>> endpoint if it supports specific page size otherwise use
>> global page sizes.
>>
>> Signed-off-by: Bharat Bhushan 
>> ---
>> v4->v5:
>>  - Rebase to Linux v5.7-rc4
>>
>> v3->v4:
>>  - Fix whitespace error
>>
>> v2->v3:
>>  - Fixed error return for incompatible endpoint
>>  - __u64 changed to __le64 in header file
>>
>>  drivers/iommu/virtio-iommu.c  | 48 ---
>>  include/uapi/linux/virtio_iommu.h |  7 +
>>  2 files changed, 51 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
>> index d5cac4f46ca5..9513d2ab819e 100644
>> --- a/drivers/iommu/virtio-iommu.c
>> +++ b/drivers/iommu/virtio-iommu.c
>> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>>  struct viommu_dev   *viommu;
>>  struct viommu_domain*vdomain;
>>  struct list_headresv_regions;
>> +u64 pgsize_bitmap;
>>  };
>>  
>>  struct viommu_request {
>> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct viommu_domain 
>> *vdomain)
>>  return ret;
>>  }
>>  
>> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
>> +struct virtio_iommu_probe_pgsize_mask *mask,
>> +size_t len)
>> +{
>> +u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
>> +
>> +if (len < sizeof(*mask))
> 
> This is too late to validate length, you have dereferenced it already.
> do it before the read pls.
> 
>> +return -EINVAL;
> 
> OK but note that guest will then just proceed to ignore the
> property. Is that really OK? Wouldn't host want to know?
> 
> 
>> +
>> +vdev->pgsize_bitmap = pgsize_bitmap;
> 
> what if bitmap is 0? Is that a valid size? I see a bunch of
> BUG_ON with that value ...
Indeed [PATCH v2] virtio-iommu: Add PAGE_SIZE_MASK property states
"The device MUST set at least one bit in page_size_mask, describing
the page granularity".

atm if the device returns a null mask the guest hits such BUG_ON()

[1.841434] kernel BUG at drivers/iommu/iommu.c:653!
[1.842868] Internal error: Oops - BUG: 0 [#1] SMP
[1.844261] Modules linked in:
[1.845161] CPU: 6 PID: 325 Comm: kworker/6:1 Not tainted
5.7.0-rc4-aarch64-guest-bharat-v5+ #196
[1.847474] Hardware name: linux,dummy-virt (DT)
[1.848329] Workqueue: events deferred_probe_work_func
[1.849229] pstate: 6045 (nZCv daif +PAN -UAO)
[1.850116] pc : iommu_group_create_direct_mappings.isra.24+0x210/0x228
[1.851351] lr : iommu_group_add_device+0x108/0x2d0
[1.852270] sp : 800015bef880
[1.852901] x29: 800015bef880 x28: 0003cc3dab98
[1.853897] x27:  x26: 0003cc3dab80
[1.854894] x25: 800011033480 x24: 0003cc080948
[1.855891] x23: 8000113f49c8 x22: 
[1.856887] x21: 0003cc3dac00 x20: 0003cc080a00
[1.858440] x19:  x18: 0010
[1.860029] x17: 9a468e4c x16: 00300604
[1.861616] x15:  x14: 0720072007200720
[1.863200] x13: 0720072007200720 x12: 0020
[1.864787] x11: 0101010101010101 x10: 7f7f7f7f7f7f7f7f
[1.866373] x9 : 8000107de0d8 x8 : 7f7f7f7f7f7f7f7f
[1.868048] x7 : 39322f392f2f2f2f x6 : 80808080
[1.869634] x5 : 0003cc171000 x4 : 0003cc171000
[1.871218] x3 : 8000114aff28 x2 : 
[1.872802] x1 : 0003cb3c60b0 x0 : 0003
[1.874383] Call trace:
[1.875133]  iommu_group_create_direct_mappings.isra.24+0x210/0x228
[1.876996]  iommu_group_add_device+0x108/0x2d0
[1.878359]  iommu_group_get_for_dev+0xa0/0x210
[1.879714]  viommu_add_device+0x128/0x480
[1.880942]  iommu_probe_device+0x5c/0xd8
[1.882144]  of_iommu_configure+0x160/0x208
[1.883535]  of_dma_configure+0xec/0x2b8
[1.884732]  pci_dma_configure+0x48/0xd0
[1.885911]  really_probe+0xa0/0x448
[1.886985]  driver_probe_device+0xe8/0x140
[1.888253]  __device_attach_driver+0x94/0x120
[1.889581]  bus_for_each_drv+0x84/0xd8
[1.890730]  __device_attach+0xe4/0x168
[1.891879]  device_initial_probe+0x1c/0x28
[1.893164]  bus_probe_device+0xa4/0xb0
[1.894311]  deferred_probe_work_func+0xa0/0xf0
[1.895663]  process_one_work+0x1bc/0x458
[1.896864]  worker_thread+0x150/0x4f8
[1.898003]  kthread+0x114/0x118
[1.898977]  ret_from_fork+0x10/0x18
[1.900056] Code: d63f0020 b9406be2 17e4 a90573fb (d421)
[1.901872] ---[ end trace 47e5fb5111a3e69b ]---
[1.903251] Kernel panic - not syncing: Fatal exception
[1.904809] SMP: stopping secondary CPUs
[1.906024] Kernel Offset: disabled
[1.907079] CPU features: 0x084002,22000438
[1.908332] Memory Limit: none
[  

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Auger Eric
Hi,

On 5/7/20 1:32 PM, Michael S. Tsirkin wrote:
> On Thu, May 07, 2020 at 11:24:29AM +, Bharat Bhushan wrote:
>>
>>
>>> -Original Message-
>>> From: Michael S. Tsirkin 
>>> Sent: Wednesday, May 6, 2020 5:53 AM
>>> To: Bharat Bhushan 
>>> Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
>>> virtualization@lists.linux-foundation.org; io...@lists.linux-foundation.org;
>>> linux-ker...@vger.kernel.org; eric.auger@gmail.com; 
>>> eric.au...@redhat.com
>>> Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported 
>>> by
>>> endpoint
>>>
>>> External Email
>>>
>>> --
>>> On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
 Different endpoint can support different page size, probe endpoint if
 it supports specific page size otherwise use global page sizes.

 Signed-off-by: Bharat Bhushan 
 ---
 v4->v5:
  - Rebase to Linux v5.7-rc4

 v3->v4:
  - Fix whitespace error

 v2->v3:
  - Fixed error return for incompatible endpoint
  - __u64 changed to __le64 in header file

  drivers/iommu/virtio-iommu.c  | 48 ---
  include/uapi/linux/virtio_iommu.h |  7 +
  2 files changed, 51 insertions(+), 4 deletions(-)

 diff --git a/drivers/iommu/virtio-iommu.c
 b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
 --- a/drivers/iommu/virtio-iommu.c
 +++ b/drivers/iommu/virtio-iommu.c
 @@ -78,6 +78,7 @@ struct viommu_endpoint {
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
struct list_headresv_regions;
 +  u64 pgsize_bitmap;
  };

  struct viommu_request {
 @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
>>> viommu_domain *vdomain)
return ret;
  }

 +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
 +  struct virtio_iommu_probe_pgsize_mask *mask,
 +  size_t len)
 +{
 +  u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
 +
 +  if (len < sizeof(*mask))
>>>
>>> This is too late to validate length, you have dereferenced it already.
>>> do it before the read pls.
>>
>> Yes, Will change here and other places as well
>>
>>>
 +  return -EINVAL;
>>>
>>> OK but note that guest will then just proceed to ignore the property. Is 
>>> that really
>>> OK? Wouldn't host want to know?
>>
>>
>> Guest need to be in sync with device, so yes seems like guest need to tell 
>> device which page-size-mask it is using.
>>
>> Corresponding spec change patch 
>> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
>>
>> Would like Jean/Eric to comment here as well.
>>
>>>
>>>
 +
 +  vdev->pgsize_bitmap = pgsize_bitmap;
>>>
>>> what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with 
>>> that value ...
>>
>> As per spec proposed device is supposed to set at-least one bit.
>> Will add a bug_on her.
> 
> Or better fail probe ...
Yes I agree I would rather fail the probe.
> 
>> Should we add bug_on or switch to global config page-size mask if this is 
>> zero (notify device which page-size-mask it is using).
> 
> It's a spec violation, I wouldn't try to use the device.
> 
>>>
>>> I also see a bunch of code like e.g. this:
>>>
>>> pg_size = 1UL << __ffs(pgsize_bitmap);
>>>
>>> which probably won't DTRT on a 32 bit guest if the bitmap has bits set in 
>>> the high
>>> word.
>>>
>>
>> My thought is that in that case viommu_domain_finalise() will fail, do not 
>> proceed.
> 
> That's undefined behaviour in C. You need to make sure this condition
> is never reached. And spec does not make this illegal at all
> so it looks like we actually need to handle this gracefully.
> 
> 
>>>
>>>
 +  return 0;
 +}
 +
  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
   struct virtio_iommu_probe_resv_mem *mem,
   size_t len)
 @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev
>>> *viommu, struct device *dev)
case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
ret = viommu_add_resv_mem(vdev, (void *)prop, len);
break;
 +  case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
 +  ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
 +  break;
default:
dev_err(dev, "unknown viommu prop 0x%x\n", type);
}
 @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct
 viommu_endpoint *vdev,

vdomain->id = (unsigned int)ret;

 -  domain->pgsize_bitmap   = viommu->pgsize_bitmap;
 +  domain->pg

Re: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by endpoint

2020-05-07 Thread Auger Eric
Hi Bharat,

On 5/7/20 1:24 PM, Bharat Bhushan wrote:
> 
> 
>> -Original Message-
>> From: Michael S. Tsirkin 
>> Sent: Wednesday, May 6, 2020 5:53 AM
>> To: Bharat Bhushan 
>> Cc: jean-phili...@linaro.org; j...@8bytes.org; jasow...@redhat.com;
>> virtualization@lists.linux-foundation.org; io...@lists.linux-foundation.org;
>> linux-ker...@vger.kernel.org; eric.auger@gmail.com; eric.au...@redhat.com
>> Subject: [EXT] Re: [PATCH v5] iommu/virtio: Use page size bitmap supported by
>> endpoint
>>
>> External Email
>>
>> --
>> On Tue, May 05, 2020 at 03:00:04PM +0530, Bharat Bhushan wrote:
>>> Different endpoint can support different page size, probe endpoint if
>>> it supports specific page size otherwise use global page sizes.
>>>
>>> Signed-off-by: Bharat Bhushan 
>>> ---
>>> v4->v5:
>>>  - Rebase to Linux v5.7-rc4
>>>
>>> v3->v4:
>>>  - Fix whitespace error
>>>
>>> v2->v3:
>>>  - Fixed error return for incompatible endpoint
>>>  - __u64 changed to __le64 in header file
>>>
>>>  drivers/iommu/virtio-iommu.c  | 48 ---
>>>  include/uapi/linux/virtio_iommu.h |  7 +
>>>  2 files changed, 51 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/iommu/virtio-iommu.c
>>> b/drivers/iommu/virtio-iommu.c index d5cac4f46ca5..9513d2ab819e 100644
>>> --- a/drivers/iommu/virtio-iommu.c
>>> +++ b/drivers/iommu/virtio-iommu.c
>>> @@ -78,6 +78,7 @@ struct viommu_endpoint {
>>> struct viommu_dev   *viommu;
>>> struct viommu_domain*vdomain;
>>> struct list_headresv_regions;
>>> +   u64 pgsize_bitmap;
>>>  };
>>>
>>>  struct viommu_request {
>>> @@ -415,6 +416,19 @@ static int viommu_replay_mappings(struct
>> viommu_domain *vdomain)
>>> return ret;
>>>  }
>>>
>>> +static int viommu_set_pgsize_bitmap(struct viommu_endpoint *vdev,
>>> +   struct virtio_iommu_probe_pgsize_mask *mask,
>>> +   size_t len)
>>> +{
>>> +   u64 pgsize_bitmap = le64_to_cpu(mask->pgsize_bitmap);
>>> +
>>> +   if (len < sizeof(*mask))
>>
>> This is too late to validate length, you have dereferenced it already.
>> do it before the read pls.
> 
> Yes, Will change here and other places as well
> 
>>
>>> +   return -EINVAL;
>>
>> OK but note that guest will then just proceed to ignore the property. Is 
>> that really
>> OK? Wouldn't host want to know?
> 
> 
> Guest need to be in sync with device, so yes seems like guest need to tell 
> device which page-size-mask it is using.
> 
> Corresponding spec change patch 
> (https://www.mail-archive.com/virtio-dev@lists.oasis-open.org/msg06214.html)
> 
> Would like Jean/Eric to comment here as well.
why can't we fail the probe request in that case? This is a misbehaving
device that reports malformed property, right?

Thanks

Eric


> 
>>
>>
>>> +
>>> +   vdev->pgsize_bitmap = pgsize_bitmap;
>>
>> what if bitmap is 0? Is that a valid size? I see a bunch of BUG_ON with that 
>> value ...
> 
> As per spec proposed device is supposed to set at-least one bit.
> Will add a bug_on her.
> Should we add bug_on or switch to global config page-size mask if this is 
> zero (notify device which page-size-mask it is using).
> 
>>
>> I also see a bunch of code like e.g. this:
>>
>> pg_size = 1UL << __ffs(pgsize_bitmap);
>>
>> which probably won't DTRT on a 32 bit guest if the bitmap has bits set in 
>> the high
>> word.
>>
> 
> My thought is that in that case viommu_domain_finalise() will fail, do not 
> proceed.
> 
>>
>>
>>> +   return 0;
>>> +}
>>> +
>>>  static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
>>>struct virtio_iommu_probe_resv_mem *mem,
>>>size_t len)
>>> @@ -499,6 +513,9 @@ static int viommu_probe_endpoint(struct viommu_dev
>> *viommu, struct device *dev)
>>> case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
>>> ret = viommu_add_resv_mem(vdev, (void *)prop, len);
>>> break;
>>> +   case VIRTIO_IOMMU_PROBE_T_PAGE_SIZE_MASK:
>>> +   ret = viommu_set_pgsize_bitmap(vdev, (void *)prop, len);
>>> +   break;
>>> default:
>>> dev_err(dev, "unknown viommu prop 0x%x\n", type);
>>> }
>>> @@ -630,7 +647,7 @@ static int viommu_domain_finalise(struct
>>> viommu_endpoint *vdev,
>>>
>>> vdomain->id = (unsigned int)ret;
>>>
>>> -   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
>>> +   domain->pgsize_bitmap   = vdev->pgsize_bitmap;
>>> domain->geometry= viommu->geometry;
>>>
>>> vdomain->map_flags  = viommu->map_flags;
>>> @@ -654,6 +671,29 @@ static void viommu_domain_free(struct iommu_domain
>> *domain)
>>> kfree(vdomain);
>>>  }
>>>
>>> +/*
>>> + * Check whether the endpoint's capabilities are compatible with
>>> +other
>>> + * endpoint

Re: [PATCH v2 3/3] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-26 Thread Auger Eric
Hi Jean,

On 3/26/20 10:35 AM, Jean-Philippe Brucker wrote:
> We don't currently support IOMMUs with a page granule larger than the
> system page size. The IOVA allocator has a BUG_ON() in this case, and
> VFIO has a WARN_ON().
> 
> Removing these obstacles ranges doesn't seem possible without major
> changes to the DMA API and VFIO. Some callers of iommu_map(), for
> example, want to map multiple page-aligned regions adjacent to each
> others for scatter-gather purposes. Even in simple DMA API uses, a call
> to dma_map_page() would let the endpoint access neighbouring memory. And
> VFIO users cannot ensure that their virtual address buffer is physically
> contiguous at the IOMMU granule.
> 
> Rather than triggering the IOVA BUG_ON() on mismatched page sizes, abort
> the vdomain finalise() with an error message. We could simply abort the
> viommu probe(), but an upcoming extension to virtio-iommu will allow
> setting different page masks for each endpoint.
> 
> Reported-by: Bharat Bhushan 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
> v1->v2: Move to vdomain_finalise(), improve commit message
> ---
>  drivers/iommu/virtio-iommu.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 5eed75cd121f..750f69c49b95 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -607,12 +607,22 @@ static struct iommu_domain 
> *viommu_domain_alloc(unsigned type)
>   return &vdomain->domain;
>  }
>  
> -static int viommu_domain_finalise(struct viommu_dev *viommu,
> +static int viommu_domain_finalise(struct viommu_endpoint *vdev,
> struct iommu_domain *domain)
>  {
>   int ret;
> + unsigned long viommu_page_size;
> + struct viommu_dev *viommu = vdev->viommu;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> + viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
> + if (viommu_page_size > PAGE_SIZE) {
> + dev_err(vdev->dev,
> + "granule 0x%lx larger than system page size 0x%lx\n",
> + viommu_page_size, PAGE_SIZE);
> + return -EINVAL;
> + }
> +
>   ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain,
> viommu->last_domain, GFP_KERNEL);
>   if (ret < 0)
> @@ -659,7 +669,7 @@ static int viommu_attach_dev(struct iommu_domain *domain, 
> struct device *dev)
>* Properly initialize the domain now that we know which viommu
>* owns it.
>*/
> - ret = viommu_domain_finalise(vdev->viommu, domain);
> + ret = viommu_domain_finalise(vdev, domain);
>   } else if (vdomain->viommu != vdev->viommu) {
>   dev_err(dev, "cannot attach to foreign vIOMMU\n");
>   ret = -EXDEV;
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-25 Thread Auger Eric
Hi Jean,

On 3/18/20 12:40 PM, Jean-Philippe Brucker wrote:
> We don't currently support IOMMUs with a page granule larger than the
> system page size. The IOVA allocator has a BUG_ON() in this case, and
> VFIO has a WARN_ON().
> 
> It might be possible to remove these obstacles if necessary. If the host
> uses 64kB pages and the guest uses 4kB, then a device driver calling
> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
> of memory. This problem could be worked around with bounce buffers.
> 
> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
> page sizes, abort the virtio-iommu probe with an error message.
> 
> Reported-by: Bharat Bhushan 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 6d4e3c2a2ddb..80d5d8f621ab 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
>   struct device *parent_dev = vdev->dev.parent;
>   struct viommu_dev *viommu = NULL;
>   struct device *dev = &vdev->dev;
> + unsigned long viommu_page_size;
>   u64 input_start = 0;
>   u64 input_end = -1UL;
>   int ret;
> @@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device *vdev)
>   goto err_free_vqs;
>   }
>  
> + viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
Given the fact we now head towards probing the device for its
page_size_mask in viommu_add_device() the check may need to happen
later, in viommu_domain_finalise() for instance?

Thanks

Eric
> + if (viommu_page_size > PAGE_SIZE) {
> + dev_err(dev, "granule 0x%lx larger than system page size 
> 0x%lx\n",
> + viommu_page_size, PAGE_SIZE);
> + ret = -EINVAL;
> + goto err_free_vqs;
> + }
> +
>   viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
>   viommu->last_domain = ~0U;
>  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Reject IOMMU page granule larger than PAGE_SIZE

2020-03-18 Thread Auger Eric
Hi,

On 3/18/20 1:00 PM, Robin Murphy wrote:
> On 2020-03-18 11:40 am, Jean-Philippe Brucker wrote:
>> We don't currently support IOMMUs with a page granule larger than the
>> system page size. The IOVA allocator has a BUG_ON() in this case, and
>> VFIO has a WARN_ON().

Adding Alex in CC in case he has time to jump in. At the moment I don't
get why this WARN_ON() is here.

This was introduced in
c8dbca165bb090f926996a572ea2b5b577b34b70 vfio/iommu_type1: Avoid overflow

>>
>> It might be possible to remove these obstacles if necessary. If the host
>> uses 64kB pages and the guest uses 4kB, then a device driver calling
>> alloc_page() followed by dma_map_page() will create a 64kB mapping for a
>> 4kB physical page, allowing the endpoint to access the neighbouring 60kB
>> of memory. This problem could be worked around with bounce buffers.
> 
> FWIW the fundamental issue is that callers of iommu_map() may expect to
> be able to map two or more page-aligned regions directly adjacent to
> each other for scatter-gather purposes (or ring buffer tricks), and
> that's just not possible if the IOMMU granule is too big. Bounce
> buffering would be a viable workaround for the streaming DMA API and
> certain similar use-cases, but not in general (e.g. coherent DMA, VFIO,
> GPUs, etc.)
> 
> Robin.
> 
>> For the moment, rather than triggering the IOVA BUG_ON() on mismatched
>> page sizes, abort the virtio-iommu probe with an error message.

I understand this is a introduced as a temporary solution but this
sounds as an important limitation to me. For instance this will prevent
from running a fedora guest exposed with a virtio-iommu with a RHEL host.

Thanks

Eric
>>
>> Reported-by: Bharat Bhushan 
>> Signed-off-by: Jean-Philippe Brucker 
>> ---
>>   drivers/iommu/virtio-iommu.c | 9 +
>>   1 file changed, 9 insertions(+)
>>
>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
>> index 6d4e3c2a2ddb..80d5d8f621ab 100644
>> --- a/drivers/iommu/virtio-iommu.c
>> +++ b/drivers/iommu/virtio-iommu.c
>> @@ -998,6 +998,7 @@ static int viommu_probe(struct virtio_device *vdev)
>>   struct device *parent_dev = vdev->dev.parent;
>>   struct viommu_dev *viommu = NULL;
>>   struct device *dev = &vdev->dev;
>> +    unsigned long viommu_page_size;
>>   u64 input_start = 0;
>>   u64 input_end = -1UL;
>>   int ret;
>> @@ -1028,6 +1029,14 @@ static int viommu_probe(struct virtio_device
>> *vdev)
>>   goto err_free_vqs;
>>   }
>>   +    viommu_page_size = 1UL << __ffs(viommu->pgsize_bitmap);
>> +    if (viommu_page_size > PAGE_SIZE) {
>> +    dev_err(dev, "granule 0x%lx larger than system page size
>> 0x%lx\n",
>> +    viommu_page_size, PAGE_SIZE);
>> +    ret = -EINVAL;
>> +    goto err_free_vqs;
>> +    }
>> +
>>   viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ |
>> VIRTIO_IOMMU_MAP_F_WRITE;
>>   viommu->last_domain = ~0U;
>>  
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Auger Eric
Hi Michael, Joerg,

On 3/3/20 5:09 PM, Michael S. Tsirkin wrote:
> On Tue, Mar 03, 2020 at 04:53:19PM +0100, Joerg Roedel wrote:
>> On Tue, Mar 03, 2020 at 09:00:05AM -0500, Michael S. Tsirkin wrote:
>>> Not necessarily. E.g. some power systems have neither.
>>> There are also systems looking to bypass ACPI e.g. for boot speed.
>>
>> If there is no firmware layer between the hardware and the OS the
>> necessary information the OS needs to run on the hardware is probably
>> hard-coded into the kernel?
> 
> No. It's coded into the hardware. Which might even be practical
> for bare-metal (e.g. on-board flash), but is very practical
> when the device is part of a hypervisor.
> 
>> In that case the same can be done with
>> virtio-iommu tolopology.
>>
>>> That sentence doesn't really answer the question, does it?
>>
>> To be more elaborate, putting this information into config space is a
>> layering violation. Hardware is never completly self-descriptive
> 
> 
> This "hardware" is actually part of hypervisor so there's no
> reason it can't be completely self-descriptive. It's specified
> by the same entity as the "firmware".

Yes in the QEMU case the machine code would fill this information as it
knows all the devices connected to the iommu. In the same way it would
generate the DT description or the ACPI tables
> 
>> and
>> that is why there is the firmware which provides the information about
>> the hardware to the OS in a generic way.
>>
>>> Frankly with platform specific interfaces like ACPI, virtio-iommu is
>>> much less compelling.  Describing topology as part of the device in a
>>> way that is first, portable, and second, is a good fit for hypervisors,
>>> is to me one of the main reasons virtio-iommu makes sense at all.
>>
>> Virtio-IOMMU makes sense in the first place because it is much faster
>> than emulating one of the hardware IOMMUs.
> 
> I don't see why it would be much faster. The interface isn't that
> different from command queues of VTD.

> 
>> And an ACPI table is also
>> portable to all ACPI platforms, same with device-tree.
>>
>> Regards,
>>
>>  Joerg
> 
> Making ACPI meet the goals of embedded projects such as kata containers
> would be a gigantic task with huge stability implications.  By
> comparison this 400-line parser is well contained and does the job.  I
> didn't yet see compelling reasons not to merge this, but I'll be
> interested to see some more specific concerns.
> 
> 
Thanks

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/3] iommu/virtio: Add topology description to virtio-iommu config space

2020-03-03 Thread Auger Eric
Hi Joerg,

On 3/2/20 5:16 PM, Joerg Roedel wrote:
> On Fri, Feb 28, 2020 at 06:25:36PM +0100, Jean-Philippe Brucker wrote:
>> This solution isn't elegant nor foolproof, but is the best we can do at
>> the moment and works with existing virtio-iommu implementations. It also
>> enables an IOMMU for lightweight hypervisors that do not rely on
>> firmware methods for booting.
> 
> I appreciate the enablement on x86, but putting the conmfiguration into
> mmio-space isn't really something I want to see upstream. What is the
> problem with defining an ACPI table instead? This would also make things
> work on AARCH64 UEFI machines.
Michael has pushed this solution (putting the "configuration in the PCI
config space"), I think for those main reasons:
- ACPI may not be supported on some archs/hyps
- the virtio-iommu is a PCIe device so binding should not need ACPI
description

Another issue with ACPI integration is we have different flavours of
tables that exist: IORT, DMAR, IVRS

x86 ACPI integration was suggested with IORT. But it seems ARM is
reluctant to extend IORT to support para-virtualized IOMMU. So the VIOT
was proposed as a different atternative in "[RFC 00/13] virtio-iommu on
non-devicetree platforms"
(https://patchwork.kernel.org/cover/11257727/). Proposing a table that
may be used by different vendors seems to be a challenging issue here.

So even if the above solution does not look perfect, it looked a
sensible compromise given the above arguments. Please could you explain
what are the most compelling arguments against it?

Thanks

Eric
> 
> Regards,
> 
>   Joerg
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/5] iommu: Implement iommu_put_resv_regions_simple()

2019-09-16 Thread Auger Eric
Hi Thierry,

On 8/29/19 1:17 PM, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Most IOMMU drivers only need to free the memory allocated for each
> reserved region. Instead of open-coding the loop to do this in each
> driver, extract the code into a common function that can be used by
> all these drivers.

If I am not wrong, all the drivers now use the
iommu_put_resv_regions_simple helper. So we can wonder if the callback
is still relevant?

Thanks

Eric
> 
> Thierry
> 
> Thierry Reding (5):
>   iommu: Implement iommu_put_resv_regions_simple()
>   iommu: arm: Use iommu_put_resv_regions_simple()
>   iommu: amd: Use iommu_put_resv_regions_simple()
>   iommu: intel: Use iommu_put_resv_regions_simple()
>   iommu: virt: Use iommu_put_resv_regions_simple()
> 
>  drivers/iommu/amd_iommu.c| 11 +--
>  drivers/iommu/arm-smmu-v3.c  | 11 +--
>  drivers/iommu/arm-smmu.c | 11 +--
>  drivers/iommu/intel-iommu.c  | 11 +--
>  drivers/iommu/iommu.c| 19 +++
>  drivers/iommu/virtio-iommu.c | 14 +++---
>  include/linux/iommu.h|  2 ++
>  7 files changed, 28 insertions(+), 51 deletions(-)
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-25 Thread Auger Eric
Hi Christoph, Michael,

On 7/25/19 3:04 PM, Christoph Hellwig wrote:
> On Thu, Jul 25, 2019 at 01:53:49PM +0200, Auger Eric wrote:
>> I am confused: if vring_use_dma_api() returns false if the dma_mask is
>> unset (ie. vring_use_dma_api() returns false), the virtio-blk-pci device
>> will not be able to get translated addresses and won't work properly.
>>
>> The patch above allows the dma api to be used and only influences the
>> max_segment_size and it works properly.
>>
>> So is it normal the dma_mask is unset in my case?
> 
> Its not normal.  I assume you use virtio-nmio?  Due to the mess with
> the dma_mask being a pointer a lot of subsystems forgot to set a dma
> mask up, and oddly enough seem to mostly get away with it.
> 

No the issue is encountered with virtio-blk-pci

I think the problem is virtio_max_dma_size() is called from
virtblk_probe (virtio_blk.c) on the virtio device and not the actual
virtio_pci_device which has a dma_mask set. I don't think the virtio
device ever has a dma_mask set.

We do not hit the guest crash on the virtio-net-pci device as the
virtio-net driver does not call virtio_max_dma_size() on the virtio
device.

Does fd1068e1860e ("virtio-blk: Consider virtio_max_dma_size() for
maximum segment size") call virtio_max_dma_size() on the right device?

Thanks

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 2/2] virtio/virtio_ring: Fix the dma_max_mapping_size call

2019-07-25 Thread Auger Eric
Hi,

On 7/23/19 5:38 PM, Christoph Hellwig wrote:
> On Mon, Jul 22, 2019 at 04:36:09PM +0100, Robin Murphy wrote:
>>> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
>>> index c8be1c4f5b55..37c143971211 100644
>>> --- a/drivers/virtio/virtio_ring.c
>>> +++ b/drivers/virtio/virtio_ring.c
>>> @@ -262,7 +262,7 @@ size_t virtio_max_dma_size(struct virtio_device *vdev)
>>>   {
>>> size_t max_segment_size = SIZE_MAX;
>>>   - if (vring_use_dma_api(vdev))
>>> +   if (vring_use_dma_api(vdev) && vdev->dev.dma_mask)
>>
>> Hmm, might it make sense to roll that check up into vring_use_dma_api() 
>> itself? After all, if the device has no mask then it's likely that other 
>> DMA API ops wouldn't really work as expected either.
> 
> Makes sense to me.
> 

I am confused: if vring_use_dma_api() returns false if the dma_mask is
unset (ie. vring_use_dma_api() returns false), the virtio-blk-pci device
will not be able to get translated addresses and won't work properly.

The patch above allows the dma api to be used and only influences the
max_segment_size and it works properly.

So is it normal the dma_mask is unset in my case?

Thanks

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] dma-mapping: Use dma_get_mask in dma_addressing_limited

2019-07-22 Thread Auger Eric
Hi Christoph,

On 7/22/19 6:51 PM, Eric Auger wrote:
> We currently have cases where the dma_addressing_limited() gets
> called with dma_mask unset. This causes a NULL pointer dereference.
> 
> Use dma_get_mask() accessor to prevent the crash.
> 
> Fixes: b866455423e0 ("dma-mapping: add a dma_addressing_limited helper")
> Signed-off-by: Eric Auger 

As a follow-up of my last email, here is a patch featuring
dma_get_mask(). But you don't have the WARN_ON_ONCE anymore, pointing
out suspect users.

Feel free to pick up your preferred approach

Thanks

Eric
> 
> ---
> 
> v1 -> v2:
> - was [PATCH 1/2] dma-mapping: Protect dma_addressing_limited
>   against NULL dma_mask
> - Use dma_get_mask
> ---
>  include/linux/dma-mapping.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
> index e11b115dd0e4..f7d1eea32c78 100644
> --- a/include/linux/dma-mapping.h
> +++ b/include/linux/dma-mapping.h
> @@ -689,8 +689,8 @@ static inline int dma_coerce_mask_and_coherent(struct 
> device *dev, u64 mask)
>   */
>  static inline bool dma_addressing_limited(struct device *dev)
>  {
> - return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
> - dma_get_required_mask(dev);
> + return min_not_zero(dma_get_mask(dev), dev->bus_dma_mask) <
> + dma_get_required_mask(dev);
>  }
>  
>  #ifdef CONFIG_ARCH_HAS_SETUP_DMA_OPS
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/2] dma-mapping: Protect dma_addressing_limited against NULL dma_mask

2019-07-22 Thread Auger Eric
Hi Christoph,

On 7/22/19 5:26 PM, Christoph Hellwig wrote:
>>  static inline bool dma_addressing_limited(struct device *dev)
>>  {
>> -return min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
>> -dma_get_required_mask(dev);
>> +return WARN_ON_ONCE(!dev->dma_mask) ? false :
>> +min_not_zero(*dev->dma_mask, dev->bus_dma_mask) <
>> +dma_get_required_mask(dev);
> 
> This should really use a separate if statement, but I can fix that
> up when applying it.
> 
Just wondering why we don't use the dma_get_mask() accessor which
returns DMA_BIT_MASK(32) in case the dma_mask is not set.

Do you foresee any issue and would it still mandate to add dma_mask
checks on each call sites?

Thanks

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] iommu/virtio: Update to most recent specification

2019-07-22 Thread Auger Eric
Hi Jean,

On 7/22/19 4:40 PM, Jean-Philippe Brucker wrote:
> Following specification review a few things were changed in v8 of the
> virtio-iommu series [1], but have been omitted when merging the base
> driver. Add them now:
> 
> * Remove the EXEC flag.
> * Add feature bit for the MMIO flag.
> * Change domain_bits to domain_range.
> * Add NOMEM status flag.
> 
> [1] 
> https://lore.kernel.org/linux-iommu/20190530170929.19366-1-jean-philippe.bruc...@arm.com/

> 
> Fixes: edcd69ab9a32 ("iommu: Add virtio-iommu driver")
> Reported-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 
Tested-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/virtio-iommu.c  | 40 ++-
>  include/uapi/linux/virtio_iommu.h | 32 ++---
>  2 files changed, 47 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 433f4d2ee956..80a740df0737 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -2,7 +2,7 @@
>  /*
>   * Virtio driver for the paravirtualized IOMMU
>   *
> - * Copyright (C) 2018 Arm Limited
> + * Copyright (C) 2019 Arm Limited
>   */
>  
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> @@ -47,7 +47,10 @@ struct viommu_dev {
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
> - u8  domain_bits;
> + u32 first_domain;
> + u32 last_domain;
> + /* Supported MAP flags */
> + u32 map_flags;
>   u32 probe_size;
>  };
>  
> @@ -62,6 +65,7 @@ struct viommu_domain {
>   struct viommu_dev   *viommu;
>   struct mutexmutex; /* protects viommu pointer */
>   unsigned intid;
> + u32 map_flags;
>  
>   spinlock_t  mappings_lock;
>   struct rb_root_cached   mappings;
> @@ -113,6 +117,8 @@ static int viommu_get_req_errno(void *buf, size_t len)
>   return -ENOENT;
>   case VIRTIO_IOMMU_S_FAULT:
>   return -EFAULT;
> + case VIRTIO_IOMMU_S_NOMEM:
> + return -ENOMEM;
>   case VIRTIO_IOMMU_S_IOERR:
>   case VIRTIO_IOMMU_S_DEVERR:
>   default:
> @@ -607,15 +613,15 @@ static int viommu_domain_finalise(struct viommu_dev 
> *viommu,
>  {
>   int ret;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
> - unsigned int max_domain = viommu->domain_bits > 31 ? ~0 :
> -   (1U << viommu->domain_bits) - 1;
>  
>   vdomain->viommu = viommu;
> + vdomain->map_flags  = viommu->map_flags;
>  
>   domain->pgsize_bitmap   = viommu->pgsize_bitmap;
>   domain->geometry= viommu->geometry;
>  
> - ret = ida_alloc_max(&viommu->domain_ids, max_domain, GFP_KERNEL);
> + ret = ida_alloc_range(&viommu->domain_ids, viommu->first_domain,
> +   viommu->last_domain, GFP_KERNEL);
>   if (ret >= 0)
>   vdomain->id = (unsigned int)ret;
>  
> @@ -710,7 +716,7 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
> phys_addr_t paddr, size_t size, int prot)
>  {
>   int ret;
> - int flags;
> + u32 flags;
>   struct virtio_iommu_req_map map;
>   struct viommu_domain *vdomain = to_viommu_domain(domain);
>  
> @@ -718,6 +724,9 @@ static int viommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   (prot & IOMMU_WRITE ? VIRTIO_IOMMU_MAP_F_WRITE : 0) |
>   (prot & IOMMU_MMIO ? VIRTIO_IOMMU_MAP_F_MMIO : 0);
>  
> + if (flags & ~vdomain->map_flags)
> + return -EINVAL;
> +
>   ret = viommu_add_mapping(vdomain, iova, paddr, size, flags);
>   if (ret)
>   return ret;
> @@ -1027,7 +1036,8 @@ static int viommu_probe(struct virtio_device *vdev)
>   goto err_free_vqs;
>   }
>  
> - viommu->domain_bits = 32;
> + viommu->map_flags = VIRTIO_IOMMU_MAP_F_READ | VIRTIO_IOMMU_MAP_F_WRITE;
> + viommu->last_domain = ~0U;
>  
>   /* Optional features */
>   virtio_cread_feature(vdev, VIRTIO_IOMMU_F_INPUT_RANGE,
> @@ -1038,9 +1048,13 @@ static int viommu_probe(struct virtio_device *vdev)
>struct virtio_iommu_config, input_range.end,
>&input_end);
>  
> - virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_BITS,
> -  struct virtio_iommu_config, domain_bits,
> -  &viommu->domain_bits);
> + virtio_cread_feature(vdev, VIRTIO_IOMMU_F_DOMAIN_RANGE,
> +  struct virtio_iommu_config, domain_range.start,
> +  &viommu->first_domain);
> +
> + virtio_cread_fe

Re: [PATCH v8 5/7] iommu: Add virtio-iommu driver

2019-06-14 Thread Auger Eric
Hi jean,

On 5/30/19 7:09 PM, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio transport without emulating page
> tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables. A little more work is required for modular and x86
> support, so for the moment the driver depends on CONFIG_VIRTIO=y and
> CONFIG_ARM64.
> 
> Acked-by: Joerg Roedel 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric

> ---
>  MAINTAINERS   |   7 +
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 934 ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 110 
>  6 files changed, 1064 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 429c6c624861..62bd1834d95a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16807,6 +16807,13 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO IOMMU DRIVER
> +M:   Jean-Philippe Brucker 
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   drivers/iommu/virtio-iommu.c
> +F:   include/uapi/linux/virtio_iommu.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db5221d..e15cdcd8cb3c 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -473,4 +473,15 @@ config HYPERV_IOMMU
> Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
> guests to run with x2APIC mode enabled.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO=y
> + depends on ARM64
> + select IOMMU_API
> + select INTERVAL_TREE
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15e986b..f13f36ae1af6 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -33,3 +33,4 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>  obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..b2719a87c3c5
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,934 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2019 Arm Limited
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +#define VIOMMU_REQUEST_VQ0
> +#define VIOMMU_NR_VQS1
> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vqs[VIOMMU_NR_VQS];
> + spinlock_t  request_lock;
> + struct list_headrequests;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u32 first_domain;
> + u32 last_domain;
> + /* Supported MAP flags */
> + u32 map_flags;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + u32 flags;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex; /* protects viommu pointer */
> + unsigned intid;
> + u32 map_flags;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + unsigned long   nr_endpoints;
> +};
>

Re: [PATCH v8 0/7] Add virtio-iommu driver

2019-06-13 Thread Auger Eric
Hi,

On 5/30/19 7:09 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.12 [1].
> Since last version [2] we've worked on improving the specification,
> which resulted in the following changes to the interface:
> * Remove the EXEC flag.
> * Add feature bit for the MMIO flag.
> * Change domain_bits to domain_range.
> 
> Given that there were small changes to patch 5/7, I removed the review
> and test tags. Please find the code at [3].
> 
> [1] Virtio-iommu specification v0.12, sources and pdf
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.12
> http://jpbrucker.net/virtio-iommu/spec/v0.12/virtio-iommu-v0.12.pdf
> 
> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-dev-diff-v0.11-v0.12.pdf
> 
> [2] [PATCH v7 0/7] Add virtio-iommu driver
> 
> https://lore.kernel.org/linux-pci/0ba215f5-e856-bf31-8dd9-a85710714...@arm.com/T/
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.12
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.12

SERIES
Tested-by: Eric Auger 

with QEMU branch:
https://github.com/eauger/qemu/tree/v4.0-virtio-iommu-v0.12

and kernel branch:
https://github.com/eauger/linux/tree/virtio-iommu-v0.12
(where I just added the rebased iort patches on top of Jean-Philippe's
virtio-iommu/v0.12 branch). So I also tested the ACPI boot.

I will formally submit the QEMU series ASAP.

Thanks

Eric

> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   of: Allow the iommu-map property to omit untranslated devices
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1176 +
>  drivers/of/base.c |   10 +-
>  drivers/pci/of.c  |6 +
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  165 +++
>  10 files changed, 1470 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-15 Thread Auger Eric
Hi Robin

On 12/13/18 1:37 PM, Robin Murphy wrote:
> On 2018-12-12 3:27 pm, Auger Eric wrote:
>> Hi,
>>
>> On 12/12/18 3:56 PM, Michael S. Tsirkin wrote:
>>> On Fri, Dec 07, 2018 at 06:52:31PM +, Jean-Philippe Brucker wrote:
>>>> Sorry for the delay, I wanted to do a little more performance analysis
>>>> before continuing.
>>>>
>>>> On 27/11/2018 18:10, Michael S. Tsirkin wrote:
>>>>> On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote:
>>>>>>>> +    if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) ||
>>>>>>>> +    !virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP))
>>>>>>>
>>>>>>> Why bother with a feature bit for this then btw?
>>>>>>
>>>>>> We'll need a new feature bit for sharing page tables with the
>>>>>> hardware,
>>>>>> because they require different requests (attach_table/invalidate
>>>>>> instead
>>>>>> of map/unmap.) A future device supporting page table sharing won't
>>>>>> necessarily need to support map/unmap.
>>>>>>
>>>>> I don't see virtio iommu being extended to support ARM specific
>>>>> requests. This just won't scale, too many different
>>>>> descriptor formats out there.
>>>>
>>>> They aren't really ARM specific requests. The two new requests are
>>>> ATTACH_TABLE and INVALIDATE, which would be used by x86 IOMMUs as well.
>>>>
>>>> Sharing CPU address space with the HW IOMMU (SVM) has been in the scope
>>>> of virtio-iommu since the first RFC, and I've been working with that
>>>> extension in mind since the beginning. As an example you can have a
>>>> look
>>>> at my current draft for this [1], which is inspired from the VFIO work
>>>> we've been doing with Intel.
>>>>
>>>> The negotiation phase inevitably requires vendor-specific fields in the
>>>> descriptors - host tells which formats are supported, guest chooses a
>>>> format and attaches page tables. But invalidation and fault reporting
>>>> descriptors are fairly generic.
>>>
>>> We need to tread carefully here.  People expect it that if user does
>>> lspci and sees a virtio device then it's reasonably portable.
>>>
>>>>> If you want to go that way down the road, you should avoid
>>>>> virtio iommu, instead emulate and share code with the ARM SMMU
>>>>> (probably
>>>>> with a different vendor id so you can implement the
>>>>> report on map for devices without PRI).
>>>>
>>>> vSMMU has to stay in userspace though. The main reason we're proposing
>>>> virtio-iommu is that emulating every possible vIOMMU model in the
>>>> kernel
>>>> would be unmaintainable. With virtio-iommu we can process the fast path
>>>> in the host kernel, through vhost-iommu, and do the heavy lifting in
>>>> userspace.
>>>
>>> Interesting.
>>>
>>>> As said above, I'm trying to keep the fast path for
>>>> virtio-iommu generic.
>>>>
>>>> More notes on what I consider to be the fast path, and comparison with
>>>> vSMMU:
>>>>
>>>> (1) The primary use-case we have in mind for vIOMMU is something like
>>>> DPDK in the guest, assigning a hardware device to guest userspace. DPDK
>>>> maps a large amount of memory statically, to be used by a pass-through
>>>> device. For this case I don't think we care about vIOMMU performance.
>>>> Setup and teardown need to be reasonably fast, sure, but the MAP/UNMAP
>>>> requests don't have to be optimal.
>>>>
>>>>
>>>> (2) If the assigned device is owned by the guest kernel, then mappings
>>>> are dynamic and require dma_map/unmap() to be fast, but there generally
>>>> is no need for a vIOMMU, since device and drivers are trusted by the
>>>> guest kernel. Even when the user does enable a vIOMMU for this case
>>>> (allowing to over-commit guest memory, which needs to be pinned
>>>> otherwise),
>>>
>>> BTW that's in theory in practice it doesn't really work.
>>>
>>>> we generally play tricks like lazy TLBI (non-strict mode) to
>>>> make it faster.
>>>
>>> Simple lazy TLB

Re: [virtio-dev] Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-12-14 Thread Auger Eric
Hi,

On 12/12/18 3:56 PM, Michael S. Tsirkin wrote:
> On Fri, Dec 07, 2018 at 06:52:31PM +, Jean-Philippe Brucker wrote:
>> Sorry for the delay, I wanted to do a little more performance analysis
>> before continuing.
>>
>> On 27/11/2018 18:10, Michael S. Tsirkin wrote:
>>> On Tue, Nov 27, 2018 at 05:55:20PM +, Jean-Philippe Brucker wrote:
>> +if (!virtio_has_feature(vdev, VIRTIO_F_VERSION_1) ||
>> +!virtio_has_feature(vdev, VIRTIO_IOMMU_F_MAP_UNMAP))
>
> Why bother with a feature bit for this then btw?

 We'll need a new feature bit for sharing page tables with the hardware,
 because they require different requests (attach_table/invalidate instead
 of map/unmap.) A future device supporting page table sharing won't
 necessarily need to support map/unmap.

>>> I don't see virtio iommu being extended to support ARM specific
>>> requests. This just won't scale, too many different
>>> descriptor formats out there.
>>
>> They aren't really ARM specific requests. The two new requests are
>> ATTACH_TABLE and INVALIDATE, which would be used by x86 IOMMUs as well.
>>
>> Sharing CPU address space with the HW IOMMU (SVM) has been in the scope
>> of virtio-iommu since the first RFC, and I've been working with that
>> extension in mind since the beginning. As an example you can have a look
>> at my current draft for this [1], which is inspired from the VFIO work
>> we've been doing with Intel.
>>
>> The negotiation phase inevitably requires vendor-specific fields in the
>> descriptors - host tells which formats are supported, guest chooses a
>> format and attaches page tables. But invalidation and fault reporting
>> descriptors are fairly generic.
> 
> We need to tread carefully here.  People expect it that if user does
> lspci and sees a virtio device then it's reasonably portable.
> 
>>> If you want to go that way down the road, you should avoid
>>> virtio iommu, instead emulate and share code with the ARM SMMU (probably
>>> with a different vendor id so you can implement the
>>> report on map for devices without PRI).
>>
>> vSMMU has to stay in userspace though. The main reason we're proposing
>> virtio-iommu is that emulating every possible vIOMMU model in the kernel
>> would be unmaintainable. With virtio-iommu we can process the fast path
>> in the host kernel, through vhost-iommu, and do the heavy lifting in
>> userspace.
> 
> Interesting.
> 
>> As said above, I'm trying to keep the fast path for
>> virtio-iommu generic.
>>
>> More notes on what I consider to be the fast path, and comparison with
>> vSMMU:
>>
>> (1) The primary use-case we have in mind for vIOMMU is something like
>> DPDK in the guest, assigning a hardware device to guest userspace. DPDK
>> maps a large amount of memory statically, to be used by a pass-through
>> device. For this case I don't think we care about vIOMMU performance.
>> Setup and teardown need to be reasonably fast, sure, but the MAP/UNMAP
>> requests don't have to be optimal.
>>
>>
>> (2) If the assigned device is owned by the guest kernel, then mappings
>> are dynamic and require dma_map/unmap() to be fast, but there generally
>> is no need for a vIOMMU, since device and drivers are trusted by the
>> guest kernel. Even when the user does enable a vIOMMU for this case
>> (allowing to over-commit guest memory, which needs to be pinned
>> otherwise),
> 
> BTW that's in theory in practice it doesn't really work.
> 
>> we generally play tricks like lazy TLBI (non-strict mode) to
>> make it faster.
> 
> Simple lazy TLB for guest/userspace drivers would be a big no no.
> You need something smarter.
> 
>> Here device and drivers are trusted, therefore the
>> vulnerability window of lazy mode isn't a concern.
>>
>> If the reason to enable the vIOMMU is over-comitting guest memory
>> however, you can't use nested translation because it requires pinning
>> the second-level tables. For this case performance matters a bit,
>> because your invalidate-on-map needs to be fast, even if you enable lazy
>> mode and only receive inval-on-unmap every 10ms. It won't ever be as
>> fast as nested translation, though. For this case I think vSMMU+Caching
>> Mode and userspace virtio-iommu with MAP/UNMAP would perform similarly
>> (given page-sized payloads), because the pagetable walk doesn't add a
>> lot of overhead compared to the context switch. But given the results
>> below, vhost-iommu would be faster than vSMMU+CM.
>>
>>
>> (3) Then there is SVM. For SVM, any destructive change to the process
>> address space requires a synchronous invalidation command to the
>> hardware (at least when using PCI ATS). Given that SVM is based on page
>> faults, fault reporting from host to guest also needs to be fast, as
>> well as fault response from guest to host.
>>
>> I think this is where performance matters the most. To get a feel of the
>> advantage we get with virtio-iommu, I compared the vSMMU page-table
>> sharing implementation [2] and vh

Re: [PATCH v5 0/7] Add virtio-iommu driver

2018-12-03 Thread Auger Eric
Hi Michael,

On 11/27/18 6:16 PM, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 06:09:25PM +0100, Auger Eric wrote:
>> Hi Michael,
>>
>> On 11/27/18 5:53 PM, Michael S. Tsirkin wrote:
>>> On Thu, Nov 22, 2018 at 07:37:54PM +, Jean-Philippe Brucker wrote:
>>>> Implement the virtio-iommu driver, following specification v0.9 [1].
>>>>
>>>> Since v4 [2] I fixed the issues reported by Eric, and added Reviewed-by
>>>> from Eric and Rob. Thanks!
>>>>
>>>> I changed the specification to fix one inconsistency discussed in v4.
>>>> That the device fills the probe buffer with zeroes is now a "SHOULD"
>>>> instead of a "MAY", since it's the only way for the driver to know if
>>>> the device wrote the status. Existing devices already do this. In
>>>> addition the device now needs to fill the three padding bytes at the
>>>> tail with zeroes.
>>>>
>>>> You can find Linux driver and kvmtool device on branches
>>>> virtio-iommu/v0.9 [3]. I also lightly tested with Eric's latest QEMU
>>>> device [4].
>>>
>>> I tried to get this to work on my x86 box but without
>>> success. Any hints? Does this have to do with the IORT table?
>>> I think we really should just reserve our own table ID
>>> and avoid the pain of trying to add things to the IORT spec.
>>> I'm reluctant to merge lots of code that I can't easily test.
>>> Again, if we found a way to push more configuration into
>>> virtio config space the problem space would be smaller.
>>
>> You can at least test it with QEMU ARM virt in TCG mode.
> 
> It's slow enough that I generally just focus on KVM.
> 
>> Then I have
>> worked on the IORT integration in PC/Q35 but this is not yet functional.
>> I need to debug what's wrong on guest ACPI probing. I plan to work on
>> this this week.
>>
>> Thanks
>>
>> Eric
> 
> Sounds good. Did you need to make changes to IORT?  I don't remember. If
> yes it would be very easy to just have a virtio specific ACPI table -
> I am assuming ARM guys will be just as hostile to virt changes
> to IORT as they were to minor changes to ARM vIOMMU.

I eventually succeeded to prototype the IORT integration on x86. Please
find below the branches to use for testing:

- QEMU: https://github.com/eauger/qemu.git v3.1.0-rc3-virtio-iommu-v0.9-x86
  On top of "[RFC v9 00/17] VIRTIO-IOMMU device", I added the IORT build
for pc machine
- LINUX GUEST: https://github.com/eauger/linux.git
virtio-iommu-v0.9-iort-x86
  Jean's virtio-iommu/devel branch + 2 hacks (to be discussed separately)
  Make sure the CONFIG_VIRTIO_IOMMU config is set

Used qemu cmd line featuring -device virtio-iommu-pci,addr=0xa:

./x86_64-softmmu/qemu-system-x86_64 -M
q35,accel=kvm,usb=off,sata=off,dump-guest-core=off -cpu
Haswell,-hle,-rtm -smp 4,sockets=4,cores=1,threads=1 -m 4G -display none
--enable-kvm -serial tcp:localhost:,server -device
virtio-iommu-pci,addr=0xa -trace
events=/home/augere/UPSTREAM/qemu/hw-virtio-iommu -qmp
unix:/home/augere/TEST/QEMU/qmp-sock,server,nowait -rtc
base=utc,driftfix=slew -global kvm-pit.lost_tick_policy=delay -realtime
mlock=off -no-hpet -no-shutdown -device
virtio-blk-pci,scsi=off,drive=drv0,id=virtio-disk0,bootindex=1,iommu_platform,disable-modern=off,disable-legacy=on
-drive
file=/home/augere/VM/IMAGES/vm0.qcow2,format=qcow2,if=none,id=drv0
-device
virtio-net-pci,bus=pcie.0,netdev=nic0,mac=6a:f5:10:b1:3d:d2,iommu_platform,disable-modern=off,disable-legacy=on
-netdev
tap,id=nic0,script=/home/augere/TEST/SCRIPTS/qemu-ifup,downscript=/home/augere/TEST/SCRIPTS/qemu-ifdown
-kernel
/home/augere/VM/BOOT/vmlinuz-4.20.0-rc3-guest-defconfig-virtio-iommu-0.9-x86+
-initrd
/home/augere/VM/BOOT/initrd.img-4.20.0-rc3-guest-defconfig-virtio-iommu-0.9-x86+
-append 'root=/dev/mapper/rhel_dhcp156--238-root ro crashkernel=auto
rd.lvm.lv=rhel_dhcp156-238/root rd.lvm.lv=rhel_dhcp156-238/swap rw
ignore_loglevel console=tty0 console=ttyS0,115200n8 serial acpi=force'
-net none -d guest_errors

I put sata=off otherwise I have early untranslated transactions that
prevent the guest from booting.

Please let me know if you encounter any issue reproducing the
environment on your side.

Thanks

Eric


> 
> 
>>>
>>>> [1] Virtio-iommu specification v0.9, sources, pdf and diff from v0.8
>>>> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>>>> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
>>>> 
>>>> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.8-v0.9.pdf
>>>>
>>>> [2] [PAT

Re: [PATCH v5 0/7] Add virtio-iommu driver

2018-11-27 Thread Auger Eric
Hi Michael,

On 11/27/18 6:16 PM, Michael S. Tsirkin wrote:
> On Tue, Nov 27, 2018 at 06:09:25PM +0100, Auger Eric wrote:
>> Hi Michael,
>>
>> On 11/27/18 5:53 PM, Michael S. Tsirkin wrote:
>>> On Thu, Nov 22, 2018 at 07:37:54PM +, Jean-Philippe Brucker wrote:
>>>> Implement the virtio-iommu driver, following specification v0.9 [1].
>>>>
>>>> Since v4 [2] I fixed the issues reported by Eric, and added Reviewed-by
>>>> from Eric and Rob. Thanks!
>>>>
>>>> I changed the specification to fix one inconsistency discussed in v4.
>>>> That the device fills the probe buffer with zeroes is now a "SHOULD"
>>>> instead of a "MAY", since it's the only way for the driver to know if
>>>> the device wrote the status. Existing devices already do this. In
>>>> addition the device now needs to fill the three padding bytes at the
>>>> tail with zeroes.
>>>>
>>>> You can find Linux driver and kvmtool device on branches
>>>> virtio-iommu/v0.9 [3]. I also lightly tested with Eric's latest QEMU
>>>> device [4].
>>>
>>> I tried to get this to work on my x86 box but without
>>> success. Any hints? Does this have to do with the IORT table?
>>> I think we really should just reserve our own table ID
>>> and avoid the pain of trying to add things to the IORT spec.
>>> I'm reluctant to merge lots of code that I can't easily test.
>>> Again, if we found a way to push more configuration into
>>> virtio config space the problem space would be smaller.
>>
>> You can at least test it with QEMU ARM virt in TCG mode.
> 
> It's slow enough that I generally just focus on KVM.
fair enough
> 
>> Then I have
>> worked on the IORT integration in PC/Q35 but this is not yet functional.
>> I need to debug what's wrong on guest ACPI probing. I plan to work on
>> this this week.
>>
>> Thanks
>>
>> Eric
> 
> Sounds good. Did you need to make changes to IORT?  I don't remember. If
> yes it would be very easy to just have a virtio specific ACPI table -
> I am assuming ARM guys will be just as hostile to virt changes
> to IORT as they were to minor changes to ARM vIOMMU.

Well the only difference is that on ARM we have 3 nodes in the IORT: the
root complex node -> the VIRTIO-IOMMU node -> the ITS node. The ITS is
the ARM MSI controller which does MSI translation. So the ARM IORT
describes the chained ID mappings from the end point RID through the
StreamID SMMU input to the device ID MSI controller input. On Intel we
don't have the last node.

But again my integration is not yet functional and I don't know yet if I
have captured the whole picture.

Thanks

Eric
> 
> 
>>>
>>>> [1] Virtio-iommu specification v0.9, sources, pdf and diff from v0.8
>>>> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>>>> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
>>>> 
>>>> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.8-v0.9.pdf
>>>>
>>>> [2] [PATCH v4 0/7] Add virtio-iommu driver
>>>> 
>>>> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/031074.html
>>>>
>>>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9
>>>> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
>>>>
>>>> [4] [RFC v9 00/17] VIRTIO-IOMMU device
>>>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
>>>>
>>>> Jean-Philippe Brucker (7):
>>>>   dt-bindings: virtio-mmio: Add IOMMU description
>>>>   dt-bindings: virtio: Add virtio-pci-iommu node
>>>>   of: Allow the iommu-map property to omit untranslated devices
>>>>   PCI: OF: Initialize dev->fwnode appropriately
>>>>   iommu: Add virtio-iommu driver
>>>>   iommu/virtio: Add probe request
>>>>   iommu/virtio: Add event queue
>>>>
>>>>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>>>>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>>>>  MAINTAINERS   |7 +
>>>>  drivers/iommu/Kconfig |   11 +
>>>>  drivers/iommu/Makefile|1 +
>>>>  drivers/iommu/virtio-iommu.c  | 1157 +
>>>>  drivers/of/base.c |   10 +-
>>>>  drivers/pci/of.c  |7 +
>>>>  include/uapi/linux/virtio_ids.h   |1 +
>>>>  include/uapi/linux/virtio_iommu.h |  161 +++
>>>>  10 files changed, 1448 insertions(+), 3 deletions(-)
>>>>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>>>>  create mode 100644 drivers/iommu/virtio-iommu.c
>>>>  create mode 100644 include/uapi/linux/virtio_iommu.h
>>>>
>>>> -- 
>>>> 2.19.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 0/7] Add virtio-iommu driver

2018-11-27 Thread Auger Eric
Hi Michael,

On 11/27/18 5:53 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 22, 2018 at 07:37:54PM +, Jean-Philippe Brucker wrote:
>> Implement the virtio-iommu driver, following specification v0.9 [1].
>>
>> Since v4 [2] I fixed the issues reported by Eric, and added Reviewed-by
>> from Eric and Rob. Thanks!
>>
>> I changed the specification to fix one inconsistency discussed in v4.
>> That the device fills the probe buffer with zeroes is now a "SHOULD"
>> instead of a "MAY", since it's the only way for the driver to know if
>> the device wrote the status. Existing devices already do this. In
>> addition the device now needs to fill the three padding bytes at the
>> tail with zeroes.
>>
>> You can find Linux driver and kvmtool device on branches
>> virtio-iommu/v0.9 [3]. I also lightly tested with Eric's latest QEMU
>> device [4].
> 
> I tried to get this to work on my x86 box but without
> success. Any hints? Does this have to do with the IORT table?
> I think we really should just reserve our own table ID
> and avoid the pain of trying to add things to the IORT spec.
> I'm reluctant to merge lots of code that I can't easily test.
> Again, if we found a way to push more configuration into
> virtio config space the problem space would be smaller.

You can at least test it with QEMU ARM virt in TCG mode. Then I have
worked on the IORT integration in PC/Q35 but this is not yet functional.
I need to debug what's wrong on guest ACPI probing. I plan to work on
this this week.

Thanks

Eric
> 
>> [1] Virtio-iommu specification v0.9, sources, pdf and diff from v0.8
>> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
>> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
>> 
>> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.8-v0.9.pdf
>>
>> [2] [PATCH v4 0/7] Add virtio-iommu driver
>> 
>> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/031074.html
>>
>> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9
>> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
>>
>> [4] [RFC v9 00/17] VIRTIO-IOMMU device
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
>>
>> Jean-Philippe Brucker (7):
>>   dt-bindings: virtio-mmio: Add IOMMU description
>>   dt-bindings: virtio: Add virtio-pci-iommu node
>>   of: Allow the iommu-map property to omit untranslated devices
>>   PCI: OF: Initialize dev->fwnode appropriately
>>   iommu: Add virtio-iommu driver
>>   iommu/virtio: Add probe request
>>   iommu/virtio: Add event queue
>>
>>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>>  MAINTAINERS   |7 +
>>  drivers/iommu/Kconfig |   11 +
>>  drivers/iommu/Makefile|1 +
>>  drivers/iommu/virtio-iommu.c  | 1157 +
>>  drivers/of/base.c |   10 +-
>>  drivers/pci/of.c  |7 +
>>  include/uapi/linux/virtio_ids.h   |1 +
>>  include/uapi/linux/virtio_iommu.h |  161 +++
>>  10 files changed, 1448 insertions(+), 3 deletions(-)
>>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>>  create mode 100644 drivers/iommu/virtio-iommu.c
>>  create mode 100644 include/uapi/linux/virtio_iommu.h
>>
>> -- 
>> 2.19.1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 0/7] Add virtio-iommu driver

2018-11-23 Thread Auger Eric
Hi Jean,

On 11/22/18 8:37 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.9 [1].
> 
> Since v4 [2] I fixed the issues reported by Eric, and added Reviewed-by
> from Eric and Rob. Thanks!
> 
> I changed the specification to fix one inconsistency discussed in v4.
> That the device fills the probe buffer with zeroes is now a "SHOULD"
> instead of a "MAY", since it's the only way for the driver to know if
> the device wrote the status. Existing devices already do this. In
> addition the device now needs to fill the three padding bytes at the
> tail with zeroes.
> 
> You can find Linux driver and kvmtool device on branches
> virtio-iommu/v0.9 [3]. I also lightly tested with Eric's latest QEMU
> device [4].
> 
> [1] Virtio-iommu specification v0.9, sources, pdf and diff from v0.8
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.9
> http://jpbrucker.net/virtio-iommu/spec/v0.9/virtio-iommu-v0.9.pdf
> 
> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.8-v0.9.pdf
> 
> [2] [PATCH v4 0/7] Add virtio-iommu driver
> 
> https://lists.linuxfoundation.org/pipermail/iommu/2018-November/031074.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.9
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.9
> 
> [4] [RFC v9 00/17] VIRTIO-IOMMU device
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg575578.html
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   of: Allow the iommu-map property to omit untranslated devices
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1157 +
>  drivers/of/base.c |   10 +-
>  drivers/pci/of.c  |7 +
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  161 +++
>  10 files changed, 1448 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
for the whole series
Tested-by: Eric Auger 

Thanks

Eric

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 5/7] iommu: Add virtio-iommu driver

2018-11-23 Thread Auger Eric
Hi Jean,

On 11/22/18 8:37 PM, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio transport without emulating page
> tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  MAINTAINERS   |   7 +
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 916 ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 104 
>  6 files changed, 1040 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1689dcfec800..3d8550c76f4a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15946,6 +15946,13 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO IOMMU DRIVER
> +M:   Jean-Philippe Brucker 
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   drivers/iommu/virtio-iommu.c
> +F:   include/uapi/linux/virtio_iommu.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index bf2bbfa2a399..db5f2b8c23f5 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -464,4 +464,15 @@ config QCOM_IOMMU
>   help
> Support for IOMMU on certain Qualcomm SoCs.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO=y
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 5481e5fe1f95..bd7e55751d09 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -36,3 +36,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..7540dab9c8dc
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,916 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 Arm Limited
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +#define VIOMMU_REQUEST_VQ0
> +#define VIOMMU_NR_VQS1
> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vqs[VIOMMU_NR_VQS];
> + spinlock_t  request_lock;
> + struct list_headrequests;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u8  domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + u32 flags;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex; /* protects viommu pointer */
> + unsigned intid;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + unsigned long   nr_endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev   *viommu;
> + struct viommu_domain*vdomain;
> +};
> +
> +struct viommu_request {
> + struct list_headlist;
> + void*writeback;
> + unsigned intwrite_offset;
> + unsigned intlen;
> + char 

Re: [virtio-dev] Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver

2018-11-21 Thread Auger Eric
Hi jean,

On 11/20/18 6:30 PM, Jean-Philippe Brucker wrote:
> On 16/11/2018 18:46, Jean-Philippe Brucker wrote:
 +/*
 + * __viommu_sync_req - Complete all in-flight requests
 + *
 + * Wait for all added requests to complete. When this function returns, 
 all
 + * requests that were in-flight at the time of the call have completed.
 + */
 +static int __viommu_sync_req(struct viommu_dev *viommu)
 +{
 +  int ret = 0;
 +  unsigned int len;
 +  size_t write_len;
 +  struct viommu_request *req;
 +  struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
 +
 +  assert_spin_locked(&viommu->request_lock);
 +
 +  virtqueue_kick(vq);
 +
 +  while (!list_empty(&viommu->requests)) {
 +  len = 0;
 +  req = virtqueue_get_buf(vq, &len);
 +  if (!req)
 +  continue;
 +
 +  if (!len)
 +  viommu_set_req_status(req->buf, req->len,
 +VIRTIO_IOMMU_S_IOERR);
 +
 +  write_len = req->len - req->write_offset;
 +  if (req->writeback && len >= write_len)
>>> I don't get "len >= write_len". Is it valid for the device to write more
>>> than write_len? If it writes less than write_len, the status is not set,
>>> is it?
> 
> Actually, len could well be three bytes smaller than write_len. The spec
> doesn't require the device to write the three padding bytes in
> virtio_iommu_req_tail, after the status byte.
> 
> Here the driver just assumes that the device wrote the reserved field.
> The QEMU device seems to write uninitialized data in there...

Indeed that's incorrect and I should fix it. tail struct should be
properly initialized to 0. Only probe request implementation is correct.
> 
> Any objection to changing the spec to require the device to initialize
> those bytes to zero? I think it makes things nicer overall and shouldn't
> have any performance impact.
No objection from me.
> 
> [...]
 +static struct iommu_domain *viommu_domain_alloc(unsigned type)
 +{
 +  struct viommu_domain *vdomain;
 +
 +  if (type != IOMMU_DOMAIN_UNMANAGED && type != IOMMU_DOMAIN_DMA)
>>> smmu drivers also support the IDENTITY type. Don't we want to support it
>>> as well?
>>
>> Eventually yes. The IDENTITY domain is useful when an IOMMU has been
>> forced upon you and gets in the way of performance, in which case you
>> get rid of it with CONFIG_IOMMU_DEFAULT_PASSTHROUGH=y or
>> iommu.passthrough=y. For virtio-iommu though, you could simply not
>> instantiate the device.
>>
>> I don't think it's difficult to add: if the device supports
>> VIRTIO_IOMMU_F_BYPASS, then we simply don't send an ATTACH request.
>> Otherwise after ATTACH we send a MAP for the whole input range. If the
>> change isn't bigger than this, I'll add it to v5.
> 
> Not as straightforward as I hoped, when the device doesn't support
> VIRTIO_IOMMU_F_BYPASS:
> 
> * We can't simply create an ID map for the whole input range, we need to
> carve out the resv_mem regions.
> 
> * For a VFIO device, the host has no way of forwarding the identity
> mapping. For example the guest will attempt to map [0; 0x]
> -> [0; 0x], but VFIO only allows to map RAM and cannot take
> such request. One solution is to only create ID mapping for RAM, and
> register a notifier for memory hotplug, like intel-iommu does for IOMMUs
> that don't support HW pass-through.
> 
> Since it's not completely trivial and - I think - not urgent, I'll leave
> this for later.
OK makes sense to me too. It was just a head up.

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 0/7] Add virtio-iommu driver

2018-11-16 Thread Auger Eric
Hi Jean,
On 11/15/18 5:52 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.8 [1].
> 
> Changes since v3 [2]:
> * Rebase onto v4.20-rc2. Patch 3 now touches drivers/of/base.c instead
>   of drivers/pci/of.c, since the map_rid() function has moved.
> * Removed the request timeout, that depended on DEBUG.
> * Other small fixes addressing comments on v3.
> 
> You can find Linux driver and kvmtool device on my virtio-iommu/v0.8.1
> branches [3]. You can also test it with the latest version of Eric's
> QEMU device [4].

Tested-by: Eric Auger 

this was tested with qemu
(https://github.com/eauger/qemu/tree/v3.1.0-rc1-virtio-iommu-v0.8.1)

Thanks

Eric
> 
> [1] Virtio-iommu specification v0.8, sources and pdf
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
> http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
> 
> [2] [PATCH v3 0/7] Add virtio-iommu driver
> https://www.spinics.net/lists/linux-pci/msg77110.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8.1
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8.1
> 
> [4] [RFC v8 00/18] VIRTIO-IOMMU device
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg572637.html
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   of: Allow the iommu-map property to omit untranslated devices
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1160 +
>  drivers/of/base.c |   10 +-
>  drivers/pci/of.c  |7 +
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  161 +++
>  10 files changed, 1451 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v4 6/7] iommu/virtio: Add probe request

2018-11-16 Thread Auger Eric
Hi Jean,

On 11/15/18 5:52 PM, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
Reviewed-by: Eric Auger 

Thanks

Eric
> ---
>  drivers/iommu/virtio-iommu.c  | 156 --
>  include/uapi/linux/virtio_iommu.h |  38 
>  2 files changed, 188 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 2a9cb6285a1e..c547ebd79c43 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -46,6 +46,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -67,8 +68,10 @@ struct viommu_domain {
>  };
>  
>  struct viommu_endpoint {
> + struct device   *dev;
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -119,6 +122,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
> *viommu,
>  {
>   size_t tail_size = sizeof(struct virtio_iommu_req_tail);
>  
> + if (req->type == VIRTIO_IOMMU_T_PROBE)
> + return len - viommu->probe_size - tail_size;
> +
>   return len - tail_size;
>  }
>  
> @@ -394,6 +400,110 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + size_t size;
> + u64 start64, end64;
> + phys_addr_t start, end;
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + start = start64 = le64_to_cpu(mem->start);
> + end = end64 = le64_to_cpu(mem->end);
> + size = end64 - start64 + 1;
> +
> + /* Catch any overflow, including the unlikely end64 - start64 + 1 = 0 */
> + if (start != start64 || end != end64 || size < end64 - start64)
> + return -EOVERFLOW;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + }
> + if (!region)
> + return -ENOMEM;
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + size_t probe_len;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + return -EINVAL;
> +
> + probe_len = sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail);
> + probe = kzalloc(probe_len, GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe, probe_len);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length) + sizeof(*prop);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop, len);
> + break;
> + default:
> +   

Re: [PATCH v4 7/7] iommu/virtio: Add event queue

2018-11-16 Thread Auger Eric
Hi Jean,

On 11/15/18 5:52 PM, Jean-Philippe Brucker wrote:
> The event queue offers a way for the device to report access faults from
> endpoints. It is implemented on virtqueue #1. Whenever the host needs to
> signal a fault, it fills one of the buffers offered by the guest and
> interrupts it.
> 
> Signed-off-by: Jean-Philippe Brucker 

> ---
>  drivers/iommu/virtio-iommu.c  | 116 +++---
>  include/uapi/linux/virtio_iommu.h |  19 +
>  2 files changed, 126 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index c547ebd79c43..81c6b72e9c43 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -29,7 +29,8 @@
>  #define MSI_IOVA_LENGTH  0x10
>  
>  #define VIOMMU_REQUEST_VQ0
> -#define VIOMMU_NR_VQS1
> +#define VIOMMU_EVENT_VQ  1
> +#define VIOMMU_NR_VQS2
>  
>  struct viommu_dev {
>   struct iommu_device iommu;
> @@ -41,6 +42,7 @@ struct viommu_dev {
>   struct virtqueue*vqs[VIOMMU_NR_VQS];
>   spinlock_t  request_lock;
>   struct list_headrequests;
> + void*evts;
>  
>   /* Device configuration */
>   struct iommu_domain_geometrygeometry;
> @@ -82,6 +84,15 @@ struct viommu_request {
>   charbuf[];
>  };
>  
> +#define VIOMMU_FAULT_RESV_MASK   0xff00
> +
> +struct viommu_event {
> + union {
> + u32 head;
> + struct virtio_iommu_fault fault;
> + };
> +};
> +
>  #define to_viommu_domain(domain) \
>   container_of(domain, struct viommu_domain, domain)
>  
> @@ -504,6 +515,69 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   return ret;
>  }
>  
> +static int viommu_fault_handler(struct viommu_dev *viommu,
> + struct virtio_iommu_fault *fault)
> +{
> + char *reason_str;
> +
> + u8 reason   = fault->reason;
> + u32 flags   = le32_to_cpu(fault->flags);
> + u32 endpoint= le32_to_cpu(fault->endpoint);
> + u64 address = le64_to_cpu(fault->address);
> +
> + switch (reason) {
> + case VIRTIO_IOMMU_FAULT_R_DOMAIN:
> + reason_str = "domain";
> + break;
> + case VIRTIO_IOMMU_FAULT_R_MAPPING:
> + reason_str = "page";
> + break;
> + case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
> + default:
> + reason_str = "unknown";
> + break;
> + }
> +
> + /* TODO: find EP by ID and report_iommu_fault */
> + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
> + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
> [%s%s%s]\n",
> + reason_str, endpoint, address,
> + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
> "",
> + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
> "",
> + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
> "");
> + else
> + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
> + reason_str, endpoint);
> + return 0;
> +}
> +
> +static void viommu_event_handler(struct virtqueue *vq)
> +{
> + int ret;
> + unsigned int len;
> + struct scatterlist sg[1];
> + struct viommu_event *evt;
> + struct viommu_dev *viommu = vq->vdev->priv;
> +
> + while ((evt = virtqueue_get_buf(vq, &len)) != NULL) {
> + if (len > sizeof(*evt)) {
> + dev_err(viommu->dev,
> + "invalid event buffer (len %u != %zu)\n",
> + len, sizeof(*evt));
> + } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) {
> + viommu_fault_handler(viommu, &evt->fault);
> + }
> +
> + sg_init_one(sg, evt, sizeof(*evt));
> + ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC);
> + if (ret)
> + dev_err(viommu->dev, "could not add event buffer\n");
> + }
> +
> + if (!virtqueue_kick(vq))
> + dev_err(viommu->dev, "kick failed\n");
There are other occurences of virtqueue_kick where you don't check the
returned value
> +}
> +
>  /* IOMMU API */
>  
>  static struct iommu_domain *viommu_domain_alloc(unsigned type)
> @@ -887,16 +961,35 @@ static struct iommu_ops viommu_ops = {
>  static int viommu_init_vqs(struct viommu_dev *viommu)
>  {
>   struct virtio_device *vdev = dev_to_virtio(viommu->dev);
> - const char *name = "request";
> - void *ret;
> + const char *names[] = { "request", "event" };
> + vq_callback_t *callbacks[] = {
> + NULL, /* No async requests */
> + viommu_event_

Re: [PATCH v4 5/7] iommu: Add virtio-iommu driver

2018-11-15 Thread Auger Eric
Hi Jean,

On 11/15/18 5:52 PM, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio transport without emulating page
> tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
> requests.
> 
> The bulk of the code transforms calls coming from the IOMMU API into
> corresponding virtio requests. Mappings are kept in an interval tree
> instead of page tables.

Some few comments/questions below.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  MAINTAINERS   |   7 +
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 918 ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 104 
>  6 files changed, 1042 insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 0abecc528dac..0c7bdce57719 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -15948,6 +15948,13 @@ S:   Maintained
>  F:   drivers/virtio/virtio_input.c
>  F:   include/uapi/linux/virtio_input.h
>  
> +VIRTIO IOMMU DRIVER
> +M:   Jean-Philippe Brucker 
> +L:   virtualization@lists.linux-foundation.org
> +S:   Maintained
> +F:   drivers/iommu/virtio-iommu.c
> +F:   include/uapi/linux/virtio_iommu.h
> +
>  VIRTUAL BOX GUEST DEVICE DRIVER
>  M:   Hans de Goede 
>  M:   Arnd Bergmann 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index d9a25715650e..efdeaaeee0e0 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -435,4 +435,15 @@ config QCOM_IOMMU
>   help
> Support for IOMMU on certain Qualcomm SoCs.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO=y
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index a158a68c8ea8..48d831a39281 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..2a9cb6285a1e
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,918 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2018 Arm Limited
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +#define VIOMMU_REQUEST_VQ0
> +#define VIOMMU_NR_VQS1
> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vqs[VIOMMU_NR_VQS];
> + spinlock_t  request_lock;
> + struct list_headrequests;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u8  domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + u32 flags;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex;
same naming/comment as in smmu driver may help here
struct mutex init_mutex; /* Protects viommu pointer */
> + unsigned intid;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + unsigned long   nr_endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev   *viommu;
> + struct viommu_domain*vdomain;
> +};
> +
> +struct viommu_request {
> + struct list_headlist;
> + void*writeback;
> + unsigned int  

Re: [PATCH v3 6/7] iommu/virtio: Add probe request

2018-11-15 Thread Auger Eric
Hi Jean,
On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 147 --
>  include/uapi/linux/virtio_iommu.h |  39 
>  2 files changed, 180 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 9fb38cd3b727..8eaf66770469 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -56,6 +56,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -77,8 +78,10 @@ struct viommu_domain {
>  };
>  
>  struct viommu_endpoint {
> + struct device   *dev;
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -129,6 +132,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
> *viommu,
>  {
>   size_t tail_size = sizeof(struct virtio_iommu_req_tail);
>  
> + if (req->type == VIRTIO_IOMMU_T_PROBE)
> + return len - viommu->probe_size - tail_size;
> +
>   return len - tail_size;
>  }
>  
> @@ -414,6 +420,101 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
nit: extra void line
> + u64 start = le64_to_cpu(mem->start);
> + u64 end = le64_to_cpu(mem->end);
> + size_t size = end - start + 1;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
need to test region
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
same
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + size_t probe_len;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + return -EINVAL;
> +
> + probe_len = sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail);
> + probe = kzalloc(probe_len, GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe, probe_len);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length) + sizeof(*prop);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop, len);
> + break;
> + default:
> + dev_err(dev, "unknown viommu prop 0x%x\n", type);
> + }
> +
> + if (ret)
> + dev_err(dev, "failed to parse viommu prop 0x%x\n", 
> type);
> +
> + cur += len;
> + if (cur >= viommu->probe_size)
> + break;
> +
> +  

Re: [PATCH v3 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2018-11-15 Thread Auger Eric
Hi Jean,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> Some systems implement virtio-iommu as a PCI endpoint. The operating
> systems needs to discover the relationship between IOMMU and masters long
s/systems/system
> before the PCI endpoint gets probed. Add a PCI child node to describe the
> virtio-iommu device.
> 
> The virtio-pci-iommu is conceptually split between a PCI programming
> interface and a translation component on the parent bus. The latter
> doesn't have a node in the device tree. The virtio-pci-iommu node
> describes both, by linking the PCI endpoint to "iommus" property of DMA
> master nodes and to "iommu-map" properties of bus nodes.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .../devicetree/bindings/virtio/iommu.txt  | 66 +++
>  1 file changed, 66 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
> 
> diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
> b/Documentation/devicetree/bindings/virtio/iommu.txt
> new file mode 100644
> index ..2407fea0651c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/virtio/iommu.txt
> @@ -0,0 +1,66 @@
> +* virtio IOMMU PCI device
> +
> +When virtio-iommu uses the PCI transport, its programming interface is
> +discovered dynamically by the PCI probing infrastructure. However the
> +device tree statically describes the relation between IOMMU and DMA
> +masters. Therefore, the PCI root complex that hosts the virtio-iommu
> +contains a child node representing the IOMMU device explicitly.
> +
> +Required properties:
> +
> +- compatible:Should be "virtio,pci-iommu"
> +- reg:   PCI address of the IOMMU. As defined in the PCI Bus
> + Binding reference [1], the reg property is a five-cell
> + address encoded as (phys.hi phys.mid phys.lo size.hi
> + size.lo). phys.hi should contain the device's BDF as
> + 0b  dfff . The other cells
> + should be zero.
> +- #iommu-cells:  Each platform DMA master managed by the IOMMU is 
> assigned
> + an endpoint ID, described by the "iommus" property [2].
> + For virtio-iommu, #iommu-cells must be 1.
> +
> +Notes:
> +
> +- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
> +  virtio-iommu node doesn't have an "iommus" property, and is omitted from
> +  the iommu-map property of the root complex.
> +
> +Example:
> +
> +pcie@1000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> +
> + /* The IOMMU programming interface uses slot 00:01.0 */
> + iommu0: iommu@0008 {
> + compatible = "virtio,pci-iommu";
> + reg = <0x0800 0 0 0 0>;
> + #iommu-cells = <1>;
> + };
> +
> + /*
> +  * The IOMMU manages all functions in this PCI domain except
> +  * itself. Omit BDF 00:01.0.
> +  */
> + iommu-map = <0x0 &iommu0 0x0 0x8>
> + <0x9 &iommu0 0x9 0xfff7>;
> +};
> +
> +pcie@2000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> + /*
> +  * The IOMMU also manages all functions from this domain,
> +  * with endpoint IDs 0x1 - 0x1
> +  */
> + iommu-map = <0x0 &iommu0 0x1 0x1>;
> +};
> +
> +ethernet@fe001000 {
> + ...
> + /* The IOMMU manages this platform device with endpoint ID 0x2 */
> + iommus = <&iommu0 0x2>;
> +};
> +
> +[1] Documentation/devicetree/bindings/pci/pci.txt
> +[2] Documentation/devicetree/bindings/iommu/iommu.txt
Reviewed-by: Eric Auger 

Thanks

Eric

> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 1/7] dt-bindings: virtio-mmio: Add IOMMU description

2018-11-15 Thread Auger Eric
Hi Jean,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> The nature of a virtio-mmio node is discovered by the virtio driver at
> probe time. However the DMA relation between devices must be described
> statically. When a virtio-mmio node is a virtio-iommu device, it needs an
> "#iommu-cells" property as specified by bindings/iommu/iommu.txt.
> 
> Otherwise, the virtio-mmio device may perform DMA through an IOMMU, which
> requires an "iommus" property. Describe these requirements in the
> device-tree bindings documentation.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  .../devicetree/bindings/virtio/mmio.txt   | 30 +++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
> b/Documentation/devicetree/bindings/virtio/mmio.txt
> index 5069c1b8e193..748595473b36 100644
> --- a/Documentation/devicetree/bindings/virtio/mmio.txt
> +++ b/Documentation/devicetree/bindings/virtio/mmio.txt
> @@ -8,10 +8,40 @@ Required properties:
>  - reg:   control registers base address and size including 
> configuration space
>  - interrupts:interrupt generated by the device
>  
> +Required properties for virtio-iommu:
> +
> +- #iommu-cells:  When the node corresponds to a virtio-iommu device, it 
> is
> + linked to DMA masters using the "iommus" or "iommu-map"
> + properties [1][2]. #iommu-cells specifies the size of the
> + "iommus" property. For virtio-iommu #iommu-cells must be
> + 1, each cell describing a single endpoint ID.
> +
> +Optional properties:
> +
> +- iommus:If the device accesses memory through an IOMMU, it should
> + have an "iommus" property [1]. Since virtio-iommu itself
> + does not access memory through an IOMMU, the "virtio,mmio"
> + node cannot have both an "#iommu-cells" and an "iommus"
> + property.
> +
>  Example:
>  
>   virtio_block@3000 {
>   compatible = "virtio,mmio";
>   reg = <0x3000 0x100>;
>   interrupts = <41>;
> +
> + /* Device has endpoint ID 23 */
> + iommus = <&viommu 23>
>   }
> +
> + viommu: virtio_iommu@3100 {
> + compatible = "virtio,mmio";
> + reg = <0x3100 0x100>;
> + interrupts = <42>;
> +
> + #iommu-cells = <1>
> + }
> +
> +[1] Documentation/devicetree/bindings/iommu/iommu.txt
> +[2] Documentation/devicetree/bindings/pci/pci-iommu.txt
> 
Reviewed-by: Eric Auger 

Thanks

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 5/7] iommu: Add virtio-iommu driver

2018-11-08 Thread Auger Eric
Hi Jean-Philippe,

On 10/12/18 6:35 PM, Michael S. Tsirkin wrote:
> On Fri, Oct 12, 2018 at 03:59:15PM +0100, Jean-Philippe Brucker wrote:
>> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
>> requests such as map/unmap over virtio transport without emulating page
>> tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
>> requests.
>>
>> The bulk of the code transforms calls coming from the IOMMU API into
>> corresponding virtio requests. Mappings are kept in an interval tree
>> instead of page tables.
>>
>> Signed-off-by: Jean-Philippe Brucker 
>> ---
>>  MAINTAINERS   |   7 +
>>  drivers/iommu/Kconfig |  11 +
>>  drivers/iommu/Makefile|   1 +
>>  drivers/iommu/virtio-iommu.c  | 938 ++
>>  include/uapi/linux/virtio_ids.h   |   1 +
>>  include/uapi/linux/virtio_iommu.h | 101 
>>  6 files changed, 1059 insertions(+)
>>  create mode 100644 drivers/iommu/virtio-iommu.c
>>  create mode 100644 include/uapi/linux/virtio_iommu.h
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 48a65c3a4189..f02fa65f47e2 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -15599,6 +15599,13 @@ S:  Maintained
>>  F:  drivers/virtio/virtio_input.c
>>  F:  include/uapi/linux/virtio_input.h
>>  
>> +VIRTIO IOMMU DRIVER
>> +M:  Jean-Philippe Brucker 
>> +L:  virtualization@lists.linux-foundation.org
>> +S:  Maintained
>> +F:  drivers/iommu/virtio-iommu.c
>> +F:  include/uapi/linux/virtio_iommu.h
>> +
>>  VIRTUAL BOX GUEST DEVICE DRIVER
>>  M:  Hans de Goede 
>>  M:  Arnd Bergmann 
>> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> index c60395b7470f..2dc016dc2b92 100644
>> --- a/drivers/iommu/Kconfig
>> +++ b/drivers/iommu/Kconfig
>> @@ -414,4 +414,15 @@ config QCOM_IOMMU
>>  help
>>Support for IOMMU on certain Qualcomm SoCs.
>>  
>> +config VIRTIO_IOMMU
>> +bool "Virtio IOMMU driver"
>> +depends on VIRTIO=y
>> +select IOMMU_API
>> +select INTERVAL_TREE
>> +select ARM_DMA_USE_IOMMU if ARM
>> +help
>> +  Para-virtualised IOMMU driver with virtio.
>> +
>> +  Say Y here if you intend to run this kernel as a guest.
>> +
>>  endif # IOMMU_SUPPORT
>> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
>> index ab5eba6edf82..4cd643408e49 100644
>> --- a/drivers/iommu/Makefile
>> +++ b/drivers/iommu/Makefile
>> @@ -31,3 +31,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
>> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
>> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
>> new file mode 100644
>> index ..9fb38cd3b727
>> --- /dev/null
>> +++ b/drivers/iommu/virtio-iommu.c
>> @@ -0,0 +1,938 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Virtio driver for the paravirtualized IOMMU
>> + *
>> + * Copyright (C) 2018 Arm Limited
>> + */
>> +
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include 
>> +
>> +#define MSI_IOVA_BASE   0x800
>> +#define MSI_IOVA_LENGTH 0x10
>> +
>> +#define VIOMMU_REQUEST_VQ   0
>> +#define VIOMMU_NR_VQS   1
>> +
>> +/*
>> + * During development, it is convenient to time out rather than wait
>> + * indefinitely in atomic context when a device misbehaves and a request 
>> doesn't
>> + * return. In production however, some requests shouldn't return until they 
>> are
>> + * successful.
>> + */
>> +#ifdef DEBUG
>> +#define VIOMMU_REQUEST_TIMEOUT  1 /* 10s */
>> +#endif
>> +
>> +struct viommu_dev {
>> +struct iommu_device iommu;
>> +struct device   *dev;
>> +struct virtio_device*vdev;
>> +
>> +struct ida  domain_ids;
>> +
>> +struct virtqueue*vqs[VIOMMU_NR_VQS];
>> +spinlock_t  request_lock;
>> +struct list_headrequests;
>> +
>> +/* Device configuration */
>> +struct iommu_domain_geometrygeometry;
>> +u64 pgsize_bitmap;
>> +u8  domain_bits;
>> +};
>> +
>> +struct viommu_mapping {
>> +phys_addr_t paddr;
>> +struct interval_tree_node   iova;
>> +u32 flags;
>> +};
>> +
>> +struct viommu_domain {
>> +struct iommu_domain domain;
>> +struct viommu_dev   *viommu;
>> +struct mutexmutex;
>> +unsigned intid;
>> +
>> +spinlock_t  mappings_lock;
>> +struct rb_root_cac

Re: [PATCH v3 6/7] iommu/virtio: Add probe request

2018-11-08 Thread Auger Eric
Hi Jean-Philippe,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 147 --
>  include/uapi/linux/virtio_iommu.h |  39 
>  2 files changed, 180 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 9fb38cd3b727..8eaf66770469 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -56,6 +56,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -77,8 +78,10 @@ struct viommu_domain {
>  };
>  
>  struct viommu_endpoint {
> + struct device   *dev;
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -129,6 +132,9 @@ static off_t viommu_get_req_offset(struct viommu_dev 
> *viommu,
>  {
>   size_t tail_size = sizeof(struct virtio_iommu_req_tail);
>  
> + if (req->type == VIRTIO_IOMMU_T_PROBE)
> + return len - viommu->probe_size - tail_size;
> +
>   return len - tail_size;
>  }
>  
> @@ -414,6 +420,101 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 start = le64_to_cpu(mem->start);
> + u64 end = le64_to_cpu(mem->end);
> + size_t size = end - start + 1;
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + default:
> + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
> +  mem->subtype);
> + /* Fall-through */
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + region = iommu_alloc_resv_region(start, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(start, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + size_t probe_len;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + return -EINVAL;
> +
> + probe_len = sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail);
> + probe = kzalloc(probe_len, GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe, probe_len);
> + if (ret)
> + goto out_free;
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length) + sizeof(*prop);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop, len);
> + break;
> + default:
> + dev_err(dev, "unknown viommu prop 0x%x\n", type);
> + }
> +
> + if (ret)
> + dev_err(dev, "failed to parse viommu prop 0x%x\n", 
> type);
> +
> + cur += len;
> + if (cur >= viommu->probe_size)
> + break;
> +
> + prop = (void *)probe->properties 

Re: [PATCH v3 0/7] Add virtio-iommu driver

2018-10-16 Thread Auger Eric
Hi Jean,

On 10/16/18 8:44 PM, Jean-Philippe Brucker wrote:
> On 16/10/2018 10:25, Auger Eric wrote:
>> Hi Jean,
>>
>> On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
>>> Implement the virtio-iommu driver, following specification v0.8 [1].
>>> Changes since v2 [2]:
>>>
>>> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>>>    would like to phase out the MMIO transport. This produces a complex
>>>    topology where the programming interface of the IOMMU could appear
>>>    lower than the endpoints that it translates. It's not unheard of (e.g.
>>>    AMD IOMMU), and the guest easily copes with this.
>>>    
>>>    The "Firmware description" section of the specification has been
>>>    updated with all combinations of PCI, MMIO and DT, ACPI.
>>
>> I have a question wrt the FW specification. The IOMMU consumes 1 slot in
>> the PCI domain and one needs to leave a RID hole in the iommu-map.  It
>> is not obvious to me that this RID always is predictable given the pcie
>> enumeration mechanism. Generally we have a coarse grain mapping of RID
>> onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
>> to precisely identify the RID granted to the iommu. On QEMU this may
>> depend on the instantiation order of the virtio-pci device right?
> 
> Yes, although it should all happen before you boot the guest, since
> there is no hotplugging an IOMMU. Could you reserve a PCI slot upfront
> and use it for virtio-iommu later? Or generate the iommu-map at the same
> time as generating the child node of the PCI RC?

Even when cold-plugging the PCIe devices through qemu CLI, this depends
on the order of the pcie devices in the list I guess. I need to further
experiment.
> 
>> So
>> this does not look trivial to build this info. Isn't it possible to do
>> this exclusion at kernel level instead?
> 
> So in theory VIRTIO_F_IOMMU_PLATFORM already does that:
> 
> VIRTIO_F_IOMMU_PLATFORM(33)
> This feature indicates that the device is behind an IOMMU that
> translates bus addresses from the device into physical addresses in
> memory. If this feature bit is set to 0, then the device emits
> physical addresses which are not translated further, even though an
> IOMMU may be present.

This tells the driver to use the dma api, right? Effectively this
explicitly says whether the device is supposed to be upfront an IOMMU.
> 
> For better or for worse, the guest has to implement it. If this feature
> bit is unset for virtio-iommu, it does DMA on the physical address
> space, regardless of what the static topology description says.
> 
> In practice it doesn't quite work. If your iommu-map describes the IOMMU
> as translating itself, Linux' OF code will wait for the IOMMU to be
> probed before probing the IOMMU. Working around this with hacks is
> possible, but I don't want to introduce more questionable code to OF and
> device tree bindings if there is any other way.
Hum ok. I cannot really comment on this.

I just wanted to raise this concern about RID identfication.

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH v3 0/7] Add virtio-iommu driver

2018-10-16 Thread Auger Eric
Hi Jean,

On 10/12/18 4:59 PM, Jean-Philippe Brucker wrote:
> Implement the virtio-iommu driver, following specification v0.8 [1].
> Changes since v2 [2]:
> 
> * Patches 2-4 allow virtio-iommu to use the PCI transport, since QEMU
>   would like to phase out the MMIO transport. This produces a complex
>   topology where the programming interface of the IOMMU could appear
>   lower than the endpoints that it translates. It's not unheard of (e.g.
>   AMD IOMMU), and the guest easily copes with this.
>   
>   The "Firmware description" section of the specification has been
>   updated with all combinations of PCI, MMIO and DT, ACPI.

I have a question wrt the FW specification. The IOMMU consumes 1 slot in
the PCI domain and one needs to leave a RID hole in the iommu-map.  It
is not obvious to me that this RID always is predictable given the pcie
enumeration mechanism. Generally we have a coarse grain mapping of RID
onto iommu phandles/STREAMIDs. Here, if I understand correctly we need
to precisely identify the RID granted to the iommu. On QEMU this may
depend on the instantiation order of the virtio-pci device right? So
this does not look trivial to build this info. Isn't it possible to do
this exclusion at kernel level instead?

Thanks

Eric
> 
> * Fix structures layout, they don't need the "packed" attribute anymore.
> 
> * While we're at it, add domain parameter to DETACH request, and leave
>   some padding. This way the next version, that adds PASID support,
>   won't have to introduce a "DETACH2" request to stay backward
>   compatible.
> 
> * Require virtio device 1.0+. Remove legacy transport notes from the
>   specification.
> 
> * Request timeout is now only enabled with DEBUG.
> 
> * The patch for VFIO Kconfig (previously patch 5/5) is in next.
> 
> You can find Linux driver and kvmtool device on branches
> virtio-iommu/v0.8 [3] (currently based on 4.19-rc7 but rebasing onto
> next only produced a trivial conflict). Branch virtio-iommu/devel
> contains a few patches that I'd like to send once the base is upstream:
> 
> * virtio-iommu as a module. It got *much* nicer after Rob's probe
>   deferral rework, but I still have a bug to fix when re-loading the
>   virtio-iommu module.
> 
> * ACPI support requires a minor IORT spec update (reservation of node
>   ID). I think it should be easier to obtain once the device and drivers
>   are upstream.
> 
> [1] Virtio-iommu specification v0.8, diff from v0.7, and sources
> git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.8
> http://jpbrucker.net/virtio-iommu/spec/v0.8/virtio-iommu-v0.8.pdf
> 
> http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-pdf-diff-v0.7-v0.8.pdf
> 
> [2] [PATCH v2 0/5] Add virtio-iommu driver
> https://www.spinics.net/lists/kvm/msg170655.html
> 
> [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.8
> git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.8
> 
> Jean-Philippe Brucker (7):
>   dt-bindings: virtio-mmio: Add IOMMU description
>   dt-bindings: virtio: Add virtio-pci-iommu node
>   PCI: OF: allow endpoints to bypass the iommu
>   PCI: OF: Initialize dev->fwnode appropriately
>   iommu: Add virtio-iommu driver
>   iommu/virtio: Add probe request
>   iommu/virtio: Add event queue
> 
>  .../devicetree/bindings/virtio/iommu.txt  |   66 +
>  .../devicetree/bindings/virtio/mmio.txt   |   30 +
>  MAINTAINERS   |7 +
>  drivers/iommu/Kconfig |   11 +
>  drivers/iommu/Makefile|1 +
>  drivers/iommu/virtio-iommu.c  | 1171 +
>  drivers/pci/of.c  |   14 +-
>  include/uapi/linux/virtio_ids.h   |1 +
>  include/uapi/linux/virtio_iommu.h |  159 +++
>  9 files changed, 1457 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: Fix vhost_copy_to_user()

2018-04-11 Thread Auger Eric
Hi Jason,

On 11/04/18 15:44, Jason Wang wrote:
> 
> 
> On 2018年04月11日 21:30, Eric Auger wrote:
>> vhost_copy_to_user is used to copy vring used elements to userspace.
>> We should use VHOST_ADDR_USED instead of VHOST_ADDR_DESC.
>>
>> Fixes: f88949138058 ("vhost: introduce O(1) vq metadata cache")
>> Signed-off-by: Eric Auger 
>>
>> ---
>>
>> This fixes a stall observed when running an aarch64 guest with
>> virtual smmu
>> ---
>>   drivers/vhost/vhost.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
>> index bec722e..f44aead 100644
>> --- a/drivers/vhost/vhost.c
>> +++ b/drivers/vhost/vhost.c
>> @@ -744,7 +744,7 @@ static int vhost_copy_to_user(struct
>> vhost_virtqueue *vq, void __user *to,
>>   struct iov_iter t;
>>   void __user *uaddr = vhost_vq_meta_fetch(vq,
>>(u64)(uintptr_t)to, size,
>> - VHOST_ADDR_DESC);
>> + VHOST_ADDR_USED);
>> if (uaddr)
>>   return __copy_to_user(uaddr, from, size);
> 
> Acked-by: Jason Wang 
> 
> Thanks!
> 
> Stable material I think.

yes I think so.

Thanks

Eric
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH v2 2/5] iommu/virtio-iommu: Add probe request

2018-01-19 Thread Auger Eric
Hi Jean-Philippe,

On 19/01/18 17:21, Jean-Philippe Brucker wrote:
> On 16/01/18 23:26, Auger Eric wrote:
> [...]
>>> +   switch (mem->subtype) {
>>> +   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
>>> +   region = iommu_alloc_resv_region(addr, size, prot,
>>> +IOMMU_RESV_MSI);
>> if (!region)
>>  return -ENOMEM;
>>> +   break;
>>> +   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
>>> +   default:
>>> +   region = iommu_alloc_resv_region(addr, size, 0,
>>> +IOMMU_RESV_RESERVED);
>> same.
> 
> I'll add them, thanks
> 
>> There is another issue related to the exclusion of iovas belonging to
>> reserved regions. Typically on x86, when attempting to run
>> virtio-blk-pci with iommu I eventually saw the driver using iova
>> belonging to the IOAPIC regions to map phys addr and this stalled qemu
>> with a drown trace:
>>
>> "virtio: bogus descriptor or out of resources"
>>
>> those regions need to be excluded from the iova allocator. This was
>> resolved by adding
>> if (iommu_dma_init_domain(domain,
>>vdev->viommu->geometry.aperture_start,
>>vdev->viommu->geometry.aperture_end,
>>dev))
>> in viommu_attach_dev()
> 
> The most recent hack for x86 [1] does call iommu_dma_init_domain() in
> attach_dev(). Is it buggy?

Oh OK. Actually I have never used that branch and in my version the last
arg of iommu_dma_init_domain was NULL hence the issue. But I was
thinking more generally that care must be taken to exclude all those
regions.

Thanks

Eric
> 
> We probably shouldn't call iommu_dma_init_domain() unconditionally
> (outside of CONFIG_X86 that is), since it's normally done by the arch
> (arch/arm64/mm/dma-mapping.c)
> 
> Thanks,
> Jean
> 
> [1]
> http://www.linux-arm.org/git?p=linux-jpb.git;a=commitdiff;h=e910e224b58712151dda06df595a53ff07edef63
> on branch virtio-iommu/v0.5-x86
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH v2 2/5] iommu/virtio-iommu: Add probe request

2018-01-16 Thread Auger Eric
Hi Jean-Philippe,

On 17/11/17 19:52, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 165 
> --
>  include/uapi/linux/virtio_iommu.h |  37 +
>  2 files changed, 195 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index feb8c8925c3a..79e0add94e05 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -45,6 +45,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -72,6 +73,7 @@ struct viommu_domain {
>  struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -139,6 +141,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
>   case VIRTIO_IOMMU_T_UNMAP:
>   size = sizeof(r->unmap);
>   break;
> + case VIRTIO_IOMMU_T_PROBE:
> + *bottom += viommu->probe_size;
> + size = sizeof(r->probe) + *bottom;
> + break;
>   default:
>   return -EINVAL;
>   }
> @@ -448,6 +454,106 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 addr = le64_to_cpu(mem->addr);
> + u64 size = le64_to_cpu(mem->size);
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(addr, size, prot,
> +  IOMMU_RESV_MSI);
if (!region)
return -ENOMEM;
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + default:
> + region = iommu_alloc_resv_region(addr, size, 0,
> +  IOMMU_RESV_RESERVED);
same.

There is another issue related to the exclusion of iovas belonging to
reserved regions. Typically on x86, when attempting to run
virtio-blk-pci with iommu I eventually saw the driver using iova
belonging to the IOAPIC regions to map phys addr and this stalled qemu
with a drown trace:

"virtio: bogus descriptor or out of resources"

those regions need to be excluded from the iova allocator. This was
resolved by adding
if (iommu_dma_init_domain(domain,
  vdev->viommu->geometry.aperture_start,
  vdev->viommu->geometry.aperture_end,
  dev))
in viommu_attach_dev()

Thanks

Eric
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> +
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
> + /* Please update your driver. */
> + pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + /* Trouble ahead. */
> + return -EINVAL;
> +
> + probe = kzalloc(sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe);
> + if (ret) {
> + kfree(probe);
> + return ret;
> + }
> +
> + prop

Re: [RFC PATCH v2 3/5] iommu/virtio-iommu: Add event queue

2018-01-16 Thread Auger Eric
Hi,

On 17/11/17 19:52, Jean-Philippe Brucker wrote:
> The event queue offers a way for the device to report access faults from
> devices.
end points?
 It is implemented on virtqueue #1, whenever the host needs to
> signal a fault it fills one of the buffers offered by the guest and
> interrupts it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 138 
> ++
>  include/uapi/linux/virtio_iommu.h |  18 +
>  2 files changed, 142 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index 79e0add94e05..fe0d449bf489 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -30,6 +30,12 @@
>  #define MSI_IOVA_BASE0x800
>  #define MSI_IOVA_LENGTH  0x10
>  
> +enum viommu_vq_idx {
> + VIOMMU_REQUEST_VQ   = 0,
> + VIOMMU_EVENT_VQ = 1,
> + VIOMMU_NUM_VQS  = 2,
> +};
> +
>  struct viommu_dev {
>   struct iommu_device iommu;
>   struct device   *dev;
> @@ -37,7 +43,7 @@ struct viommu_dev {
>  
>   struct ida  domain_ids;
>  
> - struct virtqueue*vq;
> + struct virtqueue*vqs[VIOMMU_NUM_VQS];
>   /* Serialize anything touching the request queue */
>   spinlock_t  request_lock;
>  
> @@ -84,6 +90,15 @@ struct viommu_request {
>   struct list_headlist;
>  };
>  
> +#define VIOMMU_FAULT_RESV_MASK   0xff00
> +
> +struct viommu_event {
> + union {
> + u32 head;
> + struct virtio_iommu_fault fault;
> + };
> +};
> +
>  #define to_viommu_domain(domain) container_of(domain, struct viommu_domain, 
> domain)
>  
>  /* Virtio transport */
> @@ -160,12 +175,13 @@ static int viommu_receive_resp(struct viommu_dev 
> *viommu, int nr_sent,
>   unsigned int len;
>   int nr_received = 0;
>   struct viommu_request *req, *pending;
> + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
>  
>   pending = list_first_entry_or_null(sent, struct viommu_request, list);
>   if (WARN_ON(!pending))
>   return 0;
>  
> - while ((req = virtqueue_get_buf(viommu->vq, &len)) != NULL) {
> + while ((req = virtqueue_get_buf(vq, &len)) != NULL) {
>   if (req != pending) {
>   dev_warn(viommu->dev, "discarding stale request\n");
>   continue;
> @@ -202,6 +218,7 @@ static int _viommu_send_reqs_sync(struct viommu_dev 
> *viommu,
>* dies.
>*/
>   unsigned long timeout_ms = 1000;
> + struct virtqueue *vq = viommu->vqs[VIOMMU_REQUEST_VQ];
>  
>   *nr_sent = 0;
>  
> @@ -211,15 +228,14 @@ static int _viommu_send_reqs_sync(struct viommu_dev 
> *viommu,
>   sg[0] = &req->top;
>   sg[1] = &req->bottom;
>  
> - ret = virtqueue_add_sgs(viommu->vq, sg, 1, 1, req,
> - GFP_ATOMIC);
> + ret = virtqueue_add_sgs(vq, sg, 1, 1, req, GFP_ATOMIC);
>   if (ret)
>   break;
>  
>   list_add_tail(&req->list, &pending);
>   }
>  
> - if (i && !virtqueue_kick(viommu->vq))
> + if (i && !virtqueue_kick(vq))
>   return -EPIPE;
>  
>   timeout = ktime_add_ms(ktime_get(), timeout_ms * i);
> @@ -554,6 +570,70 @@ static int viommu_probe_endpoint(struct viommu_dev 
> *viommu, struct device *dev)
>   return 0;
>  }
>  
> +static int viommu_fault_handler(struct viommu_dev *viommu,
> + struct virtio_iommu_fault *fault)
> +{
> + char *reason_str;
> +
> + u8 reason   = fault->reason;
> + u32 flags   = le32_to_cpu(fault->flags);
> + u32 endpoint= le32_to_cpu(fault->endpoint);
> + u64 address = le64_to_cpu(fault->address);
> +
> + switch (reason) {
> + case VIRTIO_IOMMU_FAULT_R_DOMAIN:
> + reason_str = "domain";
> + break;
> + case VIRTIO_IOMMU_FAULT_R_MAPPING:
> + reason_str = "page";
> + break;
> + case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
> + default:
> + reason_str = "unknown";
> + break;
> + }
> +
> + /* TODO: find EP by ID and report_iommu_fault */
> + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
> + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
> [%s%s%s]\n",
> + reason_str, endpoint, address,
> + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
> "",
> + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
> "",
> + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
> "");
> + else
> + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
> +

Re: [RFC PATCH v2 2/5] iommu/virtio-iommu: Add probe request

2018-01-16 Thread Auger Eric
Hi Jean-Philippe,

On 17/11/17 19:52, Jean-Philippe Brucker wrote:
> When the device offers the probe feature, send a probe request for each
> device managed by the IOMMU. Extract RESV_MEM information. When we
> encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
> This will tell other subsystems that there is no need to map the MSI
> doorbell in the virtio-iommu, because MSIs bypass it.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/virtio-iommu.c  | 165 
> --
>  include/uapi/linux/virtio_iommu.h |  37 +
>  2 files changed, 195 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> index feb8c8925c3a..79e0add94e05 100644
> --- a/drivers/iommu/virtio-iommu.c
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -45,6 +45,7 @@ struct viommu_dev {
>   struct iommu_domain_geometrygeometry;
>   u64 pgsize_bitmap;
>   u8  domain_bits;
> + u32 probe_size;
>  };
>  
>  struct viommu_mapping {
> @@ -72,6 +73,7 @@ struct viommu_domain {
>  struct viommu_endpoint {
>   struct viommu_dev   *viommu;
>   struct viommu_domain*vdomain;
> + struct list_headresv_regions;
>  };
>  
>  struct viommu_request {
> @@ -139,6 +141,10 @@ static int viommu_get_req_size(struct viommu_dev *viommu,
>   case VIRTIO_IOMMU_T_UNMAP:
>   size = sizeof(r->unmap);
>   break;
> + case VIRTIO_IOMMU_T_PROBE:
> + *bottom += viommu->probe_size;
> + size = sizeof(r->probe) + *bottom;
> + break;
>   default:
>   return -EINVAL;
>   }
> @@ -448,6 +454,106 @@ static int viommu_replay_mappings(struct viommu_domain 
> *vdomain)
>   return ret;
>  }
>  
> +static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
> +struct virtio_iommu_probe_resv_mem *mem,
> +size_t len)
> +{
> + struct iommu_resv_region *region = NULL;
> + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
> +
> + u64 addr = le64_to_cpu(mem->addr);
> + u64 size = le64_to_cpu(mem->size);
> +
> + if (len < sizeof(*mem))
> + return -EINVAL;
> +
> + switch (mem->subtype) {
> + case VIRTIO_IOMMU_RESV_MEM_T_MSI:
> + region = iommu_alloc_resv_region(addr, size, prot,
> +  IOMMU_RESV_MSI);
> + break;
> + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
> + default:
> + region = iommu_alloc_resv_region(addr, size, 0,
> +  IOMMU_RESV_RESERVED);
> + break;
> + }
> +
> + list_add(&vdev->resv_regions, ®ion->list);
> +
> + if (mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_RESERVED &&
> + mem->subtype != VIRTIO_IOMMU_RESV_MEM_T_MSI) {
> + /* Please update your driver. */
> + pr_warn("unknown resv mem subtype 0x%x\n", mem->subtype);
> + return -EINVAL;
> + }
why not adding this in the switch default case and do not call list_add
in case the subtype region is not recognized?
> +
> + return 0;
> +}
> +
> +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device 
> *dev)
> +{
> + int ret;
> + u16 type, len;
> + size_t cur = 0;
> + struct virtio_iommu_req_probe *probe;
> + struct virtio_iommu_probe_property *prop;
> + struct iommu_fwspec *fwspec = dev->iommu_fwspec;
> + struct viommu_endpoint *vdev = fwspec->iommu_priv;
> +
> + if (!fwspec->num_ids)
> + /* Trouble ahead. */
> + return -EINVAL;
> +
> + probe = kzalloc(sizeof(*probe) + viommu->probe_size +
> + sizeof(struct virtio_iommu_req_tail), GFP_KERNEL);
> + if (!probe)
> + return -ENOMEM;
> +
> + probe->head.type = VIRTIO_IOMMU_T_PROBE;
> + /*
> +  * For now, assume that properties of an endpoint that outputs multiple
> +  * IDs are consistent. Only probe the first one.
> +  */
> + probe->endpoint = cpu_to_le32(fwspec->ids[0]);
> +
> + ret = viommu_send_req_sync(viommu, probe);
> + if (ret) {
goto out?
> + kfree(probe);
> + return ret;
> + }
> +
> + prop = (void *)probe->properties;
> + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
> +
> + while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
> +cur < viommu->probe_size) {
> + len = le16_to_cpu(prop->length);
> +
> + switch (type) {
> + case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
> + ret = viommu_add_resv_mem(vdev, (void *)prop->value, 
> len);
> + break;
> + default:
> + dev_dbg(dev, "unknown viommu prop 0x%x\n", type);
> + }

Re: [RFC PATCH v2 1/5] iommu: Add virtio-iommu driver

2018-01-15 Thread Auger Eric
Hi Jean-Philippe,

please find some comments below.

On 17/11/17 19:52, Jean-Philippe Brucker wrote:
> The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
> requests such as map/unmap over virtio-mmio transport without emulating
> page tables. This implementation handle ATTACH, DETACH, MAP and UNMAP
handles
> requests.
> 
> The bulk of the code is to create requests and send them through virtio.
> Implementing the IOMMU API is fairly straightforward since the
> virtio-iommu MAP/UNMAP interface is almost identical.
> 
> Signed-off-by: Jean-Philippe Brucker 
> ---
>  drivers/iommu/Kconfig |  11 +
>  drivers/iommu/Makefile|   1 +
>  drivers/iommu/virtio-iommu.c  | 958 
> ++
>  include/uapi/linux/virtio_ids.h   |   1 +
>  include/uapi/linux/virtio_iommu.h | 140 ++
>  5 files changed,  insertions(+)
>  create mode 100644 drivers/iommu/virtio-iommu.c
>  create mode 100644 include/uapi/linux/virtio_iommu.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 17b212f56e6a..7271e59e8b23 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -403,4 +403,15 @@ config QCOM_IOMMU
>   help
> Support for IOMMU on certain Qualcomm SoCs.
>  
> +config VIRTIO_IOMMU
> + bool "Virtio IOMMU driver"
> + depends on VIRTIO_MMIO
> + select IOMMU_API
> + select INTERVAL_TREE
> + select ARM_DMA_USE_IOMMU if ARM
> + help
> +   Para-virtualised IOMMU driver with virtio.
> +
> +   Say Y here if you intend to run this kernel as a guest.
> +
>  endif # IOMMU_SUPPORT
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index dca71fe1c885..432242f3a328 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -31,3 +31,4 @@ obj-$(CONFIG_EXYNOS_IOMMU) += exynos-iommu.o
>  obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
>  obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
>  obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
> +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
> diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
> new file mode 100644
> index ..feb8c8925c3a
> --- /dev/null
> +++ b/drivers/iommu/virtio-iommu.c
> @@ -0,0 +1,958 @@
> +/*
> + * Virtio driver for the paravirtualized IOMMU
> + *
> + * Copyright (C) 2017 ARM Limited
> + * Author: Jean-Philippe Brucker 
> + *
> + * SPDX-License-Identifier: GPL-2.0
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include 
> +
> +#define MSI_IOVA_BASE0x800
> +#define MSI_IOVA_LENGTH  0x10
> +
> +struct viommu_dev {
> + struct iommu_device iommu;
> + struct device   *dev;
> + struct virtio_device*vdev;
> +
> + struct ida  domain_ids;
> +
> + struct virtqueue*vq;
> + /* Serialize anything touching the request queue */
> + spinlock_t  request_lock;
> +
> + /* Device configuration */
> + struct iommu_domain_geometrygeometry;
> + u64 pgsize_bitmap;
> + u8  domain_bits;
> +};
> +
> +struct viommu_mapping {
> + phys_addr_t paddr;
> + struct interval_tree_node   iova;
> + union {
> + struct virtio_iommu_req_map map;
> + struct virtio_iommu_req_unmap unmap;
> + } req;
> +};
> +
> +struct viommu_domain {
> + struct iommu_domain domain;
> + struct viommu_dev   *viommu;
> + struct mutexmutex;
> + unsigned intid;
> +
> + spinlock_t  mappings_lock;
> + struct rb_root_cached   mappings;
> +
> + /* Number of endpoints attached to this domain */
> + refcount_t  endpoints;
> +};
> +
> +struct viommu_endpoint {
> + struct viommu_dev   *viommu;
> + struct viommu_domain*vdomain;
> +};
> +
> +struct viommu_request {
> + struct scatterlist  top;
> + struct scatterlist  bottom;
> +
> + int written;
> + struct list_headlist;
> +};
> +
> +#define to_viommu_domain(domain) container_of(domain, struct viommu_domain, 
> domain)
> +
> +/* Virtio transport */
> +
> +static int viommu_status_to_errno(u8 status)
> +{
> + switch (status) {
> + case VIRTIO_IOMMU_S_OK:
> + return 0;
> + case VIRTIO_IOMMU_S_UNSUPP:
> + return -ENOSYS;
> + case VIRTIO_IOMMU_S_INVAL:
> + return -EINVAL;
> + case VIRTIO_IOMMU_S_RANGE:
> + return -ERANGE;
> + case VIRTIO_IOMMU_S_NOENT

Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-10-26 Thread Auger Eric
Hi Jean,

On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.
> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the most restricted one for all
>   endpoints. This will most likely be in v0.5.
> 
> * Then on top of that, a major improvement will describe hardware
>   acceleration features available to the guest. There is what I call "Page
>   Table Handover" (or simply, from the host POV, "Nested"), the ability
>   for the guest to manipulate its own page tables instead of sending
>   MAP/UNMAP requests to the host. This, along with IO Page Fault
>   reporting, will also permit SVM virtualization on different platforms.
> 
> Thanks,
> Jean
> 
> [1] http://www.spinics.net/lists/kvm/msg147990.html
> [2] git://linux-arm.org/virtio-iommu.git branch viommu/v0.4
> 
> http://www.linux-arm.org/git?p=virtio-iommu.git;a=blob;f=dist/v0.4/virtio-iommu-v0.4.pdf
> I reiterate the disclaimers: don't use this document as a reference,
> it's a draft. It's also not an OASIS document yet. It may be riddled
> with mistakes. As this is a working draft, it is unstable and I do not
> guarantee backward compatibility of future versions.
> [3] https://lists.gnu.org/archive/html/qemu-arm/2017-08/msg4.html
> [4] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.4
> Warning: UAPI headers have changed! They didn't follow the spec,
> please update. (Use branch v0.1, that has the old headers, for the
> Qemu prototype [3])
When rebasing the v0.4 driver on master I observe a regression: commands
are not received properly by QEMU (typically an attach command is
received with a type of 0). After a bisection of the guest kernel the
first commit the problem appears is:

commit e3067861ba6650a566a

Re: [RFC] virtio-iommu version 0.4

2017-10-26 Thread Auger Eric
Hi Jean,
On 19/09/2017 12:47, Jean-Philippe Brucker wrote:
> Hi Eric,
> 
> On 12/09/17 18:13, Auger Eric wrote:
>> 2.6.7
>> - As I am currently integrating v0.4 in QEMU here are some other comments:
>> At the moment struct virtio_iommu_req_probe flags is missing in your
>> header. As such I understood the ACK protocol was not implemented by the
>> driver in your branch.
> 
> Uh indeed. And yet I could swear I've written that code... somewhere. I
> will add it to the batch of v0.5 changes, it shouldn't be too invasive.
> 
>> - VIRTIO_IOMMU_PROBE_PROPERTY_TYPE_MASK is VIRTIO_IOMMU_T_MASK in your
>> header too.
> 
> Yes, keeping VIRTIO_IOMMU_PROBE_PROPERTY_TYPE_MASK is probably best
> (though it is a mouthful).
> 
>> 2.6.8.2:
>> - I am really confused about what the device should report as resv
>> regions depending on the PE nature (VFIO or not VFIO)
>>
>> In other iommu drivers, the resv regions are populated by the iommu
>> driver through its get_resv_regions callback. They are usually composed
>> of an iommu specific MSI region (mapped or bypassed) and non IOMMU
>> specific (device specific) reserved regions:
>> iommu_dma_get_resv_regions(). In the case of virtio-iommu driver, those
>> are the guest reserved regions.
>>
>> First in the current virtio-iommu driver I don't see the
>> iommu_dma_get_resv_regions call. Do you agree that the virtio-iommu
>> driver should compute the non IOMMU specific MSI regions ie. this is
>> not the responsability of the virtio-iommu device.
> 
> For SW_MSI, certainly. The driver allocates a fixed IOVA region for
> mapping the MSI doorbell. But the driver has to know whether the doorbell
> region is translated or bypassed.
Sorry I was talking about *non* IOMMU specific MSI regions, typically
the regions corresponding to guest PCI host bridge windows. This is
normally computed in the iommu driver and I didn't that that in the
existing virtio-iommu driver.
> 
>> Then why is it more the job of the device to return the guest iommu
>> specific region rather than the driver itself?
> 
> The MSI region is architectural on x86 IOMMUs, but
> implementation-defined on virtio-iommu. It depends which platform the host
> is emulating. In Linux, x86 IOMMU drivers register the bypass region
> because there always is an IOAPIC on the other end, with a fixed MSI
> address. But virtio-iommu may be either behind a GIC, an APIC or some
> other IRQ chip.
> 
> The driver *could* go over all the irqchips/platforms it knows and try to
> guess if there is a fixed doorbell or if it needs to reserve an IOVA for
> them, but it would look horrible. I much prefer having a well-defined way
> of doing this, so a description from the device.

This means I must have target specific code in the virtio-iommu device
which is unusual, right? I was initially thinking you could handle that
on the driver side using a config set for ARM|ARM64. But on the other
hand you should communicate the info to the device ...

> 
>> Then I understand this is the responsability of the virtio-iommu device
>> to gather information about the host resv regions in case of VFIO EP.
>> Typically the host PCIe host bridge windows cannot be used for IOVA.
>> Also the host MSI reserved IOVA window cannot be used. Do you agree.
> 
> Yes, all regions reported in sysfs reserved_regions in the host would be
> reported as RESV_T_RESERVED by virtio-iommu.
So to summarize if the probe request is sent to an emulated device, we
should return the target specific MSI window. We can't and don't return
the non IOMMU specific guest reserved windows.

For a VFIO device, we would return all reserved regions of the group the
device belongs to. Is that correct?
> 
>> I really think the spec should clarify what exact resv regions the
>> device should return in case of VFIO device and non VFIO device.
> 
> Agreed. I will add something about RESV_T_RESERVED with the PCI bridge
> example in Implementation Notes. Do you think the MSI examples at the end
> need improvement as well? I can try to explain that RESV_MSI regions in
> virtio-iommu are only those of the emulated platform, not the HW or SW MSI
> regions from the host.

I think I would expect an explanation detailing returned reserved
regions for pure emulated devices and HW/VFIO devices.

Another unrelated remark:
- you should add a permission violation error.
- wrt the probe request ACK protocol, this looks pretty heavy as both
the driver and the device need to parse the req_probe buffer. The device
need to fill in the output buffer and then read the same info on the
input buffer. Couldn't we imagine something simpler?

Thanks

Eric
> 
> Thanks,
> Jean
> 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC] virtio-iommu version 0.4

2017-10-26 Thread Auger Eric
Hi jean,

On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.

2.6.7
- As I am currently integrating v0.4 in QEMU here are some other comments:
At the moment struct virtio_iommu_req_probe flags is missing in your
header. As such I understood the ACK protocol was not implemented by the
driver in your branch.
- VIRTIO_IOMMU_PROBE_PROPERTY_TYPE_MASK is VIRTIO_IOMMU_T_MASK in your
header too.
2.6.8.2:
- I am really confused about what the device should report as resv
regions depending on the PE nature (VFIO or not VFIO)

In other iommu drivers, the resv regions are populated by the iommu
driver through its get_resv_regions callback. They are usually composed
of an iommu specific MSI region (mapped or bypassed) and non IOMMU
specific (device specific) reserved regions:
iommu_dma_get_resv_regions(). In the case of virtio-iommu driver, those
are the guest reserved regions.

First in the current virtio-iommu driver I don't see the
iommu_dma_get_resv_regions call. Do you agree that the virtio-iommu
driver should compute the non IOMMU specific MSI regions. ie. this is
not the responsability of the virtio-iommu device.

Then why is it more the job of the device to return the guest iommu
specific region rather than the driver itself?

Then I understand this is the responsability of the virtio-iommu device
to gather information about the host resv regions in case of VFIO EP.
Typically the host PCIe host bridge windows cannot be used for IOVA.
Also the host MSI reserved IOVA window cannot be used. Do you agree.

I really think the spec should clarify what exact resv regions the
device should return in case of VFIO device and non VFIO device.

Thanks

Eric

> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the

Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-10-26 Thread Auger Eric
Hi Jean-Philippe,

On 04/08/2017 20:19, Jean-Philippe Brucker wrote:
> This is the continuation of my proposal for virtio-iommu, the para-
> virtualized IOMMU. Here is a summary of the changes since last time [1]:
> 
> * The virtio-iommu document now resembles an actual specification. It is
>   split into a formal description of the virtio device, and implementation
>   notes. Please find sources and binaries at [2].
> 
> * Added a probe request to describe to the guest different properties that
>   do not fit in firmware or in the virtio config space. This is a
>   necessary stepping stone for extending the virtio-iommu.
> 
> * There is a working Qemu prototype [3], thanks to Eric Auger and Bharat
>   Bhushan.
> 
> You can find the Linux driver and kvmtool device at [4] and [5]. I
> plan to rework driver and kvmtool device slightly before sending the
> patches.
> 
> To understand the virtio-iommu, I advise to first read introduction and
> motivation, then skim through implementation notes and finally look at the
> device specification.
> 
> I wasn't sure how to organize the review. For those who prefer to comment
> inline, I attached v0.4 of device-operations.tex and topology.tex+MSI.tex
> to this thread. They are the biggest chunks of the document. But LaTeX
> isn't very pleasant to read, so you can simply send a list of comments in
> relation to section numbers and a few words of context, we'll manage.

Please find some comments/questions below:

2.6.7:1
I do not understand the footnode #6 sentence: 'Without a specific
definition of the ACK requirements for a given property type, it simply
means that the driver read all fields of that property."
2.6.7:2
what if the driver has not provided a big enough buffer and the device
cannot report all supported properties?
2.6.8.2:
Couldn't we use the same terminology as iommu_resv_type in iommu.h?
the negative form is not the easiest to understand for me.
Why F_MSI?
2.6.8.2.1
Multiple overlapping RESV_MEM properties are merged together. Device
requirement? if same types I assume?

what if the device is a physical assigned device. does the device report
the host doorbell or the guest doorbell, may be worth to clarify.

3.1.1 last paragraph
you talk about device ID but this is a terminology only known by ARM I
think. Actually it is introduced later in 3.1.2.

3.1.2
About ACPI integration. Isn't IORT ARM specific? Will other
architectures use that table? In IORT we also have Named Component node
which look pretty the same. Could you elaborate of the difference?
Device Object name: isn't it the path to the LNR0005 in the namespace?

Thanks

Eric
> 
> ---
> Version numbers 0.1-0.4 are arbitrary. I'm hoping they allow to compare
> more easily differences since the RFC (see [6]), but haven't been made
> public so far. This is the first public posting since initial proposal
> [1], and the following describes all changes.
> 
> ## v0.1 ##
> 
> Content is the same as the RFC, but formatted to LaTeX. 'make' generates
> one PDF and one HTML document.
> 
> ## v0.2 ##
> 
> Add introductions, improve topology example and firmware description based
> on feedback and a number of useful discussions.
> 
> ## v0.3 ##
> 
> Add normative sections (MUST, SHOULD, etc). Clarify some things, tighten
> the device and driver behaviour. Unmap semantics are consolidated; they
> are now closer to VFIO Type1 v2 semantics.
> 
> ## v0.4 ##
> 
> Introduce PROBE requests. They provide per-endpoint information to the
> driver that couldn't be described otherwise.
> 
> For the moment, they allow to handle MSIs on x86 virtual platforms (see
> 3.2). To do that we communicate reserved IOVA regions, that will also be
> useful for describing regions that cannot be mapped for a given endpoint,
> for instance addresses that correspond to a PCI bridge window.
> 
> Introducing such a large framework for this tiny feature may seem
> overkill, but it is needed for future extensions of the virtio-iommu and I
> believe it really is worth the effort.
> 
> ## Future ##
> 
> Other extensions are in preparation. I won't detail them here because v0.4
> already is a lot to digest, but in short, building on top of PROBE:
> 
> * First, since the IOMMU is paravirtualized, the device can expose some
>   properties of the physical topology to the guest, and let it allocate
>   resources more efficiently. For example, when the virtio-iommu manages
>   both physical and emulated endpoints, with different underlying IOMMUs,
>   we now have a way to describe multiple page and block granularities,
>   instead of forcing the guest to use the most restricted one for all
>   endpoints. This will most likely be in v0.5.
> 
> * Then on top of that, a major improvement will describe hardware
>   acceleration features available to the guest. There is what I call "Page
>   Table Handover" (or simply, from the host POV, "Nested"), the ability
>   for the guest to manipulate its own page tables instead of sending
>   MAP/UNMAP requests to