Re: [PATCH 22/22] docs: fix broken documentation links

2019-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2019 at 10:17:32PM +0200, Federico Vaga wrote:
> On Thursday, May 30, 2019 1:23:53 AM CEST Mauro Carvalho Chehab wrote:
> > Mostly due to x86 and acpi conversion, several documentation
> > links are still pointing to the old file. Fix them.
> 
> For the Italian documentation I just send I patch to fix them in a dedicated 
> patch


Acked-by: Michael S. Tsirkin 

for the vhost change.

> > 
> > Signed-off-by: Mauro Carvalho Chehab 
> > ---
> >  Documentation/acpi/dsd/leds.txt  |  2 +-
> >  Documentation/admin-guide/kernel-parameters.rst  |  6 +++---
> >  Documentation/admin-guide/kernel-parameters.txt  | 16 
> >  Documentation/admin-guide/ras.rst|  2 +-
> >  .../devicetree/bindings/net/fsl-enetc.txt|  7 +++
> >  .../bindings/pci/amlogic,meson-pcie.txt  |  2 +-
> >  .../bindings/regulator/qcom,rpmh-regulator.txt   |  2 +-
> >  Documentation/devicetree/booting-without-of.txt  |  2 +-
> >  Documentation/driver-api/gpio/board.rst  |  2 +-
> >  Documentation/driver-api/gpio/consumer.rst   |  2 +-
> >  .../firmware-guide/acpi/enumeration.rst  |  2 +-
> >  .../firmware-guide/acpi/method-tracing.rst   |  2 +-
> >  Documentation/i2c/instantiating-devices  |  2 +-
> >  Documentation/sysctl/kernel.txt  |  4 ++--
> >  .../translations/it_IT/process/howto.rst |  2 +-
> >  .../it_IT/process/stable-kernel-rules.rst|  4 ++--
> >  .../translations/zh_CN/process/4.Coding.rst  |  2 +-
> >  Documentation/x86/x86_64/5level-paging.rst   |  2 +-
> >  Documentation/x86/x86_64/boot-options.rst|  4 ++--
> >  .../x86/x86_64/fake-numa-for-cpusets.rst |  2 +-
> >  MAINTAINERS  |  6 +++---
> >  arch/arm/Kconfig |  2 +-
> >  arch/arm64/kernel/kexec_image.c  |  2 +-
> >  arch/powerpc/Kconfig |  2 +-
> >  arch/x86/Kconfig | 16 
> >  arch/x86/Kconfig.debug   |  2 +-
> >  arch/x86/boot/header.S   |  2 +-
> >  arch/x86/entry/entry_64.S|  2 +-
> >  arch/x86/include/asm/bootparam_utils.h   |  2 +-
> >  arch/x86/include/asm/page_64_types.h |  2 +-
> >  arch/x86/include/asm/pgtable_64_types.h  |  2 +-
> >  arch/x86/kernel/cpu/microcode/amd.c  |  2 +-
> >  arch/x86/kernel/kexec-bzimage64.c|  2 +-
> >  arch/x86/kernel/pci-dma.c|  2 +-
> >  arch/x86/mm/tlb.c|  2 +-
> >  arch/x86/platform/pvh/enlighten.c|  2 +-
> >  drivers/acpi/Kconfig | 10 +-
> >  drivers/net/ethernet/faraday/ftgmac100.c |  2 +-
> >  .../fieldbus/Documentation/fieldbus_dev.txt  |  4 ++--
> >  drivers/vhost/vhost.c|  2 +-
> >  include/acpi/acpi_drivers.h  |  2 +-
> >  include/linux/fs_context.h   |  2 +-
> >  include/linux/lsm_hooks.h|  2 +-
> >  mm/Kconfig   |  2 +-
> >  security/Kconfig |  2 +-
> >  tools/include/linux/err.h|  2 +-
> >  tools/objtool/Documentation/stack-validation.txt |  4 ++--
> >  tools/testing/selftests/x86/protection_keys.c|  2 +-
> >  48 files changed, 77 insertions(+), 78 deletions(-)
> > 
> > diff --git a/Documentation/acpi/dsd/leds.txt
> > b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644
> > --- a/Documentation/acpi/dsd/leds.txt
> > +++ b/Documentation/acpi/dsd/leds.txt
> > @@ -96,4 +96,4 @@ where
> > 
> > http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-da
> > ta-extension-UUID-v1.1.pdf>, referenced 2019-02-21.
> > 
> > -[7] Documentation/acpi/dsd/data-node-reference.txt
> > +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst
> > diff --git a/Documentation/admin-guide/kernel-parameters.rst
> > b/Documentation/admin-guide/kernel-parameters.rst index
> > 0124980dca2d..8d3273e32eb1 100644
> > --- a/Documentation/admin-guide/kernel-parameters.rst
> > +++ b/Documentation/admin-guide/kernel-parameters.rst
> > @@ -167,7 +167,7 @@ parameter is applicable::
> > X86-32  X86-32, aka i386 architecture is enabled.
> > X86-64  X86-64 architecture is enabled.
> > More X86-64 boot options can be found in
> > -   Documentation/x86/x86_64/boot-options.txt 
> .
> > +   Documentation/x86/x86_64/boot-options.rst.
> > X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64)
> > X86_UV  SGI UV support is enabled.
> > XEN Xen support is enabled
> > @@ -181,10 +181,10 @@ In addition, the following text indicates that the
> > option:: Parameters denoted with BOOT are actually 

Re: [PATCH net-next 0/6] vhost: accelerate metadata access

2019-05-30 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 30 May 2019 14:13:28 -0400

> On Thu, May 30, 2019 at 11:07:30AM -0700, David Miller wrote:
>> From: Jason Wang 
>> Date: Fri, 24 May 2019 04:12:12 -0400
>> 
>> > This series tries to access virtqueue metadata through kernel virtual
>> > address instead of copy_user() friends since they had too much
>> > overheads like checks, spec barriers or even hardware feature
>> > toggling like SMAP. This is done through setup kernel address through
>> > direct mapping and co-opreate VM management with MMU notifiers.
>> > 
>> > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see
>> > obvious improvement.
>> 
>> I'm still waiting for some review from mst.
>> 
>> If I don't see any review soon I will just wipe these changes from
>> patchwork as it serves no purpose to just let them rot there.
>> 
>> Thank you.
> 
> I thought we agreed I'm merging this through my tree, not net-next.
> So you can safely wipe it.

Aha, I didn't catch that, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] vhost: accelerate metadata access

2019-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2019 at 11:07:30AM -0700, David Miller wrote:
> From: Jason Wang 
> Date: Fri, 24 May 2019 04:12:12 -0400
> 
> > This series tries to access virtqueue metadata through kernel virtual
> > address instead of copy_user() friends since they had too much
> > overheads like checks, spec barriers or even hardware feature
> > toggling like SMAP. This is done through setup kernel address through
> > direct mapping and co-opreate VM management with MMU notifiers.
> > 
> > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see
> > obvious improvement.
> 
> I'm still waiting for some review from mst.
> 
> If I don't see any review soon I will just wipe these changes from
> patchwork as it serves no purpose to just let them rot there.
> 
> Thank you.

I thought we agreed I'm merging this through my tree, not net-next.
So you can safely wipe it.

Thanks!

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] vhost: accelerate metadata access

2019-05-30 Thread David Miller
From: Jason Wang 
Date: Fri, 24 May 2019 04:12:12 -0400

> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling like SMAP. This is done through setup kernel address through
> direct mapping and co-opreate VM management with MMU notifiers.
> 
> Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see
> obvious improvement.

I'm still waiting for some review from mst.

If I don't see any review soon I will just wipe these changes from
patchwork as it serves no purpose to just let them rot there.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2019-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2019 at 06:09:24PM +0100, Jean-Philippe Brucker wrote:
> Some systems implement virtio-iommu as a PCI endpoint. The operating
> system needs to discover the relationship between IOMMU and masters long
> before the PCI endpoint gets probed. Add a PCI child node to describe the
> virtio-iommu device.
> 
> The virtio-pci-iommu is conceptually split between a PCI programming
> interface and a translation component on the parent bus. The latter
> doesn't have a node in the device tree. The virtio-pci-iommu node
> describes both, by linking the PCI endpoint to "iommus" property of DMA
> master nodes and to "iommu-map" properties of bus nodes.
> 
> Reviewed-by: Rob Herring 
> Reviewed-by: Eric Auger 
> Signed-off-by: Jean-Philippe Brucker 

So this is just an example right?
We are not defining any new properties or anything like that.

I think down the road for non dt platforms we want to put this
info in the config space of the device. I do not think ACPI
is the best option for this since not all systems have it.
But that can wait.

> ---
>  .../devicetree/bindings/virtio/iommu.txt  | 66 +++
>  1 file changed, 66 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
> 
> diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
> b/Documentation/devicetree/bindings/virtio/iommu.txt
> new file mode 100644
> index ..2407fea0651c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/virtio/iommu.txt
> @@ -0,0 +1,66 @@
> +* virtio IOMMU PCI device
> +
> +When virtio-iommu uses the PCI transport, its programming interface is
> +discovered dynamically by the PCI probing infrastructure. However the
> +device tree statically describes the relation between IOMMU and DMA
> +masters. Therefore, the PCI root complex that hosts the virtio-iommu
> +contains a child node representing the IOMMU device explicitly.
> +
> +Required properties:
> +
> +- compatible:Should be "virtio,pci-iommu"
> +- reg:   PCI address of the IOMMU. As defined in the PCI Bus
> + Binding reference [1], the reg property is a five-cell
> + address encoded as (phys.hi phys.mid phys.lo size.hi
> + size.lo). phys.hi should contain the device's BDF as
> + 0b  dfff . The other cells
> + should be zero.
> +- #iommu-cells:  Each platform DMA master managed by the IOMMU is 
> assigned
> + an endpoint ID, described by the "iommus" property [2].
> + For virtio-iommu, #iommu-cells must be 1.
> +
> +Notes:
> +
> +- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
> +  virtio-iommu node doesn't have an "iommus" property, and is omitted from
> +  the iommu-map property of the root complex.
> +
> +Example:
> +
> +pcie@1000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> +
> + /* The IOMMU programming interface uses slot 00:01.0 */
> + iommu0: iommu@0008 {
> + compatible = "virtio,pci-iommu";
> + reg = <0x0800 0 0 0 0>;
> + #iommu-cells = <1>;
> + };
> +
> + /*
> +  * The IOMMU manages all functions in this PCI domain except
> +  * itself. Omit BDF 00:01.0.
> +  */
> + iommu-map = <0x0  0x0 0x8>
> + <0x9  0x9 0xfff7>;
> +};
> +
> +pcie@2000 {
> + compatible = "pci-host-ecam-generic";
> + ...
> + /*
> +  * The IOMMU also manages all functions from this domain,
> +  * with endpoint IDs 0x1 - 0x1
> +  */
> + iommu-map = <0x0  0x1 0x1>;
> +};
> +
> +ethernet@fe001000 {
> + ...
> + /* The IOMMU manages this platform device with endpoint ID 0x2 */
> + iommus = < 0x2>;
> +};
> +
> +[1] Documentation/devicetree/bindings/pci/pci.txt
> +[2] Documentation/devicetree/bindings/iommu/iommu.txt
> -- 
> 2.21.0
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 7/7] iommu/virtio: Add event queue

2019-05-30 Thread Jean-Philippe Brucker
The event queue offers a way for the device to report access faults from
endpoints. It is implemented on virtqueue #1. Whenever the host needs to
signal a fault, it fills one of the buffers offered by the guest and
interrupts it.

Acked-by: Joerg Roedel 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 115 +++---
 include/uapi/linux/virtio_iommu.h |  19 +
 2 files changed, 125 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index 5d4947c47420..2688cdcac6e5 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -29,7 +29,8 @@
 #define MSI_IOVA_LENGTH0x10
 
 #define VIOMMU_REQUEST_VQ  0
-#define VIOMMU_NR_VQS  1
+#define VIOMMU_EVENT_VQ1
+#define VIOMMU_NR_VQS  2
 
 struct viommu_dev {
struct iommu_device iommu;
@@ -41,6 +42,7 @@ struct viommu_dev {
struct virtqueue*vqs[VIOMMU_NR_VQS];
spinlock_t  request_lock;
struct list_headrequests;
+   void*evts;
 
/* Device configuration */
struct iommu_domain_geometrygeometry;
@@ -86,6 +88,15 @@ struct viommu_request {
charbuf[];
 };
 
+#define VIOMMU_FAULT_RESV_MASK 0xff00
+
+struct viommu_event {
+   union {
+   u32 head;
+   struct virtio_iommu_fault fault;
+   };
+};
+
 #define to_viommu_domain(domain)   \
container_of(domain, struct viommu_domain, domain)
 
@@ -509,6 +520,68 @@ static int viommu_probe_endpoint(struct viommu_dev 
*viommu, struct device *dev)
return ret;
 }
 
+static int viommu_fault_handler(struct viommu_dev *viommu,
+   struct virtio_iommu_fault *fault)
+{
+   char *reason_str;
+
+   u8 reason   = fault->reason;
+   u32 flags   = le32_to_cpu(fault->flags);
+   u32 endpoint= le32_to_cpu(fault->endpoint);
+   u64 address = le64_to_cpu(fault->address);
+
+   switch (reason) {
+   case VIRTIO_IOMMU_FAULT_R_DOMAIN:
+   reason_str = "domain";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_MAPPING:
+   reason_str = "page";
+   break;
+   case VIRTIO_IOMMU_FAULT_R_UNKNOWN:
+   default:
+   reason_str = "unknown";
+   break;
+   }
+
+   /* TODO: find EP by ID and report_iommu_fault */
+   if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS)
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx 
[%s%s%s]\n",
+   reason_str, endpoint, address,
+   flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : 
"",
+   flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : 
"");
+   else
+   dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n",
+   reason_str, endpoint);
+   return 0;
+}
+
+static void viommu_event_handler(struct virtqueue *vq)
+{
+   int ret;
+   unsigned int len;
+   struct scatterlist sg[1];
+   struct viommu_event *evt;
+   struct viommu_dev *viommu = vq->vdev->priv;
+
+   while ((evt = virtqueue_get_buf(vq, )) != NULL) {
+   if (len > sizeof(*evt)) {
+   dev_err(viommu->dev,
+   "invalid event buffer (len %u != %zu)\n",
+   len, sizeof(*evt));
+   } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) {
+   viommu_fault_handler(viommu, >fault);
+   }
+
+   sg_init_one(sg, evt, sizeof(*evt));
+   ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC);
+   if (ret)
+   dev_err(viommu->dev, "could not add event buffer\n");
+   }
+
+   virtqueue_kick(vq);
+}
+
 /* IOMMU API */
 
 static struct iommu_domain *viommu_domain_alloc(unsigned type)
@@ -895,16 +968,35 @@ static struct iommu_ops viommu_ops = {
 static int viommu_init_vqs(struct viommu_dev *viommu)
 {
struct virtio_device *vdev = dev_to_virtio(viommu->dev);
-   const char *name = "request";
-   void *ret;
+   const char *names[] = { "request", "event" };
+   vq_callback_t *callbacks[] = {
+   NULL, /* No async requests */
+   viommu_event_handler,
+   };
 
-   ret = virtio_find_single_vq(vdev, NULL, name);
-   if (IS_ERR(ret)) {
-   dev_err(viommu->dev, "cannot find VQ\n");
-   return PTR_ERR(ret);
-   }
+   return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks,
+   

[PATCH v8 6/7] iommu/virtio: Add probe request

2019-05-30 Thread Jean-Philippe Brucker
When the device offers the probe feature, send a probe request for each
device managed by the IOMMU. Extract RESV_MEM information. When we
encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region.
This will tell other subsystems that there is no need to map the MSI
doorbell in the virtio-iommu, because MSIs bypass it.

Acked-by: Joerg Roedel 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/iommu/virtio-iommu.c  | 157 --
 include/uapi/linux/virtio_iommu.h |  36 +++
 2 files changed, 187 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
index b2719a87c3c5..5d4947c47420 100644
--- a/drivers/iommu/virtio-iommu.c
+++ b/drivers/iommu/virtio-iommu.c
@@ -49,6 +49,7 @@ struct viommu_dev {
u32 last_domain;
/* Supported MAP flags */
u32 map_flags;
+   u32 probe_size;
 };
 
 struct viommu_mapping {
@@ -71,8 +72,10 @@ struct viommu_domain {
 };
 
 struct viommu_endpoint {
+   struct device   *dev;
struct viommu_dev   *viommu;
struct viommu_domain*vdomain;
+   struct list_headresv_regions;
 };
 
 struct viommu_request {
@@ -125,6 +128,9 @@ static off_t viommu_get_write_desc_offset(struct viommu_dev 
*viommu,
 {
size_t tail_size = sizeof(struct virtio_iommu_req_tail);
 
+   if (req->type == VIRTIO_IOMMU_T_PROBE)
+   return len - viommu->probe_size - tail_size;
+
return len - tail_size;
 }
 
@@ -399,6 +405,110 @@ static int viommu_replay_mappings(struct viommu_domain 
*vdomain)
return ret;
 }
 
+static int viommu_add_resv_mem(struct viommu_endpoint *vdev,
+  struct virtio_iommu_probe_resv_mem *mem,
+  size_t len)
+{
+   size_t size;
+   u64 start64, end64;
+   phys_addr_t start, end;
+   struct iommu_resv_region *region = NULL;
+   unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO;
+
+   start = start64 = le64_to_cpu(mem->start);
+   end = end64 = le64_to_cpu(mem->end);
+   size = end64 - start64 + 1;
+
+   /* Catch any overflow, including the unlikely end64 - start64 + 1 = 0 */
+   if (start != start64 || end != end64 || size < end64 - start64)
+   return -EOVERFLOW;
+
+   if (len < sizeof(*mem))
+   return -EINVAL;
+
+   switch (mem->subtype) {
+   default:
+   dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n",
+mem->subtype);
+   /* Fall-through */
+   case VIRTIO_IOMMU_RESV_MEM_T_RESERVED:
+   region = iommu_alloc_resv_region(start, size, 0,
+IOMMU_RESV_RESERVED);
+   break;
+   case VIRTIO_IOMMU_RESV_MEM_T_MSI:
+   region = iommu_alloc_resv_region(start, size, prot,
+IOMMU_RESV_MSI);
+   break;
+   }
+   if (!region)
+   return -ENOMEM;
+
+   list_add(>resv_regions, >list);
+   return 0;
+}
+
+static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev)
+{
+   int ret;
+   u16 type, len;
+   size_t cur = 0;
+   size_t probe_len;
+   struct virtio_iommu_req_probe *probe;
+   struct virtio_iommu_probe_property *prop;
+   struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev);
+   struct viommu_endpoint *vdev = fwspec->iommu_priv;
+
+   if (!fwspec->num_ids)
+   return -EINVAL;
+
+   probe_len = sizeof(*probe) + viommu->probe_size +
+   sizeof(struct virtio_iommu_req_tail);
+   probe = kzalloc(probe_len, GFP_KERNEL);
+   if (!probe)
+   return -ENOMEM;
+
+   probe->head.type = VIRTIO_IOMMU_T_PROBE;
+   /*
+* For now, assume that properties of an endpoint that outputs multiple
+* IDs are consistent. Only probe the first one.
+*/
+   probe->endpoint = cpu_to_le32(fwspec->ids[0]);
+
+   ret = viommu_send_req_sync(viommu, probe, probe_len);
+   if (ret)
+   goto out_free;
+
+   prop = (void *)probe->properties;
+   type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK;
+
+   while (type != VIRTIO_IOMMU_PROBE_T_NONE &&
+  cur < viommu->probe_size) {
+   len = le16_to_cpu(prop->length) + sizeof(*prop);
+
+   switch (type) {
+   case VIRTIO_IOMMU_PROBE_T_RESV_MEM:
+   ret = viommu_add_resv_mem(vdev, (void *)prop, len);
+   break;
+   default:
+   dev_err(dev, "unknown viommu prop 0x%x\n", type);
+   }
+
+   if (ret)
+   dev_err(dev, "failed to parse viommu prop 

[PATCH v8 5/7] iommu: Add virtio-iommu driver

2019-05-30 Thread Jean-Philippe Brucker
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU
requests such as map/unmap over virtio transport without emulating page
tables. This implementation handles ATTACH, DETACH, MAP and UNMAP
requests.

The bulk of the code transforms calls coming from the IOMMU API into
corresponding virtio requests. Mappings are kept in an interval tree
instead of page tables. A little more work is required for modular and x86
support, so for the moment the driver depends on CONFIG_VIRTIO=y and
CONFIG_ARM64.

Acked-by: Joerg Roedel 
Signed-off-by: Jean-Philippe Brucker 
---
 MAINTAINERS   |   7 +
 drivers/iommu/Kconfig |  11 +
 drivers/iommu/Makefile|   1 +
 drivers/iommu/virtio-iommu.c  | 934 ++
 include/uapi/linux/virtio_ids.h   |   1 +
 include/uapi/linux/virtio_iommu.h | 110 
 6 files changed, 1064 insertions(+)
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 429c6c624861..62bd1834d95a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16807,6 +16807,13 @@ S: Maintained
 F: drivers/virtio/virtio_input.c
 F: include/uapi/linux/virtio_input.h
 
+VIRTIO IOMMU DRIVER
+M: Jean-Philippe Brucker 
+L: virtualization@lists.linux-foundation.org
+S: Maintained
+F: drivers/iommu/virtio-iommu.c
+F: include/uapi/linux/virtio_iommu.h
+
 VIRTUAL BOX GUEST DEVICE DRIVER
 M: Hans de Goede 
 M: Arnd Bergmann 
diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db5221d..e15cdcd8cb3c 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -473,4 +473,15 @@ config HYPERV_IOMMU
  Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux
  guests to run with x2APIC mode enabled.
 
+config VIRTIO_IOMMU
+   bool "Virtio IOMMU driver"
+   depends on VIRTIO=y
+   depends on ARM64
+   select IOMMU_API
+   select INTERVAL_TREE
+   help
+ Para-virtualised IOMMU driver with virtio.
+
+ Say Y here if you intend to run this kernel as a guest.
+
 endif # IOMMU_SUPPORT
diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
index 8c71a15e986b..f13f36ae1af6 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -33,3 +33,4 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o
 obj-$(CONFIG_S390_IOMMU) += s390-iommu.o
 obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o
 obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o
+obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o
diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c
new file mode 100644
index ..b2719a87c3c5
--- /dev/null
+++ b/drivers/iommu/virtio-iommu.c
@@ -0,0 +1,934 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Virtio driver for the paravirtualized IOMMU
+ *
+ * Copyright (C) 2019 Arm Limited
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define MSI_IOVA_BASE  0x800
+#define MSI_IOVA_LENGTH0x10
+
+#define VIOMMU_REQUEST_VQ  0
+#define VIOMMU_NR_VQS  1
+
+struct viommu_dev {
+   struct iommu_device iommu;
+   struct device   *dev;
+   struct virtio_device*vdev;
+
+   struct ida  domain_ids;
+
+   struct virtqueue*vqs[VIOMMU_NR_VQS];
+   spinlock_t  request_lock;
+   struct list_headrequests;
+
+   /* Device configuration */
+   struct iommu_domain_geometrygeometry;
+   u64 pgsize_bitmap;
+   u32 first_domain;
+   u32 last_domain;
+   /* Supported MAP flags */
+   u32 map_flags;
+};
+
+struct viommu_mapping {
+   phys_addr_t paddr;
+   struct interval_tree_node   iova;
+   u32 flags;
+};
+
+struct viommu_domain {
+   struct iommu_domain domain;
+   struct viommu_dev   *viommu;
+   struct mutexmutex; /* protects viommu pointer */
+   unsigned intid;
+   u32 map_flags;
+
+   spinlock_t  mappings_lock;
+   struct rb_root_cached   mappings;
+
+   unsigned long   nr_endpoints;
+};
+
+struct viommu_endpoint {
+   struct viommu_dev   *viommu;
+   struct viommu_domain*vdomain;
+};
+
+struct viommu_request {
+   struct list_headlist;
+   void*writeback;
+   unsigned intwrite_offset;
+   

[PATCH v8 4/7] PCI: OF: Initialize dev->fwnode appropriately

2019-05-30 Thread Jean-Philippe Brucker
For PCI devices that have an OF node, set the fwnode as well. This way
drivers that rely on fwnode don't need the special case described by
commit f94277af03ea ("of/platform: Initialise dev->fwnode appropriately").

Acked-by: Bjorn Helgaas 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/pci/of.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 73d5adec0a28..c4f1b5507b40 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -22,12 +22,15 @@ void pci_set_of_node(struct pci_dev *dev)
return;
dev->dev.of_node = of_pci_find_child_device(dev->bus->dev.of_node,
dev->devfn);
+   if (dev->dev.of_node)
+   dev->dev.fwnode = >dev.of_node->fwnode;
 }
 
 void pci_release_of_node(struct pci_dev *dev)
 {
of_node_put(dev->dev.of_node);
dev->dev.of_node = NULL;
+   dev->dev.fwnode = NULL;
 }
 
 void pci_set_bus_of_node(struct pci_bus *bus)
@@ -42,12 +45,15 @@ void pci_set_bus_of_node(struct pci_bus *bus)
bus->self->untrusted = true;
}
bus->dev.of_node = node;
+   if (node)
+   bus->dev.fwnode = >fwnode;
 }
 
 void pci_release_bus_of_node(struct pci_bus *bus)
 {
of_node_put(bus->dev.of_node);
bus->dev.of_node = NULL;
+   bus->dev.fwnode = NULL;
 }
 
 struct device_node * __weak pcibios_get_phb_of_node(struct pci_bus *bus)
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 3/7] of: Allow the iommu-map property to omit untranslated devices

2019-05-30 Thread Jean-Philippe Brucker
In PCI root complex nodes, the iommu-map property describes the IOMMU that
translates each endpoint. On some platforms, the IOMMU itself is presented
as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). This isn't supported
by the current OF driver, which expects all endpoints to have an IOMMU.
Allow the iommu-map property to have gaps.

Relaxing of_map_rid() also allows the msi-map property to have gaps, which
is invalid since MSIs always reach an MSI controller. In that case
pci_msi_setup_msi_irqs() will return an error when attempting to find the
device's MSI domain.

Reviewed-by: Rob Herring 
Signed-off-by: Jean-Philippe Brucker 
---
 drivers/of/base.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 20e0e7ee4edf..55e7f5bb0549 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -2294,8 +2294,12 @@ int of_map_rid(struct device_node *np, u32 rid,
return 0;
}
 
-   pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n",
-   np, map_name, rid, target && *target ? *target : NULL);
-   return -EFAULT;
+   pr_info("%pOF: no %s translation for rid 0x%x on %pOF\n", np, map_name,
+   rid, target && *target ? *target : NULL);
+
+   /* Bypasses translation */
+   if (id_out)
+   *id_out = rid;
+   return 0;
 }
 EXPORT_SYMBOL_GPL(of_map_rid);
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node

2019-05-30 Thread Jean-Philippe Brucker
Some systems implement virtio-iommu as a PCI endpoint. The operating
system needs to discover the relationship between IOMMU and masters long
before the PCI endpoint gets probed. Add a PCI child node to describe the
virtio-iommu device.

The virtio-pci-iommu is conceptually split between a PCI programming
interface and a translation component on the parent bus. The latter
doesn't have a node in the device tree. The virtio-pci-iommu node
describes both, by linking the PCI endpoint to "iommus" property of DMA
master nodes and to "iommu-map" properties of bus nodes.

Reviewed-by: Rob Herring 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/iommu.txt  | 66 +++
 1 file changed, 66 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt

diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt 
b/Documentation/devicetree/bindings/virtio/iommu.txt
new file mode 100644
index ..2407fea0651c
--- /dev/null
+++ b/Documentation/devicetree/bindings/virtio/iommu.txt
@@ -0,0 +1,66 @@
+* virtio IOMMU PCI device
+
+When virtio-iommu uses the PCI transport, its programming interface is
+discovered dynamically by the PCI probing infrastructure. However the
+device tree statically describes the relation between IOMMU and DMA
+masters. Therefore, the PCI root complex that hosts the virtio-iommu
+contains a child node representing the IOMMU device explicitly.
+
+Required properties:
+
+- compatible:  Should be "virtio,pci-iommu"
+- reg: PCI address of the IOMMU. As defined in the PCI Bus
+   Binding reference [1], the reg property is a five-cell
+   address encoded as (phys.hi phys.mid phys.lo size.hi
+   size.lo). phys.hi should contain the device's BDF as
+   0b  dfff . The other cells
+   should be zero.
+- #iommu-cells:Each platform DMA master managed by the IOMMU is 
assigned
+   an endpoint ID, described by the "iommus" property [2].
+   For virtio-iommu, #iommu-cells must be 1.
+
+Notes:
+
+- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the
+  virtio-iommu node doesn't have an "iommus" property, and is omitted from
+  the iommu-map property of the root complex.
+
+Example:
+
+pcie@1000 {
+   compatible = "pci-host-ecam-generic";
+   ...
+
+   /* The IOMMU programming interface uses slot 00:01.0 */
+   iommu0: iommu@0008 {
+   compatible = "virtio,pci-iommu";
+   reg = <0x0800 0 0 0 0>;
+   #iommu-cells = <1>;
+   };
+
+   /*
+* The IOMMU manages all functions in this PCI domain except
+* itself. Omit BDF 00:01.0.
+*/
+   iommu-map = <0x0  0x0 0x8>
+   <0x9  0x9 0xfff7>;
+};
+
+pcie@2000 {
+   compatible = "pci-host-ecam-generic";
+   ...
+   /*
+* The IOMMU also manages all functions from this domain,
+* with endpoint IDs 0x1 - 0x1
+*/
+   iommu-map = <0x0  0x1 0x1>;
+};
+
+ethernet@fe001000 {
+   ...
+   /* The IOMMU manages this platform device with endpoint ID 0x2 */
+   iommus = < 0x2>;
+};
+
+[1] Documentation/devicetree/bindings/pci/pci.txt
+[2] Documentation/devicetree/bindings/iommu/iommu.txt
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 1/7] dt-bindings: virtio-mmio: Add IOMMU description

2019-05-30 Thread Jean-Philippe Brucker
The nature of a virtio-mmio node is discovered by the virtio driver at
probe time. However the DMA relation between devices must be described
statically. When a virtio-mmio node is a virtio-iommu device, it needs an
"#iommu-cells" property as specified by bindings/iommu/iommu.txt.

Otherwise, the virtio-mmio device may perform DMA through an IOMMU, which
requires an "iommus" property. Describe these requirements in the
device-tree bindings documentation.

Reviewed-by: Rob Herring 
Reviewed-by: Eric Auger 
Signed-off-by: Jean-Philippe Brucker 
---
 .../devicetree/bindings/virtio/mmio.txt   | 30 +++
 1 file changed, 30 insertions(+)

diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt 
b/Documentation/devicetree/bindings/virtio/mmio.txt
index 5069c1b8e193..21af30fbb81f 100644
--- a/Documentation/devicetree/bindings/virtio/mmio.txt
+++ b/Documentation/devicetree/bindings/virtio/mmio.txt
@@ -8,10 +8,40 @@ Required properties:
 - reg: control registers base address and size including configuration 
space
 - interrupts:  interrupt generated by the device
 
+Required properties for virtio-iommu:
+
+- #iommu-cells:When the node corresponds to a virtio-iommu device, it 
is
+   linked to DMA masters using the "iommus" or "iommu-map"
+   properties [1][2]. #iommu-cells specifies the size of the
+   "iommus" property. For virtio-iommu #iommu-cells must be
+   1, each cell describing a single endpoint ID.
+
+Optional properties:
+
+- iommus:  If the device accesses memory through an IOMMU, it should
+   have an "iommus" property [1]. Since virtio-iommu itself
+   does not access memory through an IOMMU, the "virtio,mmio"
+   node cannot have both an "#iommu-cells" and an "iommus"
+   property.
+
 Example:
 
virtio_block@3000 {
compatible = "virtio,mmio";
reg = <0x3000 0x100>;
interrupts = <41>;
+
+   /* Device has endpoint ID 23 */
+   iommus = < 23>
}
+
+   viommu: iommu@3100 {
+   compatible = "virtio,mmio";
+   reg = <0x3100 0x100>;
+   interrupts = <42>;
+
+   #iommu-cells = <1>
+   }
+
+[1] Documentation/devicetree/bindings/iommu/iommu.txt
+[2] Documentation/devicetree/bindings/pci/pci-iommu.txt
-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH v8 0/7] Add virtio-iommu driver

2019-05-30 Thread Jean-Philippe Brucker
Implement the virtio-iommu driver, following specification v0.12 [1].
Since last version [2] we've worked on improving the specification,
which resulted in the following changes to the interface:
* Remove the EXEC flag.
* Add feature bit for the MMIO flag.
* Change domain_bits to domain_range.

Given that there were small changes to patch 5/7, I removed the review
and test tags. Please find the code at [3].

[1] Virtio-iommu specification v0.12, sources and pdf
git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.12
http://jpbrucker.net/virtio-iommu/spec/v0.12/virtio-iommu-v0.12.pdf

http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-dev-diff-v0.11-v0.12.pdf

[2] [PATCH v7 0/7] Add virtio-iommu driver

https://lore.kernel.org/linux-pci/0ba215f5-e856-bf31-8dd9-a85710714...@arm.com/T/

[3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.12
git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.12

Jean-Philippe Brucker (7):
  dt-bindings: virtio-mmio: Add IOMMU description
  dt-bindings: virtio: Add virtio-pci-iommu node
  of: Allow the iommu-map property to omit untranslated devices
  PCI: OF: Initialize dev->fwnode appropriately
  iommu: Add virtio-iommu driver
  iommu/virtio: Add probe request
  iommu/virtio: Add event queue

 .../devicetree/bindings/virtio/iommu.txt  |   66 +
 .../devicetree/bindings/virtio/mmio.txt   |   30 +
 MAINTAINERS   |7 +
 drivers/iommu/Kconfig |   11 +
 drivers/iommu/Makefile|1 +
 drivers/iommu/virtio-iommu.c  | 1176 +
 drivers/of/base.c |   10 +-
 drivers/pci/of.c  |6 +
 include/uapi/linux/virtio_ids.h   |1 +
 include/uapi/linux/virtio_iommu.h |  165 +++
 10 files changed, 1470 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt
 create mode 100644 drivers/iommu/virtio-iommu.c
 create mode 100644 include/uapi/linux/virtio_iommu.h

-- 
2.21.0

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()

2019-05-30 Thread Jason Wang


On 2019/5/30 下午6:10, Stefano Garzarella wrote:

On Thu, May 30, 2019 at 05:46:18PM +0800, Jason Wang wrote:

On 2019/5/29 下午6:58, Stefano Garzarella wrote:

On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote:

On 2019/5/28 下午6:56, Stefano Garzarella wrote:

We flush all pending works before to call vdev->config->reset(vdev),
but other works can be queued before the vdev->config->del_vqs(vdev),
so we add another flush after it, to avoid use after free.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
net/vmw_vsock/virtio_transport.c | 23 +--
1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e694df10ab61..ad093ce96693 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
return ret;
}
+static void virtio_vsock_flush_works(struct virtio_vsock *vsock)
+{
+   flush_work(>loopback_work);
+   flush_work(>rx_work);
+   flush_work(>tx_work);
+   flush_work(>event_work);
+   flush_work(>send_pkt_work);
+}
+
static void virtio_vsock_remove(struct virtio_device *vdev)
{
struct virtio_vsock *vsock = vdev->priv;
@@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
mutex_lock(_virtio_vsock_mutex);
the_virtio_vsock = NULL;
-   flush_work(>loopback_work);
-   flush_work(>rx_work);
-   flush_work(>tx_work);
-   flush_work(>event_work);
-   flush_work(>send_pkt_work);
-
/* Reset all connected sockets when the device disappear */
vsock_for_each_connected_socket(virtio_vsock_reset_sock);
@@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
vsock->event_run = false;
mutex_unlock(>event_lock);
+   /* Flush all pending works */
+   virtio_vsock_flush_works(vsock);
+
/* Flush all device writes and interrupts, device will not use any
 * more buffers.
 */
@@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
/* Delete virtqueues and flush outstanding callbacks if any */
vdev->config->del_vqs(vdev);
+   /* Other works can be queued before 'config->del_vqs()', so we flush
+* all works before to free the vsock object to avoid use after free.
+*/
+   virtio_vsock_flush_works(vsock);

Some questions after a quick glance:

1) It looks to me that the work could be queued from the path of
vsock_transport_cancel_pkt() . Is that synchronized here?


Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can
queue work from the upper layer (socket).

Setting the_virtio_vsock to NULL, should synchronize, but after a careful look
a rare issue could happen:
we are setting the_virtio_vsock to NULL at the start of .remove() and we
are freeing the object pointed by it at the end of .remove(), so
virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be
running, accessing the object that we are freed.


Yes, that's my point.



Should I use something like RCU to prevent this issue?

  virtio_transport_send_pkt() and vsock_transport_cancel_pkt()
  {
  rcu_read_lock();
  vsock = rcu_dereference(the_virtio_vsock_mutex);


RCU is probably a way to go. (Like what vhost_transport_send_pkt() did).


Okay, I'm going this way.


  ...
  rcu_read_unlock();
  }

  virtio_vsock_remove()
  {
  rcu_assign_pointer(the_virtio_vsock_mutex, NULL);
  synchronize_rcu();

  ...

  free(vsock);
  }

Could there be a better approach?



2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still
needed? It looks to me we've already done except that we need flush rx_work
in the end since send_pkt_work can requeue rx_work.

The main reason of tx_run/rx_run/event_run is to prevent that a worker
function is running while we are calling config->reset().

E.g. if an interrupt comes between virtio_vsock_flush_works() and
config->reset(), it can queue new works that can access the device while
we are in config->reset().

IMHO they are still needed.

What do you think?


I mean could we simply do flush after reset once and without tx_rx/rx_run
tricks?

rest();

virtio_vsock_flush_work();

virtio_vsock_free_buf();

My only doubt is:
is it safe to call config->reset() while a worker function could access
the device?

I had this doubt reading the Michael's advice[1] and looking at
virtnet_remove() where there are these lines before the config->reset():

/* Make sure no work handler is accessing the device. */
flush_work(>config_work);

Thanks,
Stefano

[1] https://lore.kernel.org/netdev/20190521055650-mutt-send-email-...@kernel.org



Good point. Then I agree with you. But if we can use the RCU to detect 
the detach of 

Re: [PATCH 1/4] vsock/virtio: fix locking around 'the_virtio_vsock'

2019-05-30 Thread Stefano Garzarella
On Wed, May 29, 2019 at 09:28:52PM -0700, David Miller wrote:
> From: Stefano Garzarella 
> Date: Tue, 28 May 2019 12:56:20 +0200
> 
> > @@ -68,7 +68,13 @@ struct virtio_vsock {
> >  
> >  static struct virtio_vsock *virtio_vsock_get(void)
> >  {
> > -   return the_virtio_vsock;
> > +   struct virtio_vsock *vsock;
> > +
> > +   mutex_lock(_virtio_vsock_mutex);
> > +   vsock = the_virtio_vsock;
> > +   mutex_unlock(_virtio_vsock_mutex);
> > +
> > +   return vsock;
> 
> This doesn't do anything as far as I can tell.
> 
> No matter what, you will either get the value before it's changed or
> after it's changed.
> 
> Since you should never publish the pointer by assigning it until the
> object is fully initialized, this can never be a problem even without
> the mutex being there.
> 
> Even if you sampled the the_virtio_sock value right before it's being
> set to NULL by the remove function, that still can happen with the
> mutex held too.

Yes, I found out when I was answering Jason's question [1]. :(

I proposed to use RCU to solve this issue, do you agree?
Let me know if there is a better way.

> 
> This function is also terribly named btw, it implies that a reference
> count is being taken.  But that's not what this function does, it
> just returns the pointer value as-is.

What do you think if I remove the function, using directly the_virtio_vsock?
(should be easier to use with RCU API)

Thanks,
Stefano

[1] https://lore.kernel.org/netdev/20190529105832.oz3sagbne5teq3nt@steredhat
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()

2019-05-30 Thread Stefano Garzarella
On Thu, May 30, 2019 at 05:46:18PM +0800, Jason Wang wrote:
> 
> On 2019/5/29 下午6:58, Stefano Garzarella wrote:
> > On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote:
> > > On 2019/5/28 下午6:56, Stefano Garzarella wrote:
> > > > We flush all pending works before to call vdev->config->reset(vdev),
> > > > but other works can be queued before the vdev->config->del_vqs(vdev),
> > > > so we add another flush after it, to avoid use after free.
> > > > 
> > > > Suggested-by: Michael S. Tsirkin 
> > > > Signed-off-by: Stefano Garzarella 
> > > > ---
> > > >net/vmw_vsock/virtio_transport.c | 23 +--
> > > >1 file changed, 17 insertions(+), 6 deletions(-)
> > > > 
> > > > diff --git a/net/vmw_vsock/virtio_transport.c 
> > > > b/net/vmw_vsock/virtio_transport.c
> > > > index e694df10ab61..ad093ce96693 100644
> > > > --- a/net/vmw_vsock/virtio_transport.c
> > > > +++ b/net/vmw_vsock/virtio_transport.c
> > > > @@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device 
> > > > *vdev)
> > > > return ret;
> > > >}
> > > > +static void virtio_vsock_flush_works(struct virtio_vsock *vsock)
> > > > +{
> > > > +   flush_work(>loopback_work);
> > > > +   flush_work(>rx_work);
> > > > +   flush_work(>tx_work);
> > > > +   flush_work(>event_work);
> > > > +   flush_work(>send_pkt_work);
> > > > +}
> > > > +
> > > >static void virtio_vsock_remove(struct virtio_device *vdev)
> > > >{
> > > > struct virtio_vsock *vsock = vdev->priv;
> > > > @@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct 
> > > > virtio_device *vdev)
> > > > mutex_lock(_virtio_vsock_mutex);
> > > > the_virtio_vsock = NULL;
> > > > -   flush_work(>loopback_work);
> > > > -   flush_work(>rx_work);
> > > > -   flush_work(>tx_work);
> > > > -   flush_work(>event_work);
> > > > -   flush_work(>send_pkt_work);
> > > > -
> > > > /* Reset all connected sockets when the device disappear */
> > > > vsock_for_each_connected_socket(virtio_vsock_reset_sock);
> > > > @@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct 
> > > > virtio_device *vdev)
> > > > vsock->event_run = false;
> > > > mutex_unlock(>event_lock);
> > > > +   /* Flush all pending works */
> > > > +   virtio_vsock_flush_works(vsock);
> > > > +
> > > > /* Flush all device writes and interrupts, device will not use 
> > > > any
> > > >  * more buffers.
> > > >  */
> > > > @@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct 
> > > > virtio_device *vdev)
> > > > /* Delete virtqueues and flush outstanding callbacks if any */
> > > > vdev->config->del_vqs(vdev);
> > > > +   /* Other works can be queued before 'config->del_vqs()', so we 
> > > > flush
> > > > +* all works before to free the vsock object to avoid use after 
> > > > free.
> > > > +*/
> > > > +   virtio_vsock_flush_works(vsock);
> > > 
> > > Some questions after a quick glance:
> > > 
> > > 1) It looks to me that the work could be queued from the path of
> > > vsock_transport_cancel_pkt() . Is that synchronized here?
> > > 
> > Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can
> > queue work from the upper layer (socket).
> > 
> > Setting the_virtio_vsock to NULL, should synchronize, but after a careful 
> > look
> > a rare issue could happen:
> > we are setting the_virtio_vsock to NULL at the start of .remove() and we
> > are freeing the object pointed by it at the end of .remove(), so
> > virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be
> > running, accessing the object that we are freed.
> 
> 
> Yes, that's my point.
> 
> 
> > 
> > Should I use something like RCU to prevent this issue?
> > 
> >  virtio_transport_send_pkt() and vsock_transport_cancel_pkt()
> >  {
> >  rcu_read_lock();
> >  vsock = rcu_dereference(the_virtio_vsock_mutex);
> 
> 
> RCU is probably a way to go. (Like what vhost_transport_send_pkt() did).
> 

Okay, I'm going this way.

> 
> >  ...
> >  rcu_read_unlock();
> >  }
> > 
> >  virtio_vsock_remove()
> >  {
> >  rcu_assign_pointer(the_virtio_vsock_mutex, NULL);
> >  synchronize_rcu();
> > 
> >  ...
> > 
> >  free(vsock);
> >  }
> > 
> > Could there be a better approach?
> > 
> > 
> > > 2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still
> > > needed? It looks to me we've already done except that we need flush 
> > > rx_work
> > > in the end since send_pkt_work can requeue rx_work.
> > The main reason of tx_run/rx_run/event_run is to prevent that a worker
> > function is running while we are calling config->reset().
> > 
> > E.g. if an interrupt comes between virtio_vsock_flush_works() and
> > config->reset(), it can queue new works that can access the device while
> > we are in config->reset().
> > 
> > IMHO they are 

Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()

2019-05-30 Thread Jason Wang


On 2019/5/29 下午6:58, Stefano Garzarella wrote:

On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote:

On 2019/5/28 下午6:56, Stefano Garzarella wrote:

We flush all pending works before to call vdev->config->reset(vdev),
but other works can be queued before the vdev->config->del_vqs(vdev),
so we add another flush after it, to avoid use after free.

Suggested-by: Michael S. Tsirkin 
Signed-off-by: Stefano Garzarella 
---
   net/vmw_vsock/virtio_transport.c | 23 +--
   1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index e694df10ab61..ad093ce96693 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev)
return ret;
   }
+static void virtio_vsock_flush_works(struct virtio_vsock *vsock)
+{
+   flush_work(>loopback_work);
+   flush_work(>rx_work);
+   flush_work(>tx_work);
+   flush_work(>event_work);
+   flush_work(>send_pkt_work);
+}
+
   static void virtio_vsock_remove(struct virtio_device *vdev)
   {
struct virtio_vsock *vsock = vdev->priv;
@@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
mutex_lock(_virtio_vsock_mutex);
the_virtio_vsock = NULL;
-   flush_work(>loopback_work);
-   flush_work(>rx_work);
-   flush_work(>tx_work);
-   flush_work(>event_work);
-   flush_work(>send_pkt_work);
-
/* Reset all connected sockets when the device disappear */
vsock_for_each_connected_socket(virtio_vsock_reset_sock);
@@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
vsock->event_run = false;
mutex_unlock(>event_lock);
+   /* Flush all pending works */
+   virtio_vsock_flush_works(vsock);
+
/* Flush all device writes and interrupts, device will not use any
 * more buffers.
 */
@@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev)
/* Delete virtqueues and flush outstanding callbacks if any */
vdev->config->del_vqs(vdev);
+   /* Other works can be queued before 'config->del_vqs()', so we flush
+* all works before to free the vsock object to avoid use after free.
+*/
+   virtio_vsock_flush_works(vsock);


Some questions after a quick glance:

1) It looks to me that the work could be queued from the path of
vsock_transport_cancel_pkt() . Is that synchronized here?


Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can
queue work from the upper layer (socket).

Setting the_virtio_vsock to NULL, should synchronize, but after a careful look
a rare issue could happen:
we are setting the_virtio_vsock to NULL at the start of .remove() and we
are freeing the object pointed by it at the end of .remove(), so
virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be
running, accessing the object that we are freed.



Yes, that's my point.




Should I use something like RCU to prevent this issue?

 virtio_transport_send_pkt() and vsock_transport_cancel_pkt()
 {
 rcu_read_lock();
 vsock = rcu_dereference(the_virtio_vsock_mutex);



RCU is probably a way to go. (Like what vhost_transport_send_pkt() did).



 ...
 rcu_read_unlock();
 }

 virtio_vsock_remove()
 {
 rcu_assign_pointer(the_virtio_vsock_mutex, NULL);
 synchronize_rcu();

 ...

 free(vsock);
 }

Could there be a better approach?



2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still
needed? It looks to me we've already done except that we need flush rx_work
in the end since send_pkt_work can requeue rx_work.

The main reason of tx_run/rx_run/event_run is to prevent that a worker
function is running while we are calling config->reset().

E.g. if an interrupt comes between virtio_vsock_flush_works() and
config->reset(), it can queue new works that can access the device while
we are in config->reset().

IMHO they are still needed.

What do you think?



I mean could we simply do flush after reset once and without 
tx_rx/rx_run tricks?


rest();

virtio_vsock_flush_work();

virtio_vsock_free_buf();


Thanks





Thanks for your questions,
Stefano

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 22/22] docs: fix broken documentation links

2019-05-30 Thread Wolfram Sang
On Wed, May 29, 2019 at 08:23:53PM -0300, Mauro Carvalho Chehab wrote:
> Mostly due to x86 and acpi conversion, several documentation
> links are still pointing to the old file. Fix them.
> 
> Signed-off-by: Mauro Carvalho Chehab 

Didn't I ack this already?

For the I2C part:

Reviewed-by: Wolfram Sang 



signature.asc
Description: PGP signature
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization