Re: [PATCH 22/22] docs: fix broken documentation links
On Thu, May 30, 2019 at 10:17:32PM +0200, Federico Vaga wrote: > On Thursday, May 30, 2019 1:23:53 AM CEST Mauro Carvalho Chehab wrote: > > Mostly due to x86 and acpi conversion, several documentation > > links are still pointing to the old file. Fix them. > > For the Italian documentation I just send I patch to fix them in a dedicated > patch Acked-by: Michael S. Tsirkin for the vhost change. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > Documentation/acpi/dsd/leds.txt | 2 +- > > Documentation/admin-guide/kernel-parameters.rst | 6 +++--- > > Documentation/admin-guide/kernel-parameters.txt | 16 > > Documentation/admin-guide/ras.rst| 2 +- > > .../devicetree/bindings/net/fsl-enetc.txt| 7 +++ > > .../bindings/pci/amlogic,meson-pcie.txt | 2 +- > > .../bindings/regulator/qcom,rpmh-regulator.txt | 2 +- > > Documentation/devicetree/booting-without-of.txt | 2 +- > > Documentation/driver-api/gpio/board.rst | 2 +- > > Documentation/driver-api/gpio/consumer.rst | 2 +- > > .../firmware-guide/acpi/enumeration.rst | 2 +- > > .../firmware-guide/acpi/method-tracing.rst | 2 +- > > Documentation/i2c/instantiating-devices | 2 +- > > Documentation/sysctl/kernel.txt | 4 ++-- > > .../translations/it_IT/process/howto.rst | 2 +- > > .../it_IT/process/stable-kernel-rules.rst| 4 ++-- > > .../translations/zh_CN/process/4.Coding.rst | 2 +- > > Documentation/x86/x86_64/5level-paging.rst | 2 +- > > Documentation/x86/x86_64/boot-options.rst| 4 ++-- > > .../x86/x86_64/fake-numa-for-cpusets.rst | 2 +- > > MAINTAINERS | 6 +++--- > > arch/arm/Kconfig | 2 +- > > arch/arm64/kernel/kexec_image.c | 2 +- > > arch/powerpc/Kconfig | 2 +- > > arch/x86/Kconfig | 16 > > arch/x86/Kconfig.debug | 2 +- > > arch/x86/boot/header.S | 2 +- > > arch/x86/entry/entry_64.S| 2 +- > > arch/x86/include/asm/bootparam_utils.h | 2 +- > > arch/x86/include/asm/page_64_types.h | 2 +- > > arch/x86/include/asm/pgtable_64_types.h | 2 +- > > arch/x86/kernel/cpu/microcode/amd.c | 2 +- > > arch/x86/kernel/kexec-bzimage64.c| 2 +- > > arch/x86/kernel/pci-dma.c| 2 +- > > arch/x86/mm/tlb.c| 2 +- > > arch/x86/platform/pvh/enlighten.c| 2 +- > > drivers/acpi/Kconfig | 10 +- > > drivers/net/ethernet/faraday/ftgmac100.c | 2 +- > > .../fieldbus/Documentation/fieldbus_dev.txt | 4 ++-- > > drivers/vhost/vhost.c| 2 +- > > include/acpi/acpi_drivers.h | 2 +- > > include/linux/fs_context.h | 2 +- > > include/linux/lsm_hooks.h| 2 +- > > mm/Kconfig | 2 +- > > security/Kconfig | 2 +- > > tools/include/linux/err.h| 2 +- > > tools/objtool/Documentation/stack-validation.txt | 4 ++-- > > tools/testing/selftests/x86/protection_keys.c| 2 +- > > 48 files changed, 77 insertions(+), 78 deletions(-) > > > > diff --git a/Documentation/acpi/dsd/leds.txt > > b/Documentation/acpi/dsd/leds.txt index 81a63af42ed2..cc58b1a574c5 100644 > > --- a/Documentation/acpi/dsd/leds.txt > > +++ b/Documentation/acpi/dsd/leds.txt > > @@ -96,4 +96,4 @@ where > > > > http://www.uefi.org/sites/default/files/resources/_DSD-hierarchical-da > > ta-extension-UUID-v1.1.pdf>, referenced 2019-02-21. > > > > -[7] Documentation/acpi/dsd/data-node-reference.txt > > +[7] Documentation/firmware-guide/acpi/dsd/data-node-references.rst > > diff --git a/Documentation/admin-guide/kernel-parameters.rst > > b/Documentation/admin-guide/kernel-parameters.rst index > > 0124980dca2d..8d3273e32eb1 100644 > > --- a/Documentation/admin-guide/kernel-parameters.rst > > +++ b/Documentation/admin-guide/kernel-parameters.rst > > @@ -167,7 +167,7 @@ parameter is applicable:: > > X86-32 X86-32, aka i386 architecture is enabled. > > X86-64 X86-64 architecture is enabled. > > More X86-64 boot options can be found in > > - Documentation/x86/x86_64/boot-options.txt > . > > + Documentation/x86/x86_64/boot-options.rst. > > X86 Either 32-bit or 64-bit x86 (same as X86-32+X86-64) > > X86_UV SGI UV support is enabled. > > XEN Xen support is enabled > > @@ -181,10 +181,10 @@ In addition, the following text indicates that the > > option:: Parameters denoted with BOOT are actually
Re: [PATCH net-next 0/6] vhost: accelerate metadata access
From: "Michael S. Tsirkin" Date: Thu, 30 May 2019 14:13:28 -0400 > On Thu, May 30, 2019 at 11:07:30AM -0700, David Miller wrote: >> From: Jason Wang >> Date: Fri, 24 May 2019 04:12:12 -0400 >> >> > This series tries to access virtqueue metadata through kernel virtual >> > address instead of copy_user() friends since they had too much >> > overheads like checks, spec barriers or even hardware feature >> > toggling like SMAP. This is done through setup kernel address through >> > direct mapping and co-opreate VM management with MMU notifiers. >> > >> > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see >> > obvious improvement. >> >> I'm still waiting for some review from mst. >> >> If I don't see any review soon I will just wipe these changes from >> patchwork as it serves no purpose to just let them rot there. >> >> Thank you. > > I thought we agreed I'm merging this through my tree, not net-next. > So you can safely wipe it. Aha, I didn't catch that, thanks! ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/6] vhost: accelerate metadata access
On Thu, May 30, 2019 at 11:07:30AM -0700, David Miller wrote: > From: Jason Wang > Date: Fri, 24 May 2019 04:12:12 -0400 > > > This series tries to access virtqueue metadata through kernel virtual > > address instead of copy_user() friends since they had too much > > overheads like checks, spec barriers or even hardware feature > > toggling like SMAP. This is done through setup kernel address through > > direct mapping and co-opreate VM management with MMU notifiers. > > > > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see > > obvious improvement. > > I'm still waiting for some review from mst. > > If I don't see any review soon I will just wipe these changes from > patchwork as it serves no purpose to just let them rot there. > > Thank you. I thought we agreed I'm merging this through my tree, not net-next. So you can safely wipe it. Thanks! -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net-next 0/6] vhost: accelerate metadata access
From: Jason Wang Date: Fri, 24 May 2019 04:12:12 -0400 > This series tries to access virtqueue metadata through kernel virtual > address instead of copy_user() friends since they had too much > overheads like checks, spec barriers or even hardware feature > toggling like SMAP. This is done through setup kernel address through > direct mapping and co-opreate VM management with MMU notifiers. > > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see > obvious improvement. I'm still waiting for some review from mst. If I don't see any review soon I will just wipe these changes from patchwork as it serves no purpose to just let them rot there. Thank you. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node
On Thu, May 30, 2019 at 06:09:24PM +0100, Jean-Philippe Brucker wrote: > Some systems implement virtio-iommu as a PCI endpoint. The operating > system needs to discover the relationship between IOMMU and masters long > before the PCI endpoint gets probed. Add a PCI child node to describe the > virtio-iommu device. > > The virtio-pci-iommu is conceptually split between a PCI programming > interface and a translation component on the parent bus. The latter > doesn't have a node in the device tree. The virtio-pci-iommu node > describes both, by linking the PCI endpoint to "iommus" property of DMA > master nodes and to "iommu-map" properties of bus nodes. > > Reviewed-by: Rob Herring > Reviewed-by: Eric Auger > Signed-off-by: Jean-Philippe Brucker So this is just an example right? We are not defining any new properties or anything like that. I think down the road for non dt platforms we want to put this info in the config space of the device. I do not think ACPI is the best option for this since not all systems have it. But that can wait. > --- > .../devicetree/bindings/virtio/iommu.txt | 66 +++ > 1 file changed, 66 insertions(+) > create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt > > diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt > b/Documentation/devicetree/bindings/virtio/iommu.txt > new file mode 100644 > index ..2407fea0651c > --- /dev/null > +++ b/Documentation/devicetree/bindings/virtio/iommu.txt > @@ -0,0 +1,66 @@ > +* virtio IOMMU PCI device > + > +When virtio-iommu uses the PCI transport, its programming interface is > +discovered dynamically by the PCI probing infrastructure. However the > +device tree statically describes the relation between IOMMU and DMA > +masters. Therefore, the PCI root complex that hosts the virtio-iommu > +contains a child node representing the IOMMU device explicitly. > + > +Required properties: > + > +- compatible:Should be "virtio,pci-iommu" > +- reg: PCI address of the IOMMU. As defined in the PCI Bus > + Binding reference [1], the reg property is a five-cell > + address encoded as (phys.hi phys.mid phys.lo size.hi > + size.lo). phys.hi should contain the device's BDF as > + 0b dfff . The other cells > + should be zero. > +- #iommu-cells: Each platform DMA master managed by the IOMMU is > assigned > + an endpoint ID, described by the "iommus" property [2]. > + For virtio-iommu, #iommu-cells must be 1. > + > +Notes: > + > +- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the > + virtio-iommu node doesn't have an "iommus" property, and is omitted from > + the iommu-map property of the root complex. > + > +Example: > + > +pcie@1000 { > + compatible = "pci-host-ecam-generic"; > + ... > + > + /* The IOMMU programming interface uses slot 00:01.0 */ > + iommu0: iommu@0008 { > + compatible = "virtio,pci-iommu"; > + reg = <0x0800 0 0 0 0>; > + #iommu-cells = <1>; > + }; > + > + /* > + * The IOMMU manages all functions in this PCI domain except > + * itself. Omit BDF 00:01.0. > + */ > + iommu-map = <0x0 0x0 0x8> > + <0x9 0x9 0xfff7>; > +}; > + > +pcie@2000 { > + compatible = "pci-host-ecam-generic"; > + ... > + /* > + * The IOMMU also manages all functions from this domain, > + * with endpoint IDs 0x1 - 0x1 > + */ > + iommu-map = <0x0 0x1 0x1>; > +}; > + > +ethernet@fe001000 { > + ... > + /* The IOMMU manages this platform device with endpoint ID 0x2 */ > + iommus = < 0x2>; > +}; > + > +[1] Documentation/devicetree/bindings/pci/pci.txt > +[2] Documentation/devicetree/bindings/iommu/iommu.txt > -- > 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v8 7/7] iommu/virtio: Add event queue
The event queue offers a way for the device to report access faults from endpoints. It is implemented on virtqueue #1. Whenever the host needs to signal a fault, it fills one of the buffers offered by the guest and interrupts it. Acked-by: Joerg Roedel Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 115 +++--- include/uapi/linux/virtio_iommu.h | 19 + 2 files changed, 125 insertions(+), 9 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index 5d4947c47420..2688cdcac6e5 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -29,7 +29,8 @@ #define MSI_IOVA_LENGTH0x10 #define VIOMMU_REQUEST_VQ 0 -#define VIOMMU_NR_VQS 1 +#define VIOMMU_EVENT_VQ1 +#define VIOMMU_NR_VQS 2 struct viommu_dev { struct iommu_device iommu; @@ -41,6 +42,7 @@ struct viommu_dev { struct virtqueue*vqs[VIOMMU_NR_VQS]; spinlock_t request_lock; struct list_headrequests; + void*evts; /* Device configuration */ struct iommu_domain_geometrygeometry; @@ -86,6 +88,15 @@ struct viommu_request { charbuf[]; }; +#define VIOMMU_FAULT_RESV_MASK 0xff00 + +struct viommu_event { + union { + u32 head; + struct virtio_iommu_fault fault; + }; +}; + #define to_viommu_domain(domain) \ container_of(domain, struct viommu_domain, domain) @@ -509,6 +520,68 @@ static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) return ret; } +static int viommu_fault_handler(struct viommu_dev *viommu, + struct virtio_iommu_fault *fault) +{ + char *reason_str; + + u8 reason = fault->reason; + u32 flags = le32_to_cpu(fault->flags); + u32 endpoint= le32_to_cpu(fault->endpoint); + u64 address = le64_to_cpu(fault->address); + + switch (reason) { + case VIRTIO_IOMMU_FAULT_R_DOMAIN: + reason_str = "domain"; + break; + case VIRTIO_IOMMU_FAULT_R_MAPPING: + reason_str = "page"; + break; + case VIRTIO_IOMMU_FAULT_R_UNKNOWN: + default: + reason_str = "unknown"; + break; + } + + /* TODO: find EP by ID and report_iommu_fault */ + if (flags & VIRTIO_IOMMU_FAULT_F_ADDRESS) + dev_err_ratelimited(viommu->dev, "%s fault from EP %u at %#llx [%s%s%s]\n", + reason_str, endpoint, address, + flags & VIRTIO_IOMMU_FAULT_F_READ ? "R" : "", + flags & VIRTIO_IOMMU_FAULT_F_WRITE ? "W" : "", + flags & VIRTIO_IOMMU_FAULT_F_EXEC ? "X" : ""); + else + dev_err_ratelimited(viommu->dev, "%s fault from EP %u\n", + reason_str, endpoint); + return 0; +} + +static void viommu_event_handler(struct virtqueue *vq) +{ + int ret; + unsigned int len; + struct scatterlist sg[1]; + struct viommu_event *evt; + struct viommu_dev *viommu = vq->vdev->priv; + + while ((evt = virtqueue_get_buf(vq, )) != NULL) { + if (len > sizeof(*evt)) { + dev_err(viommu->dev, + "invalid event buffer (len %u != %zu)\n", + len, sizeof(*evt)); + } else if (!(evt->head & VIOMMU_FAULT_RESV_MASK)) { + viommu_fault_handler(viommu, >fault); + } + + sg_init_one(sg, evt, sizeof(*evt)); + ret = virtqueue_add_inbuf(vq, sg, 1, evt, GFP_ATOMIC); + if (ret) + dev_err(viommu->dev, "could not add event buffer\n"); + } + + virtqueue_kick(vq); +} + /* IOMMU API */ static struct iommu_domain *viommu_domain_alloc(unsigned type) @@ -895,16 +968,35 @@ static struct iommu_ops viommu_ops = { static int viommu_init_vqs(struct viommu_dev *viommu) { struct virtio_device *vdev = dev_to_virtio(viommu->dev); - const char *name = "request"; - void *ret; + const char *names[] = { "request", "event" }; + vq_callback_t *callbacks[] = { + NULL, /* No async requests */ + viommu_event_handler, + }; - ret = virtio_find_single_vq(vdev, NULL, name); - if (IS_ERR(ret)) { - dev_err(viommu->dev, "cannot find VQ\n"); - return PTR_ERR(ret); - } + return virtio_find_vqs(vdev, VIOMMU_NR_VQS, viommu->vqs, callbacks, +
[PATCH v8 6/7] iommu/virtio: Add probe request
When the device offers the probe feature, send a probe request for each device managed by the IOMMU. Extract RESV_MEM information. When we encounter a MSI doorbell region, set it up as a IOMMU_RESV_MSI region. This will tell other subsystems that there is no need to map the MSI doorbell in the virtio-iommu, because MSIs bypass it. Acked-by: Joerg Roedel Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- drivers/iommu/virtio-iommu.c | 157 -- include/uapi/linux/virtio_iommu.h | 36 +++ 2 files changed, 187 insertions(+), 6 deletions(-) diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c index b2719a87c3c5..5d4947c47420 100644 --- a/drivers/iommu/virtio-iommu.c +++ b/drivers/iommu/virtio-iommu.c @@ -49,6 +49,7 @@ struct viommu_dev { u32 last_domain; /* Supported MAP flags */ u32 map_flags; + u32 probe_size; }; struct viommu_mapping { @@ -71,8 +72,10 @@ struct viommu_domain { }; struct viommu_endpoint { + struct device *dev; struct viommu_dev *viommu; struct viommu_domain*vdomain; + struct list_headresv_regions; }; struct viommu_request { @@ -125,6 +128,9 @@ static off_t viommu_get_write_desc_offset(struct viommu_dev *viommu, { size_t tail_size = sizeof(struct virtio_iommu_req_tail); + if (req->type == VIRTIO_IOMMU_T_PROBE) + return len - viommu->probe_size - tail_size; + return len - tail_size; } @@ -399,6 +405,110 @@ static int viommu_replay_mappings(struct viommu_domain *vdomain) return ret; } +static int viommu_add_resv_mem(struct viommu_endpoint *vdev, + struct virtio_iommu_probe_resv_mem *mem, + size_t len) +{ + size_t size; + u64 start64, end64; + phys_addr_t start, end; + struct iommu_resv_region *region = NULL; + unsigned long prot = IOMMU_WRITE | IOMMU_NOEXEC | IOMMU_MMIO; + + start = start64 = le64_to_cpu(mem->start); + end = end64 = le64_to_cpu(mem->end); + size = end64 - start64 + 1; + + /* Catch any overflow, including the unlikely end64 - start64 + 1 = 0 */ + if (start != start64 || end != end64 || size < end64 - start64) + return -EOVERFLOW; + + if (len < sizeof(*mem)) + return -EINVAL; + + switch (mem->subtype) { + default: + dev_warn(vdev->dev, "unknown resv mem subtype 0x%x\n", +mem->subtype); + /* Fall-through */ + case VIRTIO_IOMMU_RESV_MEM_T_RESERVED: + region = iommu_alloc_resv_region(start, size, 0, +IOMMU_RESV_RESERVED); + break; + case VIRTIO_IOMMU_RESV_MEM_T_MSI: + region = iommu_alloc_resv_region(start, size, prot, +IOMMU_RESV_MSI); + break; + } + if (!region) + return -ENOMEM; + + list_add(>resv_regions, >list); + return 0; +} + +static int viommu_probe_endpoint(struct viommu_dev *viommu, struct device *dev) +{ + int ret; + u16 type, len; + size_t cur = 0; + size_t probe_len; + struct virtio_iommu_req_probe *probe; + struct virtio_iommu_probe_property *prop; + struct iommu_fwspec *fwspec = dev_iommu_fwspec_get(dev); + struct viommu_endpoint *vdev = fwspec->iommu_priv; + + if (!fwspec->num_ids) + return -EINVAL; + + probe_len = sizeof(*probe) + viommu->probe_size + + sizeof(struct virtio_iommu_req_tail); + probe = kzalloc(probe_len, GFP_KERNEL); + if (!probe) + return -ENOMEM; + + probe->head.type = VIRTIO_IOMMU_T_PROBE; + /* +* For now, assume that properties of an endpoint that outputs multiple +* IDs are consistent. Only probe the first one. +*/ + probe->endpoint = cpu_to_le32(fwspec->ids[0]); + + ret = viommu_send_req_sync(viommu, probe, probe_len); + if (ret) + goto out_free; + + prop = (void *)probe->properties; + type = le16_to_cpu(prop->type) & VIRTIO_IOMMU_PROBE_T_MASK; + + while (type != VIRTIO_IOMMU_PROBE_T_NONE && + cur < viommu->probe_size) { + len = le16_to_cpu(prop->length) + sizeof(*prop); + + switch (type) { + case VIRTIO_IOMMU_PROBE_T_RESV_MEM: + ret = viommu_add_resv_mem(vdev, (void *)prop, len); + break; + default: + dev_err(dev, "unknown viommu prop 0x%x\n", type); + } + + if (ret) + dev_err(dev, "failed to parse viommu prop
[PATCH v8 5/7] iommu: Add virtio-iommu driver
The virtio IOMMU is a para-virtualized device, allowing to send IOMMU requests such as map/unmap over virtio transport without emulating page tables. This implementation handles ATTACH, DETACH, MAP and UNMAP requests. The bulk of the code transforms calls coming from the IOMMU API into corresponding virtio requests. Mappings are kept in an interval tree instead of page tables. A little more work is required for modular and x86 support, so for the moment the driver depends on CONFIG_VIRTIO=y and CONFIG_ARM64. Acked-by: Joerg Roedel Signed-off-by: Jean-Philippe Brucker --- MAINTAINERS | 7 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile| 1 + drivers/iommu/virtio-iommu.c | 934 ++ include/uapi/linux/virtio_ids.h | 1 + include/uapi/linux/virtio_iommu.h | 110 6 files changed, 1064 insertions(+) create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h diff --git a/MAINTAINERS b/MAINTAINERS index 429c6c624861..62bd1834d95a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16807,6 +16807,13 @@ S: Maintained F: drivers/virtio/virtio_input.c F: include/uapi/linux/virtio_input.h +VIRTIO IOMMU DRIVER +M: Jean-Philippe Brucker +L: virtualization@lists.linux-foundation.org +S: Maintained +F: drivers/iommu/virtio-iommu.c +F: include/uapi/linux/virtio_iommu.h + VIRTUAL BOX GUEST DEVICE DRIVER M: Hans de Goede M: Arnd Bergmann diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 83664db5221d..e15cdcd8cb3c 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -473,4 +473,15 @@ config HYPERV_IOMMU Stub IOMMU driver to handle IRQs as to allow Hyper-V Linux guests to run with x2APIC mode enabled. +config VIRTIO_IOMMU + bool "Virtio IOMMU driver" + depends on VIRTIO=y + depends on ARM64 + select IOMMU_API + select INTERVAL_TREE + help + Para-virtualised IOMMU driver with virtio. + + Say Y here if you intend to run this kernel as a guest. + endif # IOMMU_SUPPORT diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 8c71a15e986b..f13f36ae1af6 100644 --- a/drivers/iommu/Makefile +++ b/drivers/iommu/Makefile @@ -33,3 +33,4 @@ obj-$(CONFIG_FSL_PAMU) += fsl_pamu.o fsl_pamu_domain.o obj-$(CONFIG_S390_IOMMU) += s390-iommu.o obj-$(CONFIG_QCOM_IOMMU) += qcom_iommu.o obj-$(CONFIG_HYPERV_IOMMU) += hyperv-iommu.o +obj-$(CONFIG_VIRTIO_IOMMU) += virtio-iommu.o diff --git a/drivers/iommu/virtio-iommu.c b/drivers/iommu/virtio-iommu.c new file mode 100644 index ..b2719a87c3c5 --- /dev/null +++ b/drivers/iommu/virtio-iommu.c @@ -0,0 +1,934 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Virtio driver for the paravirtualized IOMMU + * + * Copyright (C) 2019 Arm Limited + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define MSI_IOVA_BASE 0x800 +#define MSI_IOVA_LENGTH0x10 + +#define VIOMMU_REQUEST_VQ 0 +#define VIOMMU_NR_VQS 1 + +struct viommu_dev { + struct iommu_device iommu; + struct device *dev; + struct virtio_device*vdev; + + struct ida domain_ids; + + struct virtqueue*vqs[VIOMMU_NR_VQS]; + spinlock_t request_lock; + struct list_headrequests; + + /* Device configuration */ + struct iommu_domain_geometrygeometry; + u64 pgsize_bitmap; + u32 first_domain; + u32 last_domain; + /* Supported MAP flags */ + u32 map_flags; +}; + +struct viommu_mapping { + phys_addr_t paddr; + struct interval_tree_node iova; + u32 flags; +}; + +struct viommu_domain { + struct iommu_domain domain; + struct viommu_dev *viommu; + struct mutexmutex; /* protects viommu pointer */ + unsigned intid; + u32 map_flags; + + spinlock_t mappings_lock; + struct rb_root_cached mappings; + + unsigned long nr_endpoints; +}; + +struct viommu_endpoint { + struct viommu_dev *viommu; + struct viommu_domain*vdomain; +}; + +struct viommu_request { + struct list_headlist; + void*writeback; + unsigned intwrite_offset; +
[PATCH v8 4/7] PCI: OF: Initialize dev->fwnode appropriately
For PCI devices that have an OF node, set the fwnode as well. This way drivers that rely on fwnode don't need the special case described by commit f94277af03ea ("of/platform: Initialise dev->fwnode appropriately"). Acked-by: Bjorn Helgaas Signed-off-by: Jean-Philippe Brucker --- drivers/pci/of.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/pci/of.c b/drivers/pci/of.c index 73d5adec0a28..c4f1b5507b40 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -22,12 +22,15 @@ void pci_set_of_node(struct pci_dev *dev) return; dev->dev.of_node = of_pci_find_child_device(dev->bus->dev.of_node, dev->devfn); + if (dev->dev.of_node) + dev->dev.fwnode = >dev.of_node->fwnode; } void pci_release_of_node(struct pci_dev *dev) { of_node_put(dev->dev.of_node); dev->dev.of_node = NULL; + dev->dev.fwnode = NULL; } void pci_set_bus_of_node(struct pci_bus *bus) @@ -42,12 +45,15 @@ void pci_set_bus_of_node(struct pci_bus *bus) bus->self->untrusted = true; } bus->dev.of_node = node; + if (node) + bus->dev.fwnode = >fwnode; } void pci_release_bus_of_node(struct pci_bus *bus) { of_node_put(bus->dev.of_node); bus->dev.of_node = NULL; + bus->dev.fwnode = NULL; } struct device_node * __weak pcibios_get_phb_of_node(struct pci_bus *bus) -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v8 3/7] of: Allow the iommu-map property to omit untranslated devices
In PCI root complex nodes, the iommu-map property describes the IOMMU that translates each endpoint. On some platforms, the IOMMU itself is presented as a PCI endpoint (e.g. AMD IOMMU and virtio-iommu). This isn't supported by the current OF driver, which expects all endpoints to have an IOMMU. Allow the iommu-map property to have gaps. Relaxing of_map_rid() also allows the msi-map property to have gaps, which is invalid since MSIs always reach an MSI controller. In that case pci_msi_setup_msi_irqs() will return an error when attempting to find the device's MSI domain. Reviewed-by: Rob Herring Signed-off-by: Jean-Philippe Brucker --- drivers/of/base.c | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/of/base.c b/drivers/of/base.c index 20e0e7ee4edf..55e7f5bb0549 100644 --- a/drivers/of/base.c +++ b/drivers/of/base.c @@ -2294,8 +2294,12 @@ int of_map_rid(struct device_node *np, u32 rid, return 0; } - pr_err("%pOF: Invalid %s translation - no match for rid 0x%x on %pOF\n", - np, map_name, rid, target && *target ? *target : NULL); - return -EFAULT; + pr_info("%pOF: no %s translation for rid 0x%x on %pOF\n", np, map_name, + rid, target && *target ? *target : NULL); + + /* Bypasses translation */ + if (id_out) + *id_out = rid; + return 0; } EXPORT_SYMBOL_GPL(of_map_rid); -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v8 2/7] dt-bindings: virtio: Add virtio-pci-iommu node
Some systems implement virtio-iommu as a PCI endpoint. The operating system needs to discover the relationship between IOMMU and masters long before the PCI endpoint gets probed. Add a PCI child node to describe the virtio-iommu device. The virtio-pci-iommu is conceptually split between a PCI programming interface and a translation component on the parent bus. The latter doesn't have a node in the device tree. The virtio-pci-iommu node describes both, by linking the PCI endpoint to "iommus" property of DMA master nodes and to "iommu-map" properties of bus nodes. Reviewed-by: Rob Herring Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- .../devicetree/bindings/virtio/iommu.txt | 66 +++ 1 file changed, 66 insertions(+) create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt diff --git a/Documentation/devicetree/bindings/virtio/iommu.txt b/Documentation/devicetree/bindings/virtio/iommu.txt new file mode 100644 index ..2407fea0651c --- /dev/null +++ b/Documentation/devicetree/bindings/virtio/iommu.txt @@ -0,0 +1,66 @@ +* virtio IOMMU PCI device + +When virtio-iommu uses the PCI transport, its programming interface is +discovered dynamically by the PCI probing infrastructure. However the +device tree statically describes the relation between IOMMU and DMA +masters. Therefore, the PCI root complex that hosts the virtio-iommu +contains a child node representing the IOMMU device explicitly. + +Required properties: + +- compatible: Should be "virtio,pci-iommu" +- reg: PCI address of the IOMMU. As defined in the PCI Bus + Binding reference [1], the reg property is a five-cell + address encoded as (phys.hi phys.mid phys.lo size.hi + size.lo). phys.hi should contain the device's BDF as + 0b dfff . The other cells + should be zero. +- #iommu-cells:Each platform DMA master managed by the IOMMU is assigned + an endpoint ID, described by the "iommus" property [2]. + For virtio-iommu, #iommu-cells must be 1. + +Notes: + +- DMA from the IOMMU device isn't managed by another IOMMU. Therefore the + virtio-iommu node doesn't have an "iommus" property, and is omitted from + the iommu-map property of the root complex. + +Example: + +pcie@1000 { + compatible = "pci-host-ecam-generic"; + ... + + /* The IOMMU programming interface uses slot 00:01.0 */ + iommu0: iommu@0008 { + compatible = "virtio,pci-iommu"; + reg = <0x0800 0 0 0 0>; + #iommu-cells = <1>; + }; + + /* +* The IOMMU manages all functions in this PCI domain except +* itself. Omit BDF 00:01.0. +*/ + iommu-map = <0x0 0x0 0x8> + <0x9 0x9 0xfff7>; +}; + +pcie@2000 { + compatible = "pci-host-ecam-generic"; + ... + /* +* The IOMMU also manages all functions from this domain, +* with endpoint IDs 0x1 - 0x1 +*/ + iommu-map = <0x0 0x1 0x1>; +}; + +ethernet@fe001000 { + ... + /* The IOMMU manages this platform device with endpoint ID 0x2 */ + iommus = < 0x2>; +}; + +[1] Documentation/devicetree/bindings/pci/pci.txt +[2] Documentation/devicetree/bindings/iommu/iommu.txt -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v8 1/7] dt-bindings: virtio-mmio: Add IOMMU description
The nature of a virtio-mmio node is discovered by the virtio driver at probe time. However the DMA relation between devices must be described statically. When a virtio-mmio node is a virtio-iommu device, it needs an "#iommu-cells" property as specified by bindings/iommu/iommu.txt. Otherwise, the virtio-mmio device may perform DMA through an IOMMU, which requires an "iommus" property. Describe these requirements in the device-tree bindings documentation. Reviewed-by: Rob Herring Reviewed-by: Eric Auger Signed-off-by: Jean-Philippe Brucker --- .../devicetree/bindings/virtio/mmio.txt | 30 +++ 1 file changed, 30 insertions(+) diff --git a/Documentation/devicetree/bindings/virtio/mmio.txt b/Documentation/devicetree/bindings/virtio/mmio.txt index 5069c1b8e193..21af30fbb81f 100644 --- a/Documentation/devicetree/bindings/virtio/mmio.txt +++ b/Documentation/devicetree/bindings/virtio/mmio.txt @@ -8,10 +8,40 @@ Required properties: - reg: control registers base address and size including configuration space - interrupts: interrupt generated by the device +Required properties for virtio-iommu: + +- #iommu-cells:When the node corresponds to a virtio-iommu device, it is + linked to DMA masters using the "iommus" or "iommu-map" + properties [1][2]. #iommu-cells specifies the size of the + "iommus" property. For virtio-iommu #iommu-cells must be + 1, each cell describing a single endpoint ID. + +Optional properties: + +- iommus: If the device accesses memory through an IOMMU, it should + have an "iommus" property [1]. Since virtio-iommu itself + does not access memory through an IOMMU, the "virtio,mmio" + node cannot have both an "#iommu-cells" and an "iommus" + property. + Example: virtio_block@3000 { compatible = "virtio,mmio"; reg = <0x3000 0x100>; interrupts = <41>; + + /* Device has endpoint ID 23 */ + iommus = < 23> } + + viommu: iommu@3100 { + compatible = "virtio,mmio"; + reg = <0x3100 0x100>; + interrupts = <42>; + + #iommu-cells = <1> + } + +[1] Documentation/devicetree/bindings/iommu/iommu.txt +[2] Documentation/devicetree/bindings/pci/pci-iommu.txt -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH v8 0/7] Add virtio-iommu driver
Implement the virtio-iommu driver, following specification v0.12 [1]. Since last version [2] we've worked on improving the specification, which resulted in the following changes to the interface: * Remove the EXEC flag. * Add feature bit for the MMIO flag. * Change domain_bits to domain_range. Given that there were small changes to patch 5/7, I removed the review and test tags. Please find the code at [3]. [1] Virtio-iommu specification v0.12, sources and pdf git://linux-arm.org/virtio-iommu.git virtio-iommu/v0.12 http://jpbrucker.net/virtio-iommu/spec/v0.12/virtio-iommu-v0.12.pdf http://jpbrucker.net/virtio-iommu/spec/diffs/virtio-iommu-dev-diff-v0.11-v0.12.pdf [2] [PATCH v7 0/7] Add virtio-iommu driver https://lore.kernel.org/linux-pci/0ba215f5-e856-bf31-8dd9-a85710714...@arm.com/T/ [3] git://linux-arm.org/linux-jpb.git virtio-iommu/v0.12 git://linux-arm.org/kvmtool-jpb.git virtio-iommu/v0.12 Jean-Philippe Brucker (7): dt-bindings: virtio-mmio: Add IOMMU description dt-bindings: virtio: Add virtio-pci-iommu node of: Allow the iommu-map property to omit untranslated devices PCI: OF: Initialize dev->fwnode appropriately iommu: Add virtio-iommu driver iommu/virtio: Add probe request iommu/virtio: Add event queue .../devicetree/bindings/virtio/iommu.txt | 66 + .../devicetree/bindings/virtio/mmio.txt | 30 + MAINTAINERS |7 + drivers/iommu/Kconfig | 11 + drivers/iommu/Makefile|1 + drivers/iommu/virtio-iommu.c | 1176 + drivers/of/base.c | 10 +- drivers/pci/of.c |6 + include/uapi/linux/virtio_ids.h |1 + include/uapi/linux/virtio_iommu.h | 165 +++ 10 files changed, 1470 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/virtio/iommu.txt create mode 100644 drivers/iommu/virtio-iommu.c create mode 100644 include/uapi/linux/virtio_iommu.h -- 2.21.0 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()
On 2019/5/30 下午6:10, Stefano Garzarella wrote: On Thu, May 30, 2019 at 05:46:18PM +0800, Jason Wang wrote: On 2019/5/29 下午6:58, Stefano Garzarella wrote: On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote: On 2019/5/28 下午6:56, Stefano Garzarella wrote: We flush all pending works before to call vdev->config->reset(vdev), but other works can be queued before the vdev->config->del_vqs(vdev), so we add another flush after it, to avoid use after free. Suggested-by: Michael S. Tsirkin Signed-off-by: Stefano Garzarella --- net/vmw_vsock/virtio_transport.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e694df10ab61..ad093ce96693 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev) return ret; } +static void virtio_vsock_flush_works(struct virtio_vsock *vsock) +{ + flush_work(>loopback_work); + flush_work(>rx_work); + flush_work(>tx_work); + flush_work(>event_work); + flush_work(>send_pkt_work); +} + static void virtio_vsock_remove(struct virtio_device *vdev) { struct virtio_vsock *vsock = vdev->priv; @@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev) mutex_lock(_virtio_vsock_mutex); the_virtio_vsock = NULL; - flush_work(>loopback_work); - flush_work(>rx_work); - flush_work(>tx_work); - flush_work(>event_work); - flush_work(>send_pkt_work); - /* Reset all connected sockets when the device disappear */ vsock_for_each_connected_socket(virtio_vsock_reset_sock); @@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev) vsock->event_run = false; mutex_unlock(>event_lock); + /* Flush all pending works */ + virtio_vsock_flush_works(vsock); + /* Flush all device writes and interrupts, device will not use any * more buffers. */ @@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev) /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); + /* Other works can be queued before 'config->del_vqs()', so we flush +* all works before to free the vsock object to avoid use after free. +*/ + virtio_vsock_flush_works(vsock); Some questions after a quick glance: 1) It looks to me that the work could be queued from the path of vsock_transport_cancel_pkt() . Is that synchronized here? Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can queue work from the upper layer (socket). Setting the_virtio_vsock to NULL, should synchronize, but after a careful look a rare issue could happen: we are setting the_virtio_vsock to NULL at the start of .remove() and we are freeing the object pointed by it at the end of .remove(), so virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be running, accessing the object that we are freed. Yes, that's my point. Should I use something like RCU to prevent this issue? virtio_transport_send_pkt() and vsock_transport_cancel_pkt() { rcu_read_lock(); vsock = rcu_dereference(the_virtio_vsock_mutex); RCU is probably a way to go. (Like what vhost_transport_send_pkt() did). Okay, I'm going this way. ... rcu_read_unlock(); } virtio_vsock_remove() { rcu_assign_pointer(the_virtio_vsock_mutex, NULL); synchronize_rcu(); ... free(vsock); } Could there be a better approach? 2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still needed? It looks to me we've already done except that we need flush rx_work in the end since send_pkt_work can requeue rx_work. The main reason of tx_run/rx_run/event_run is to prevent that a worker function is running while we are calling config->reset(). E.g. if an interrupt comes between virtio_vsock_flush_works() and config->reset(), it can queue new works that can access the device while we are in config->reset(). IMHO they are still needed. What do you think? I mean could we simply do flush after reset once and without tx_rx/rx_run tricks? rest(); virtio_vsock_flush_work(); virtio_vsock_free_buf(); My only doubt is: is it safe to call config->reset() while a worker function could access the device? I had this doubt reading the Michael's advice[1] and looking at virtnet_remove() where there are these lines before the config->reset(): /* Make sure no work handler is accessing the device. */ flush_work(>config_work); Thanks, Stefano [1] https://lore.kernel.org/netdev/20190521055650-mutt-send-email-...@kernel.org Good point. Then I agree with you. But if we can use the RCU to detect the detach of
Re: [PATCH 1/4] vsock/virtio: fix locking around 'the_virtio_vsock'
On Wed, May 29, 2019 at 09:28:52PM -0700, David Miller wrote: > From: Stefano Garzarella > Date: Tue, 28 May 2019 12:56:20 +0200 > > > @@ -68,7 +68,13 @@ struct virtio_vsock { > > > > static struct virtio_vsock *virtio_vsock_get(void) > > { > > - return the_virtio_vsock; > > + struct virtio_vsock *vsock; > > + > > + mutex_lock(_virtio_vsock_mutex); > > + vsock = the_virtio_vsock; > > + mutex_unlock(_virtio_vsock_mutex); > > + > > + return vsock; > > This doesn't do anything as far as I can tell. > > No matter what, you will either get the value before it's changed or > after it's changed. > > Since you should never publish the pointer by assigning it until the > object is fully initialized, this can never be a problem even without > the mutex being there. > > Even if you sampled the the_virtio_sock value right before it's being > set to NULL by the remove function, that still can happen with the > mutex held too. Yes, I found out when I was answering Jason's question [1]. :( I proposed to use RCU to solve this issue, do you agree? Let me know if there is a better way. > > This function is also terribly named btw, it implies that a reference > count is being taken. But that's not what this function does, it > just returns the pointer value as-is. What do you think if I remove the function, using directly the_virtio_vsock? (should be easier to use with RCU API) Thanks, Stefano [1] https://lore.kernel.org/netdev/20190529105832.oz3sagbne5teq3nt@steredhat ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()
On Thu, May 30, 2019 at 05:46:18PM +0800, Jason Wang wrote: > > On 2019/5/29 下午6:58, Stefano Garzarella wrote: > > On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote: > > > On 2019/5/28 下午6:56, Stefano Garzarella wrote: > > > > We flush all pending works before to call vdev->config->reset(vdev), > > > > but other works can be queued before the vdev->config->del_vqs(vdev), > > > > so we add another flush after it, to avoid use after free. > > > > > > > > Suggested-by: Michael S. Tsirkin > > > > Signed-off-by: Stefano Garzarella > > > > --- > > > >net/vmw_vsock/virtio_transport.c | 23 +-- > > > >1 file changed, 17 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/net/vmw_vsock/virtio_transport.c > > > > b/net/vmw_vsock/virtio_transport.c > > > > index e694df10ab61..ad093ce96693 100644 > > > > --- a/net/vmw_vsock/virtio_transport.c > > > > +++ b/net/vmw_vsock/virtio_transport.c > > > > @@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device > > > > *vdev) > > > > return ret; > > > >} > > > > +static void virtio_vsock_flush_works(struct virtio_vsock *vsock) > > > > +{ > > > > + flush_work(>loopback_work); > > > > + flush_work(>rx_work); > > > > + flush_work(>tx_work); > > > > + flush_work(>event_work); > > > > + flush_work(>send_pkt_work); > > > > +} > > > > + > > > >static void virtio_vsock_remove(struct virtio_device *vdev) > > > >{ > > > > struct virtio_vsock *vsock = vdev->priv; > > > > @@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct > > > > virtio_device *vdev) > > > > mutex_lock(_virtio_vsock_mutex); > > > > the_virtio_vsock = NULL; > > > > - flush_work(>loopback_work); > > > > - flush_work(>rx_work); > > > > - flush_work(>tx_work); > > > > - flush_work(>event_work); > > > > - flush_work(>send_pkt_work); > > > > - > > > > /* Reset all connected sockets when the device disappear */ > > > > vsock_for_each_connected_socket(virtio_vsock_reset_sock); > > > > @@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct > > > > virtio_device *vdev) > > > > vsock->event_run = false; > > > > mutex_unlock(>event_lock); > > > > + /* Flush all pending works */ > > > > + virtio_vsock_flush_works(vsock); > > > > + > > > > /* Flush all device writes and interrupts, device will not use > > > > any > > > > * more buffers. > > > > */ > > > > @@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct > > > > virtio_device *vdev) > > > > /* Delete virtqueues and flush outstanding callbacks if any */ > > > > vdev->config->del_vqs(vdev); > > > > + /* Other works can be queued before 'config->del_vqs()', so we > > > > flush > > > > +* all works before to free the vsock object to avoid use after > > > > free. > > > > +*/ > > > > + virtio_vsock_flush_works(vsock); > > > > > > Some questions after a quick glance: > > > > > > 1) It looks to me that the work could be queued from the path of > > > vsock_transport_cancel_pkt() . Is that synchronized here? > > > > > Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can > > queue work from the upper layer (socket). > > > > Setting the_virtio_vsock to NULL, should synchronize, but after a careful > > look > > a rare issue could happen: > > we are setting the_virtio_vsock to NULL at the start of .remove() and we > > are freeing the object pointed by it at the end of .remove(), so > > virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be > > running, accessing the object that we are freed. > > > Yes, that's my point. > > > > > > Should I use something like RCU to prevent this issue? > > > > virtio_transport_send_pkt() and vsock_transport_cancel_pkt() > > { > > rcu_read_lock(); > > vsock = rcu_dereference(the_virtio_vsock_mutex); > > > RCU is probably a way to go. (Like what vhost_transport_send_pkt() did). > Okay, I'm going this way. > > > ... > > rcu_read_unlock(); > > } > > > > virtio_vsock_remove() > > { > > rcu_assign_pointer(the_virtio_vsock_mutex, NULL); > > synchronize_rcu(); > > > > ... > > > > free(vsock); > > } > > > > Could there be a better approach? > > > > > > > 2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still > > > needed? It looks to me we've already done except that we need flush > > > rx_work > > > in the end since send_pkt_work can requeue rx_work. > > The main reason of tx_run/rx_run/event_run is to prevent that a worker > > function is running while we are calling config->reset(). > > > > E.g. if an interrupt comes between virtio_vsock_flush_works() and > > config->reset(), it can queue new works that can access the device while > > we are in config->reset(). > > > > IMHO they are
Re: [PATCH 3/4] vsock/virtio: fix flush of works during the .remove()
On 2019/5/29 下午6:58, Stefano Garzarella wrote: On Wed, May 29, 2019 at 11:22:40AM +0800, Jason Wang wrote: On 2019/5/28 下午6:56, Stefano Garzarella wrote: We flush all pending works before to call vdev->config->reset(vdev), but other works can be queued before the vdev->config->del_vqs(vdev), so we add another flush after it, to avoid use after free. Suggested-by: Michael S. Tsirkin Signed-off-by: Stefano Garzarella --- net/vmw_vsock/virtio_transport.c | 23 +-- 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c index e694df10ab61..ad093ce96693 100644 --- a/net/vmw_vsock/virtio_transport.c +++ b/net/vmw_vsock/virtio_transport.c @@ -660,6 +660,15 @@ static int virtio_vsock_probe(struct virtio_device *vdev) return ret; } +static void virtio_vsock_flush_works(struct virtio_vsock *vsock) +{ + flush_work(>loopback_work); + flush_work(>rx_work); + flush_work(>tx_work); + flush_work(>event_work); + flush_work(>send_pkt_work); +} + static void virtio_vsock_remove(struct virtio_device *vdev) { struct virtio_vsock *vsock = vdev->priv; @@ -668,12 +677,6 @@ static void virtio_vsock_remove(struct virtio_device *vdev) mutex_lock(_virtio_vsock_mutex); the_virtio_vsock = NULL; - flush_work(>loopback_work); - flush_work(>rx_work); - flush_work(>tx_work); - flush_work(>event_work); - flush_work(>send_pkt_work); - /* Reset all connected sockets when the device disappear */ vsock_for_each_connected_socket(virtio_vsock_reset_sock); @@ -690,6 +693,9 @@ static void virtio_vsock_remove(struct virtio_device *vdev) vsock->event_run = false; mutex_unlock(>event_lock); + /* Flush all pending works */ + virtio_vsock_flush_works(vsock); + /* Flush all device writes and interrupts, device will not use any * more buffers. */ @@ -726,6 +732,11 @@ static void virtio_vsock_remove(struct virtio_device *vdev) /* Delete virtqueues and flush outstanding callbacks if any */ vdev->config->del_vqs(vdev); + /* Other works can be queued before 'config->del_vqs()', so we flush +* all works before to free the vsock object to avoid use after free. +*/ + virtio_vsock_flush_works(vsock); Some questions after a quick glance: 1) It looks to me that the work could be queued from the path of vsock_transport_cancel_pkt() . Is that synchronized here? Both virtio_transport_send_pkt() and vsock_transport_cancel_pkt() can queue work from the upper layer (socket). Setting the_virtio_vsock to NULL, should synchronize, but after a careful look a rare issue could happen: we are setting the_virtio_vsock to NULL at the start of .remove() and we are freeing the object pointed by it at the end of .remove(), so virtio_transport_send_pkt() or vsock_transport_cancel_pkt() may still be running, accessing the object that we are freed. Yes, that's my point. Should I use something like RCU to prevent this issue? virtio_transport_send_pkt() and vsock_transport_cancel_pkt() { rcu_read_lock(); vsock = rcu_dereference(the_virtio_vsock_mutex); RCU is probably a way to go. (Like what vhost_transport_send_pkt() did). ... rcu_read_unlock(); } virtio_vsock_remove() { rcu_assign_pointer(the_virtio_vsock_mutex, NULL); synchronize_rcu(); ... free(vsock); } Could there be a better approach? 2) If we decide to flush after dev_vqs(), is tx_run/rx_run/event_run still needed? It looks to me we've already done except that we need flush rx_work in the end since send_pkt_work can requeue rx_work. The main reason of tx_run/rx_run/event_run is to prevent that a worker function is running while we are calling config->reset(). E.g. if an interrupt comes between virtio_vsock_flush_works() and config->reset(), it can queue new works that can access the device while we are in config->reset(). IMHO they are still needed. What do you think? I mean could we simply do flush after reset once and without tx_rx/rx_run tricks? rest(); virtio_vsock_flush_work(); virtio_vsock_free_buf(); Thanks Thanks for your questions, Stefano ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH 22/22] docs: fix broken documentation links
On Wed, May 29, 2019 at 08:23:53PM -0300, Mauro Carvalho Chehab wrote: > Mostly due to x86 and acpi conversion, several documentation > links are still pointing to the old file. Fix them. > > Signed-off-by: Mauro Carvalho Chehab Didn't I ack this already? For the I2C part: Reviewed-by: Wolfram Sang signature.asc Description: PGP signature ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization