Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
在 2021/6/21 下午6:41, Yongji Xie 写道: On Mon, Jun 21, 2021 at 5:14 PM Jason Wang wrote: 在 2021/6/15 下午10:13, Xie Yongji 写道: This VDUSE driver enables implementing vDPA devices in userspace. The vDPA device's control path is handled in kernel and the data path is handled in userspace. A message mechnism is used by VDUSE driver to forward some control messages such as starting/stopping datapath to userspace. Userspace can use read()/write() to receive/reply those control messages. And some ioctls are introduced to help userspace to implement the data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD ioctls can be used to inject interrupt and setup the kickfd for virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the configuration space and inject a config interrupt. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1453 include/uapi/linux/vduse.h | 143 ++ 6 files changed, 1613 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 9bfc2b510c64..acd95e9dcfe7 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq#Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a503c1b2bfd9..6e23bce6433a 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK vDPA block device simulator which terminates IO request in a memory buffer. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA)+= virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index ..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index ..5271cbd15e28 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1453 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) +#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024) +#define VDUSE_IOVA_SIZE (128 * 1024 * 1024) +#define VDUSE_REQUEST_TIMEOUT 30 + +struct vduse_virtqueue { + u16 index; + u32 num; + u32 avail_idx; + u64 desc_addr; + u64 driver_addr; + u64 device_addr; + bool ready; + bool kicked; + spinlock_t
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson wrote: > > In the patch ("drivers: base: Add bits to struct device to control > iommu strictness") we add the ability for devices to tell us about > their IOMMU strictness requirements. Let's now take that into account > in the IOMMU layer. > > A few notes here: > * Presumably this is always how iommu_get_dma_strict() was intended to > behave. Had this not been the intention then it never would have > taken a domain as a parameter. > * The iommu_set_dma_strict() feels awfully non-symmetric now. That > function sets the _default_ strictness globally in the system > whereas iommu_get_dma_strict() returns the value for a given domain > (falling back to the default). Presumably, at least, the fact that > iommu_set_dma_strict() doesn't take a domain makes this obvious. > > The function iommu_get_dma_strict() should now make it super obvious > where strictness comes from and who overides who. Though the function > changed a bunch to make the logic clearer, the only two new rules > should be: > * Devices can force strictness for themselves, overriding the cmdline > "iommu.strict=0" or a call to iommu_set_dma_strict(false)). > * Devices can request non-strictness for themselves, assuming there > was no cmdline "iommu.strict=1" or a call to > iommu_set_dma_strict(true). > > Signed-off-by: Douglas Anderson > --- > > drivers/iommu/iommu.c | 56 +-- > include/linux/iommu.h | 2 ++ > 2 files changed, 45 insertions(+), 13 deletions(-) > > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > index 808ab70d5df5..0c84a4c06110 100644 > --- a/drivers/iommu/iommu.c > +++ b/drivers/iommu/iommu.c > @@ -28,8 +28,19 @@ > static struct kset *iommu_group_kset; > static DEFINE_IDA(iommu_group_ida); > > +enum iommu_strictness { > + IOMMU_DEFAULT_STRICTNESS = -1, > + IOMMU_NOT_STRICT = 0, > + IOMMU_STRICT = 1, > +}; > +static inline enum iommu_strictness bool_to_strictness(bool strictness) > +{ > + return (enum iommu_strictness)strictness; > +} > + > static unsigned int iommu_def_domain_type __read_mostly; > -static bool iommu_dma_strict __read_mostly = true; > +static enum iommu_strictness cmdline_dma_strict __read_mostly = > IOMMU_DEFAULT_STRICTNESS; > +static enum iommu_strictness driver_dma_strict __read_mostly = > IOMMU_DEFAULT_STRICTNESS; > static u32 iommu_cmd_line __read_mostly; > > struct iommu_group { > @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = { > }; > > #define IOMMU_CMD_LINE_DMA_API BIT(0) > -#define IOMMU_CMD_LINE_STRICT BIT(1) > > static int iommu_alloc_default_domain(struct iommu_group *group, > struct device *dev); > @@ -336,25 +346,38 @@ early_param("iommu.passthrough", > iommu_set_def_domain_type); > > static int __init iommu_dma_setup(char *str) > { > - int ret = kstrtobool(str, _dma_strict); > + bool strict; > + int ret = kstrtobool(str, ); > > if (!ret) > - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; > + cmdline_dma_strict = bool_to_strictness(strict); > return ret; > } > early_param("iommu.strict", iommu_dma_setup); > > void iommu_set_dma_strict(bool strict) > { > - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) > - iommu_dma_strict = strict; > + /* A driver can request strictness but not the other way around */ > + if (driver_dma_strict != IOMMU_STRICT) > + driver_dma_strict = bool_to_strictness(strict); > } > > bool iommu_get_dma_strict(struct iommu_domain *domain) > { > - /* only allow lazy flushing for DMA domains */ > - if (domain->type == IOMMU_DOMAIN_DMA) > - return iommu_dma_strict; > + /* Non-DMA domains or anyone forcing it to strict makes it strict */ > + if (domain->type != IOMMU_DOMAIN_DMA || > + cmdline_dma_strict == IOMMU_STRICT || > + driver_dma_strict == IOMMU_STRICT || > + domain->force_strict) > + return true; > + > + /* Anyone requesting non-strict (if no forces) makes it non-strict */ > + if (cmdline_dma_strict == IOMMU_NOT_STRICT || > + driver_dma_strict == IOMMU_NOT_STRICT || > + domain->request_non_strict) > + return false; > + > + /* Nobody said anything, so it's strict by default */ If iommu.strict is not set in the command line, upstream treats it as iommu.strict=1. Meaning, no drivers can override it. If I understand it correctly, with your series, if iommu.strict=1 is not set, drivers can override the "default strict mode" and ask for non-strict mode for their domain. So if this series gets in and future driver changes start asking for non-strict mode, systems that are expected to operate in fully strict mode will now have devices operating in non-strict mode. That's breaking backward
Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default
On 6/22/21 7:52 AM, Douglas Anderson wrote: @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev) static int iommu_group_alloc_default_domain(struct bus_type *bus, struct iommu_group *group, - unsigned int type) + unsigned int type, + struct device *dev) { struct iommu_domain *dom; @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus, if (!dom) return -ENOMEM; + /* Save the strictness requests from the device */ + if (dev && type == IOMMU_DOMAIN_DMA) { + dom->request_non_strict = dev->request_non_strict_iommu; + dom->force_strict = dev->force_strict_iommu; + } + An iommu default domain might be used by multiple devices which might have different "strict" attributions. Then who could override who? Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode
IOMMUs can be run in "strict" mode or in "non-strict" mode. The quick-summary difference between the two is that in "strict" mode we wait until everything is flushed out when we unmap DMA memory. In "non-strict" we don't. Using the IOMMU in "strict" mode is more secure/safer but slower because we have to sit and wait for flushes while we're unmapping. To explain a bit why "non-strict" mode is unsafe, let's imagine two examples. An example of "non-strict" being insecure when reading from a device: a) Linux driver maps memory for DMA. b) Linux driver starts DMA on the device. c) Device write to RAM subject to bounds checking done by IOMMU. d) Device finishes writing to RAM and signals transfer is finished. e) Linux driver starts unmapping DMA memory but doesn't flush. f) Linux driver validates that the data in memory looks sane and that accessing it won't cause the driver to, for instance, overflow a buffer. g) Device takes advantage of knowledge of how the Linux driver works and sneaks in a modification to the data after the validation but before the IOMMU unmap flush finishes. h) Device has now caused the Linux driver to access memory it shouldn't. An example of "non-strict" being insecure when writing to a device: a) Linux driver writes data intended for the device to RAM. b) Linux driver maps memory for DMA. c) Linux driver starts DMA on the device. d) Device reads from RAM subject to bounds checking done by IOMMU. e) Device finishes reading from RAM and signals transfer is finished. f) Linux driver starts unmapping DMA memory but doesn't flush. g) Linux driver frees memory and returns it to the pool. h) Memory is allocated for another purpose. i) Device takes advantage of the period of time before IOMMU flush to read memory that it shouldn't have had access to. As you can see from the above examples, using the iommu in "non-strict" mode might not sound _too_ scary (the window of badness is small and the exposed memory is small) but there is certainly risk. Let's evaluate the risk by breaking it down into two problems that IOMMUs are supposed to be protecting us against: Case 1: IOMMUs prevent malicious code running on the peripheral (maybe a malicious peripheral or maybe someone exploited a benign peripheral) from turning into an exploit of the Linux kernel. This is particularly important if the peripheral has loadable / updatable firmware or if the peripheral has some type of general purpose processor and is processing untrusted inputs. It's also important if the device is something that can be easily plugged into the host and the device has direct DMA access itself, like a PCIe device. Case 2: IOMMUs limit the severity of a class of software bugs in the kernel. If we misconfigure a peripheral by accident then instead of the peripheral clobbering random memory due to a bug we might get an IOMMU error. Now that we understand the issue and the risks, let's evaluate whether we really need "strict" mode for the Qualcomm SDHCI controllers. I will make the argument that we don't _need_ strict mode for them. Why? * The SDHCI controller on Qualcomm SoCs doesn't appear to have loadable / updatable firmware and, assuming it's got some firmware baked into it, I see no evidence that the firmware could be compromised. * Even though, for external SD cards in particular, the controller is dealing with "untrusted" inputs, it's dealing with them in a very controlled way. It seems unlikely that a rogue SD card would be able to present something to the SDHCI controller that would cause it to DMA to/from an address other than one the kernel told it about. * Although it would be nice to catch more software bugs, once the Linux driver has been debugged and stressed the value is not very high. If the IOMMU caught something like this the system would be in a pretty bad shape anyway (we don't really recover from IOMMU errors) and the only benefit would be a better spotlight on what went wrong. Now we have a good understanding of the benefits of "strict" mode for our SDHCI controllers, let's look at some performance numbers. I used "dd" to measure read speeds from eMMC on a sc7180-trogdor-lazor board. Basic test command (while booted from USB): echo 3 > /proc/sys/vm/drop_caches dd if=/dev/mmcblk1 of=/dev/null bs=4M count=512 I attempted to run my tests for enough iterations that results stabilized and weren't too noisy. Tests were run with patches picked to the chromeos-5.4 tree (sanity checked against v5.13-rc7). I also attempted to compare to other attempts to address IOMMU problems and/or attempts to bump the cpufreq up to solve this problem: - eMMC datasheet spec: 300 MB/s "Typical Sequential Performance" NOTE: we're driving the bus at 192 MHz instead of 200 Mhz so we might not be able to achieve the full 300 MB/s. - Baseline: 210.9 MB/s - Baseline + peg cpufreq to max: 284.3 MB/s - This patch: 279.6 MB/s - This patch + peg cpufreq to max: 288.1 MB/s - Joel's IO
[PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
We now have a way for PCIe devices to force iommu.strict through the "struct device" and that's now hooked up. Let's remove the special case for PCIe devices. NOTE: there are still other places in this file that make decisions based on the PCIe "untrusted" status. This patch only handles removing the one related to iommu.strict. Removing the other cases is left as an exercise to the reader. Signed-off-by: Douglas Anderson --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7bcdd1205535..e50c06ce1a6b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -368,7 +368,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, init_iova_domain(iovad, 1UL << order, base_pfn); - if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && + if (!cookie->fq_domain && domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) { if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, iommu_dma_entry_dtor)) -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/6] iommu: Combine device strictness requests with the global default
In the patch ("drivers: base: Add bits to struct device to control iommu strictness") we add the ability for devices to tell us about their IOMMU strictness requirements. Let's now take that into account in the IOMMU layer. A few notes here: * Presumably this is always how iommu_get_dma_strict() was intended to behave. Had this not been the intention then it never would have taken a domain as a parameter. * The iommu_set_dma_strict() feels awfully non-symmetric now. That function sets the _default_ strictness globally in the system whereas iommu_get_dma_strict() returns the value for a given domain (falling back to the default). Presumably, at least, the fact that iommu_set_dma_strict() doesn't take a domain makes this obvious. The function iommu_get_dma_strict() should now make it super obvious where strictness comes from and who overides who. Though the function changed a bunch to make the logic clearer, the only two new rules should be: * Devices can force strictness for themselves, overriding the cmdline "iommu.strict=0" or a call to iommu_set_dma_strict(false)). * Devices can request non-strictness for themselves, assuming there was no cmdline "iommu.strict=1" or a call to iommu_set_dma_strict(true). Signed-off-by: Douglas Anderson --- drivers/iommu/iommu.c | 56 +-- include/linux/iommu.h | 2 ++ 2 files changed, 45 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..0c84a4c06110 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -28,8 +28,19 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); +enum iommu_strictness { + IOMMU_DEFAULT_STRICTNESS = -1, + IOMMU_NOT_STRICT = 0, + IOMMU_STRICT = 1, +}; +static inline enum iommu_strictness bool_to_strictness(bool strictness) +{ + return (enum iommu_strictness)strictness; +} + static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; static u32 iommu_cmd_line __read_mostly; struct iommu_group { @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = { }; #define IOMMU_CMD_LINE_DMA_API BIT(0) -#define IOMMU_CMD_LINE_STRICT BIT(1) static int iommu_alloc_default_domain(struct iommu_group *group, struct device *dev); @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type); static int __init iommu_dma_setup(char *str) { - int ret = kstrtobool(str, _dma_strict); + bool strict; + int ret = kstrtobool(str, ); if (!ret) - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; + cmdline_dma_strict = bool_to_strictness(strict); return ret; } early_param("iommu.strict", iommu_dma_setup); void iommu_set_dma_strict(bool strict) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + /* A driver can request strictness but not the other way around */ + if (driver_dma_strict != IOMMU_STRICT) + driver_dma_strict = bool_to_strictness(strict); } bool iommu_get_dma_strict(struct iommu_domain *domain) { - /* only allow lazy flushing for DMA domains */ - if (domain->type == IOMMU_DOMAIN_DMA) - return iommu_dma_strict; + /* Non-DMA domains or anyone forcing it to strict makes it strict */ + if (domain->type != IOMMU_DOMAIN_DMA || + cmdline_dma_strict == IOMMU_STRICT || + driver_dma_strict == IOMMU_STRICT || + domain->force_strict) + return true; + + /* Anyone requesting non-strict (if no forces) makes it non-strict */ + if (cmdline_dma_strict == IOMMU_NOT_STRICT || + driver_dma_strict == IOMMU_NOT_STRICT || + domain->request_non_strict) + return false; + + /* Nobody said anything, so it's strict by default */ return true; } EXPORT_SYMBOL_GPL(iommu_get_dma_strict); @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev) static int iommu_group_alloc_default_domain(struct bus_type *bus, struct iommu_group *group, - unsigned int type) + unsigned int type, + struct device *dev) { struct iommu_domain *dom; @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus, if (!dom) return -ENOMEM; + /* Save the strictness requests from the device */ + if (dev && type == IOMMU_DOMAIN_DMA) { +
[PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices
At the moment the generic IOMMU framework reaches into the PCIe device to check the "untrusted" state and uses this information to figure out if it should be running the IOMMU in strict or non-strict mode. Let's instead set the new boolean in "struct device" to indicate when we want forced strictness. NOTE: we still continue to set the "untrusted" bit in PCIe since that apparently is used for more than just IOMMU strictness. It probably makes sense for a later patchset to clarify all of the other needs we have for "untrusted" PCIe devices (perhaps add more booleans into the "struct device") so we can fully eliminate the need for the IOMMU framework to reach into a PCIe device. Signed-off-by: Douglas Anderson --- drivers/pci/probe.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 275204646c68..8d81f0fb3e50 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1572,8 +1572,10 @@ static void set_pcie_untrusted(struct pci_dev *dev) * untrusted as well. */ parent = pci_upstream_bridge(dev); - if (parent && (parent->untrusted || parent->external_facing)) + if (parent && (parent->untrusted || parent->external_facing)) { dev->untrusted = true; + dev->dev.force_strict_iommu = true; + } } /** -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness
How to control the "strictness" of an IOMMU is a bit of a mess right now. As far as I can tell, right now: * You can set the default to "non-strict" and some devices (right now, only PCI devices) can request to run in "strict" mode. * You can set the default to "strict" and no devices in the system are allowed to run as "non-strict". I believe this needs to be improved a bit. Specifically: * We should be able to default to "strict" mode but let devices that claim to be fairly low risk request that they be run in "non-strict" mode. * We should allow devices outside of PCIe to request "strict" mode if the system default is "non-strict". I believe the correct way to do this is two bits in "struct device". One allows a device to force things to "strict" mode and the other allows a device to _request_ "non-strict" mode. The asymmetry here is on purpose. Generally if anything in the system makes a request for strictness of something then we want it strict. Thus drivers can only request (but not force) non-strictness. It's expected that the strictness fields can be filled in by the bus code like in the patch ("PCI: Indicate that we want to force strict DMA for untrusted devices") or by using the new pre_probe concept introduced in the patch ("drivers: base: Add the concept of "pre_probe" to drivers"). Signed-off-by: Douglas Anderson --- include/linux/device.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index f1a00040fa53..c1b985e10c47 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -449,6 +449,15 @@ struct dev_links_info { * and optionall (if the coherent mask is large enough) also * for dma allocations. This flag is managed by the dma ops * instance from ->dma_supported. + * @force_strict_iommu: If set to %true then we should force this device to + * iommu.strict regardless of the other defaults in the + * system. Only has an effect if an IOMMU is in place. + * @request_non_strict_iommu: If set to %true and there are no other known + * reasons to make the iommu.strict for this device, + * then default to non-strict mode. This implies + * some belief that the DMA master for this device + * won't abuse the DMA path to compromise the kernel. + * Only has an effect if an IOMMU is in place. * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -557,6 +566,8 @@ struct device { #ifdef CONFIG_DMA_OPS_BYPASS booldma_ops_bypass : 1; #endif + boolforce_strict_iommu:1; + boolrequest_non_strict_iommu:1; }; /** -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers
Right now things are a bit awkward if a driver would like a chance to run before some of the more "automatic" things (pinctrl, DMA, IOMMUs, ...) happen to a device. This patch aims to fix that problem by introducing the concept of a "pre_probe" function that drivers can implement to run before the "automatic" stuff. Why would you want to run before the "automatic" stuff? The incentive in my case is that I want to be able to fill in some boolean flags in the "struct device" before the IOMMU init runs. It appears that the strictness vs. non-strictness of a device's iommu config is determined once at init time and can't be changed afterwards. However, I would like to avoid hardcoding the rules for strictness in the IOMMU driver. Instead I'd like to let individual drivers be able to make informed decisions about the appropriateness of strictness vs. non-strictness. The desire for running code pre_probe is likely not limited to my use case. I believe that the list "qcom_smmu_client_of_match" is hacked into the iommu driver specifically because there was no real good framework for this. For the existing list it wasn't _quite_ as ugly as my needs since the decision could be made solely on compatible string, but it still feels like it would have been better for individual drivers to run code and setup some state rather than coding up a big list in the IOMMU driver. Even without this patch, I believe it is possible for a driver to run before the "automatic" things by registering for "BUS_NOTIFY_BIND_DRIVER" in its init call, though I haven't personally tested this. Using the notifier is a bit awkward, though, and I'd rather avoid it. Also, using "BUS_NOTIFY_BIND_DRIVER" would require drivers to stop using the convenience module_platform_driver() helper and roll a bunch of boilerplate code. NOTE: the pre_probe here is listed in the driver structure. As a side effect of this it will be passed a "struct device *" rather than the more specific device type (like the "struct platform_device *" that most platform devices get passed to their probe). Presumably this won't cause trouble and it's a lot less code to write but if we need to make it more symmetric that's also possible by touching more files. Signed-off-by: Douglas Anderson --- drivers/base/dd.c | 10 -- include/linux/device/driver.h | 9 + 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index ecd7cf848daf..9a13bff8dafa 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -549,10 +549,16 @@ static int really_probe(struct device *dev, struct device_driver *drv) re_probe: dev->driver = drv; + if (drv->pre_probe) { + ret = drv->pre_probe(dev); + if (ret) + goto probe_failed_pre_dma; + } + /* If using pinctrl, bind pins now before probing */ ret = pinctrl_bind_pins(dev); if (ret) - goto pinctrl_bind_failed; + goto probe_failed_pre_dma; if (dev->bus->dma_configure) { ret = dev->bus->dma_configure(dev); @@ -639,7 +645,7 @@ static int really_probe(struct device *dev, struct device_driver *drv) if (dev->bus) blocking_notifier_call_chain(>bus->p->bus_notifier, BUS_NOTIFY_DRIVER_NOT_BOUND, dev); -pinctrl_bind_failed: +probe_failed_pre_dma: device_links_no_driver(dev); devres_release_all(dev); arch_teardown_dma_ops(dev); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index a498ebcf4993..f7305dd6ceb1 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -57,6 +57,14 @@ enum probe_type { * @probe_type:Type of the probe (synchronous or asynchronous) to use. * @of_match_table: The open firmware table. * @acpi_match_table: The ACPI match table. + * @pre_probe: Called after a device has been bound to a driver but before + * anything "automatic" (pinctrl, DMA, IOMMUs, ...) has been + * setup. This is mostly a chance for the driver to do things + * that might need to be run before any of those automatic + * processes. The vast majority of devices don't need to + * implement this. Note that there is no "post_remove" at the + * moment. If you need to undo something that you did in + * pre_probe() you can use devres. * @probe: Called to query the existence of a specific device, * whether this driver can work with it, and bind the driver * to a specific device. @@ -105,6 +113,7 @@ struct device_driver { const struct of_device_id *of_match_table; const struct acpi_device_id *acpi_match_table; + int (*pre_probe) (struct device *dev); int (*probe) (struct device *dev); void (*sync_state)(struct device *dev);
[PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
This patch attempts to put forward a proposal for enabling non-strict DMA on a device-by-device basis. The patch series requests non-strict DMA for the Qualcomm SDHCI controller as a first device to enable, getting a nice bump in performance with what's believed to be a very small drop in security / safety (see the patch for the full argument). As part of this patch series I am end up slightly cleaning up some of the interactions between the PCI subsystem and the IOMMU subsystem but I don't go all the way to fully remove all the tentacles. Specifically this patch series only concerns itself with a single aspect: strict vs. non-strict mode for the IOMMU. I'm hoping that this will be easier to talk about / reason about for more subsystems compared to overall deciding what it means for a device to be "external" or "untrusted". If something like this patch series ends up being landable, it will undoubtedly need coordination between many maintainers to land. I believe it's fully bisectable but later patches in the series definitely depend on earlier ones. Sorry for the long CC list. :( Douglas Anderson (6): drivers: base: Add the concept of "pre_probe" to drivers drivers: base: Add bits to struct device to control iommu strictness PCI: Indicate that we want to force strict DMA for untrusted devices iommu: Combine device strictness requests with the global default iommu: Stop reaching into PCIe devices to decide strict vs. non-strict mmc: sdhci-msm: Request non-strict IOMMU mode drivers/base/dd.c | 10 +-- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iommu.c | 56 +++ drivers/mmc/host/sdhci-msm.c | 8 + drivers/pci/probe.c | 4 ++- include/linux/device.h| 11 +++ include/linux/device/driver.h | 9 ++ include/linux/iommu.h | 2 ++ 8 files changed, 85 insertions(+), 17 deletions(-) -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson wrote: > > On 6/18/21 2:32 PM, Robin Murphy wrote: > > On 2021-06-18 17:12, Ross Philipson wrote: > >> The IOMMU should always be set to default translated type after > >> the PMRs are disabled to protect the MLE from DMA. > >> > >> Signed-off-by: Ross Philipson > >> --- > >> drivers/iommu/intel/iommu.c | 5 + > >> drivers/iommu/iommu.c | 6 +- > >> 2 files changed, 10 insertions(+), 1 deletion(-) > >> > >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c > >> index be35284..4f0256d 100644 > >> --- a/drivers/iommu/intel/iommu.c > >> +++ b/drivers/iommu/intel/iommu.c > >> @@ -41,6 +41,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> #include > >> #include > >> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device > >> *dev) > >>*/ > >> static int device_def_domain_type(struct device *dev) > >> { > >> +/* Do not allow identity domain when Secure Launch is configured */ > >> +if (slaunch_get_flags() & SL_FLAG_ACTIVE) > >> +return IOMMU_DOMAIN_DMA; > > > > Is this specific to Intel? It seems like it could easily be done > > commonly like the check for untrusted external devices. > > It is currently Intel only but that will change. I will look into what > you suggest. > > > > >> + > >> if (dev_is_pci(dev)) { > >> struct pci_dev *pdev = to_pci_dev(dev); > >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c > >> index 808ab70d..d49b7dd 100644 > >> --- a/drivers/iommu/iommu.c > >> +++ b/drivers/iommu/iommu.c > >> @@ -23,6 +23,7 @@ > >> #include > >> #include > >> #include > >> +#include > >> #include > >> static struct kset *iommu_group_kset; > >> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) > >> { > >> if (cmd_line) > >> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; > >> -iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; > >> + > >> +/* Do not allow identity domain when Secure Launch is configured */ > >> +if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) > >> +iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; > > > > Quietly ignoring the setting and possibly leaving iommu_def_domain_type > > uninitialised (note that 0 is not actually a usable type) doesn't seem > > great. AFAICS this probably warrants similar treatment to the > > Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event > though passthrough was requested. Or perhaps something more is needed here? > > > mem_encrypt_active() case - there doesn't seem a great deal of value in > > trying to save users from themselves if they care about measured boot > > yet explicitly pass options which may compromise measured boot. If you > > really want to go down that route there's at least the sysfs interface > > you'd need to nobble as well, not to mention the various ways of > > completely disabling IOMMUs... > > Doing a secure launch with the kernel is not a general purpose user use > case. A lot of work is done to secure the environment. Allowing > passthrough mode would leave the secure launch kernel exposed to DMA. I > think what we are trying to do here is what we intend though there may > be a better way or perhaps it is incomplete as you suggest. > I don't really like all these special cases. Generically, what you're trying to do is (AFAICT) to get the kernel to run in a mode in which it does its best not to trust attached devices. Nothing about this is specific to Secure Launch. There are plenty of scenarios in which this the case: - Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen device domains (did I get the name right), whatever tricks QEMU has, etc. - SRTM / DRTM technologies (including but not limited to Secure Launch -- plain old Secure Boot can work like this too). - Secure guest technologies, including but not limited to TDX and SEV. - Any computer with a USB-C port or other external DMA-capable port. - Regular computers in which the admin wants to enable this mode for whatever reason. Can you folks all please agree on a coordinated way for a Linux kernel to configure itself appropriately? Or to be configured via initramfs, boot option, or some other trusted source of configuration supplied at boot time? We don't need a whole bunch of if (TDX), if (SEV), if (secure launch), if (I have a USB-C port with PCIe exposed), if (running on Xen), and similar checks all over the place. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
On Mon, Jun 21, 2021 at 10:59:20AM -0700, Stefano Stabellini wrote: > Just as a clarification: I was referring to the zeroing of "mem" in > swiotlb_late_init_with_tbl and swiotlb_init_with_tbl. While it looks > like Tom and Christoph are talking about the zeroing of "tlb". Indeed. > > The zeroing of "mem" is required as some fields of struct io_tlb_mem > need to be initialized to zero. While the zeroing of "tlb" is not > required except from the point of view of not leaking sensitive data as > Christoph mentioned. > > Either way, Claire's v14 patch 01/12 looks fine to me. Agreed. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions
On Fri, 18 Jun 2021, Christoph Hellwig wrote: > On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote: > > > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem > > > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so > > > swiotlb_init_with_tbl is also good. > > > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think > > > it's clearer and safer. > > > > On x86, if the memset is done before set_memory_decrypted() and memory > > encryption is active, then the memory will look like ciphertext afterwards > > and not be zeroes. If zeroed memory is required, then a memset must be > > done after the set_memory_decrypted() calls. > > Which should be fine - we don't care that the memory is cleared to 0, > just that it doesn't leak other data. Maybe a comment would be useful, > though, Just as a clarification: I was referring to the zeroing of "mem" in swiotlb_late_init_with_tbl and swiotlb_init_with_tbl. While it looks like Tom and Christoph are talking about the zeroing of "tlb". The zeroing of "mem" is required as some fields of struct io_tlb_mem need to be initialized to zero. While the zeroing of "tlb" is not required except from the point of view of not leaking sensitive data as Christoph mentioned. Either way, Claire's v14 patch 01/12 looks fine to me. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch
On 6/18/21 2:32 PM, Robin Murphy wrote: > On 2021-06-18 17:12, Ross Philipson wrote: >> The IOMMU should always be set to default translated type after >> the PMRs are disabled to protect the MLE from DMA. >> >> Signed-off-by: Ross Philipson >> --- >> drivers/iommu/intel/iommu.c | 5 + >> drivers/iommu/iommu.c | 6 +- >> 2 files changed, 10 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index be35284..4f0256d 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -41,6 +41,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device >> *dev) >> */ >> static int device_def_domain_type(struct device *dev) >> { >> + /* Do not allow identity domain when Secure Launch is configured */ >> + if (slaunch_get_flags() & SL_FLAG_ACTIVE) >> + return IOMMU_DOMAIN_DMA; > > Is this specific to Intel? It seems like it could easily be done > commonly like the check for untrusted external devices. It is currently Intel only but that will change. I will look into what you suggest. > >> + >> if (dev_is_pci(dev)) { >> struct pci_dev *pdev = to_pci_dev(dev); >> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c >> index 808ab70d..d49b7dd 100644 >> --- a/drivers/iommu/iommu.c >> +++ b/drivers/iommu/iommu.c >> @@ -23,6 +23,7 @@ >> #include >> #include >> #include >> +#include >> #include >> static struct kset *iommu_group_kset; >> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line) >> { >> if (cmd_line) >> iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API; >> - iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; >> + >> + /* Do not allow identity domain when Secure Launch is configured */ >> + if (!(slaunch_get_flags() & SL_FLAG_ACTIVE)) >> + iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY; > > Quietly ignoring the setting and possibly leaving iommu_def_domain_type > uninitialised (note that 0 is not actually a usable type) doesn't seem > great. AFAICS this probably warrants similar treatment to the Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event though passthrough was requested. Or perhaps something more is needed here? > mem_encrypt_active() case - there doesn't seem a great deal of value in > trying to save users from themselves if they care about measured boot > yet explicitly pass options which may compromise measured boot. If you > really want to go down that route there's at least the sysfs interface > you'd need to nobble as well, not to mention the various ways of > completely disabling IOMMUs... Doing a secure launch with the kernel is not a general purpose user use case. A lot of work is done to secure the environment. Allowing passthrough mode would leave the secure launch kernel exposed to DMA. I think what we are trying to do here is what we intend though there may be a better way or perhaps it is incomplete as you suggest. > > It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on > !SECURE_LAUNCH for clarity though. This came from a specific request to not make disabling IOMMU modes build time dependent. This is because a secure launch enabled kernel can also be booted as a general purpose kernel in cases where this is desired. Thank you, Ross > > Robin. > >> } >> void iommu_set_default_translated(bool cmd_line) >> ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()
Members of struct "llq" will be zero-inited, apart from member max_n_shift. But we write llq.val straight after the init, so it was pointless to zero init those other members. As such, separately init member max_n_shift only. In addition, struct "head" is initialised to "llq" only so that member max_n_shift is set. But that member is never referenced for "head", so remove any init there. Removing these initializations is seen as a small performance optimisation, as this code is (very) hot path. Signed-off-by: John Garry diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c index 54b2f27b81d4..8a8ad49bb7fd 100644 --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, unsigned long flags; bool owner; struct arm_smmu_cmdq *cmdq = >cmdq; - struct arm_smmu_ll_queue llq = { - .max_n_shift = cmdq->q.llq.max_n_shift, - }, head = llq; + struct arm_smmu_ll_queue llq, head; int ret = 0; + llq.max_n_shift = cmdq->q.llq.max_n_shift; + /* 1. Allocate some space in the queue */ local_irq_save(flags); llq.val = READ_ONCE(cmdq->q.llq.val); -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list
On 2021-06-21 06:47, Sai Prakash Ranjan wrote: Hi, On 2021-06-19 03:39, Doug Anderson wrote: Hi, On Thu, Jun 17, 2021 at 7:51 PM Sai Prakash Ranjan wrote: Currently for iommu_unmap() of large scatter-gather list with page size elements, the majority of time is spent in flushing of partial walks in __arm_lpae_unmap() which is a VA based TLB invalidation invalidating page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support range based invalidations like on arm-smmu-v3.2. For example: to unmap a 32MB scatter-gather list with page size elements (8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB for 4K granule) and each of 2MB will further result in 512 TLBIVAs (2MB/4K) resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge overhead. So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate the entire context for partial walk flush on select few platforms where cost of over-invalidation is less than unmap latency It would probably be worth punching this description up a little bit. Elsewhere you said in more detail why this over-invalidation is less of a big deal for the Qualcomm SMMU. It's probably worth saying something like that here, too. Like this bit paraphrased from your other email: On qcom impl, we have several performance improvements for TLB cache invalidations in HW like wait-for-safe (for realtime clients such as camera and display) and few others to allow for cache lookups/updates when TLBI is in progress for the same context bank. Sure will add this info as well in the next version. using the newly introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for non-strict mode given its all about over-invalidation saving time on individual unmaps and non-deterministic generally. As per usual I'm mostly clueless, but I don't quite understand why you want this new behavior for non-strict mode. To me it almost seems like the opposite? Specifically, non-strict mode is already outside the critical path today and so there's no need to optimize it. I'm probably not explaining myself clearly, but I guess i'm thinking: a) today for strict, unmap is in the critical path and it's important to get it out of there. Getting it out of the critical path is so important that we're willing to over-invalidate to speed up the critical path. b) today for non-strict, unmap is not in the critical path. So I would almost expect your patch to _disable_ your new feature for non-strict mappings, not auto-enable your new feature for non-strict mappings. If I'm babbling, feel free to ignore. ;-) Looking back, I guess Robin was the one that suggested the behavior you're implementing, so it's more likely he's right than I am. ;-) Thanks for taking a look. Non-strict mode is only for leaf entries and dma domains and this optimization is for non-leaf entries and is applicable for both, see __arm_lpae_unmap(). In other words, if you have iommu.strict=0 (non-strict mode) and try unmapping a large sg buffer as the problem described in the commit text, you would still go via this path in unmap and see the delay without this patch. So what Robin suggested is that, let's do this unconditionally for all users with non-strict mode as opposed to only restricting it to implementation specific in case of strict mode. Right, unmapping tables works out as a bit of a compromise for non-strict mode - we don't use a freelist to defer the freeing of pagetable pages, so we rely on making non-leaf invalidations synchronously to knock out walk caches which may be pointing to the page beforte we free it. We might actually be able to get away without that for non-strict unmaps, since partial walks pointing at freed memory probably aren't too much more hazardous than the equivalent leaf TLB entries while the IOVA region is held in the flush queue, but it certainly does matter for maps when we're knocking out a (presumably empty) table entry to put down a new block whose IOVA will be immediately live. Robin. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string
On Mon, Jun 21, 2021 at 04:54:18PM +0100, Will Deacon wrote: > On Mon, Jun 21, 2021 at 04:11:55PM +0200, Thierry Reding wrote: > > On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote: > > > On 18/06/2021 21:47, Rob Herring wrote: > > > > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding > > > > wrote: > > > >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > > >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > > >> index 9d27aa5111d4..1181b590db71 100644 > > > >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > > >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > > >> @@ -54,8 +54,14 @@ properties: > > > >>- const: arm,mmu-500 > > > >>- description: NVIDIA SoCs that program two ARM MMU-500s > > > >> identically > > > >> items: > > > >> + - description: NVIDIA SoCs that require memory controller > > > >> interaction > > > > > > > > This is not valid jsonschema: > > > > > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one > > > > must be fixed: > > > > None is not of type 'object', 'boolean' > > > > None is not of type 'array' > > > > from schema $id: http://json-schema.org/draft-07/schema# > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > > > must be fixed: > > > > None is not of type 'object' > > > > None is not of type 'array' > > > > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml# > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > > > must be fixed: > > > > None is not of type 'object' > > > > None is not of type 'array' > > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one > > > > must be fixed: > > > > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const': > > > > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of > > > > type 'object' > > > > {'const': 'nvidia,tegra194-smmu'} is not of type 'string' > > > > {'const': 'nvidia,tegra186-smmu'} is not of type 'string' > > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > > > > > > > > > > This was not reviewed nor tested since the DT list was not Cc'ed. > > > > > > Ugh, I see now weird empty item on a list... and not only DT list was > > > skipped - Thierry did not Cc you either. > > > > This seemed like a too trivial addition to waste Rob's time on, so I > > didn't add him (or the DT list for that matter) on Cc. The ARM SMMU > > maintainers had reviewed this, which seemed like it was enough for what > > the DT bindings change was doing. > > Hmm, I didn't review it. I find the yaml stuff unreadable so I usually > wait for the DT folks to ack bindings changes before I queue them in the > SMMU tree. Alright... in the future I'll make sure to always Cc DT folks, even for trivial stuff like this. Thierry signature.asc Description: PGP signature ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string
On Mon, Jun 21, 2021 at 04:11:55PM +0200, Thierry Reding wrote: > On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote: > > On 18/06/2021 21:47, Rob Herring wrote: > > > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding > > > wrote: > > >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > >> index 9d27aa5111d4..1181b590db71 100644 > > >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > > >> @@ -54,8 +54,14 @@ properties: > > >>- const: arm,mmu-500 > > >>- description: NVIDIA SoCs that program two ARM MMU-500s > > >> identically > > >> items: > > >> + - description: NVIDIA SoCs that require memory controller > > >> interaction > > > > > > This is not valid jsonschema: > > > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one > > > must be fixed: > > > None is not of type 'object', 'boolean' > > > None is not of type 'array' > > > from schema $id: http://json-schema.org/draft-07/schema# > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > > must be fixed: > > > None is not of type 'object' > > > None is not of type 'array' > > > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml# > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > > must be fixed: > > > None is not of type 'object' > > > None is not of type 'array' > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one > > > must be fixed: > > > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const': > > > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of > > > type 'object' > > > {'const': 'nvidia,tegra194-smmu'} is not of type 'string' > > > {'const': 'nvidia,tegra186-smmu'} is not of type 'string' > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > > > > > > > This was not reviewed nor tested since the DT list was not Cc'ed. > > > > Ugh, I see now weird empty item on a list... and not only DT list was > > skipped - Thierry did not Cc you either. > > This seemed like a too trivial addition to waste Rob's time on, so I > didn't add him (or the DT list for that matter) on Cc. The ARM SMMU > maintainers had reviewed this, which seemed like it was enough for what > the DT bindings change was doing. Hmm, I didn't review it. I find the yaml stuff unreadable so I usually wait for the DT folks to ack bindings changes before I queue them in the SMMU tree. Will ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
On 2021-06-18 03:51, Sai Prakash Ranjan wrote: Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context with tlb_flush_all() callback in partial walk flush to improve unmap performance on select few platforms where the cost of over-invalidation is less than the unmap latency. I still think this doesn't belong anywhere near io-pgtable at all. It's a driver-internal decision how exactly it implements a non-leaf invalidation, and that may be more complex than a predetermined boolean decision. For example, I've just realised for SMMUv3 we can't invalidate multiple levels of table at once with a range command, since if we assume the whole thing is mapped at worst-case page granularity we may fail to invalidate any parts which are mapped as intermediate-level blocks. If invalidating a 1GB region (with 4KB granule) means having to fall back to 256K non-range commands, we may not want to invalidate by VA then, even though doing so for a 2MB region is still optimal. It's also quite feasible that drivers might want to do this for leaf invalidations too - if you don't like issuing 512 commands to invalidate 2MB, do you like issuing 511 commands to invalidate 2044KB? - and at that point the logic really has to be in the driver anyway. Robin. Signed-off-by: Sai Prakash Ranjan --- drivers/iommu/io-pgtable-arm.c | 3 ++- include/linux/io-pgtable.h | 5 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c index 87def58e79b5..5d362f2214bd 100644 --- a/drivers/iommu/io-pgtable-arm.c +++ b/drivers/iommu/io-pgtable-arm.c @@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie) if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS | IO_PGTABLE_QUIRK_NON_STRICT | IO_PGTABLE_QUIRK_ARM_TTBR1 | - IO_PGTABLE_QUIRK_ARM_OUTER_WBWA)) + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA | + IO_PGTABLE_QUIRK_TLB_INV_ALL)) return NULL; data = arm_lpae_alloc_pgtable(cfg); diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h index 4d40dfa75b55..45441592a0e6 100644 --- a/include/linux/io-pgtable.h +++ b/include/linux/io-pgtable.h @@ -82,6 +82,10 @@ struct io_pgtable_cfg { * * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability * attributes set in the TCR for a non-coherent page-table walker. +* +* IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate +* entire context for partial walk flush to increase unmap +* performance on select few platforms. */ #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) @@ -89,6 +93,7 @@ struct io_pgtable_cfg { #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4) #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) + #define IO_PGTABLE_QUIRK_TLB_INV_ALLBIT(7) unsigned long quirks; unsigned long pgsize_bitmap; unsigned intias; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
Hi Robin, On 2021/6/21 19:59, Robin Murphy wrote: On 2021-06-21 11:34, John Garry wrote: On 21/06/2021 11:00, Lu Baolu wrote: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Yes. We need to distinguish the "must" and "nice-to-have" cases of setting strict mode. Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. This patch series looks good to me and I have added by reviewed-by. Probably we could make another patch series to improve it so that the kernel optimization should not override the user setting. On a personal level I would be happy with that approach, but I think it's better to not start changing things right away in a follow-up series. So how about we add this patch (which replaces 6/6 "iommu: Remove mode argument from iommu_set_dma_strict()")? Robin, any opinion? For me it boils down to whether there are any realistic workloads where non-strict mode *would* still perform better under virtualisation. The At present, we see that strict mode has better performance in the virtualization environment because it will make the shadow page table management more efficient. When the hardware supports nested translation, we may have to re-evaluate this since there's no need for a shadowing page table anymore. only reason for the user to explicitly pass "iommu.strict=0" is because they expect it to increase unmap performance; if it's only ever going to lead to an unexpected performance loss, I don't see any value in overriding the kernel's decision purely for the sake of subservience. If there *are* certain valid cases for allowing it for people who really know what they're doing, then we should arguably also log a counterpart message to say "we're honouring your override but beware it may have the opposite effect to what you expect" for the benefit of other users who assume it's a generic go-faster knob. At that point it starts getting non-trivial enough that I'd want to know for sure it's worthwhile. The other reason this might be better to revisit later is that an AMD equivalent is still in flight[1], and there might be more that can eventually be factored out. I think both series are pretty much good to merge for 5.14, but time's already tight to sort out the conflicts which exist as-is, without making them any worse. Agreed. We could revisit it later. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string
On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote: > On 18/06/2021 21:47, Rob Herring wrote: > > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding > > wrote: > >> > >> From: Thierry Reding > >> > >> The ARM SMMU instantiations found on Tegra186 and later need inter- > >> operation with the memory controller in order to correctly program > >> stream ID overrides. > >> > >> Furthermore, on Tegra194 multiple instances of the SMMU can gang up > >> to achieve higher throughput. In order to do this, they have to be > >> programmed identically so that the memory controller can interleave > >> memory accesses between them. > >> > >> Add the Tegra186 compatible string to make sure the interoperation > >> with the memory controller can be enabled on that SoC generation. > >> > >> Signed-off-by: Thierry Reding > >> --- > >> Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 11 +-- > >> 1 file changed, 9 insertions(+), 2 deletions(-) > >> > >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > >> index 9d27aa5111d4..1181b590db71 100644 > >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml > >> @@ -54,8 +54,14 @@ properties: > >>- const: arm,mmu-500 > >>- description: NVIDIA SoCs that program two ARM MMU-500s identically > >> items: > >> + - description: NVIDIA SoCs that require memory controller > >> interaction > > > > This is not valid jsonschema: > > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one > > must be fixed: > > None is not of type 'object', 'boolean' > > None is not of type 'array' > > from schema $id: http://json-schema.org/draft-07/schema# > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > must be fixed: > > None is not of type 'object' > > None is not of type 'array' > > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml# > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > > must be fixed: > > None is not of type 'object' > > None is not of type 'array' > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one > > must be fixed: > > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const': > > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of > > type 'object' > > {'const': 'nvidia,tegra194-smmu'} is not of type 'string' > > {'const': 'nvidia,tegra186-smmu'} is not of type 'string' > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > > > > This was not reviewed nor tested since the DT list was not Cc'ed. > > Ugh, I see now weird empty item on a list... and not only DT list was > skipped - Thierry did not Cc you either. This seemed like a too trivial addition to waste Rob's time on, so I didn't add him (or the DT list for that matter) on Cc. The ARM SMMU maintainers had reviewed this, which seemed like it was enough for what the DT bindings change was doing. In any case, I clearly should've checked the DT binding check output more carefully. It's rather messy for Tegra because there's quite a few that we haven't converted yet. I'll have to resume my effort to convert the remaining ones and fixup the device trees so that we can actually run the DT binding and DTB validation checks more usefully. > My bad, I did not check that patch thoroughly before applying. > > Thierry, please Cc folks mentioned by get_maintainer.pl. Either sent a > fix or a revert, if fix needs more time. I've sent out a follow-up fix that removes the two bogus lines. It looks like that was the result of a bad conflict resolution on my part. > Additionally, why the patch changes reg to "minItems: 1" for > nvidia,tegra194-smmu? This is because originally the Tegra194 SMMU nodes were supposed to only represent a "dual" instance. However, on Tegra194 there are three SMMU instances in total, with the third instance (dedicated for isochronous traffic) being completely separate and having only a single range of registers. That third instance was previously supposed to be covered by the normal "arm,mmu-500" compatible string, but given that we really need that interoperation between SMMU and memory controller for SID override programming, we need the Tegra-specific compatible for the ISO instance of the SMMU as well. And since that uses only one set of registers, minItems had to become 1. Thierry signature.asc Description: PGP signature
[PATCH] dt-bindings: arm-smmu: Fix json-schema syntax
From: Thierry Reding Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible string") introduced a jsonschema syntax error as a result of a rebase gone wrong. Fix it. Fixes: 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible string") Reported-by: Rob Herring Signed-off-by: Thierry Reding --- Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml index 1181b590db71..03f2b2d4db30 100644 --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml @@ -52,16 +52,14 @@ properties: items: - const: marvell,ap806-smmu-500 - const: arm,mmu-500 - - description: NVIDIA SoCs that program two ARM MMU-500s identically -items: - description: NVIDIA SoCs that require memory controller interaction and may program multiple ARM MMU-500s identically with the memory controller interleaving translations between multiple instances for improved performance. items: - enum: - - const: nvidia,tegra194-smmu - - const: nvidia,tegra186-smmu + - nvidia,tegra194-smmu + - nvidia,tegra186-smmu - const: nvidia,smmu-500 - items: - const: arm,mmu-500 -- 2.32.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)
On Mon, Jun 21, 2021 at 01:14:48PM +0900, 'Dominique MARTINET' wrote: > Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900: > > Sure. No problem. But, the patch was already stacked on Konrad's tree > > and linux-next as well. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14=33d1641f38f0c327bc3e5c21de585c77a6512bc6 > > > > That patch is slightly different, it's a rewrite Konrad did that mixes > in Linus' suggestion[1], which breaks things for the NVMe usecase > Jianxiong Gao has. > > [1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1) > > > Konrad is aware so I think it shouldn't be submitted :) The beaty of 'devel' and 'linux-next' is that they can be reshuffled and mangled. I pushed them original patch from Bumyong there and will let it sit for a day and then create a stable branch and give it to Linus. Then I need to expand the test-regression bucket so that this does not happen again. Dominique, how easy would it be to purchase one of those devices? I was originally thinking to create a crypto device in QEMU to simulate this but that may take longer to write than just getting the real thing. Or I could create some fake devices with weird offsets and write a driver for it to exercise this.. like this one I had done some time ago that needs some brushing off. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v4 2/7] iommu/amd: Do not use flush-queue when NpCache is on
On 16/06/2021 11:04, Nadav Amit wrote: - if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) + if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) { + if (!amd_iommu_unmap_flush) + pr_warn("IOMMU batching is disabled due to virtualization"); This is missing the '\n', like the Intel driver. And, JFYI, we are also downgrading that same print to info level (in the Intel driver). Thanks, John + amd_iommu_np_cache = true; + amd_iommu_unmap_flush = true; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 21/06/2021 12:59, Robin Murphy wrote: + Nadav On a personal level I would be happy with that approach, but I think it's better to not start changing things right away in a follow-up series. So how about we add this patch (which replaces 6/6 "iommu: Remove mode argument from iommu_set_dma_strict()")? Robin, any opinion? For me it boils down to whether there are any realistic workloads where non-strict mode*would* still perform better under virtualisation. The only reason for the user to explicitly pass "iommu.strict=0" is because they expect it to increase unmap performance; if it's only ever going to lead to an unexpected performance loss, I don't see any value in overriding the kernel's decision purely for the sake of subservience. If there*are* certain valid cases for allowing it for people who really know what they're doing, then we should arguably also log a counterpart message to say "we're honouring your override but beware it may have the opposite effect to what you expect" for the benefit of other users who assume it's a generic go-faster knob. At that point it starts getting non-trivial enough that I'd want to know for sure it's worthwhile. The other reason this might be better to revisit later is that an AMD equivalent is still in flight[1], and there might be more that can eventually be factored out. I think both series are pretty much good to merge for 5.14, but time's already tight to sort out the conflicts which exist as-is, without making them any worse. ok, fine. Can revisit. As for getting these merged, I'll dry-run merging both of those series to see the conflicts. It doesn't look too problematic from a glance. Cheers, John Robin. [1] https://lore.kernel.org/linux-iommu/20210616100500.174507-3-na...@vmware.com/ --->8- [PATCH] iommu/vt-d: Make "iommu.strict" override batching due to virtualization As a change in policy, make iommu.strict cmdline argument override whether we disable batching due to virtualization. The API of iommu_set_dma_strict() is changed to accept a "force" argument, which means that we always set iommu_dma_strict true, regardless of whether we already set via cmdline. Also return a boolean, to tell whether iommu_dma_strict was set or not. Note that in all pre-existing callsites of iommu_set_dma_strict(), argument strict was true, so this argument is dropped. Signed-off-by: John Garry diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 0f9d8116..e8d65239b359 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (cap_caching_mode(iommu->cap)) { + if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false)) pr_info_once("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); - } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, "%s", iommu->name); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..1434bee64af3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +/* Return true if we set iommu_dma_strict */ +bool iommu_set_dma_strict(bool force) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) { + iommu_dma_strict = true; + return true; + } + return false; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..f17b20234296 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks); -void iommu_set_dma_strict(bool val); +bool iommu_set_dma_strict(bool force); bool iommu_get_dma_strict(struct iommu_domain *domain); extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 2021-06-21 11:34, John Garry wrote: On 21/06/2021 11:00, Lu Baolu wrote: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Yes. We need to distinguish the "must" and "nice-to-have" cases of setting strict mode. Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. This patch series looks good to me and I have added by reviewed-by. Probably we could make another patch series to improve it so that the kernel optimization should not override the user setting. On a personal level I would be happy with that approach, but I think it's better to not start changing things right away in a follow-up series. So how about we add this patch (which replaces 6/6 "iommu: Remove mode argument from iommu_set_dma_strict()")? Robin, any opinion? For me it boils down to whether there are any realistic workloads where non-strict mode *would* still perform better under virtualisation. The only reason for the user to explicitly pass "iommu.strict=0" is because they expect it to increase unmap performance; if it's only ever going to lead to an unexpected performance loss, I don't see any value in overriding the kernel's decision purely for the sake of subservience. If there *are* certain valid cases for allowing it for people who really know what they're doing, then we should arguably also log a counterpart message to say "we're honouring your override but beware it may have the opposite effect to what you expect" for the benefit of other users who assume it's a generic go-faster knob. At that point it starts getting non-trivial enough that I'd want to know for sure it's worthwhile. The other reason this might be better to revisit later is that an AMD equivalent is still in flight[1], and there might be more that can eventually be factored out. I think both series are pretty much good to merge for 5.14, but time's already tight to sort out the conflicts which exist as-is, without making them any worse. Robin. [1] https://lore.kernel.org/linux-iommu/20210616100500.174507-3-na...@vmware.com/ --->8- [PATCH] iommu/vt-d: Make "iommu.strict" override batching due to virtualization As a change in policy, make iommu.strict cmdline argument override whether we disable batching due to virtualization. The API of iommu_set_dma_strict() is changed to accept a "force" argument, which means that we always set iommu_dma_strict true, regardless of whether we already set via cmdline. Also return a boolean, to tell whether iommu_dma_strict was set or not. Note that in all pre-existing callsites of iommu_set_dma_strict(), argument strict was true, so this argument is dropped. Signed-off-by: John Garry diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 0f9d8116..e8d65239b359 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (cap_caching_mode(iommu->cap)) { + if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false)) pr_info_once("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); - } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, "%s", iommu->name); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..1434bee64af3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +/* Return true if we set iommu_dma_strict */ +bool iommu_set_dma_strict(bool force) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) { + iommu_dma_strict = true; + return true; + } + return false; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..f17b20234296 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7
Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
On Mon, Jun 21, 2021 at 5:14 PM Jason Wang wrote: > > > 在 2021/6/15 下午10:13, Xie Yongji 写道: > > This VDUSE driver enables implementing vDPA devices in userspace. > > The vDPA device's control path is handled in kernel and the data > > path is handled in userspace. > > > > A message mechnism is used by VDUSE driver to forward some control > > messages such as starting/stopping datapath to userspace. Userspace > > can use read()/write() to receive/reply those control messages. > > > > And some ioctls are introduced to help userspace to implement the > > data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file > > descriptors referring to vDPA device's iova regions. Then userspace > > can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES > > and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features > > and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD > > ioctls can be used to inject interrupt and setup the kickfd for > > virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the > > configuration space and inject a config interrupt. > > > > Signed-off-by: Xie Yongji > > --- > > Documentation/userspace-api/ioctl/ioctl-number.rst |1 + > > drivers/vdpa/Kconfig | 10 + > > drivers/vdpa/Makefile |1 + > > drivers/vdpa/vdpa_user/Makefile|5 + > > drivers/vdpa/vdpa_user/vduse_dev.c | 1453 > > > > include/uapi/linux/vduse.h | 143 ++ > > 6 files changed, 1613 insertions(+) > > create mode 100644 drivers/vdpa/vdpa_user/Makefile > > create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c > > create mode 100644 include/uapi/linux/vduse.h > > > > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst > > b/Documentation/userspace-api/ioctl/ioctl-number.rst > > index 9bfc2b510c64..acd95e9dcfe7 100644 > > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst > > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst > > @@ -300,6 +300,7 @@ Code Seq#Include File > > Comments > > 'z' 10-4F drivers/s390/crypto/zcrypt_api.h > > conflict! > > '|' 00-7F linux/media.h > > 0x80 00-1F linux/fb.h > > +0x81 00-1F linux/vduse.h > > 0x89 00-06 arch/x86/include/asm/sockios.h > > 0x89 0B-DF linux/sockios.h > > 0x89 E0-EF linux/sockios.h > > SIOCPROTOPRIVATE range > > diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig > > index a503c1b2bfd9..6e23bce6433a 100644 > > --- a/drivers/vdpa/Kconfig > > +++ b/drivers/vdpa/Kconfig > > @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK > > vDPA block device simulator which terminates IO request in a > > memory buffer. > > > > +config VDPA_USER > > + tristate "VDUSE (vDPA Device in Userspace) support" > > + depends on EVENTFD && MMU && HAS_DMA > > + select DMA_OPS > > + select VHOST_IOTLB > > + select IOMMU_IOVA > > + help > > + With VDUSE it is possible to emulate a vDPA Device > > + in a userspace program. > > + > > config IFCVF > > tristate "Intel IFC VF vDPA driver" > > depends on PCI_MSI > > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile > > index 67fe7f3d6943..f02ebed33f19 100644 > > --- a/drivers/vdpa/Makefile > > +++ b/drivers/vdpa/Makefile > > @@ -1,6 +1,7 @@ > > # SPDX-License-Identifier: GPL-2.0 > > obj-$(CONFIG_VDPA) += vdpa.o > > obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ > > +obj-$(CONFIG_VDPA_USER) += vdpa_user/ > > obj-$(CONFIG_IFCVF)+= ifcvf/ > > obj-$(CONFIG_MLX5_VDPA) += mlx5/ > > obj-$(CONFIG_VP_VDPA)+= virtio_pci/ > > diff --git a/drivers/vdpa/vdpa_user/Makefile > > b/drivers/vdpa/vdpa_user/Makefile > > new file mode 100644 > > index ..260e0b26af99 > > --- /dev/null > > +++ b/drivers/vdpa/vdpa_user/Makefile > > @@ -0,0 +1,5 @@ > > +# SPDX-License-Identifier: GPL-2.0 > > + > > +vduse-y := vduse_dev.o iova_domain.o > > + > > +obj-$(CONFIG_VDPA_USER) += vduse.o > > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c > > b/drivers/vdpa/vdpa_user/vduse_dev.c > > new file mode 100644 > > index ..5271cbd15e28 > > --- /dev/null > > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c > > @@ -0,0 +1,1453 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * VDUSE: vDPA Device in Userspace > > + * > > + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All > > rights reserved. > > + * > > + * Author: Xie Yongji > > + * > > + */ > > + > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > +#include > > + > > +#include "iova_domain.h" > > + > > +#define DRV_AUTHOR
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 21/06/2021 11:00, Lu Baolu wrote: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Yes. We need to distinguish the "must" and "nice-to-have" cases of setting strict mode. Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. This patch series looks good to me and I have added by reviewed-by. Probably we could make another patch series to improve it so that the kernel optimization should not override the user setting. On a personal level I would be happy with that approach, but I think it's better to not start changing things right away in a follow-up series. So how about we add this patch (which replaces 6/6 "iommu: Remove mode argument from iommu_set_dma_strict()")? Robin, any opinion? --->8- [PATCH] iommu/vt-d: Make "iommu.strict" override batching due to virtualization As a change in policy, make iommu.strict cmdline argument override whether we disable batching due to virtualization. The API of iommu_set_dma_strict() is changed to accept a "force" argument, which means that we always set iommu_dma_strict true, regardless of whether we already set via cmdline. Also return a boolean, to tell whether iommu_dma_strict was set or not. Note that in all pre-existing callsites of iommu_set_dma_strict(), argument strict was true, so this argument is dropped. Signed-off-by: John Garry diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index 0f9d8116..e8d65239b359 100644 --- a/drivers/iommu/intel/iommu.c +++ b/drivers/iommu/intel/iommu.c @@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void) * is likely to be much lower than the overhead of synchronizing * the virtual and physical IOMMU page-tables. */ - if (cap_caching_mode(iommu->cap)) { + if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false)) pr_info_once("IOMMU batching disallowed due to virtualization\n"); - iommu_set_dma_strict(true); - } iommu_device_sysfs_add(>iommu, NULL, intel_iommu_groups, "%s", iommu->name); diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..1434bee64af3 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +/* Return true if we set iommu_dma_strict */ +bool iommu_set_dma_strict(bool force) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) { + iommu_dma_strict = true; + return true; + } + return false; } bool iommu_get_dma_strict(struct iommu_domain *domain) diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 32d448050bf7..f17b20234296 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain); int iommu_set_pgtable_quirks(struct iommu_domain *domain, unsigned long quirks); -void iommu_set_dma_strict(bool val); +bool iommu_set_dma_strict(bool force); bool iommu_get_dma_strict(struct iommu_domain *domain); extern int report_iommu_fault(struct iommu_domain *domain, struct device *dev, -- 2.26.2 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 2021/6/21 16:12, John Garry wrote: On 21/06/2021 06:17, Lu Baolu wrote: On 2021/6/18 19:34, John Garry wrote: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..ff221d3ddcbc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +void iommu_set_dma_strict(void) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + iommu_dma_strict = true; } Hi baolu, Hi John, Sorry for this late comment. > Normally the cache invalidation policy should come from the user. We have pre-build kernel option and also a kernel boot command iommu.strict to override it. These seem reasonable. We also have a helper (iommu_set_dma_strict()) so that the vendor iommu driver could squeeze in and change the previous settings mostly due to: a) vendor iommu driver specific kernel boot command. (We are about to deprecate those.) b) quirky hardware. c) kernel optimization (e.x. strict mode in VM environment). a) and b) are mandatory, while c) is optional. In any instance should c) override the flush mode specified by the user. Hence, probably we should also have another helper like: void iommu_set_dma_strict_optional() { if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } Any thoughts? What you are suggesting is a change in policy from mainline code. Currently for c) we always set strict enabled, regardless of any user cmdline input. But now you are saying that you want iommu.strict to override in particular scenario, right? In that case I would think it's better to rework the current API, like adding an option to "force" strict mode: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Yes. We need to distinguish the "must" and "nice-to-have" cases of setting strict mode. Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. This patch series looks good to me and I have added by reviewed-by. Probably we could make another patch series to improve it so that the kernel optimization should not override the user setting. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace
在 2021/6/15 下午10:13, Xie Yongji 写道: This VDUSE driver enables implementing vDPA devices in userspace. The vDPA device's control path is handled in kernel and the data path is handled in userspace. A message mechnism is used by VDUSE driver to forward some control messages such as starting/stopping datapath to userspace. Userspace can use read()/write() to receive/reply those control messages. And some ioctls are introduced to help userspace to implement the data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file descriptors referring to vDPA device's iova regions. Then userspace can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD ioctls can be used to inject interrupt and setup the kickfd for virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the configuration space and inject a config interrupt. Signed-off-by: Xie Yongji --- Documentation/userspace-api/ioctl/ioctl-number.rst |1 + drivers/vdpa/Kconfig | 10 + drivers/vdpa/Makefile |1 + drivers/vdpa/vdpa_user/Makefile|5 + drivers/vdpa/vdpa_user/vduse_dev.c | 1453 include/uapi/linux/vduse.h | 143 ++ 6 files changed, 1613 insertions(+) create mode 100644 drivers/vdpa/vdpa_user/Makefile create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c create mode 100644 include/uapi/linux/vduse.h diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 9bfc2b510c64..acd95e9dcfe7 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -300,6 +300,7 @@ Code Seq#Include File Comments 'z' 10-4F drivers/s390/crypto/zcrypt_api.hconflict! '|' 00-7F linux/media.h 0x80 00-1F linux/fb.h +0x81 00-1F linux/vduse.h 0x89 00-06 arch/x86/include/asm/sockios.h 0x89 0B-DF linux/sockios.h 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index a503c1b2bfd9..6e23bce6433a 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK vDPA block device simulator which terminates IO request in a memory buffer. +config VDPA_USER + tristate "VDUSE (vDPA Device in Userspace) support" + depends on EVENTFD && MMU && HAS_DMA + select DMA_OPS + select VHOST_IOTLB + select IOMMU_IOVA + help + With VDUSE it is possible to emulate a vDPA Device + in a userspace program. + config IFCVF tristate "Intel IFC VF vDPA driver" depends on PCI_MSI diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile index 67fe7f3d6943..f02ebed33f19 100644 --- a/drivers/vdpa/Makefile +++ b/drivers/vdpa/Makefile @@ -1,6 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_VDPA) += vdpa.o obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ +obj-$(CONFIG_VDPA_USER) += vdpa_user/ obj-$(CONFIG_IFCVF)+= ifcvf/ obj-$(CONFIG_MLX5_VDPA) += mlx5/ obj-$(CONFIG_VP_VDPA)+= virtio_pci/ diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile new file mode 100644 index ..260e0b26af99 --- /dev/null +++ b/drivers/vdpa/vdpa_user/Makefile @@ -0,0 +1,5 @@ +# SPDX-License-Identifier: GPL-2.0 + +vduse-y := vduse_dev.o iova_domain.o + +obj-$(CONFIG_VDPA_USER) += vduse.o diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c b/drivers/vdpa/vdpa_user/vduse_dev.c new file mode 100644 index ..5271cbd15e28 --- /dev/null +++ b/drivers/vdpa/vdpa_user/vduse_dev.c @@ -0,0 +1,1453 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * VDUSE: vDPA Device in Userspace + * + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. + * + * Author: Xie Yongji + * + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "iova_domain.h" + +#define DRV_AUTHOR "Yongji Xie " +#define DRV_DESC "vDPA Device in Userspace" +#define DRV_LICENSE "GPL v2" + +#define VDUSE_DEV_MAX (1U << MINORBITS) +#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024) +#define VDUSE_IOVA_SIZE (128 * 1024 * 1024) +#define VDUSE_REQUEST_TIMEOUT 30 + +struct vduse_virtqueue { + u16 index; + u32 num; + u32 avail_idx; + u64 desc_addr; + u64 driver_addr; + u64 device_addr; + bool ready; + bool kicked; + spinlock_t kick_lock; + spinlock_t irq_lock; + struct eventfd_ctx *kickfd; +
Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()
On 21/06/2021 06:17, Lu Baolu wrote: On 2021/6/18 19:34, John Garry wrote: diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 60b1ec42e73b..ff221d3ddcbc 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str) } early_param("iommu.strict", iommu_dma_setup); -void iommu_set_dma_strict(bool strict) +void iommu_set_dma_strict(void) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + iommu_dma_strict = true; } Hi baolu, Sorry for this late comment. > Normally the cache invalidation policy should come from the user. We have pre-build kernel option and also a kernel boot command iommu.strict to override it. These seem reasonable. We also have a helper (iommu_set_dma_strict()) so that the vendor iommu driver could squeeze in and change the previous settings mostly due to: a) vendor iommu driver specific kernel boot command. (We are about to deprecate those.) b) quirky hardware. c) kernel optimization (e.x. strict mode in VM environment). a) and b) are mandatory, while c) is optional. In any instance should c) override the flush mode specified by the user. Hence, probably we should also have another helper like: void iommu_set_dma_strict_optional() { if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } Any thoughts? What you are suggesting is a change in policy from mainline code. Currently for c) we always set strict enabled, regardless of any user cmdline input. But now you are saying that you want iommu.strict to override in particular scenario, right? In that case I would think it's better to rework the current API, like adding an option to "force" strict mode: void iommu_set_dma_strict(bool force) { if (force == true) iommu_dma_strict = true; else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) iommu_dma_strict = true; } So we would use iommu_set_dma_strict(true) for a) and b), but iommu_set_dma_strict(false) for c). Then I am not sure what you want to do with the accompanying print for c). It was: "IOMMU batching is disabled due to virtualization" And now is from this series: "IOMMU batching disallowed due to virtualization" Using iommu_get_dma_strict(domain) is not appropriate here to know the current mode (so we know whether to print). Note that this change would mean that the current series would require non-trivial rework, which would be unfortunate so late in the cycle. Thanks, John ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string
On 18/06/2021 21:47, Rob Herring wrote: > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding > wrote: >> >> From: Thierry Reding >> >> The ARM SMMU instantiations found on Tegra186 and later need inter- >> operation with the memory controller in order to correctly program >> stream ID overrides. >> >> Furthermore, on Tegra194 multiple instances of the SMMU can gang up >> to achieve higher throughput. In order to do this, they have to be >> programmed identically so that the memory controller can interleave >> memory accesses between them. >> >> Add the Tegra186 compatible string to make sure the interoperation >> with the memory controller can be enabled on that SoC generation. >> >> Signed-off-by: Thierry Reding >> --- >> Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 11 +-- >> 1 file changed, 9 insertions(+), 2 deletions(-) >> >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml >> index 9d27aa5111d4..1181b590db71 100644 >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml >> @@ -54,8 +54,14 @@ properties: >>- const: arm,mmu-500 >>- description: NVIDIA SoCs that program two ARM MMU-500s identically >> items: >> + - description: NVIDIA SoCs that require memory controller interaction > > This is not valid jsonschema: > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one > must be fixed: > None is not of type 'object', 'boolean' > None is not of type 'array' > from schema $id: http://json-schema.org/draft-07/schema# > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > must be fixed: > None is not of type 'object' > None is not of type 'array' > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml# > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one > must be fixed: > None is not of type 'object' > None is not of type 'array' > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml: > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one > must be fixed: > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const': > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of > type 'object' > {'const': 'nvidia,tegra194-smmu'} is not of type 'string' > {'const': 'nvidia,tegra186-smmu'} is not of type 'string' > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml# > > > This was not reviewed nor tested since the DT list was not Cc'ed. Ugh, I see now weird empty item on a list... and not only DT list was skipped - Thierry did not Cc you either. My bad, I did not check that patch thoroughly before applying. Thierry, please Cc folks mentioned by get_maintainer.pl. Either sent a fix or a revert, if fix needs more time. Additionally, why the patch changes reg to "minItems: 1" for nvidia,tegra194-smmu? Best regards, Krzysztof ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu