Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Jason Wang


在 2021/6/21 下午6:41, Yongji Xie 写道:

On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:


在 2021/6/15 下午10:13, Xie Yongji 写道:

This VDUSE driver enables implementing vDPA devices in userspace.
The vDPA device's control path is handled in kernel and the data
path is handled in userspace.

A message mechnism is used by VDUSE driver to forward some control
messages such as starting/stopping datapath to userspace. Userspace
can use read()/write() to receive/reply those control messages.

And some ioctls are introduced to help userspace to implement the
data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
descriptors referring to vDPA device's iova regions. Then userspace
can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
ioctls can be used to inject interrupt and setup the kickfd for
virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
configuration space and inject a config interrupt.

Signed-off-by: Xie Yongji 
---
   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
   drivers/vdpa/Kconfig   |   10 +
   drivers/vdpa/Makefile  |1 +
   drivers/vdpa/vdpa_user/Makefile|5 +
   drivers/vdpa/vdpa_user/vduse_dev.c | 1453 

   include/uapi/linux/vduse.h |  143 ++
   6 files changed, 1613 insertions(+)
   create mode 100644 drivers/vdpa/vdpa_user/Makefile
   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
   create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b510c64..acd95e9dcfe7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
conflict!
   '|'   00-7F  linux/media.h
   0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
   0x89  00-06  arch/x86/include/asm/sockios.h
   0x89  0B-DF  linux/sockios.h
   0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
 vDPA block device simulator which terminates IO request in a
 memory buffer.

+config VDPA_USER
+ tristate "VDUSE (vDPA Device in Userspace) support"
+ depends on EVENTFD && MMU && HAS_DMA
+ select DMA_OPS
+ select VHOST_IOTLB
+ select IOMMU_IOVA
+ help
+   With VDUSE it is possible to emulate a vDPA Device
+   in a userspace program.
+
   config IFCVF
   tristate "Intel IFC VF vDPA driver"
   depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
   # SPDX-License-Identifier: GPL-2.0
   obj-$(CONFIG_VDPA) += vdpa.o
   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
   obj-$(CONFIG_IFCVF)+= ifcvf/
   obj-$(CONFIG_MLX5_VDPA) += mlx5/
   obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..5271cbd15e28
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1453 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+ u16 index;
+ u32 num;
+ u32 avail_idx;
+ u64 desc_addr;
+ u64 driver_addr;
+ u64 device_addr;
+ bool ready;
+ bool kicked;
+ spinlock_t 

Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-21 Thread Saravana Kannan via iommu
On Mon, Jun 21, 2021 at 4:53 PM Douglas Anderson  wrote:
>
> In the patch ("drivers: base: Add bits to struct device to control
> iommu strictness") we add the ability for devices to tell us about
> their IOMMU strictness requirements. Let's now take that into account
> in the IOMMU layer.
>
> A few notes here:
> * Presumably this is always how iommu_get_dma_strict() was intended to
>   behave. Had this not been the intention then it never would have
>   taken a domain as a parameter.
> * The iommu_set_dma_strict() feels awfully non-symmetric now. That
>   function sets the _default_ strictness globally in the system
>   whereas iommu_get_dma_strict() returns the value for a given domain
>   (falling back to the default). Presumably, at least, the fact that
>   iommu_set_dma_strict() doesn't take a domain makes this obvious.
>
> The function iommu_get_dma_strict() should now make it super obvious
> where strictness comes from and who overides who. Though the function
> changed a bunch to make the logic clearer, the only two new rules
> should be:
> * Devices can force strictness for themselves, overriding the cmdline
>   "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
> * Devices can request non-strictness for themselves, assuming there
>   was no cmdline "iommu.strict=1" or a call to
>   iommu_set_dma_strict(true).
>
> Signed-off-by: Douglas Anderson 
> ---
>
>  drivers/iommu/iommu.c | 56 +--
>  include/linux/iommu.h |  2 ++
>  2 files changed, 45 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 808ab70d5df5..0c84a4c06110 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -28,8 +28,19 @@
>  static struct kset *iommu_group_kset;
>  static DEFINE_IDA(iommu_group_ida);
>
> +enum iommu_strictness {
> +   IOMMU_DEFAULT_STRICTNESS = -1,
> +   IOMMU_NOT_STRICT = 0,
> +   IOMMU_STRICT = 1,
> +};
> +static inline enum iommu_strictness bool_to_strictness(bool strictness)
> +{
> +   return (enum iommu_strictness)strictness;
> +}
> +
>  static unsigned int iommu_def_domain_type __read_mostly;
> -static bool iommu_dma_strict __read_mostly = true;
> +static enum iommu_strictness cmdline_dma_strict __read_mostly = 
> IOMMU_DEFAULT_STRICTNESS;
> +static enum iommu_strictness driver_dma_strict __read_mostly = 
> IOMMU_DEFAULT_STRICTNESS;
>  static u32 iommu_cmd_line __read_mostly;
>
>  struct iommu_group {
> @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
>  };
>
>  #define IOMMU_CMD_LINE_DMA_API BIT(0)
> -#define IOMMU_CMD_LINE_STRICT  BIT(1)
>
>  static int iommu_alloc_default_domain(struct iommu_group *group,
>   struct device *dev);
> @@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
> iommu_set_def_domain_type);
>
>  static int __init iommu_dma_setup(char *str)
>  {
> -   int ret = kstrtobool(str, _dma_strict);
> +   bool strict;
> +   int ret = kstrtobool(str, );
>
> if (!ret)
> -   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
> +   cmdline_dma_strict = bool_to_strictness(strict);
> return ret;
>  }
>  early_param("iommu.strict", iommu_dma_setup);
>
>  void iommu_set_dma_strict(bool strict)
>  {
> -   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
> -   iommu_dma_strict = strict;
> +   /* A driver can request strictness but not the other way around */
> +   if (driver_dma_strict != IOMMU_STRICT)
> +   driver_dma_strict = bool_to_strictness(strict);
>  }
>
>  bool iommu_get_dma_strict(struct iommu_domain *domain)
>  {
> -   /* only allow lazy flushing for DMA domains */
> -   if (domain->type == IOMMU_DOMAIN_DMA)
> -   return iommu_dma_strict;
> +   /* Non-DMA domains or anyone forcing it to strict makes it strict */
> +   if (domain->type != IOMMU_DOMAIN_DMA ||
> +   cmdline_dma_strict == IOMMU_STRICT ||
> +   driver_dma_strict == IOMMU_STRICT ||
> +   domain->force_strict)
> +   return true;
> +
> +   /* Anyone requesting non-strict (if no forces) makes it non-strict */
> +   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
> +   driver_dma_strict == IOMMU_NOT_STRICT ||
> +   domain->request_non_strict)
> +   return false;
> +
> +   /* Nobody said anything, so it's strict by default */

If iommu.strict is not set in the command line, upstream treats it as
iommu.strict=1. Meaning, no drivers can override it.

If I understand it correctly, with your series, if iommu.strict=1 is
not set, drivers can override the "default strict mode" and ask for
non-strict mode for their domain. So if this series gets in and future
driver changes start asking for non-strict mode, systems that are
expected to operate in fully strict mode will now have devices
operating in non-strict mode.

That's breaking backward 

Re: [PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-21 Thread Lu Baolu

On 6/22/21 7:52 AM, Douglas Anderson wrote:

@@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
  
  static int iommu_group_alloc_default_domain(struct bus_type *bus,

struct iommu_group *group,
-   unsigned int type)
+   unsigned int type,
+   struct device *dev)
  {
struct iommu_domain *dom;
  
@@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus,

if (!dom)
return -ENOMEM;
  
+	/* Save the strictness requests from the device */

+   if (dev && type == IOMMU_DOMAIN_DMA) {
+   dom->request_non_strict = dev->request_non_strict_iommu;
+   dom->force_strict = dev->force_strict_iommu;
+   }
+


An iommu default domain might be used by multiple devices which might
have different "strict" attributions. Then who could override who?

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode

2021-06-21 Thread Douglas Anderson
IOMMUs can be run in "strict" mode or in "non-strict" mode. The
quick-summary difference between the two is that in "strict" mode we
wait until everything is flushed out when we unmap DMA memory. In
"non-strict" we don't.

Using the IOMMU in "strict" mode is more secure/safer but slower
because we have to sit and wait for flushes while we're unmapping. To
explain a bit why "non-strict" mode is unsafe, let's imagine two
examples.

An example of "non-strict" being insecure when reading from a device:
a) Linux driver maps memory for DMA.
b) Linux driver starts DMA on the device.
c) Device write to RAM subject to bounds checking done by IOMMU.
d) Device finishes writing to RAM and signals transfer is finished.
e) Linux driver starts unmapping DMA memory but doesn't flush.
f) Linux driver validates that the data in memory looks sane and that
   accessing it won't cause the driver to, for instance, overflow a
   buffer.
g) Device takes advantage of knowledge of how the Linux driver works
   and sneaks in a modification to the data after the validation but
   before the IOMMU unmap flush finishes.
h) Device has now caused the Linux driver to access memory it
   shouldn't.

An example of "non-strict" being insecure when writing to a device:
a) Linux driver writes data intended for the device to RAM.
b) Linux driver maps memory for DMA.
c) Linux driver starts DMA on the device.
d) Device reads from RAM subject to bounds checking done by IOMMU.
e) Device finishes reading from RAM and signals transfer is finished.
f) Linux driver starts unmapping DMA memory but doesn't flush.
g) Linux driver frees memory and returns it to the pool.
h) Memory is allocated for another purpose.
i) Device takes advantage of the period of time before IOMMU flush to
   read memory that it shouldn't have had access to.

As you can see from the above examples, using the iommu in
"non-strict" mode might not sound _too_ scary (the window of badness
is small and the exposed memory is small) but there is certainly
risk. Let's evaluate the risk by breaking it down into two problems
that IOMMUs are supposed to be protecting us against:

Case 1: IOMMUs prevent malicious code running on the peripheral (maybe
a malicious peripheral or maybe someone exploited a benign peripheral)
from turning into an exploit of the Linux kernel. This is particularly
important if the peripheral has loadable / updatable firmware or if
the peripheral has some type of general purpose processor and is
processing untrusted inputs. It's also important if the device is
something that can be easily plugged into the host and the device has
direct DMA access itself, like a PCIe device.

Case 2: IOMMUs limit the severity of a class of software bugs in the
kernel. If we misconfigure a peripheral by accident then instead of
the peripheral clobbering random memory due to a bug we might get an
IOMMU error.

Now that we understand the issue and the risks, let's evaluate whether
we really need "strict" mode for the Qualcomm SDHCI controllers. I
will make the argument that we don't _need_ strict mode for them. Why?
* The SDHCI controller on Qualcomm SoCs doesn't appear to have
  loadable / updatable firmware and, assuming it's got some firmware
  baked into it, I see no evidence that the firmware could be
  compromised.
* Even though, for external SD cards in particular, the controller is
  dealing with "untrusted" inputs, it's dealing with them in a very
  controlled way.  It seems unlikely that a rogue SD card would be
  able to present something to the SDHCI controller that would cause
  it to DMA to/from an address other than one the kernel told it
  about.
* Although it would be nice to catch more software bugs, once the
  Linux driver has been debugged and stressed the value is not very
  high. If the IOMMU caught something like this the system would be in
  a pretty bad shape anyway (we don't really recover from IOMMU
  errors) and the only benefit would be a better spotlight on what
  went wrong.

Now we have a good understanding of the benefits of "strict" mode for
our SDHCI controllers, let's look at some performance numbers. I used
"dd" to measure read speeds from eMMC on a sc7180-trogdor-lazor
board. Basic test command (while booted from USB):
  echo 3 > /proc/sys/vm/drop_caches
  dd if=/dev/mmcblk1 of=/dev/null bs=4M count=512

I attempted to run my tests for enough iterations that results
stabilized and weren't too noisy. Tests were run with patches picked
to the chromeos-5.4 tree (sanity checked against v5.13-rc7). I also
attempted to compare to other attempts to address IOMMU problems
and/or attempts to bump the cpufreq up to solve this problem:
- eMMC datasheet spec: 300 MB/s "Typical Sequential Performance"
  NOTE: we're driving the bus at 192 MHz instead of 200 Mhz so we might
  not be able to achieve the full 300 MB/s.
- Baseline: 210.9 MB/s
- Baseline + peg cpufreq to max: 284.3 MB/s
- This patch: 279.6 MB/s
- This patch + peg cpufreq to max: 288.1 MB/s
- Joel's IO 

[PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict

2021-06-21 Thread Douglas Anderson
We now have a way for PCIe devices to force iommu.strict through the
"struct device" and that's now hooked up. Let's remove the special
case for PCIe devices.

NOTE: there are still other places in this file that make decisions
based on the PCIe "untrusted" status. This patch only handles removing
the one related to iommu.strict. Removing the other cases is left as
an exercise to the reader.

Signed-off-by: Douglas Anderson 
---

 drivers/iommu/dma-iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 7bcdd1205535..e50c06ce1a6b 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -368,7 +368,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 
init_iova_domain(iovad, 1UL << order, base_pfn);
 
-   if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) &&
+   if (!cookie->fq_domain &&
domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) {
if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
  iommu_dma_entry_dtor))
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] iommu: Combine device strictness requests with the global default

2021-06-21 Thread Douglas Anderson
In the patch ("drivers: base: Add bits to struct device to control
iommu strictness") we add the ability for devices to tell us about
their IOMMU strictness requirements. Let's now take that into account
in the IOMMU layer.

A few notes here:
* Presumably this is always how iommu_get_dma_strict() was intended to
  behave. Had this not been the intention then it never would have
  taken a domain as a parameter.
* The iommu_set_dma_strict() feels awfully non-symmetric now. That
  function sets the _default_ strictness globally in the system
  whereas iommu_get_dma_strict() returns the value for a given domain
  (falling back to the default). Presumably, at least, the fact that
  iommu_set_dma_strict() doesn't take a domain makes this obvious.

The function iommu_get_dma_strict() should now make it super obvious
where strictness comes from and who overides who. Though the function
changed a bunch to make the logic clearer, the only two new rules
should be:
* Devices can force strictness for themselves, overriding the cmdline
  "iommu.strict=0" or a call to iommu_set_dma_strict(false)).
* Devices can request non-strictness for themselves, assuming there
  was no cmdline "iommu.strict=1" or a call to
  iommu_set_dma_strict(true).

Signed-off-by: Douglas Anderson 
---

 drivers/iommu/iommu.c | 56 +--
 include/linux/iommu.h |  2 ++
 2 files changed, 45 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 808ab70d5df5..0c84a4c06110 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -28,8 +28,19 @@
 static struct kset *iommu_group_kset;
 static DEFINE_IDA(iommu_group_ida);
 
+enum iommu_strictness {
+   IOMMU_DEFAULT_STRICTNESS = -1,
+   IOMMU_NOT_STRICT = 0,
+   IOMMU_STRICT = 1,
+};
+static inline enum iommu_strictness bool_to_strictness(bool strictness)
+{
+   return (enum iommu_strictness)strictness;
+}
+
 static unsigned int iommu_def_domain_type __read_mostly;
-static bool iommu_dma_strict __read_mostly = true;
+static enum iommu_strictness cmdline_dma_strict __read_mostly = 
IOMMU_DEFAULT_STRICTNESS;
+static enum iommu_strictness driver_dma_strict __read_mostly = 
IOMMU_DEFAULT_STRICTNESS;
 static u32 iommu_cmd_line __read_mostly;
 
 struct iommu_group {
@@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = {
 };
 
 #define IOMMU_CMD_LINE_DMA_API BIT(0)
-#define IOMMU_CMD_LINE_STRICT  BIT(1)
 
 static int iommu_alloc_default_domain(struct iommu_group *group,
  struct device *dev);
@@ -336,25 +346,38 @@ early_param("iommu.passthrough", 
iommu_set_def_domain_type);
 
 static int __init iommu_dma_setup(char *str)
 {
-   int ret = kstrtobool(str, _dma_strict);
+   bool strict;
+   int ret = kstrtobool(str, );
 
if (!ret)
-   iommu_cmd_line |= IOMMU_CMD_LINE_STRICT;
+   cmdline_dma_strict = bool_to_strictness(strict);
return ret;
 }
 early_param("iommu.strict", iommu_dma_setup);
 
 void iommu_set_dma_strict(bool strict)
 {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   /* A driver can request strictness but not the other way around */
+   if (driver_dma_strict != IOMMU_STRICT)
+   driver_dma_strict = bool_to_strictness(strict);
 }
 
 bool iommu_get_dma_strict(struct iommu_domain *domain)
 {
-   /* only allow lazy flushing for DMA domains */
-   if (domain->type == IOMMU_DOMAIN_DMA)
-   return iommu_dma_strict;
+   /* Non-DMA domains or anyone forcing it to strict makes it strict */
+   if (domain->type != IOMMU_DOMAIN_DMA ||
+   cmdline_dma_strict == IOMMU_STRICT ||
+   driver_dma_strict == IOMMU_STRICT ||
+   domain->force_strict)
+   return true;
+
+   /* Anyone requesting non-strict (if no forces) makes it non-strict */
+   if (cmdline_dma_strict == IOMMU_NOT_STRICT ||
+   driver_dma_strict == IOMMU_NOT_STRICT ||
+   domain->request_non_strict)
+   return false;
+
+   /* Nobody said anything, so it's strict by default */
return true;
 }
 EXPORT_SYMBOL_GPL(iommu_get_dma_strict);
@@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev)
 
 static int iommu_group_alloc_default_domain(struct bus_type *bus,
struct iommu_group *group,
-   unsigned int type)
+   unsigned int type,
+   struct device *dev)
 {
struct iommu_domain *dom;
 
@@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct 
bus_type *bus,
if (!dom)
return -ENOMEM;
 
+   /* Save the strictness requests from the device */
+   if (dev && type == IOMMU_DOMAIN_DMA) {
+   

[PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices

2021-06-21 Thread Douglas Anderson
At the moment the generic IOMMU framework reaches into the PCIe device
to check the "untrusted" state and uses this information to figure out
if it should be running the IOMMU in strict or non-strict mode. Let's
instead set the new boolean in "struct device" to indicate when we
want forced strictness.

NOTE: we still continue to set the "untrusted" bit in PCIe since that
apparently is used for more than just IOMMU strictness. It probably
makes sense for a later patchset to clarify all of the other needs we
have for "untrusted" PCIe devices (perhaps add more booleans into the
"struct device") so we can fully eliminate the need for the IOMMU
framework to reach into a PCIe device.

Signed-off-by: Douglas Anderson 
---

 drivers/pci/probe.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 275204646c68..8d81f0fb3e50 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1572,8 +1572,10 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 * untrusted as well.
 */
parent = pci_upstream_bridge(dev);
-   if (parent && (parent->untrusted || parent->external_facing))
+   if (parent && (parent->untrusted || parent->external_facing)) {
dev->untrusted = true;
+   dev->dev.force_strict_iommu = true;
+   }
 }
 
 /**
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness

2021-06-21 Thread Douglas Anderson
How to control the "strictness" of an IOMMU is a bit of a mess right
now. As far as I can tell, right now:
* You can set the default to "non-strict" and some devices (right now,
  only PCI devices) can request to run in "strict" mode.
* You can set the default to "strict" and no devices in the system are
  allowed to run as "non-strict".

I believe this needs to be improved a bit. Specifically:
* We should be able to default to "strict" mode but let devices that
  claim to be fairly low risk request that they be run in "non-strict"
  mode.
* We should allow devices outside of PCIe to request "strict" mode if
  the system default is "non-strict".

I believe the correct way to do this is two bits in "struct
device". One allows a device to force things to "strict" mode and the
other allows a device to _request_ "non-strict" mode. The asymmetry
here is on purpose. Generally if anything in the system makes a
request for strictness of something then we want it strict. Thus
drivers can only request (but not force) non-strictness.

It's expected that the strictness fields can be filled in by the bus
code like in the patch ("PCI: Indicate that we want to force strict
DMA for untrusted devices") or by using the new pre_probe concept
introduced in the patch ("drivers: base: Add the concept of
"pre_probe" to drivers").

Signed-off-by: Douglas Anderson 
---

 include/linux/device.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/device.h b/include/linux/device.h
index f1a00040fa53..c1b985e10c47 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -449,6 +449,15 @@ struct dev_links_info {
  * and optionall (if the coherent mask is large enough) also
  * for dma allocations.  This flag is managed by the dma ops
  * instance from ->dma_supported.
+ * @force_strict_iommu: If set to %true then we should force this device to
+ * iommu.strict regardless of the other defaults in the
+ * system. Only has an effect if an IOMMU is in place.
+ * @request_non_strict_iommu: If set to %true and there are no other known
+ *   reasons to make the iommu.strict for this device,
+ *   then default to non-strict mode. This implies
+ *   some belief that the DMA master for this device
+ *   won't abuse the DMA path to compromise the kernel.
+ *   Only has an effect if an IOMMU is in place.
  *
  * At the lowest level, every device in a Linux system is represented by an
  * instance of struct device. The device structure contains the information
@@ -557,6 +566,8 @@ struct device {
 #ifdef CONFIG_DMA_OPS_BYPASS
booldma_ops_bypass : 1;
 #endif
+   boolforce_strict_iommu:1;
+   boolrequest_non_strict_iommu:1;
 };
 
 /**
-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers

2021-06-21 Thread Douglas Anderson
Right now things are a bit awkward if a driver would like a chance to
run before some of the more "automatic" things (pinctrl, DMA, IOMMUs,
...) happen to a device. This patch aims to fix that problem by
introducing the concept of a "pre_probe" function that drivers can
implement to run before the "automatic" stuff.

Why would you want to run before the "automatic" stuff? The incentive
in my case is that I want to be able to fill in some boolean flags in
the "struct device" before the IOMMU init runs. It appears that the
strictness vs. non-strictness of a device's iommu config is determined
once at init time and can't be changed afterwards. However, I would
like to avoid hardcoding the rules for strictness in the IOMMU
driver. Instead I'd like to let individual drivers be able to make
informed decisions about the appropriateness of strictness
vs. non-strictness.

The desire for running code pre_probe is likely not limited to my use
case. I believe that the list "qcom_smmu_client_of_match" is hacked
into the iommu driver specifically because there was no real good
framework for this. For the existing list it wasn't _quite_ as ugly as
my needs since the decision could be made solely on compatible string,
but it still feels like it would have been better for individual
drivers to run code and setup some state rather than coding up a big
list in the IOMMU driver.

Even without this patch, I believe it is possible for a driver to run
before the "automatic" things by registering for
"BUS_NOTIFY_BIND_DRIVER" in its init call, though I haven't personally
tested this. Using the notifier is a bit awkward, though, and I'd
rather avoid it. Also, using "BUS_NOTIFY_BIND_DRIVER" would require
drivers to stop using the convenience module_platform_driver() helper
and roll a bunch of boilerplate code.

NOTE: the pre_probe here is listed in the driver structure. As a side
effect of this it will be passed a "struct device *" rather than the
more specific device type (like the "struct platform_device *" that
most platform devices get passed to their probe). Presumably this
won't cause trouble and it's a lot less code to write but if we need
to make it more symmetric that's also possible by touching more files.

Signed-off-by: Douglas Anderson 
---

 drivers/base/dd.c | 10 --
 include/linux/device/driver.h |  9 +
 2 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index ecd7cf848daf..9a13bff8dafa 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -549,10 +549,16 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
 re_probe:
dev->driver = drv;
 
+   if (drv->pre_probe) {
+   ret = drv->pre_probe(dev);
+   if (ret)
+   goto probe_failed_pre_dma;
+   }
+
/* If using pinctrl, bind pins now before probing */
ret = pinctrl_bind_pins(dev);
if (ret)
-   goto pinctrl_bind_failed;
+   goto probe_failed_pre_dma;
 
if (dev->bus->dma_configure) {
ret = dev->bus->dma_configure(dev);
@@ -639,7 +645,7 @@ static int really_probe(struct device *dev, struct 
device_driver *drv)
if (dev->bus)
blocking_notifier_call_chain(>bus->p->bus_notifier,
 BUS_NOTIFY_DRIVER_NOT_BOUND, dev);
-pinctrl_bind_failed:
+probe_failed_pre_dma:
device_links_no_driver(dev);
devres_release_all(dev);
arch_teardown_dma_ops(dev);
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf4993..f7305dd6ceb1 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -57,6 +57,14 @@ enum probe_type {
  * @probe_type:Type of the probe (synchronous or asynchronous) to use.
  * @of_match_table: The open firmware table.
  * @acpi_match_table: The ACPI match table.
+ * @pre_probe: Called after a device has been bound to a driver but before
+ * anything "automatic" (pinctrl, DMA, IOMMUs, ...) has been
+ * setup. This is mostly a chance for the driver to do things
+ * that might need to be run before any of those automatic
+ * processes. The vast majority of devices don't need to
+ * implement this. Note that there is no "post_remove" at the
+ * moment. If you need to undo something that you did in
+ * pre_probe() you can use devres.
  * @probe: Called to query the existence of a specific device,
  * whether this driver can work with it, and bind the driver
  * to a specific device.
@@ -105,6 +113,7 @@ struct device_driver {
const struct of_device_id   *of_match_table;
const struct acpi_device_id *acpi_match_table;
 
+   int (*pre_probe) (struct device *dev);
int (*probe) (struct device *dev);
void (*sync_state)(struct device *dev);

[PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC

2021-06-21 Thread Douglas Anderson


This patch attempts to put forward a proposal for enabling non-strict
DMA on a device-by-device basis. The patch series requests non-strict
DMA for the Qualcomm SDHCI controller as a first device to enable,
getting a nice bump in performance with what's believed to be a very
small drop in security / safety (see the patch for the full argument).

As part of this patch series I am end up slightly cleaning up some of
the interactions between the PCI subsystem and the IOMMU subsystem but
I don't go all the way to fully remove all the tentacles. Specifically
this patch series only concerns itself with a single aspect: strict
vs. non-strict mode for the IOMMU. I'm hoping that this will be easier
to talk about / reason about for more subsystems compared to overall
deciding what it means for a device to be "external" or "untrusted".

If something like this patch series ends up being landable, it will
undoubtedly need coordination between many maintainers to land. I
believe it's fully bisectable but later patches in the series
definitely depend on earlier ones. Sorry for the long CC list. :(


Douglas Anderson (6):
  drivers: base: Add the concept of "pre_probe" to drivers
  drivers: base: Add bits to struct device to control iommu strictness
  PCI: Indicate that we want to force strict DMA for untrusted devices
  iommu: Combine device strictness requests with the global default
  iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
  mmc: sdhci-msm: Request non-strict IOMMU mode

 drivers/base/dd.c | 10 +--
 drivers/iommu/dma-iommu.c |  2 +-
 drivers/iommu/iommu.c | 56 +++
 drivers/mmc/host/sdhci-msm.c  |  8 +
 drivers/pci/probe.c   |  4 ++-
 include/linux/device.h| 11 +++
 include/linux/device/driver.h |  9 ++
 include/linux/iommu.h |  2 ++
 8 files changed, 85 insertions(+), 17 deletions(-)

-- 
2.32.0.288.g62a8d224e6-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch

2021-06-21 Thread Andy Lutomirski
On Mon, Jun 21, 2021 at 10:51 AM Ross Philipson
 wrote:
>
> On 6/18/21 2:32 PM, Robin Murphy wrote:
> > On 2021-06-18 17:12, Ross Philipson wrote:
> >> The IOMMU should always be set to default translated type after
> >> the PMRs are disabled to protect the MLE from DMA.
> >>
> >> Signed-off-by: Ross Philipson 
> >> ---
> >>   drivers/iommu/intel/iommu.c | 5 +
> >>   drivers/iommu/iommu.c   | 6 +-
> >>   2 files changed, 10 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
> >> index be35284..4f0256d 100644
> >> --- a/drivers/iommu/intel/iommu.c
> >> +++ b/drivers/iommu/intel/iommu.c
> >> @@ -41,6 +41,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >>   #include 
> >>   #include 
> >> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
> >> *dev)
> >>*/
> >>   static int device_def_domain_type(struct device *dev)
> >>   {
> >> +/* Do not allow identity domain when Secure Launch is configured */
> >> +if (slaunch_get_flags() & SL_FLAG_ACTIVE)
> >> +return IOMMU_DOMAIN_DMA;
> >
> > Is this specific to Intel? It seems like it could easily be done
> > commonly like the check for untrusted external devices.
>
> It is currently Intel only but that will change. I will look into what
> you suggest.
>
> >
> >> +
> >>   if (dev_is_pci(dev)) {
> >>   struct pci_dev *pdev = to_pci_dev(dev);
> >>   diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> >> index 808ab70d..d49b7dd 100644
> >> --- a/drivers/iommu/iommu.c
> >> +++ b/drivers/iommu/iommu.c
> >> @@ -23,6 +23,7 @@
> >>   #include 
> >>   #include 
> >>   #include 
> >> +#include 
> >>   #include 
> >> static struct kset *iommu_group_kset;
> >> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
> >>   {
> >>   if (cmd_line)
> >>   iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
> >> -iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> >> +
> >> +/* Do not allow identity domain when Secure Launch is configured */
> >> +if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
> >> +iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> >
> > Quietly ignoring the setting and possibly leaving iommu_def_domain_type
> > uninitialised (note that 0 is not actually a usable type) doesn't seem
> > great. AFAICS this probably warrants similar treatment to the
>
> Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
> though passthrough was requested. Or perhaps something more is needed here?
>
> > mem_encrypt_active() case - there doesn't seem a great deal of value in
> > trying to save users from themselves if they care about measured boot
> > yet explicitly pass options which may compromise measured boot. If you
> > really want to go down that route there's at least the sysfs interface
> > you'd need to nobble as well, not to mention the various ways of
> > completely disabling IOMMUs...
>
> Doing a secure launch with the kernel is not a general purpose user use
> case. A lot of work is done to secure the environment. Allowing
> passthrough mode would leave the secure launch kernel exposed to DMA. I
> think what we are trying to do here is what we intend though there may
> be a better way or perhaps it is incomplete as you suggest.
>

I don't really like all these special cases.  Generically, what you're
trying to do is (AFAICT) to get the kernel to run in a mode in which
it does its best not to trust attached devices.  Nothing about this is
specific to Secure Launch.  There are plenty of scenarios in which
this the case:

 - Virtual devices in a VM host outside the TCB, e.g. VDUSE, Xen
device domains (did I get the name right), whatever tricks QEMU has,
etc.
 - SRTM / DRTM technologies (including but not limited to Secure
Launch -- plain old Secure Boot can work like this too).
 - Secure guest technologies, including but not limited to TDX and SEV.
 - Any computer with a USB-C port or other external DMA-capable port.
 - Regular computers in which the admin wants to enable this mode for
whatever reason.

Can you folks all please agree on a coordinated way for a Linux kernel
to configure itself appropriately?  Or to be configured via initramfs,
boot option, or some other trusted source of configuration supplied at
boot time?  We don't need a whole bunch of if (TDX), if (SEV), if
(secure launch), if (I have a USB-C port with PCIe exposed), if
(running on Xen), and similar checks all over the place.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-21 Thread Christoph Hellwig
On Mon, Jun 21, 2021 at 10:59:20AM -0700, Stefano Stabellini wrote:
> Just as a clarification: I was referring to the zeroing of "mem" in
> swiotlb_late_init_with_tbl and swiotlb_init_with_tbl. While it looks
> like Tom and Christoph are talking about the zeroing of "tlb".

Indeed. 

> 
> The zeroing of "mem" is required as some fields of struct io_tlb_mem
> need to be initialized to zero. While the zeroing of "tlb" is not
> required except from the point of view of not leaking sensitive data as
> Christoph mentioned.
> 
> Either way, Claire's v14 patch 01/12 looks fine to me.

Agreed.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v13 01/12] swiotlb: Refactor swiotlb init functions

2021-06-21 Thread Stefano Stabellini
On Fri, 18 Jun 2021, Christoph Hellwig wrote:
> On Fri, Jun 18, 2021 at 09:09:17AM -0500, Tom Lendacky wrote:
> > > swiotlb_init_with_tbl uses memblock_alloc to allocate the io_tlb_mem
> > > and memblock_alloc[1] will do memset in memblock_alloc_try_nid[2], so
> > > swiotlb_init_with_tbl is also good.
> > > I'm happy to add the memset in swiotlb_init_io_tlb_mem if you think
> > > it's clearer and safer.
> > 
> > On x86, if the memset is done before set_memory_decrypted() and memory
> > encryption is active, then the memory will look like ciphertext afterwards
> > and not be zeroes. If zeroed memory is required, then a memset must be
> > done after the set_memory_decrypted() calls.
> 
> Which should be fine - we don't care that the memory is cleared to 0,
> just that it doesn't leak other data.  Maybe a comment would be useful,
> though,

Just as a clarification: I was referring to the zeroing of "mem" in
swiotlb_late_init_with_tbl and swiotlb_init_with_tbl. While it looks
like Tom and Christoph are talking about the zeroing of "tlb".

The zeroing of "mem" is required as some fields of struct io_tlb_mem
need to be initialized to zero. While the zeroing of "tlb" is not
required except from the point of view of not leaking sensitive data as
Christoph mentioned.

Either way, Claire's v14 patch 01/12 looks fine to me.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 12/12] iommu: Do not allow IOMMU passthrough with Secure Launch

2021-06-21 Thread Ross Philipson
On 6/18/21 2:32 PM, Robin Murphy wrote:
> On 2021-06-18 17:12, Ross Philipson wrote:
>> The IOMMU should always be set to default translated type after
>> the PMRs are disabled to protect the MLE from DMA.
>>
>> Signed-off-by: Ross Philipson 
>> ---
>>   drivers/iommu/intel/iommu.c | 5 +
>>   drivers/iommu/iommu.c   | 6 +-
>>   2 files changed, 10 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
>> index be35284..4f0256d 100644
>> --- a/drivers/iommu/intel/iommu.c
>> +++ b/drivers/iommu/intel/iommu.c
>> @@ -41,6 +41,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>   #include 
>>   #include 
>> @@ -2877,6 +2878,10 @@ static bool device_is_rmrr_locked(struct device
>> *dev)
>>    */
>>   static int device_def_domain_type(struct device *dev)
>>   {
>> +    /* Do not allow identity domain when Secure Launch is configured */
>> +    if (slaunch_get_flags() & SL_FLAG_ACTIVE)
>> +    return IOMMU_DOMAIN_DMA;
> 
> Is this specific to Intel? It seems like it could easily be done
> commonly like the check for untrusted external devices.

It is currently Intel only but that will change. I will look into what
you suggest.

> 
>> +
>>   if (dev_is_pci(dev)) {
>>   struct pci_dev *pdev = to_pci_dev(dev);
>>   diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
>> index 808ab70d..d49b7dd 100644
>> --- a/drivers/iommu/iommu.c
>> +++ b/drivers/iommu/iommu.c
>> @@ -23,6 +23,7 @@
>>   #include 
>>   #include 
>>   #include 
>> +#include 
>>   #include 
>>     static struct kset *iommu_group_kset;
>> @@ -2761,7 +2762,10 @@ void iommu_set_default_passthrough(bool cmd_line)
>>   {
>>   if (cmd_line)
>>   iommu_cmd_line |= IOMMU_CMD_LINE_DMA_API;
>> -    iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
>> +
>> +    /* Do not allow identity domain when Secure Launch is configured */
>> +    if (!(slaunch_get_flags() & SL_FLAG_ACTIVE))
>> +    iommu_def_domain_type = IOMMU_DOMAIN_IDENTITY;
> 
> Quietly ignoring the setting and possibly leaving iommu_def_domain_type
> uninitialised (note that 0 is not actually a usable type) doesn't seem
> great. AFAICS this probably warrants similar treatment to the

Ok so I guess it would be better to set it to IOMMU_DOMAIN_DMA event
though passthrough was requested. Or perhaps something more is needed here?

> mem_encrypt_active() case - there doesn't seem a great deal of value in
> trying to save users from themselves if they care about measured boot
> yet explicitly pass options which may compromise measured boot. If you
> really want to go down that route there's at least the sysfs interface
> you'd need to nobble as well, not to mention the various ways of
> completely disabling IOMMUs...

Doing a secure launch with the kernel is not a general purpose user use
case. A lot of work is done to secure the environment. Allowing
passthrough mode would leave the secure launch kernel exposed to DMA. I
think what we are trying to do here is what we intend though there may
be a better way or perhaps it is incomplete as you suggest.

> 
> It might be reasonable to make IOMMU_DEFAULT_PASSTHROUGH depend on
> !SECURE_LAUNCH for clarity though.

This came from a specific request to not make disabling IOMMU modes
build time dependent. This is because a secure launch enabled kernel can
also be booted as a general purpose kernel in cases where this is desired.

Thank you,
Ross

> 
> Robin.
> 
>>   }
>>     void iommu_set_default_translated(bool cmd_line)
>>

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/arm-smmu-v3: Remove some unneeded init in arm_smmu_cmdq_issue_cmdlist()

2021-06-21 Thread John Garry
Members of struct "llq" will be zero-inited, apart from member max_n_shift.
But we write llq.val straight after the init, so it was pointless to zero
init those other members. As such, separately init member max_n_shift
only.

In addition, struct "head" is initialised to "llq" only so that member
max_n_shift is set. But that member is never referenced for "head", so
remove any init there.

Removing these initializations is seen as a small performance optimisation,
as this code is (very) hot path.

Signed-off-by: John Garry 

diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c 
b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index 54b2f27b81d4..8a8ad49bb7fd 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -727,11 +727,11 @@ static int arm_smmu_cmdq_issue_cmdlist(struct 
arm_smmu_device *smmu,
unsigned long flags;
bool owner;
struct arm_smmu_cmdq *cmdq = >cmdq;
-   struct arm_smmu_ll_queue llq = {
-   .max_n_shift = cmdq->q.llq.max_n_shift,
-   }, head = llq;
+   struct arm_smmu_ll_queue llq, head;
int ret = 0;
 
+   llq.max_n_shift = cmdq->q.llq.max_n_shift;
+
/* 1. Allocate some space in the queue */
local_irq_save(flags);
llq.val = READ_ONCE(cmdq->q.llq.val);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list

2021-06-21 Thread Robin Murphy

On 2021-06-21 06:47, Sai Prakash Ranjan wrote:

Hi,

On 2021-06-19 03:39, Doug Anderson wrote:

Hi,

On Thu, Jun 17, 2021 at 7:51 PM Sai Prakash Ranjan
 wrote:


Currently for iommu_unmap() of large scatter-gather list with page size
elements, the majority of time is spent in flushing of partial walks in
__arm_lpae_unmap() which is a VA based TLB invalidation invalidating
page-by-page on iommus like arm-smmu-v2 (TLBIVA) which do not support
range based invalidations like on arm-smmu-v3.2.

For example: to unmap a 32MB scatter-gather list with page size elements
(8192 entries), there are 16->2MB buffer unmaps based on the pgsize (2MB
for 4K granule) and each of 2MB will further result in 512 TLBIVAs 
(2MB/4K)

resulting in a total of 8192 TLBIVAs (512*16) for 16->2MB causing a huge
overhead.

So instead use tlb_flush_all() callback (TLBIALL/TLBIASID) to invalidate
the entire context for partial walk flush on select few platforms where
cost of over-invalidation is less than unmap latency


It would probably be worth punching this description up a little bit.
Elsewhere you said in more detail why this over-invalidation is less
of a big deal for the Qualcomm SMMU. It's probably worth saying
something like that here, too. Like this bit paraphrased from your
other email:

On qcom impl, we have several performance improvements for TLB cache
invalidations in HW like wait-for-safe (for realtime clients such as
camera and display) and few others to allow for cache lookups/updates
when TLBI is in progress for the same context bank.



Sure will add this info as well in the next version.




using the newly
introduced quirk IO_PGTABLE_QUIRK_TLB_INV_ALL. We also do this for
non-strict mode given its all about over-invalidation saving time on
individual unmaps and non-deterministic generally.


As per usual I'm mostly clueless, but I don't quite understand why you
want this new behavior for non-strict mode. To me it almost seems like
the opposite? Specifically, non-strict mode is already outside the
critical path today and so there's no need to optimize it. I'm
probably not explaining myself clearly, but I guess i'm thinking:

a) today for strict, unmap is in the critical path and it's important
to get it out of there. Getting it out of the critical path is so
important that we're willing to over-invalidate to speed up the
critical path.

b) today for non-strict, unmap is not in the critical path.

So I would almost expect your patch to _disable_ your new feature for
non-strict mappings, not auto-enable your new feature for non-strict
mappings.

If I'm babbling, feel free to ignore. ;-) Looking back, I guess Robin
was the one that suggested the behavior you're implementing, so it's
more likely he's right than I am. ;-)



Thanks for taking a look. Non-strict mode is only for leaf entries and
dma domains and this optimization is for non-leaf entries and is applicable
for both, see __arm_lpae_unmap(). In other words, if you have 
iommu.strict=0
(non-strict mode) and try unmapping a large sg buffer as the problem 
described

in the commit text, you would still go via this path in unmap and see the
delay without this patch. So what Robin suggested is that, let's do this
unconditionally for all users with non-strict mode as opposed to only
restricting it to implementation specific in case of strict mode.


Right, unmapping tables works out as a bit of a compromise for 
non-strict mode - we don't use a freelist to defer the freeing of 
pagetable pages, so we rely on making non-leaf invalidations 
synchronously to knock out walk caches which may be pointing to the page 
beforte we free it. We might actually be able to get away without that 
for non-strict unmaps, since partial walks pointing at freed memory 
probably aren't too much more hazardous than the equivalent leaf TLB 
entries while the IOVA region is held in the flush queue, but it 
certainly does matter for maps when we're knocking out a (presumably 
empty) table entry to put down a new block whose IOVA will be 
immediately live.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string

2021-06-21 Thread Thierry Reding
On Mon, Jun 21, 2021 at 04:54:18PM +0100, Will Deacon wrote:
> On Mon, Jun 21, 2021 at 04:11:55PM +0200, Thierry Reding wrote:
> > On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote:
> > > On 18/06/2021 21:47, Rob Herring wrote:
> > > > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding 
> > > >  wrote:
> > > >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> > > >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > > >> index 9d27aa5111d4..1181b590db71 100644
> > > >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > > >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > > >> @@ -54,8 +54,14 @@ properties:
> > > >>- const: arm,mmu-500
> > > >>- description: NVIDIA SoCs that program two ARM MMU-500s 
> > > >> identically
> > > >>  items:
> > > >> +  - description: NVIDIA SoCs that require memory controller 
> > > >> interaction
> > > > 
> > > > This is not valid jsonschema:
> > > > 
> > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one
> > > > must be fixed:
> > > > None is not of type 'object', 'boolean'
> > > > None is not of type 'array'
> > > > from schema $id: http://json-schema.org/draft-07/schema#
> > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > > > must be fixed:
> > > > None is not of type 'object'
> > > > None is not of type 'array'
> > > > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml#
> > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > > > must be fixed:
> > > > None is not of type 'object'
> > > > None is not of type 'array'
> > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one
> > > > must be fixed:
> > > > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const':
> > > > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of
> > > > type 'object'
> > > > {'const': 'nvidia,tegra194-smmu'} is not of type 'string'
> > > > {'const': 'nvidia,tegra186-smmu'} is not of type 'string'
> > > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > > > 
> > > > 
> > > > This was not reviewed nor tested since the DT list was not Cc'ed.
> > > 
> > > Ugh, I see now weird empty item on a list... and not only DT list was
> > > skipped - Thierry did not Cc you either.
> > 
> > This seemed like a too trivial addition to waste Rob's time on, so I
> > didn't add him (or the DT list for that matter) on Cc. The ARM SMMU
> > maintainers had reviewed this, which seemed like it was enough for what
> > the DT bindings change was doing.
> 
> Hmm, I didn't review it. I find the yaml stuff unreadable so I usually
> wait for the DT folks to ack bindings changes before I queue them in the
> SMMU tree.

Alright... in the future I'll make sure to always Cc DT folks, even for
trivial stuff like this.

Thierry


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string

2021-06-21 Thread Will Deacon
On Mon, Jun 21, 2021 at 04:11:55PM +0200, Thierry Reding wrote:
> On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote:
> > On 18/06/2021 21:47, Rob Herring wrote:
> > > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding  
> > > wrote:
> > >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> > >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > >> index 9d27aa5111d4..1181b590db71 100644
> > >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> > >> @@ -54,8 +54,14 @@ properties:
> > >>- const: arm,mmu-500
> > >>- description: NVIDIA SoCs that program two ARM MMU-500s 
> > >> identically
> > >>  items:
> > >> +  - description: NVIDIA SoCs that require memory controller 
> > >> interaction
> > > 
> > > This is not valid jsonschema:
> > > 
> > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one
> > > must be fixed:
> > > None is not of type 'object', 'boolean'
> > > None is not of type 'array'
> > > from schema $id: http://json-schema.org/draft-07/schema#
> > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > > must be fixed:
> > > None is not of type 'object'
> > > None is not of type 'array'
> > > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml#
> > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > > must be fixed:
> > > None is not of type 'object'
> > > None is not of type 'array'
> > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one
> > > must be fixed:
> > > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const':
> > > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of
> > > type 'object'
> > > {'const': 'nvidia,tegra194-smmu'} is not of type 'string'
> > > {'const': 'nvidia,tegra186-smmu'} is not of type 'string'
> > > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > > 
> > > 
> > > This was not reviewed nor tested since the DT list was not Cc'ed.
> > 
> > Ugh, I see now weird empty item on a list... and not only DT list was
> > skipped - Thierry did not Cc you either.
> 
> This seemed like a too trivial addition to waste Rob's time on, so I
> didn't add him (or the DT list for that matter) on Cc. The ARM SMMU
> maintainers had reviewed this, which seemed like it was enough for what
> the DT bindings change was doing.

Hmm, I didn't review it. I find the yaml stuff unreadable so I usually
wait for the DT folks to ack bindings changes before I queue them in the
SMMU tree.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush

2021-06-21 Thread Robin Murphy

On 2021-06-18 03:51, Sai Prakash Ranjan wrote:

Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
with tlb_flush_all() callback in partial walk flush to improve unmap
performance on select few platforms where the cost of over-invalidation
is less than the unmap latency.


I still think this doesn't belong anywhere near io-pgtable at all. It's 
a driver-internal decision how exactly it implements a non-leaf 
invalidation, and that may be more complex than a predetermined boolean 
decision. For example, I've just realised for SMMUv3 we can't invalidate 
multiple levels of table at once with a range command, since if we 
assume the whole thing is mapped at worst-case page granularity we may 
fail to invalidate any parts which are mapped as intermediate-level 
blocks. If invalidating a 1GB region (with 4KB granule) means having to 
fall back to 256K non-range commands, we may not want to invalidate by 
VA then, even though doing so for a 2MB region is still optimal.


It's also quite feasible that drivers might want to do this for leaf 
invalidations too - if you don't like issuing 512 commands to invalidate 
2MB, do you like issuing 511 commands to invalidate 2044KB? - and at 
that point the logic really has to be in the driver anyway.


Robin.


Signed-off-by: Sai Prakash Ranjan 
---
  drivers/iommu/io-pgtable-arm.c | 3 ++-
  include/linux/io-pgtable.h | 5 +
  2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
index 87def58e79b5..5d362f2214bd 100644
--- a/drivers/iommu/io-pgtable-arm.c
+++ b/drivers/iommu/io-pgtable-arm.c
@@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, 
void *cookie)
if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
IO_PGTABLE_QUIRK_NON_STRICT |
IO_PGTABLE_QUIRK_ARM_TTBR1 |
-   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
+   IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
+   IO_PGTABLE_QUIRK_TLB_INV_ALL))
return NULL;
  
  	data = arm_lpae_alloc_pgtable(cfg);

diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
index 4d40dfa75b55..45441592a0e6 100644
--- a/include/linux/io-pgtable.h
+++ b/include/linux/io-pgtable.h
@@ -82,6 +82,10 @@ struct io_pgtable_cfg {
 *
 * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
 *  attributes set in the TCR for a non-coherent page-table walker.
+*
+* IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
+*  entire context for partial walk flush to increase unmap
+*  performance on select few platforms.
 */
#define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
#define IO_PGTABLE_QUIRK_NO_PERMS   BIT(1)
@@ -89,6 +93,7 @@ struct io_pgtable_cfg {
#define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
#define IO_PGTABLE_QUIRK_ARM_TTBR1  BIT(5)
#define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
+   #define IO_PGTABLE_QUIRK_TLB_INV_ALLBIT(7)
unsigned long   quirks;
unsigned long   pgsize_bitmap;
unsigned intias;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread Lu Baolu

Hi Robin,

On 2021/6/21 19:59, Robin Murphy wrote:

On 2021-06-21 11:34, John Garry wrote:

On 21/06/2021 11:00, Lu Baolu wrote:

void iommu_set_dma_strict(bool force)
{
  if (force == true)
 iommu_dma_strict = true;
 else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
 iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Yes. We need to distinguish the "must" and "nice-to-have" cases of
setting strict mode.



Then I am not sure what you want to do with the accompanying print 
for c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know 
the current mode (so we know whether to print).


Note that this change would mean that the current series would 
require non-trivial rework, which would be unfortunate so late in 
the cycle.


This patch series looks good to me and I have added by reviewed-by.
Probably we could make another patch series to improve it so that the
kernel optimization should not override the user setting.


On a personal level I would be happy with that approach, but I think 
it's better to not start changing things right away in a follow-up 
series.


So how about we add this patch (which replaces 6/6 "iommu: Remove mode 
argument from iommu_set_dma_strict()")?


Robin, any opinion?


For me it boils down to whether there are any realistic workloads where 
non-strict mode *would* still perform better under virtualisation. The 


At present, we see that strict mode has better performance in the
virtualization environment because it will make the shadow page table
management more efficient. When the hardware supports nested
translation, we may have to re-evaluate this since there's no need for
a shadowing page table anymore.

only reason for the user to explicitly pass "iommu.strict=0" is because 
they expect it to increase unmap performance; if it's only ever going to 
lead to an unexpected performance loss, I don't see any value in 
overriding the kernel's decision purely for the sake of subservience.


If there *are* certain valid cases for allowing it for people who really 
know what they're doing, then we should arguably also log a counterpart 
message to say "we're honouring your override but beware it may have the 
opposite effect to what you expect" for the benefit of other users who 
assume it's a generic go-faster knob. At that point it starts getting 
non-trivial enough that I'd want to know for sure it's worthwhile.


The other reason this might be better to revisit later is that an AMD 
equivalent is still in flight[1], and there might be more that can 
eventually be factored out. I think both series are pretty much good to 
merge for 5.14, but time's already tight to sort out the conflicts which 
exist as-is, without making them any worse.


Agreed. We could revisit it later.

Best regards,
baolu


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string

2021-06-21 Thread Thierry Reding
On Mon, Jun 21, 2021 at 08:46:54AM +0200, Krzysztof Kozlowski wrote:
> On 18/06/2021 21:47, Rob Herring wrote:
> > On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding  
> > wrote:
> >>
> >> From: Thierry Reding 
> >>
> >> The ARM SMMU instantiations found on Tegra186 and later need inter-
> >> operation with the memory controller in order to correctly program
> >> stream ID overrides.
> >>
> >> Furthermore, on Tegra194 multiple instances of the SMMU can gang up
> >> to achieve higher throughput. In order to do this, they have to be
> >> programmed identically so that the memory controller can interleave
> >> memory accesses between them.
> >>
> >> Add the Tegra186 compatible string to make sure the interoperation
> >> with the memory controller can be enabled on that SoC generation.
> >>
> >> Signed-off-by: Thierry Reding 
> >> ---
> >>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 11 +--
> >>  1 file changed, 9 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
> >> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> >> index 9d27aa5111d4..1181b590db71 100644
> >> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> >> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
> >> @@ -54,8 +54,14 @@ properties:
> >>- const: arm,mmu-500
> >>- description: NVIDIA SoCs that program two ARM MMU-500s identically
> >>  items:
> >> +  - description: NVIDIA SoCs that require memory controller 
> >> interaction
> > 
> > This is not valid jsonschema:
> > 
> > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one
> > must be fixed:
> > None is not of type 'object', 'boolean'
> > None is not of type 'array'
> > from schema $id: http://json-schema.org/draft-07/schema#
> > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > must be fixed:
> > None is not of type 'object'
> > None is not of type 'array'
> > from schema $id: http://devicetree.org/meta-schemas/keywords.yaml#
> > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> > must be fixed:
> > None is not of type 'object'
> > None is not of type 'array'
> > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> > properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one
> > must be fixed:
> > [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const':
> > 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of
> > type 'object'
> > {'const': 'nvidia,tegra194-smmu'} is not of type 'string'
> > {'const': 'nvidia,tegra186-smmu'} is not of type 'string'
> > from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> > 
> > 
> > This was not reviewed nor tested since the DT list was not Cc'ed.
> 
> Ugh, I see now weird empty item on a list... and not only DT list was
> skipped - Thierry did not Cc you either.

This seemed like a too trivial addition to waste Rob's time on, so I
didn't add him (or the DT list for that matter) on Cc. The ARM SMMU
maintainers had reviewed this, which seemed like it was enough for what
the DT bindings change was doing.

In any case, I clearly should've checked the DT binding check output
more carefully. It's rather messy for Tegra because there's quite a few
that we haven't converted yet. I'll have to resume my effort to convert
the remaining ones and fixup the device trees so that we can actually
run the DT binding and DTB validation checks more usefully.

> My bad, I did not check that patch thoroughly before applying.
> 
> Thierry, please Cc folks mentioned by get_maintainer.pl. Either sent a
> fix or a revert, if fix needs more time.

I've sent out a follow-up fix that removes the two bogus lines. It looks
like that was the result of a bad conflict resolution on my part.

> Additionally, why the patch changes reg to "minItems: 1" for
> nvidia,tegra194-smmu?

This is because originally the Tegra194 SMMU nodes were supposed to only
represent a "dual" instance. However, on Tegra194 there are three SMMU
instances in total, with the third instance (dedicated for isochronous
traffic) being completely separate and having only a single range of
registers.

That third instance was previously supposed to be covered by the normal
"arm,mmu-500" compatible string, but given that we really need that
interoperation between SMMU and memory controller for SID override
programming, we need the Tegra-specific compatible for the ISO instance
of the SMMU as well. And since that uses only one set of registers,
minItems had to become 1.

Thierry


signature.asc
Description: PGP signature

[PATCH] dt-bindings: arm-smmu: Fix json-schema syntax

2021-06-21 Thread Thierry Reding
From: Thierry Reding 

Commit 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible
string") introduced a jsonschema syntax error as a result of a rebase
gone wrong. Fix it.

Fixes: 4287861dca9d ("dt-bindings: arm-smmu: Add Tegra186 compatible string")
Reported-by: Rob Herring 
Signed-off-by: Thierry Reding 
---
 Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
index 1181b590db71..03f2b2d4db30 100644
--- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
+++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
@@ -52,16 +52,14 @@ properties:
 items:
   - const: marvell,ap806-smmu-500
   - const: arm,mmu-500
-  - description: NVIDIA SoCs that program two ARM MMU-500s identically
-items:
   - description: NVIDIA SoCs that require memory controller interaction
   and may program multiple ARM MMU-500s identically with the memory
   controller interleaving translations between multiple instances
   for improved performance.
 items:
   - enum:
-  - const: nvidia,tegra194-smmu
-  - const: nvidia,tegra186-smmu
+  - nvidia,tegra194-smmu
+  - nvidia,tegra186-smmu
   - const: nvidia,smmu-500
   - items:
   - const: arm,mmu-500
-- 
2.32.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: swiotlb/caamjr regression (Was: [GIT PULL] (swiotlb) stable/for-linus-5.12)

2021-06-21 Thread Konrad Rzeszutek Wilk
On Mon, Jun 21, 2021 at 01:14:48PM +0900, 'Dominique MARTINET' wrote:
> Chanho Park wrote on Mon, Jun 21, 2021 at 11:55:22AM +0900:
> > Sure. No problem. But, the patch was already stacked on Konrad's tree
> > and linux-next as well.
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/konrad/swiotlb.git/commit/?h=devel/for-linus-5.14=33d1641f38f0c327bc3e5c21de585c77a6512bc6
> >  
> 
> That patch is slightly different, it's a rewrite Konrad did that mixes
> in Linus' suggestion[1], which breaks things for the NVMe usecase
> Jianxiong Gao has.
> 
> [1] offset = (tlb_addr - mem->start) & (IO_TLB_SIZE - 1)
> 
> 
> Konrad is aware so I think it shouldn't be submitted :)

The beaty of 'devel' and 'linux-next' is that they can be reshuffled and
mangled. I pushed them original patch from Bumyong there and will let
it sit for a day and then create a stable branch and give it to Linus.

Then I need to expand the test-regression bucket so that this does not
happen again. Dominique, how easy would it be to purchase one of those
devices?

I was originally thinking to create a crypto device in QEMU to simulate
this but that may take longer to write than just getting the real thing.

Or I could create some fake devices with weird offsets and write a driver
for it to exercise this.. like this one I had done some time ago that
needs some brushing off.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/7] iommu/amd: Do not use flush-queue when NpCache is on

2021-06-21 Thread John Garry

On 16/06/2021 11:04, Nadav Amit wrote:

-   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
+   if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) {
+   if (!amd_iommu_unmap_flush)
+   pr_warn("IOMMU batching is disabled due to 
virtualization");


This is missing the '\n', like the Intel driver.

And, JFYI, we are also downgrading that same print to info level (in the 
Intel driver).


Thanks,
John


+
amd_iommu_np_cache = true;
+   amd_iommu_unmap_flush = true;


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread John Garry

On 21/06/2021 12:59, Robin Murphy wrote:

+ Nadav


On a personal level I would be happy with that approach, but I think
it's better to not start changing things right away in a follow-up series.

So how about we add this patch (which replaces 6/6 "iommu: Remove mode
argument from iommu_set_dma_strict()")?

Robin, any opinion?

For me it boils down to whether there are any realistic workloads where
non-strict mode*would*  still perform better under virtualisation. The
only reason for the user to explicitly pass "iommu.strict=0" is because
they expect it to increase unmap performance; if it's only ever going to
lead to an unexpected performance loss, I don't see any value in
overriding the kernel's decision purely for the sake of subservience.

If there*are*  certain valid cases for allowing it for people who really
know what they're doing, then we should arguably also log a counterpart
message to say "we're honouring your override but beware it may have the
opposite effect to what you expect" for the benefit of other users who
assume it's a generic go-faster knob. At that point it starts getting
non-trivial enough that I'd want to know for sure it's worthwhile.

The other reason this might be better to revisit later is that an AMD
equivalent is still in flight[1], and there might be more that can
eventually be factored out. I think both series are pretty much good to
merge for 5.14, but time's already tight to sort out the conflicts which
exist as-is, without making them any worse.


ok, fine. Can revisit.

As for getting these merged, I'll dry-run merging both of those series 
to see the conflicts. It doesn't look too problematic from a glance.


Cheers,
John



Robin.

[1]
https://lore.kernel.org/linux-iommu/20210616100500.174507-3-na...@vmware.com/


--->8-

[PATCH] iommu/vt-d: Make "iommu.strict" override batching due to
   virtualization

As a change in policy, make iommu.strict cmdline argument override
whether we disable batching due to virtualization.

The API of iommu_set_dma_strict() is changed to accept a "force"
argument, which means that we always set iommu_dma_strict true,
regardless of whether we already set via cmdline. Also return a boolean,
to tell whether iommu_dma_strict was set or not.

Note that in all pre-existing callsites of iommu_set_dma_strict(),
argument strict was true, so this argument is dropped.

Signed-off-by: John Garry

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0f9d8116..e8d65239b359 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void)
    * is likely to be much lower than the overhead of synchronizing
    * the virtual and physical IOMMU page-tables.
    */
-    if (cap_caching_mode(iommu->cap)) {
+    if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false))
   pr_info_once("IOMMU batching disallowed due to
virtualization\n");
-    iommu_set_dma_strict(true);
-    }
   iommu_device_sysfs_add(>iommu, NULL,
      intel_iommu_groups,
      "%s", iommu->name);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..1434bee64af3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str)
   }
   early_param("iommu.strict", iommu_dma_setup);

-void iommu_set_dma_strict(bool strict)
+/* Return true if we set iommu_dma_strict */
+bool iommu_set_dma_strict(bool force)
   {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) {
+    iommu_dma_strict = true;
+    return true;
+    }
+    return false;
   }

   bool iommu_get_dma_strict(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..f17b20234296 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain);
   int iommu_set_pgtable_quirks(struct iommu_domain *domain,
   unsigned long quirks);

-void iommu_set_dma_strict(bool val);
+bool iommu_set_dma_strict(bool force);
   bool iommu_get_dma_strict(struct iommu_domain *domain);

   extern int report_iommu_fault(struct iommu_domain *domain, struct
device *dev,


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread Robin Murphy

On 2021-06-21 11:34, John Garry wrote:

On 21/06/2021 11:00, Lu Baolu wrote:

void iommu_set_dma_strict(bool force)
{
  if (force == true)
 iommu_dma_strict = true;
 else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
 iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Yes. We need to distinguish the "must" and "nice-to-have" cases of
setting strict mode.



Then I am not sure what you want to do with the accompanying print 
for c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know 
the current mode (so we know whether to print).


Note that this change would mean that the current series would 
require non-trivial rework, which would be unfortunate so late in the 
cycle.


This patch series looks good to me and I have added by reviewed-by.
Probably we could make another patch series to improve it so that the
kernel optimization should not override the user setting.


On a personal level I would be happy with that approach, but I think 
it's better to not start changing things right away in a follow-up series.


So how about we add this patch (which replaces 6/6 "iommu: Remove mode 
argument from iommu_set_dma_strict()")?


Robin, any opinion?


For me it boils down to whether there are any realistic workloads where 
non-strict mode *would* still perform better under virtualisation. The 
only reason for the user to explicitly pass "iommu.strict=0" is because 
they expect it to increase unmap performance; if it's only ever going to 
lead to an unexpected performance loss, I don't see any value in 
overriding the kernel's decision purely for the sake of subservience.


If there *are* certain valid cases for allowing it for people who really 
know what they're doing, then we should arguably also log a counterpart 
message to say "we're honouring your override but beware it may have the 
opposite effect to what you expect" for the benefit of other users who 
assume it's a generic go-faster knob. At that point it starts getting 
non-trivial enough that I'd want to know for sure it's worthwhile.


The other reason this might be better to revisit later is that an AMD 
equivalent is still in flight[1], and there might be more that can 
eventually be factored out. I think both series are pretty much good to 
merge for 5.14, but time's already tight to sort out the conflicts which 
exist as-is, without making them any worse.


Robin.

[1] 
https://lore.kernel.org/linux-iommu/20210616100500.174507-3-na...@vmware.com/




--->8-

[PATCH] iommu/vt-d: Make "iommu.strict" override batching due to
  virtualization

As a change in policy, make iommu.strict cmdline argument override 
whether we disable batching due to virtualization.


The API of iommu_set_dma_strict() is changed to accept a "force" 
argument, which means that we always set iommu_dma_strict true, 
regardless of whether we already set via cmdline. Also return a boolean, 
to tell whether iommu_dma_strict was set or not.


Note that in all pre-existing callsites of iommu_set_dma_strict(), 
argument strict was true, so this argument is dropped.


Signed-off-by: John Garry 

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0f9d8116..e8d65239b359 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void)
   * is likely to be much lower than the overhead of synchronizing
   * the virtual and physical IOMMU page-tables.
   */
-    if (cap_caching_mode(iommu->cap)) {
+    if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false))
  pr_info_once("IOMMU batching disallowed due to 
virtualization\n");

-    iommu_set_dma_strict(true);
-    }
  iommu_device_sysfs_add(>iommu, NULL,
     intel_iommu_groups,
     "%s", iommu->name);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..1434bee64af3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);

-void iommu_set_dma_strict(bool strict)
+/* Return true if we set iommu_dma_strict */
+bool iommu_set_dma_strict(bool force)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) {
+    iommu_dma_strict = true;
+    return true;
+    }
+    return false;
  }

  bool iommu_get_dma_strict(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..f17b20234296 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 

Re: Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Yongji Xie
On Mon, Jun 21, 2021 at 5:14 PM Jason Wang  wrote:
>
>
> 在 2021/6/15 下午10:13, Xie Yongji 写道:
> > This VDUSE driver enables implementing vDPA devices in userspace.
> > The vDPA device's control path is handled in kernel and the data
> > path is handled in userspace.
> >
> > A message mechnism is used by VDUSE driver to forward some control
> > messages such as starting/stopping datapath to userspace. Userspace
> > can use read()/write() to receive/reply those control messages.
> >
> > And some ioctls are introduced to help userspace to implement the
> > data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
> > descriptors referring to vDPA device's iova regions. Then userspace
> > can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
> > and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
> > and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
> > ioctls can be used to inject interrupt and setup the kickfd for
> > virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
> > configuration space and inject a config interrupt.
> >
> > Signed-off-by: Xie Yongji 
> > ---
> >   Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
> >   drivers/vdpa/Kconfig   |   10 +
> >   drivers/vdpa/Makefile  |1 +
> >   drivers/vdpa/vdpa_user/Makefile|5 +
> >   drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
> > 
> >   include/uapi/linux/vduse.h |  143 ++
> >   6 files changed, 1613 insertions(+)
> >   create mode 100644 drivers/vdpa/vdpa_user/Makefile
> >   create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
> >   create mode 100644 include/uapi/linux/vduse.h
> >
> > diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
> > b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > index 9bfc2b510c64..acd95e9dcfe7 100644
> > --- a/Documentation/userspace-api/ioctl/ioctl-number.rst
> > +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
> > @@ -300,6 +300,7 @@ Code  Seq#Include File  
> >  Comments
> >   'z'   10-4F  drivers/s390/crypto/zcrypt_api.h
> > conflict!
> >   '|'   00-7F  linux/media.h
> >   0x80  00-1F  linux/fb.h
> > +0x81  00-1F  linux/vduse.h
> >   0x89  00-06  arch/x86/include/asm/sockios.h
> >   0x89  0B-DF  linux/sockios.h
> >   0x89  E0-EF  linux/sockios.h 
> > SIOCPROTOPRIVATE range
> > diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
> > index a503c1b2bfd9..6e23bce6433a 100644
> > --- a/drivers/vdpa/Kconfig
> > +++ b/drivers/vdpa/Kconfig
> > @@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
> > vDPA block device simulator which terminates IO request in a
> > memory buffer.
> >
> > +config VDPA_USER
> > + tristate "VDUSE (vDPA Device in Userspace) support"
> > + depends on EVENTFD && MMU && HAS_DMA
> > + select DMA_OPS
> > + select VHOST_IOTLB
> > + select IOMMU_IOVA
> > + help
> > +   With VDUSE it is possible to emulate a vDPA Device
> > +   in a userspace program.
> > +
> >   config IFCVF
> >   tristate "Intel IFC VF vDPA driver"
> >   depends on PCI_MSI
> > diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
> > index 67fe7f3d6943..f02ebed33f19 100644
> > --- a/drivers/vdpa/Makefile
> > +++ b/drivers/vdpa/Makefile
> > @@ -1,6 +1,7 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >   obj-$(CONFIG_VDPA) += vdpa.o
> >   obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
> > +obj-$(CONFIG_VDPA_USER) += vdpa_user/
> >   obj-$(CONFIG_IFCVF)+= ifcvf/
> >   obj-$(CONFIG_MLX5_VDPA) += mlx5/
> >   obj-$(CONFIG_VP_VDPA)+= virtio_pci/
> > diff --git a/drivers/vdpa/vdpa_user/Makefile 
> > b/drivers/vdpa/vdpa_user/Makefile
> > new file mode 100644
> > index ..260e0b26af99
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/Makefile
> > @@ -0,0 +1,5 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +vduse-y := vduse_dev.o iova_domain.o
> > +
> > +obj-$(CONFIG_VDPA_USER) += vduse.o
> > diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
> > b/drivers/vdpa/vdpa_user/vduse_dev.c
> > new file mode 100644
> > index ..5271cbd15e28
> > --- /dev/null
> > +++ b/drivers/vdpa/vdpa_user/vduse_dev.c
> > @@ -0,0 +1,1453 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * VDUSE: vDPA Device in Userspace
> > + *
> > + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All 
> > rights reserved.
> > + *
> > + * Author: Xie Yongji 
> > + *
> > + */
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include "iova_domain.h"
> > +
> > +#define DRV_AUTHOR   

Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread John Garry

On 21/06/2021 11:00, Lu Baolu wrote:

void iommu_set_dma_strict(bool force)
{
  if (force == true)
 iommu_dma_strict = true;
 else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
 iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Yes. We need to distinguish the "must" and "nice-to-have" cases of
setting strict mode.



Then I am not sure what you want to do with the accompanying print for 
c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know the 
current mode (so we know whether to print).


Note that this change would mean that the current series would require 
non-trivial rework, which would be unfortunate so late in the cycle.


This patch series looks good to me and I have added by reviewed-by.
Probably we could make another patch series to improve it so that the
kernel optimization should not override the user setting.


On a personal level I would be happy with that approach, but I think 
it's better to not start changing things right away in a follow-up series.


So how about we add this patch (which replaces 6/6 "iommu: Remove mode 
argument from iommu_set_dma_strict()")?


Robin, any opinion?

--->8-

[PATCH] iommu/vt-d: Make "iommu.strict" override batching due to
 virtualization

As a change in policy, make iommu.strict cmdline argument override 
whether we disable batching due to virtualization.


The API of iommu_set_dma_strict() is changed to accept a "force" 
argument, which means that we always set iommu_dma_strict true, 
regardless of whether we already set via cmdline. Also return a boolean, 
to tell whether iommu_dma_strict was set or not.


Note that in all pre-existing callsites of iommu_set_dma_strict(), 
argument strict was true, so this argument is dropped.


Signed-off-by: John Garry 

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0f9d8116..e8d65239b359 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4380,10 +4380,8 @@ int __init intel_iommu_init(void)
 * is likely to be much lower than the overhead of synchronizing
 * the virtual and physical IOMMU page-tables.
 */
-   if (cap_caching_mode(iommu->cap)) {
+   if (cap_caching_mode(iommu->cap) && iommu_set_dma_strict(false))
pr_info_once("IOMMU batching disallowed due to 
virtualization\n");
-   iommu_set_dma_strict(true);
-   }
iommu_device_sysfs_add(>iommu, NULL,
   intel_iommu_groups,
   "%s", iommu->name);
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..1434bee64af3 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,14 @@ static int __init iommu_dma_setup(char *str)
 }
 early_param("iommu.strict", iommu_dma_setup);

-void iommu_set_dma_strict(bool strict)
+/* Return true if we set iommu_dma_strict */
+bool iommu_set_dma_strict(bool force)
 {
-   if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-   iommu_dma_strict = strict;
+   if (force || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) {
+   iommu_dma_strict = true;
+   return true;
+   }
+   return false;
 }

 bool iommu_get_dma_strict(struct iommu_domain *domain)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index 32d448050bf7..f17b20234296 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -476,7 +476,7 @@ int iommu_enable_nesting(struct iommu_domain *domain);
 int iommu_set_pgtable_quirks(struct iommu_domain *domain,
unsigned long quirks);

-void iommu_set_dma_strict(bool val);
+bool iommu_set_dma_strict(bool force);
 bool iommu_get_dma_strict(struct iommu_domain *domain);

 extern int report_iommu_fault(struct iommu_domain *domain, struct 
device *dev,

--
2.26.2
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread Lu Baolu

On 2021/6/21 16:12, John Garry wrote:

On 21/06/2021 06:17, Lu Baolu wrote:

On 2021/6/18 19:34, John Garry wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;
  }




Hi baolu,


Hi John,




Sorry for this late comment.
 > Normally the cache invalidation policy should come from the user. We
have pre-build kernel option and also a kernel boot command iommu.strict
to override it. These seem reasonable.

We also have a helper (iommu_set_dma_strict()) so that the vendor iommu
driver could squeeze in and change the previous settings mostly due to:

a) vendor iommu driver specific kernel boot command. (We are about to
    deprecate those.)

b) quirky hardware.

c) kernel optimization (e.x. strict mode in VM environment).

a) and b) are mandatory, while c) is optional. In any instance should c)
override the flush mode specified by the user. Hence, probably we should
also have another helper like:

void iommu_set_dma_strict_optional()
{
 if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
 iommu_dma_strict = true;
}

Any thoughts?


What you are suggesting is a change in policy from mainline code. 
Currently for c) we always set strict enabled, regardless of any user 
cmdline input. But now you are saying that you want iommu.strict to 
override in particular scenario, right?


In that case I would think it's better to rework the current API, like 
adding an option to "force" strict mode:


void iommu_set_dma_strict(bool force)
{
  if (force == true)
     iommu_dma_strict = true;
 else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
     iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Yes. We need to distinguish the "must" and "nice-to-have" cases of
setting strict mode.



Then I am not sure what you want to do with the accompanying print for 
c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know the 
current mode (so we know whether to print).


Note that this change would mean that the current series would require 
non-trivial rework, which would be unfortunate so late in the cycle.


This patch series looks good to me and I have added by reviewed-by.
Probably we could make another patch series to improve it so that the
kernel optimization should not override the user setting.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 09/10] vduse: Introduce VDUSE - vDPA Device in Userspace

2021-06-21 Thread Jason Wang


在 2021/6/15 下午10:13, Xie Yongji 写道:

This VDUSE driver enables implementing vDPA devices in userspace.
The vDPA device's control path is handled in kernel and the data
path is handled in userspace.

A message mechnism is used by VDUSE driver to forward some control
messages such as starting/stopping datapath to userspace. Userspace
can use read()/write() to receive/reply those control messages.

And some ioctls are introduced to help userspace to implement the
data path. VDUSE_IOTLB_GET_FD ioctl can be used to get the file
descriptors referring to vDPA device's iova regions. Then userspace
can use mmap() to access those iova regions. VDUSE_DEV_GET_FEATURES
and VDUSE_VQ_GET_INFO ioctls are used to get the negotiated features
and metadata of virtqueues. VDUSE_INJECT_VQ_IRQ and VDUSE_VQ_SETUP_KICKFD
ioctls can be used to inject interrupt and setup the kickfd for
virtqueues. VDUSE_DEV_UPDATE_CONFIG ioctl is used to update the
configuration space and inject a config interrupt.

Signed-off-by: Xie Yongji 
---
  Documentation/userspace-api/ioctl/ioctl-number.rst |1 +
  drivers/vdpa/Kconfig   |   10 +
  drivers/vdpa/Makefile  |1 +
  drivers/vdpa/vdpa_user/Makefile|5 +
  drivers/vdpa/vdpa_user/vduse_dev.c | 1453 
  include/uapi/linux/vduse.h |  143 ++
  6 files changed, 1613 insertions(+)
  create mode 100644 drivers/vdpa/vdpa_user/Makefile
  create mode 100644 drivers/vdpa/vdpa_user/vduse_dev.c
  create mode 100644 include/uapi/linux/vduse.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst 
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 9bfc2b510c64..acd95e9dcfe7 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -300,6 +300,7 @@ Code  Seq#Include File  
 Comments
  'z'   10-4F  drivers/s390/crypto/zcrypt_api.hconflict!
  '|'   00-7F  linux/media.h
  0x80  00-1F  linux/fb.h
+0x81  00-1F  linux/vduse.h
  0x89  00-06  arch/x86/include/asm/sockios.h
  0x89  0B-DF  linux/sockios.h
  0x89  E0-EF  linux/sockios.h 
SIOCPROTOPRIVATE range
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig
index a503c1b2bfd9..6e23bce6433a 100644
--- a/drivers/vdpa/Kconfig
+++ b/drivers/vdpa/Kconfig
@@ -33,6 +33,16 @@ config VDPA_SIM_BLOCK
  vDPA block device simulator which terminates IO request in a
  memory buffer.
  
+config VDPA_USER

+   tristate "VDUSE (vDPA Device in Userspace) support"
+   depends on EVENTFD && MMU && HAS_DMA
+   select DMA_OPS
+   select VHOST_IOTLB
+   select IOMMU_IOVA
+   help
+ With VDUSE it is possible to emulate a vDPA Device
+ in a userspace program.
+
  config IFCVF
tristate "Intel IFC VF vDPA driver"
depends on PCI_MSI
diff --git a/drivers/vdpa/Makefile b/drivers/vdpa/Makefile
index 67fe7f3d6943..f02ebed33f19 100644
--- a/drivers/vdpa/Makefile
+++ b/drivers/vdpa/Makefile
@@ -1,6 +1,7 @@
  # SPDX-License-Identifier: GPL-2.0
  obj-$(CONFIG_VDPA) += vdpa.o
  obj-$(CONFIG_VDPA_SIM) += vdpa_sim/
+obj-$(CONFIG_VDPA_USER) += vdpa_user/
  obj-$(CONFIG_IFCVF)+= ifcvf/
  obj-$(CONFIG_MLX5_VDPA) += mlx5/
  obj-$(CONFIG_VP_VDPA)+= virtio_pci/
diff --git a/drivers/vdpa/vdpa_user/Makefile b/drivers/vdpa/vdpa_user/Makefile
new file mode 100644
index ..260e0b26af99
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/Makefile
@@ -0,0 +1,5 @@
+# SPDX-License-Identifier: GPL-2.0
+
+vduse-y := vduse_dev.o iova_domain.o
+
+obj-$(CONFIG_VDPA_USER) += vduse.o
diff --git a/drivers/vdpa/vdpa_user/vduse_dev.c 
b/drivers/vdpa/vdpa_user/vduse_dev.c
new file mode 100644
index ..5271cbd15e28
--- /dev/null
+++ b/drivers/vdpa/vdpa_user/vduse_dev.c
@@ -0,0 +1,1453 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * VDUSE: vDPA Device in Userspace
+ *
+ * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights 
reserved.
+ *
+ * Author: Xie Yongji 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "iova_domain.h"
+
+#define DRV_AUTHOR   "Yongji Xie "
+#define DRV_DESC "vDPA Device in Userspace"
+#define DRV_LICENSE  "GPL v2"
+
+#define VDUSE_DEV_MAX (1U << MINORBITS)
+#define VDUSE_MAX_BOUNCE_SIZE (64 * 1024 * 1024)
+#define VDUSE_IOVA_SIZE (128 * 1024 * 1024)
+#define VDUSE_REQUEST_TIMEOUT 30
+
+struct vduse_virtqueue {
+   u16 index;
+   u32 num;
+   u32 avail_idx;
+   u64 desc_addr;
+   u64 driver_addr;
+   u64 device_addr;
+   bool ready;
+   bool kicked;
+   spinlock_t kick_lock;
+   spinlock_t irq_lock;
+   struct eventfd_ctx *kickfd;
+  

Re: [PATCH v14 6/6] iommu: Remove mode argument from iommu_set_dma_strict()

2021-06-21 Thread John Garry

On 21/06/2021 06:17, Lu Baolu wrote:

On 2021/6/18 19:34, John Garry wrote:

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 60b1ec42e73b..ff221d3ddcbc 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -349,10 +349,9 @@ static int __init iommu_dma_setup(char *str)
  }
  early_param("iommu.strict", iommu_dma_setup);
-void iommu_set_dma_strict(bool strict)
+void iommu_set_dma_strict(void)
  {
-    if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
-    iommu_dma_strict = strict;
+    iommu_dma_strict = true;
  }




Hi baolu,


Sorry for this late comment.
 > Normally the cache invalidation policy should come from the user. We
have pre-build kernel option and also a kernel boot command iommu.strict
to override it. These seem reasonable.

We also have a helper (iommu_set_dma_strict()) so that the vendor iommu
driver could squeeze in and change the previous settings mostly due to:

a) vendor iommu driver specific kernel boot command. (We are about to
    deprecate those.)

b) quirky hardware.

c) kernel optimization (e.x. strict mode in VM environment).

a) and b) are mandatory, while c) is optional. In any instance should c)
override the flush mode specified by the user. Hence, probably we should
also have another helper like:

void iommu_set_dma_strict_optional()
{
 if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
     iommu_dma_strict = true;
}

Any thoughts?


What you are suggesting is a change in policy from mainline code. 
Currently for c) we always set strict enabled, regardless of any user 
cmdline input. But now you are saying that you want iommu.strict to 
override in particular scenario, right?


In that case I would think it's better to rework the current API, like 
adding an option to "force" strict mode:


void iommu_set_dma_strict(bool force)
{
if (force == true)
iommu_dma_strict = true;
else if (!(iommu_cmd_line & IOMMU_CMD_LINE_STRICT))
iommu_dma_strict = true;
}

So we would use iommu_set_dma_strict(true) for a) and b), but 
iommu_set_dma_strict(false) for c).


Then I am not sure what you want to do with the accompanying print for 
c). It was:

"IOMMU batching is disabled due to virtualization"

And now is from this series:
"IOMMU batching disallowed due to virtualization"

Using iommu_get_dma_strict(domain) is not appropriate here to know the 
current mode (so we know whether to print).


Note that this change would mean that the current series would require 
non-trivial rework, which would be unfortunate so late in the cycle.


Thanks,
John
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 2/9] dt-bindings: arm-smmu: Add Tegra186 compatible string

2021-06-21 Thread Krzysztof Kozlowski
On 18/06/2021 21:47, Rob Herring wrote:
> On Thu, Jun 3, 2021 at 10:49 AM Thierry Reding  
> wrote:
>>
>> From: Thierry Reding 
>>
>> The ARM SMMU instantiations found on Tegra186 and later need inter-
>> operation with the memory controller in order to correctly program
>> stream ID overrides.
>>
>> Furthermore, on Tegra194 multiple instances of the SMMU can gang up
>> to achieve higher throughput. In order to do this, they have to be
>> programmed identically so that the memory controller can interleave
>> memory accesses between them.
>>
>> Add the Tegra186 compatible string to make sure the interoperation
>> with the memory controller can be enabled on that SoC generation.
>>
>> Signed-off-by: Thierry Reding 
>> ---
>>  Documentation/devicetree/bindings/iommu/arm,smmu.yaml | 11 +--
>>  1 file changed, 9 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml 
>> b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> index 9d27aa5111d4..1181b590db71 100644
>> --- a/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> +++ b/Documentation/devicetree/bindings/iommu/arm,smmu.yaml
>> @@ -54,8 +54,14 @@ properties:
>>- const: arm,mmu-500
>>- description: NVIDIA SoCs that program two ARM MMU-500s identically
>>  items:
>> +  - description: NVIDIA SoCs that require memory controller interaction
> 
> This is not valid jsonschema:
> 
> /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> properties:compatible:oneOf:4:items: 'anyOf' conditional failed, one
> must be fixed:
> None is not of type 'object', 'boolean'
> None is not of type 'array'
> from schema $id: http://json-schema.org/draft-07/schema#
> /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> must be fixed:
> None is not of type 'object'
> None is not of type 'array'
> from schema $id: http://devicetree.org/meta-schemas/keywords.yaml#
> /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> properties:compatible:oneOf:4:items: 'oneOf' conditional failed, one
> must be fixed:
> None is not of type 'object'
> None is not of type 'array'
> from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> /builds/robherring/linux-dt/Documentation/devicetree/bindings/iommu/arm,smmu.yaml:
> properties:compatible:oneOf:5:items: 'oneOf' conditional failed, one
> must be fixed:
> [{'enum': [{'const': 'nvidia,tegra194-smmu'}, {'const':
> 'nvidia,tegra186-smmu'}]}, {'const': 'nvidia,smmu-500'}] is not of
> type 'object'
> {'const': 'nvidia,tegra194-smmu'} is not of type 'string'
> {'const': 'nvidia,tegra186-smmu'} is not of type 'string'
> from schema $id: http://devicetree.org/meta-schemas/string-array.yaml#
> 
> 
> This was not reviewed nor tested since the DT list was not Cc'ed.

Ugh, I see now weird empty item on a list... and not only DT list was
skipped - Thierry did not Cc you either.

My bad, I did not check that patch thoroughly before applying.

Thierry, please Cc folks mentioned by get_maintainer.pl. Either sent a
fix or a revert, if fix needs more time.

Additionally, why the patch changes reg to "minItems: 1" for
nvidia,tegra194-smmu?

Best regards,
Krzysztof
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu