[RFC PATCH v1 0/9] deferred_probe_timeout logic clean up

2022-05-26 Thread Saravana Kannan via iommu
This series is based on linux-next + these 2 small patches applies on top:
https://lore.kernel.org/lkml/20220526034609.480766-1-sarava...@google.com/

A lot of the deferred_probe_timeout logic is redundant with
fw_devlink=on.  Also, enabling deferred_probe_timeout by default breaks
a few cases.

This series tries to delete the redundant logic, simplify the frameworks
that use driver_deferred_probe_check_state(), enable
deferred_probe_timeout=10 by default, and fixes the nfsroot failure
case.

Patches 1 to 3 are fairly straightforward and can probably be applied
right away.

Patches 4 to 9 are related and are the complicated bits of this series.

Patch 8 is where someone with more knowledge of the IP auto config code
can help rewrite the patch to limit the scope of the workaround by
running the work around only if IP auto config fails the first time
around. But it's also something that can be optimized in the future
because it's already limited to the case where IP auto config is enabled
using the kernel commandline.

Yoshihiro/Geert,

If you can test this patch series and confirm that the NFS root case
works, I'd really appreciate that.

Cc: Mark Brown 
Cc: Rob Herring 
Cc: Geert Uytterhoeven 
Cc: Yoshihiro Shimoda 
Cc: John Stultz 
Cc: Nathan Chancellor 
Cc: Sebastian Andrzej Siewior 

Saravana Kannan (9):
  PM: domains: Delete usage of driver_deferred_probe_check_state()
  pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()
  net: mdio: Delete usage of driver_deferred_probe_check_state()
  Revert "driver core: Set default deferred_probe_timeout back to 0."
  driver core: Set fw_devlink.strict=1 by default
  iommu/of: Delete usage of driver_deferred_probe_check_state()
  driver core: Add fw_devlink_unblock_may_probe() helper function
  net: ipconfig: Force fw_devlink to unblock any devices that might probe
  driver core: Delete driver_deferred_probe_check_state()

 drivers/base/base.h|  1 +
 drivers/base/core.c| 60 +-
 drivers/base/dd.c  | 37 -
 drivers/base/power/domain.c|  2 +-
 drivers/iommu/of_iommu.c   |  2 +-
 drivers/net/mdio/fwnode_mdio.c |  4 +--
 drivers/pinctrl/devicetree.c   |  2 +-
 include/linux/device/driver.h  |  1 -
 include/linux/fwnode.h |  2 ++
 net/ipv4/ipconfig.c|  2 ++
 10 files changed, 74 insertions(+), 39 deletions(-)

-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 2/9] pinctrl: devicetree: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"pinctrl-[0-8]" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/pinctrl/devicetree.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pinctrl/devicetree.c b/drivers/pinctrl/devicetree.c
index 3fb238714718..ef898ee8ca6b 100644
--- a/drivers/pinctrl/devicetree.c
+++ b/drivers/pinctrl/devicetree.c
@@ -129,7 +129,7 @@ static int dt_to_map_one_config(struct pinctrl *p,
np_pctldev = of_get_next_parent(np_pctldev);
if (!np_pctldev || of_node_is_root(np_pctldev)) {
of_node_put(np_pctldev);
-   ret = driver_deferred_probe_check_state(p->dev);
+   ret = -ENODEV;
/* keep deferring if modules are enabled */
if (IS_ENABLED(CONFIG_MODULES) && !allow_default && ret 
< 0)
ret = -EPROBE_DEFER;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 4/9] Revert "driver core: Set default deferred_probe_timeout back to 0."

2022-05-26 Thread Saravana Kannan via iommu
This reverts commit 11f7e7ef553b6b93ac1aa74a3c2011b9cc8aeb61.

Let's take another shot at getting deferred_probe_timeout=10 to work.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 11b0fb6414d3..f963d9010d7f 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -256,7 +256,12 @@ static int deferred_devs_show(struct seq_file *s, void 
*data)
 }
 DEFINE_SHOW_ATTRIBUTE(deferred_devs);
 
+#ifdef CONFIG_MODULES
+int driver_deferred_probe_timeout = 10;
+#else
 int driver_deferred_probe_timeout;
+#endif
+
 EXPORT_SYMBOL_GPL(driver_deferred_probe_timeout);
 
 static int __init deferred_probe_timeout_setup(char *str)
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 1/9] PM: domains: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports
"power-domains" property, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/base/power/domain.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/power/domain.c b/drivers/base/power/domain.c
index 739e52cd4aba..3e86772d5fac 100644
--- a/drivers/base/power/domain.c
+++ b/drivers/base/power/domain.c
@@ -2730,7 +2730,7 @@ static int __genpd_dev_pm_attach(struct device *dev, 
struct device *base_dev,
mutex_unlock(&gpd_list_lock);
dev_dbg(dev, "%s() failed to find PM domain: %ld\n",
__func__, PTR_ERR(pd));
-   return driver_deferred_probe_check_state(base_dev);
+   return -ENODEV;
}
 
dev_dbg(dev, "adding to PM domain %s\n", pd->name);
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 3/9] net: mdio: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on by default and fw_devlink supports interrupt
properties, the execution will never get to the point where
driver_deferred_probe_check_state() is called before the supplier has
probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/net/mdio/fwnode_mdio.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/mdio/fwnode_mdio.c b/drivers/net/mdio/fwnode_mdio.c
index 1c1584fca632..3e79c2c51929 100644
--- a/drivers/net/mdio/fwnode_mdio.c
+++ b/drivers/net/mdio/fwnode_mdio.c
@@ -47,9 +47,7 @@ int fwnode_mdiobus_phy_device_register(struct mii_bus *mdio,
 * just fall back to poll mode
 */
if (rc == -EPROBE_DEFER)
-   rc = driver_deferred_probe_check_state(&phy->mdio.dev);
-   if (rc == -EPROBE_DEFER)
-   return rc;
+   rc = -ENODEV;
 
if (rc > 0) {
phy->irq = rc;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 5/9] driver core: Set fw_devlink.strict=1 by default

2022-05-26 Thread Saravana Kannan via iommu
Now that deferred_probe_timeout is non-zero by default, fw_devlink will
never permanently block the probing of devices. It'll try its best to
probe the devices in the right order and then finally let devices probe
even if their suppliers don't have any drivers.

Signed-off-by: Saravana Kannan 
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7cd789c4985d..7672f23231c1 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1581,7 +1581,7 @@ static int __init fw_devlink_setup(char *arg)
 }
 early_param("fw_devlink", fw_devlink_setup);
 
-static bool fw_devlink_strict;
+static bool fw_devlink_strict = true;
 static int __init fw_devlink_strict_setup(char *arg)
 {
return strtobool(arg, &fw_devlink_strict);
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 6/9] iommu/of: Delete usage of driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
Now that fw_devlink=on and fw_devlink.strict=1 by default and fw_devlink
supports iommu DT properties, the execution will never get to the point
where driver_deferred_probe_check_state() is called before the supplier
has probed successfully or before deferred probe timeout has expired.

So, delete the call and replace it with -ENODEV.

Signed-off-by: Saravana Kannan 
---
 drivers/iommu/of_iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
index 5696314ae69e..41f4eb005219 100644
--- a/drivers/iommu/of_iommu.c
+++ b/drivers/iommu/of_iommu.c
@@ -40,7 +40,7 @@ static int of_iommu_xlate(struct device *dev,
 * a proper probe-ordering dependency mechanism in future.
 */
if (!ops)
-   return driver_deferred_probe_check_state(dev);
+   return -ENODEV;
 
if (!try_module_get(ops->owner))
return -ENODEV;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 7/9] driver core: Add fw_devlink_unblock_may_probe() helper function

2022-05-26 Thread Saravana Kannan via iommu
This function can be used during the kernel boot sequence to forcefully
override fw_devlink=on and unblock the probing of all devices that have
a driver.

It's mainly meant to be called from late_initcall() or
late_initcall_sync() where a device needs to probe before the kernel can
mount rootfs.

Signed-off-by: Saravana Kannan 
---
 drivers/base/base.h|  1 +
 drivers/base/core.c| 58 ++
 drivers/base/dd.c  |  2 +-
 include/linux/fwnode.h |  2 ++
 4 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/base/base.h b/drivers/base/base.h
index ab71403d102f..b3a43a164dcd 100644
--- a/drivers/base/base.h
+++ b/drivers/base/base.h
@@ -160,6 +160,7 @@ extern int devres_release_all(struct device *dev);
 extern void device_block_probing(void);
 extern void device_unblock_probing(void);
 extern void deferred_probe_extend_timeout(void);
+extern void driver_deferred_probe_trigger(void);
 
 /* /sys/devices directory */
 extern struct kset *devices_kset;
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 7672f23231c1..7ff7fbb00643 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1655,6 +1655,64 @@ void fw_devlink_drivers_done(void)
device_links_write_unlock();
 }
 
+static int fw_devlink_may_probe(struct device *dev, void *data)
+{
+   struct device_link *link = to_devlink(dev);
+
+   if (!link->supplier->can_match && link->consumer->can_match)
+   fw_devlink_relax_link(link);
+
+   return 0;
+}
+
+/**
+ * fw_devlink_unblock_may_probe - Force unblock any device that has a driver
+ *
+ * This function is more of a sledge hammer than a scalpel. Use this very
+ * sparingly.
+ *
+ * Some devices might need to be probed and bound successfully before the 
kernel
+ * boot sequence can finish and move on to init/userspace. For example, a
+ * network interface might need to be bound to be able to mount a NFS rootfs.
+ *
+ * With fw_devlink=on by default, some of these devices might be blocked from
+ * probing because they are waiting on a optional supplier that doesn't have a
+ * driver. While fw_devlink will eventually identify such devices and unblock
+ * the probing automatically, it might be too late by the time it unblocks the
+ * probing of devices. For example, the IP4 autoconfig might timeout before
+ * fw_devlink unblocks probing of the network interface. This function is
+ * available to unblock the probing of such devices.
+ *
+ * Since there's no easy way to know which unprobed device needs to probe for
+ * boot to succeed, this function makes sure fw_devlink doesn't block any 
device
+ * that has a driver at the point in time this function is called.
+ *
+ * It does this by relaxing (fw_devlink=permissive behavior) all the device
+ * links created by fw_devlink where the consumer has a driver and the supplier
+ * doesn't have a driver.
+ *
+ * It's extremely unlikely that a proper use of this function will be outside 
of
+ * an initcall. So, until a case is made for that, this function is
+ * intentionally marked with __init.
+ */
+void __init fw_devlink_unblock_may_probe(void)
+{
+   struct device_link *link, *ln;
+
+   if (!fw_devlink_flags || fw_devlink_is_permissive())
+   return;
+
+   /* Wait for current probes to finish to limit impact. */
+   wait_for_device_probe();
+
+   device_links_write_lock();
+   class_for_each_device(&devlink_class, NULL, NULL,
+ fw_devlink_may_probe);
+   device_links_write_unlock();
+
+   driver_deferred_probe_trigger();
+}
+
 static void fw_devlink_unblock_consumers(struct device *dev)
 {
struct device_link *link;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index f963d9010d7f..af8138d44e6c 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -172,7 +172,7 @@ static bool driver_deferred_probe_enable;
  * changes in the midst of a probe, then deferred processing should be 
triggered
  * again.
  */
-static void driver_deferred_probe_trigger(void)
+void driver_deferred_probe_trigger(void)
 {
if (!driver_deferred_probe_enable)
return;
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9a81c4410b9f..0770edda7068 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct fwnode_operations;
 struct device;
@@ -199,5 +200,6 @@ extern bool fw_devlink_is_strict(void);
 int fwnode_link_add(struct fwnode_handle *con, struct fwnode_handle *sup);
 void fwnode_links_purge(struct fwnode_handle *fwnode);
 void fw_devlink_purge_absent_suppliers(struct fwnode_handle *fwnode);
+void __init fw_devlink_unblock_may_probe(void);
 
 #endif
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 8/9] net: ipconfig: Force fw_devlink to unblock any devices that might probe

2022-05-26 Thread Saravana Kannan via iommu
If there are network devices that could probe without some of their
suppliers probing and those network devices are needed for IP auto
config to work, then fw_devlink=on might break that usecase by blocking
the network devices from probing by the time IP auto config starts.

So, when IP auto config is enabled, make sure fw_devlink doesn't block
the probing of any device that has a driver by the time we get to IP
auto config.

Signed-off-by: Saravana Kannan 
---
 net/ipv4/ipconfig.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv4/ipconfig.c b/net/ipv4/ipconfig.c
index 9d41d5d5cd1e..aa7b8ba68ca6 100644
--- a/net/ipv4/ipconfig.c
+++ b/net/ipv4/ipconfig.c
@@ -1435,6 +1435,8 @@ static int __init wait_for_devices(void)
 {
int i;
 
+   fw_devlink_unblock_may_probe();
+
for (i = 0; i < DEVICE_WAIT_MAX; i++) {
struct net_device *dev;
int found = 0;
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH v1 9/9] driver core: Delete driver_deferred_probe_check_state()

2022-05-26 Thread Saravana Kannan via iommu
The function is no longer used. So delete it.

Signed-off-by: Saravana Kannan 
---
 drivers/base/dd.c | 30 --
 include/linux/device/driver.h |  1 -
 2 files changed, 31 deletions(-)

diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index af8138d44e6c..789b0871dc45 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -274,42 +274,12 @@ static int __init deferred_probe_timeout_setup(char *str)
 }
 __setup("deferred_probe_timeout=", deferred_probe_timeout_setup);
 
-/**
- * driver_deferred_probe_check_state() - Check deferred probe state
- * @dev: device to check
- *
- * Return:
- * * -ENODEV if initcalls have completed and modules are disabled.
- * * -ETIMEDOUT if the deferred probe timeout was set and has expired
- *   and modules are enabled.
- * * -EPROBE_DEFER in other cases.
- *
- * Drivers or subsystems can opt-in to calling this function instead of 
directly
- * returning -EPROBE_DEFER.
- */
-int driver_deferred_probe_check_state(struct device *dev)
-{
-   if (!IS_ENABLED(CONFIG_MODULES) && initcalls_done) {
-   dev_warn(dev, "ignoring dependency for device, assuming no 
driver\n");
-   return -ENODEV;
-   }
-
-   if (!driver_deferred_probe_timeout && initcalls_done) {
-   dev_warn(dev, "deferred probe timeout, ignoring dependency\n");
-   return -ETIMEDOUT;
-   }
-
-   return -EPROBE_DEFER;
-}
-EXPORT_SYMBOL_GPL(driver_deferred_probe_check_state);
-
 static void deferred_probe_timeout_work_func(struct work_struct *work)
 {
struct device_private *p;
 
fw_devlink_drivers_done();
 
-   driver_deferred_probe_timeout = 0;
driver_deferred_probe_trigger();
flush_work(&deferred_probe_work);
 
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index 700453017e1c..7c245d269feb 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -241,7 +241,6 @@ driver_find_device_by_acpi_dev(struct device_driver *drv, 
const void *adev)
 
 extern int driver_deferred_probe_timeout;
 void driver_deferred_probe_add(struct device *dev);
-int driver_deferred_probe_check_state(struct device *dev);
 void driver_init(void);
 
 /**
-- 
2.36.1.124.g0e6072fb45-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/2] iommu: mtk_iommu: Add support for MT6795 Helio X10 M4Us

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:18 +0200, AngeloGioacchino Del Regno wrote:
> Add support for the M4Us found in the MT6795 Helio X10 SoC.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 17 -
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 090cf6e15f85..97ff30ed2d0f 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -159,6 +159,7 @@
>  enum mtk_iommu_plat {
>   M4U_MT2712,
>   M4U_MT6779,
> + M4U_MT6795,
>   M4U_MT8167,
>   M4U_MT8173,
>   M4U_MT8183,
> @@ -954,7 +955,8 @@ static int mtk_iommu_hw_init(const struct
> mtk_iommu_data *data, unsigned int ban
>* Global control settings are in bank0. May re-init these
> global registers
>* since no sure if there is bank0 consumers.
>*/
> - if (data->plat_data->m4u_plat == M4U_MT8173) {
> + if (data->plat_data->m4u_plat == M4U_MT6795 ||
> + data->plat_data->m4u_plat == M4U_MT8173) {

Add a new flag for this. This setting difference is that the offset for
TF_PROT_TO_PROGRAM_ADDR is 5 in mt8173 while the others' offset is 4.
thus, we could rename the flag like TF_PORT_TO_ADDR_MT8173 or
TF_PORT_TO_ADDR_OFFSET_IS_5.

>   regval = F_MMU_PREFETCH_RT_REPLACE_MOD |
>F_MMU_TF_PROT_TO_PROGRAM_ADDR_MT8173;
>   } else {
> @@ -1422,6 +1424,18 @@ static const struct mtk_iommu_plat_data
> mt6779_data = {
>   .larbid_remap  = {{0}, {1}, {2}, {3}, {5}, {7, 8}, {10}, {9}},
>  };
>  
> +static const struct mtk_iommu_plat_data mt6795_data = {
> + .m4u_plat = M4U_MT6795,
> + .flags= HAS_4GB_MODE | HAS_BCLK | RESET_AXI |
> + HAS_LEGACY_IVRP_PADDR | MTK_IOMMU_TYPE_MM,
> + .inv_sel_reg  = REG_MMU_INV_SEL_GEN1,
> + .banks_num= 1,
> + .banks_enable = {true},
> + .iova_region  = single_domain,
> + .iova_region_nr = ARRAY_SIZE(single_domain),
> + .larbid_remap = {{0}, {1}, {2}, {3}, {4}}, /* Linear mapping.
> */
> +};
> +
>  static const struct mtk_iommu_plat_data mt8167_data = {
>   .m4u_plat = M4U_MT8167,
>   .flags= RESET_AXI | HAS_LEGACY_IVRP_PADDR |
> MTK_IOMMU_TYPE_MM,
> @@ -1533,6 +1547,7 @@ static const struct mtk_iommu_plat_data
> mt8195_data_vpp = {
>  static const struct of_device_id mtk_iommu_of_ids[] = {
>   { .compatible = "mediatek,mt2712-m4u", .data = &mt2712_data},
>   { .compatible = "mediatek,mt6779-m4u", .data = &mt6779_data},
> + { .compatible = "mediatek,mt6795-m4u", .data = &mt6795_data},
>   { .compatible = "mediatek,mt8167-m4u", .data = &mt8167_data},
>   { .compatible = "mediatek,mt8173-m4u", .data = &mt8173_data},
>   { .compatible = "mediatek,mt8183-m4u", .data = &mt8183_data},

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/7] dt-bindings: iommu: mediatek: Add phandles for mediatek infra/pericfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> Add properties "mediatek,infracfg" and "mediatek,pericfg" to let the
> mtk_iommu driver retrieve phandles to the infracfg and pericfg
> syscon(s)
> instead of performing a per-soc compatible lookup.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  .../devicetree/bindings/iommu/mediatek,iommu.yaml | 8
> 
>  1 file changed, 8 insertions(+)
> 
> diff --git
> a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> index 2ae3bbad7f1a..c4af41947593 100644
> --- a/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> +++ b/Documentation/devicetree/bindings/iommu/mediatek,iommu.yaml
> @@ -101,6 +101,10 @@ properties:
>  items:
>- const: bclk
>  
> +  mediatek,infracfg:
> +$ref: /schemas/types.yaml#/definitions/phandle
> +description: The phandle to the mediatek infracfg syscon
> +

Just curious, why not put this "mediatek,infracfg" and its required
segment[6/7] into one patch?

>mediatek,larbs:
>  $ref: /schemas/types.yaml#/definitions/phandle-array
>  minItems: 1
> @@ -112,6 +116,10 @@ properties:
>Refer to bindings/memory-controllers/mediatek,smi-larb.yaml.
> It must sort
>according to the local arbiter index, like larb0, larb1,
> larb2...
>  
> +  mediatek,pericfg:
> +$ref: /schemas/types.yaml#/definitions/phandle
> +description: The phandle to the mediatek pericfg syscon
> +
>'#iommu-cells':
>  const: 1
>  description: |

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/7] iommu: mtk_iommu: Lookup phandle to retrieve syscon to infracfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> This driver will get support for more SoCs and the list of infracfg
> compatibles is expected to grow: in order to prevent getting this
> situation out of control and see a long list of compatible strings,
> add support to retrieve a handle to infracfg's regmap through a
> new "mediatek,infracfg" phandle.
> 
> In order to keep retrocompatibility with older devicetrees, the old
> way is kept in place, but also a dev_warn() was added to advertise
> this change in hope that the user will see it and eventually update
> the devicetree if this is possible.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 40 +--
> 
>  1 file changed, 26 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index 71b2ace74cd6..d16b95e71ded 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -1134,22 +1134,34 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
>   data->protect_base = ALIGN(virt_to_phys(protect),
> MTK_PROTECT_PA_ALIGN);
>  
>   if (MTK_IOMMU_HAS_FLAG(data->plat_data, HAS_4GB_MODE)) {
> - switch (data->plat_data->m4u_plat) {
> - case M4U_MT2712:
> - p = "mediatek,mt2712-infracfg";
> - break;
> - case M4U_MT8173:
> - p = "mediatek,mt8173-infracfg";
> - break;
> - default:
> - p = NULL;
> + infracfg = syscon_regmap_lookup_by_phandle(dev-
> >of_node, "mediatek,infracfg");
> + if (IS_ERR(infracfg)) {
> + dev_info(dev, "Cannot find phandle to
> mediatek,infracfg:"
> +   " Please update your
> devicetree.\n");

Remove the log from Robin?

> + /*
> +  * Legacy devicetrees will not specify a
> phandle to
> +  * mediatek,infracfg: in that case, we use the
> older
> +  * way to retrieve a syscon to infra.
> +  *
> +  * This is for retrocompatibility purposes
> only, hence
> +  * no more compatibles shall be added to this.
> +  */
> + switch (data->plat_data->m4u_plat) {
> + case M4U_MT2712:
> + p = "mediatek,mt2712-infracfg";
> + break;
> + case M4U_MT8173:
> + p = "mediatek,mt8173-infracfg";
> + break;
> + default:
> + p = NULL;
> + }

We already use "mediatek,infracfg" property for commonizing. For the
previous SoC, I also prefer to put the string into the platform data.

After this, "->m4u_plat" could be removed. Of course, this is not the
main purpose of this patchset. it also is ok currently.

> +
> + infracfg =
> syscon_regmap_lookup_by_compatible(p);
> + if (IS_ERR(infracfg))
> + return PTR_ERR(infracfg);
>   }
>  
> - infracfg = syscon_regmap_lookup_by_compatible(p);
> -
> - if (IS_ERR(infracfg))
> - return PTR_ERR(infracfg);
> -
>   ret = regmap_read(infracfg, REG_INFRA_MISC, &val);
>   if (ret)
>   return ret;

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/7] iommu: mtk_iommu: Lookup phandle to retrieve syscon to pericfg

2022-05-26 Thread Yong Wu via iommu
On Wed, 2022-05-18 at 12:04 +0200, AngeloGioacchino Del Regno wrote:
> On some SoCs (of which only MT8195 is supported at the time of
> writing),
> the "R" and "W" (I/O) enable bits for the IOMMUs are in the
> pericfg_ao
> register space and not in the IOMMU space: as it happened already
> with
> infracfg, it is expected that this list will grow.
> 
> Instead of specifying pericfg compatibles on a per-SoC basis,
> following
> what was done with infracfg, let's lookup the syscon by phandle
> instead.
> Also following the previous infracfg change, add a warning for
> outdated
> devicetrees, in hope that the user will take action.
> 
> Signed-off-by: AngeloGioacchino Del Regno <
> angelogioacchino.delre...@collabora.com>
> ---
>  drivers/iommu/mtk_iommu.c | 26 --
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index d16b95e71ded..090cf6e15f85 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -138,6 +138,8 @@
>  /* PM and clock always on. e.g. infra iommu */
>  #define PM_CLK_AOBIT(15)
>  #define IFA_IOMMU_PCIE_SUPPORT   BIT(16)
> +/* IOMMU I/O (r/w) is enabled using PERICFG_IOMMU_1 register */
> +#define HAS_PERI_IOMMU1_REG  BIT(17)
>  
>  #define MTK_IOMMU_HAS_FLAG_MASK(pdata, _x, mask) \
>   pdata)->flags) & (mask)) == (_x))
> @@ -187,7 +189,6 @@ struct mtk_iommu_plat_data {
>   u32 flags;
>   u32 inv_sel_reg;
>  
> - char*pericfg_comp_str;
>   struct list_head*hw_list;
>   unsigned intiova_region_nr;
>   const struct mtk_iommu_iova_region  *iova_region;
> @@ -1214,14 +1215,19 @@ static int mtk_iommu_probe(struct
> platform_device *pdev)
>   goto out_runtime_disable;
>   }
>   } else if (MTK_IOMMU_IS_TYPE(data->plat_data,
> MTK_IOMMU_TYPE_INFRA) &&
> -data->plat_data->pericfg_comp_str) {
> - infracfg = syscon_regmap_lookup_by_compatible(data-
> >plat_data->pericfg_comp_str);
> - if (IS_ERR(infracfg)) {
> - ret = PTR_ERR(infracfg);
> - goto out_runtime_disable;
> - }
> +MTK_IOMMU_HAS_FLAG(data->plat_data,
> HAS_PERI_IOMMU1_REG)) {
> + data->pericfg = syscon_regmap_lookup_by_phandle(dev-
> >of_node, "mediatek,pericfg");

I'm not keen to add this property. Currently only mt8195 use this
setting. In the lastest SoC, we move this setting to ATF. thus I think
we could keep the current way, no need add a new DT property only for
mt8195.

> + if (IS_ERR(data->pericfg)) {
> + dev_info(dev, "Cannot find phandle to
> mediatek,pericfg:"
> +   " Please update your
> devicetree.\n");
>  
> - data->pericfg = infracfg;
> + p = "mediatek,mt8195-pericfg_ao";
> + data->pericfg =
> syscon_regmap_lookup_by_compatible(p);
> + if (IS_ERR(data->pericfg)) {
> + ret = PTR_ERR(data->pericfg);
> + goto out_runtime_disable;
> + }
> + }
>   }
>  
>   platform_set_drvdata(pdev, data);
> @@ -1480,8 +1486,8 @@ static const struct mtk_iommu_plat_data
> mt8192_data = {
>  static const struct mtk_iommu_plat_data mt8195_data_infra = {
>   .m4u_plat = M4U_MT8195,
>   .flags= WR_THROT_EN | DCM_DISABLE | STD_AXI_MODE |
> PM_CLK_AO |
> - MTK_IOMMU_TYPE_INFRA |
> IFA_IOMMU_PCIE_SUPPORT,
> - .pericfg_comp_str = "mediatek,mt8195-pericfg_ao",
> + HAS_PERI_IOMMU1_REG | MTK_IOMMU_TYPE_INFRA
> |
> + IFA_IOMMU_PCIE_SUPPORT,
>   .inv_sel_reg  = REG_MMU_INV_SEL_GEN2,
>   .banks_num= 5,
>   .banks_enable = {true, false, false, false, true},

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 0/4] DMA mapping changes for SCSI core

2022-05-26 Thread John Garry via iommu
As reported in [0], DMA mappings whose size exceeds the IOMMU IOVA caching
limit may see a big performance hit.

This series introduces a new DMA mapping API, dma_opt_mapping_size(), so
that drivers may know this limit when performance is a factor in the
mapping.

Robin didn't like using dma_max_mapping_size() for this [1].

The SCSI core code is modified to use this limit.

I also added a patch for libata-scsi as it does not currently honour the
shost max_sectors limit.

[0] 
https://lore.kernel.org/linux-iommu/20210129092120.1482-1-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/f5b78c9c-312e-70ab-ecbb-f14623a4b...@arm.com/

Changes since v1:
- Relocate scsi_add_host_with_dma() dma_dev check (Reported by Dan)
- Add tags from Damien and Martin (thanks)
  - note: I only added Martin's tag to the SCSI patch

John Garry (4):
  dma-mapping: Add dma_opt_mapping_size()
  dma-iommu: Add iommu_dma_opt_mapping_size()
  scsi: core: Cap shost max_sectors according to DMA optimum mapping
limits
  libata-scsi: Cap ata_device->max_sectors according to
shost->max_sectors

 Documentation/core-api/dma-api.rst |  9 +
 drivers/ata/libata-scsi.c  |  1 +
 drivers/iommu/dma-iommu.c  |  6 ++
 drivers/iommu/iova.c   |  5 +
 drivers/scsi/hosts.c   |  5 +
 drivers/scsi/scsi_lib.c|  4 
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 include/linux/iova.h   |  2 ++
 kernel/dma/mapping.c   | 12 
 10 files changed, 46 insertions(+), 4 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 1/4] dma-mapping: Add dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Streaming DMA mapping involving an IOMMU may be much slower for larger
total mapping size. This is because every IOMMU DMA mapping requires an
IOVA to be allocated and freed. IOVA sizes above a certain limit are not
cached, which can have a big impact on DMA mapping performance.

Provide an API for device drivers to know this "optimal" limit, such that
they may try to produce mapping which don't exceed it.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
---
 Documentation/core-api/dma-api.rst |  9 +
 include/linux/dma-map-ops.h|  1 +
 include/linux/dma-mapping.h|  5 +
 kernel/dma/mapping.c   | 12 
 4 files changed, 27 insertions(+)

diff --git a/Documentation/core-api/dma-api.rst 
b/Documentation/core-api/dma-api.rst
index 6d6d0edd2d27..b3cd9763d28b 100644
--- a/Documentation/core-api/dma-api.rst
+++ b/Documentation/core-api/dma-api.rst
@@ -204,6 +204,15 @@ Returns the maximum size of a mapping for the device. The 
size parameter
 of the mapping functions like dma_map_single(), dma_map_page() and
 others should not be larger than the returned value.
 
+::
+
+   size_t
+   dma_opt_mapping_size(struct device *dev);
+
+Returns the maximum optimal size of a mapping for the device. Mapping large
+buffers may take longer so device drivers are advised to limit total DMA
+streaming mappings length to the returned value.
+
 ::
 
bool
diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d5b06b3a4a6..98ceba6fa848 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -69,6 +69,7 @@ struct dma_map_ops {
int (*dma_supported)(struct device *dev, u64 mask);
u64 (*get_required_mask)(struct device *dev);
size_t (*max_mapping_size)(struct device *dev);
+   size_t (*opt_mapping_size)(void);
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index dca2b1355bb1..fe3849434b2a 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -144,6 +144,7 @@ int dma_set_mask(struct device *dev, u64 mask);
 int dma_set_coherent_mask(struct device *dev, u64 mask);
 u64 dma_get_required_mask(struct device *dev);
 size_t dma_max_mapping_size(struct device *dev);
+size_t dma_opt_mapping_size(struct device *dev);
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr);
 unsigned long dma_get_merge_boundary(struct device *dev);
 struct sg_table *dma_alloc_noncontiguous(struct device *dev, size_t size,
@@ -266,6 +267,10 @@ static inline size_t dma_max_mapping_size(struct device 
*dev)
 {
return 0;
 }
+static inline size_t dma_opt_mapping_size(struct device *dev)
+{
+   return 0;
+}
 static inline bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
return false;
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index db7244291b74..1bfe11b1edb6 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -773,6 +773,18 @@ size_t dma_max_mapping_size(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dma_max_mapping_size);
 
+size_t dma_opt_mapping_size(struct device *dev)
+{
+   const struct dma_map_ops *ops = get_dma_ops(dev);
+   size_t size = SIZE_MAX;
+
+   if (ops && ops->opt_mapping_size)
+   size = ops->opt_mapping_size();
+
+   return min(dma_max_mapping_size(dev), size);
+}
+EXPORT_SYMBOL_GPL(dma_opt_mapping_size);
+
 bool dma_need_sync(struct device *dev, dma_addr_t dma_addr)
 {
const struct dma_map_ops *ops = get_dma_ops(dev);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-26 Thread John Garry via iommu
Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
allows the drivers to know the optimal mapping limit and thus limit the
requested IOVA lengths.

This value is based on the IOVA rcache range limit, as IOVAs allocated
above this limit must always be newly allocated, which may be quite slow.

Signed-off-by: John Garry 
---
 drivers/iommu/dma-iommu.c | 6 ++
 drivers/iommu/iova.c  | 5 +
 include/linux/iova.h  | 2 ++
 3 files changed, 13 insertions(+)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 09f6e1c0f9c0..f619e41b9172 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -1442,6 +1442,11 @@ static unsigned long iommu_dma_get_merge_boundary(struct 
device *dev)
return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
 }
 
+static size_t iommu_dma_opt_mapping_size(void)
+{
+   return iova_rcache_range();
+}
+
 static const struct dma_map_ops iommu_dma_ops = {
.alloc  = iommu_dma_alloc,
.free   = iommu_dma_free,
@@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = {
.map_resource   = iommu_dma_map_resource,
.unmap_resource = iommu_dma_unmap_resource,
.get_merge_boundary = iommu_dma_get_merge_boundary,
+   .opt_mapping_size   = iommu_dma_opt_mapping_size,
 };
 
 /*
diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index db77aa675145..9f00b58d546e 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
*iovad,
 static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain *iovad);
 static void free_iova_rcaches(struct iova_domain *iovad);
 
+unsigned long iova_rcache_range(void)
+{
+   return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
+}
+
 static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
 {
struct iova_domain *iovad;
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 320a70e40233..c6ba6d95d79c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
*iovad, dma_addr_t iova)
 int iova_cache_get(void);
 void iova_cache_put(void);
 
+unsigned long iova_rcache_range(void);
+
 void free_iova(struct iova_domain *iovad, unsigned long pfn);
 void __free_iova(struct iova_domain *iovad, struct iova *iova);
 struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 3/4] scsi: core: Cap shost max_sectors according to DMA optimum mapping limits

2022-05-26 Thread John Garry via iommu
Streaming DMA mappings may be considerably slower when mappings go through
an IOMMU and the total mapping length is somewhat long. This is because the
IOMMU IOVA code allocates and free an IOVA for each mapping, which may
affect performance.

For performance reasons set the request_queue max_sectors from
dma_opt_mapping_size(), which knows this mapping limit.

In addition, the shost->max_sectors is repeatedly set for each sdev in
__scsi_init_queue(). This is unnecessary, so set once when adding the
host.

Signed-off-by: John Garry 
Reviewed-by: Damien Le Moal 
Reviewed-by: Martin K. Petersen 
---
 drivers/scsi/hosts.c| 5 +
 drivers/scsi/scsi_lib.c | 4 
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/scsi/hosts.c b/drivers/scsi/hosts.c
index f69b77cbf538..9563c0ac567a 100644
--- a/drivers/scsi/hosts.c
+++ b/drivers/scsi/hosts.c
@@ -240,6 +240,11 @@ int scsi_add_host_with_dma(struct Scsi_Host *shost, struct 
device *dev,
 
shost->dma_dev = dma_dev;
 
+   if (dma_dev->dma_mask) {
+   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
+   dma_opt_mapping_size(dma_dev) >> SECTOR_SHIFT);
+   }
+
/*
 * Increase usage count temporarily here so that calling
 * scsi_autopm_put_host() will trigger runtime idle if there is
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 8d18cc7e510e..2d43bb8799bd 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1884,10 +1884,6 @@ void __scsi_init_queue(struct Scsi_Host *shost, struct 
request_queue *q)
blk_queue_max_integrity_segments(q, shost->sg_prot_tablesize);
}
 
-   if (dev->dma_mask) {
-   shost->max_sectors = min_t(unsigned int, shost->max_sectors,
-   dma_max_mapping_size(dev) >> SECTOR_SHIFT);
-   }
blk_queue_max_hw_sectors(q, shost->max_sectors);
blk_queue_segment_boundary(q, shost->dma_boundary);
dma_set_seg_boundary(dev, shost->dma_boundary);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2 4/4] libata-scsi: Cap ata_device->max_sectors according to shost->max_sectors

2022-05-26 Thread John Garry via iommu
ATA devices (struct ata_device) have a max_sectors field which is
configured internally in libata. This is then used to (re)configure the
associated sdev request queue max_sectors value from how it is earlier set
in __scsi_init_queue(). In __scsi_init_queue() the max_sectors value is set
according to shost limits, which includes host DMA mapping limits.

Cap the ata_device max_sectors according to shost->max_sectors to respect
this shost limit.

Signed-off-by: John Garry 
Acked-by: Damien Le Moal 
---
 drivers/ata/libata-scsi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 06c9d90238d9..25fe89791641 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1036,6 +1036,7 @@ int ata_scsi_dev_config(struct scsi_device *sdev, struct 
ata_device *dev)
dev->flags |= ATA_DFLAG_NO_UNLOAD;
 
/* configure max sectors */
+   dev->max_sectors = min(dev->max_sectors, sdev->host->max_sectors);
blk_queue_max_hw_sectors(q, dev->max_sectors);
 
if (dev->class == ATA_DEV_ATAPI) {
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 0/2] swiotlb: Add child io tlb mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. The number child IO tlb
mem maybe set up equal with device queue number and this helps to resolve
swiotlb spinlock overhead among devices and queues.

introduces IO TLB Block concepts and swiotlb_device_allocate()
API to allocate per-device swiotlb bounce buffer. The new API Accepts
queue number as the number of child IO TLB mem to set up device's IO
TLB mem.

Patch 2 calls new allocation function in the netvsc driver to resolve
global spin lock issue.

Tianyu Lan (2):
  swiotlb: Add Child IO TLB mem support
  net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

 drivers/net/hyperv/netvsc.c |  10 ++
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 299 ++--
 3 files changed, 334 insertions(+), 13 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Netvsc driver allocates device io tlb mem via calling swiotlb_device_
allocate() and set child io tlb mem number according to device queue
number. Child io tlb mem may reduce overhead of single spin lock in
device io tlb mem among multi device queues.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/netvsc.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 9442f751ad3a..26a8f8f84fc4 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 
 #include "hyperv_net.h"
 #include "netvsc_trace.h"
@@ -98,6 +99,7 @@ static void netvsc_subchan_work(struct work_struct *w)
struct netvsc_device *nvdev =
container_of(w, struct netvsc_device, subchan_work);
struct rndis_device *rdev;
+   struct hv_device *hdev;
int i, ret;
 
/* Avoid deadlock with device removal already under RTNL */
@@ -108,6 +110,9 @@ static void netvsc_subchan_work(struct work_struct *w)
 
rdev = nvdev->extension;
if (rdev) {
+   hdev = ((struct net_device_context *)
+   netdev_priv(rdev->ndev))->device_ctx;
+
ret = rndis_set_subchannel(rdev->ndev, nvdev, NULL);
if (ret == 0) {
netif_device_attach(rdev->ndev);
@@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct *w)
nvdev->max_chn = 1;
nvdev->num_chn = 1;
}
+
+   /* Allocate boucne buffer.*/
+   swiotlb_device_allocate(&hdev->device, nvdev->num_chn,
+   10 * IO_TLB_BLOCK_UNIT);
}
 
rtnl_unlock();
@@ -769,6 +778,7 @@ void netvsc_device_remove(struct hv_device *device)
 
/* Release all resources */
free_netvsc_device_rcu(net_device);
+   swiotlb_device_free(&device->device);
 }
 
 #define RING_AVAIL_PERCENT_HIWATER 20
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[RFC PATCH V3 1/2] swiotlb: Add Child IO TLB mem support

2022-05-26 Thread Tianyu Lan
From: Tianyu Lan 

Traditionally swiotlb was not performance critical because it was only
used for slow devices. But in some setups, like TDX/SEV confidential
guests, all IO has to go through swiotlb. Currently swiotlb only has a
single lock. Under high IO load with multiple CPUs this can lead to
significant lock contention on the swiotlb lock.

This patch adds child IO TLB mem support to resolve spinlock overhead
among device's queues. Each device may allocate IO tlb mem and setup
child IO TLB mem according to queue number. Swiotlb code allocates
bounce buffer among child IO tlb mem iterately.

Introduce IO TLB Block unit(2MB) concepts to allocate big bounce buffer
from default pool for devices. IO TLB segment(256k) is too small for
device bounce buffer.

Signed-off-by: Tianyu Lan 
---
 include/linux/swiotlb.h |  38 +
 kernel/dma/swiotlb.c| 304 ++--
 2 files changed, 329 insertions(+), 13 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index 7ed35dd3de6e..a48a9d64e3c3 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -31,6 +31,14 @@ struct scatterlist;
 #define IO_TLB_SHIFT 11
 #define IO_TLB_SIZE (1 << IO_TLB_SHIFT)
 
+/*
+ * IO TLB BLOCK UNIT as device bounce buffer allocation unit.
+ * This allows device allocates bounce buffer from default io
+ * tlb pool.
+ */
+#define IO_TLB_BLOCKSIZE   (8 * IO_TLB_SEGSIZE)
+#define IO_TLB_BLOCK_UNIT  (IO_TLB_BLOCKSIZE << IO_TLB_SHIFT)
+
 /* default to 64MB */
 #define IO_TLB_DEFAULT_SIZE (64UL<<20)
 
@@ -89,6 +97,11 @@ extern enum swiotlb_force swiotlb_force;
  * @late_alloc:%true if allocated using the page allocator
  * @force_bounce: %true if swiotlb bouncing is forced
  * @for_alloc:  %true if the pool is used for memory allocation
+ * @num_child:  The child io tlb mem number in the pool.
+ * @child_nslot:The number of IO TLB slot in the child IO TLB mem.
+ * @child_nblock:The number of IO TLB block in the child IO TLB mem.
+ * @child_start:The child index to start searching in the next round.
+ * @block_start:The block index to start searching in the next round.
  */
 struct io_tlb_mem {
phys_addr_t start;
@@ -102,6 +115,16 @@ struct io_tlb_mem {
bool late_alloc;
bool force_bounce;
bool for_alloc;
+   unsigned int num_child;
+   unsigned int child_nslot;
+   unsigned int child_nblock;
+   unsigned int child_start;
+   unsigned int block_index;
+   struct io_tlb_mem *child;
+   struct io_tlb_mem *parent;
+   struct io_tlb_block {
+   unsigned int list;
+   } *block;
struct io_tlb_slot {
phys_addr_t orig_addr;
size_t alloc_size;
@@ -130,6 +153,10 @@ unsigned int swiotlb_max_segment(void);
 size_t swiotlb_max_mapping_size(struct device *dev);
 bool is_swiotlb_active(struct device *dev);
 void __init swiotlb_adjust_size(unsigned long size);
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size);
+void swiotlb_device_free(struct device *dev);
 #else
 static inline void swiotlb_init(bool addressing_limited, unsigned int flags)
 {
@@ -162,6 +189,17 @@ static inline bool is_swiotlb_active(struct device *dev)
 static inline void swiotlb_adjust_size(unsigned long size)
 {
 }
+
+void swiotlb_device_free(struct device *dev)
+{
+}
+
+int swiotlb_device_allocate(struct device *dev,
+   unsigned int area_num,
+   unsigned long size)
+{
+   return -ENOMEM;
+}
 #endif /* CONFIG_SWIOTLB */
 
 extern void swiotlb_print_info(void);
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index e2ef0864eb1e..7ca22a5a1886 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -195,7 +195,8 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, 
phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
-   unsigned long bytes = nslabs << IO_TLB_SHIFT, i;
+   unsigned long bytes = nslabs << IO_TLB_SHIFT, i, j;
+   unsigned int block_num = nslabs / IO_TLB_BLOCKSIZE;
 
mem->nslabs = nslabs;
mem->start = start;
@@ -207,7 +208,36 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->force_bounce = true;
 
spin_lock_init(&mem->lock);
-   for (i = 0; i < mem->nslabs; i++) {
+
+   if (mem->num_child) {
+   mem->child_nslot = nslabs / mem->num_child;
+   mem->child_nblock = block_num / mem->num_child;
+   mem->child_start = 0;
+
+   /*
+* Initialize child IO TLB mem, divide IO TLB pool
+* into child number. Reuse parent mem->slot in the
+* child mem->slot.
+*/
+   for (i = 0; i < mem->num_child; i++) {
+  

RE: [RFC PATCH V3 2/2] net: netvsc: Allocate per-device swiotlb bounce buffer for netvsc

2022-05-26 Thread Dexuan Cui via iommu
> From: Tianyu Lan 
> Sent: Thursday, May 26, 2022 5:01 AM
> ...
> @@ -119,6 +124,10 @@ static void netvsc_subchan_work(struct work_struct
> *w)
>   nvdev->max_chn = 1;
>   nvdev->num_chn = 1;
>   }
> +
> + /* Allocate boucne buffer.*/
> + swiotlb_device_allocate(&hdev->device, nvdev->num_chn,
> + 10 * IO_TLB_BLOCK_UNIT);
>   }

Looks like swiotlb_device_allocate() is not called if the netvsc device
has only 1 primary channel and no sub-schannel, e.g. in the case of
single-vCPU VM?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/4] dma-iommu: Add iommu_dma_opt_mapping_size()

2022-05-26 Thread Damien Le Moal via iommu
On 2022/05/26 19:28, John Garry wrote:
> Add the IOMMU callback for DMA mapping API dma_opt_mapping_size(), which
> allows the drivers to know the optimal mapping limit and thus limit the
> requested IOVA lengths.
> 
> This value is based on the IOVA rcache range limit, as IOVAs allocated
> above this limit must always be newly allocated, which may be quite slow.
> 
> Signed-off-by: John Garry 
> ---
>  drivers/iommu/dma-iommu.c | 6 ++
>  drivers/iommu/iova.c  | 5 +
>  include/linux/iova.h  | 2 ++
>  3 files changed, 13 insertions(+)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 09f6e1c0f9c0..f619e41b9172 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -1442,6 +1442,11 @@ static unsigned long 
> iommu_dma_get_merge_boundary(struct device *dev)
>   return (1UL << __ffs(domain->pgsize_bitmap)) - 1;
>  }
>  
> +static size_t iommu_dma_opt_mapping_size(void)
> +{
> + return iova_rcache_range();
> +}
> +
>  static const struct dma_map_ops iommu_dma_ops = {
>   .alloc  = iommu_dma_alloc,
>   .free   = iommu_dma_free,
> @@ -1462,6 +1467,7 @@ static const struct dma_map_ops iommu_dma_ops = {
>   .map_resource   = iommu_dma_map_resource,
>   .unmap_resource = iommu_dma_unmap_resource,
>   .get_merge_boundary = iommu_dma_get_merge_boundary,
> + .opt_mapping_size   = iommu_dma_opt_mapping_size,
>  };
>  
>  /*
> diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
> index db77aa675145..9f00b58d546e 100644
> --- a/drivers/iommu/iova.c
> +++ b/drivers/iommu/iova.c
> @@ -26,6 +26,11 @@ static unsigned long iova_rcache_get(struct iova_domain 
> *iovad,
>  static void free_cpu_cached_iovas(unsigned int cpu, struct iova_domain 
> *iovad);
>  static void free_iova_rcaches(struct iova_domain *iovad);
>  
> +unsigned long iova_rcache_range(void)
> +{
> + return PAGE_SIZE << (IOVA_RANGE_CACHE_MAX_SIZE - 1);
> +}
> +
>  static int iova_cpuhp_dead(unsigned int cpu, struct hlist_node *node)
>  {
>   struct iova_domain *iovad;
> diff --git a/include/linux/iova.h b/include/linux/iova.h
> index 320a70e40233..c6ba6d95d79c 100644
> --- a/include/linux/iova.h
> +++ b/include/linux/iova.h
> @@ -79,6 +79,8 @@ static inline unsigned long iova_pfn(struct iova_domain 
> *iovad, dma_addr_t iova)
>  int iova_cache_get(void);
>  void iova_cache_put(void);
>  
> +unsigned long iova_rcache_range(void);
> +
>  void free_iova(struct iova_domain *iovad, unsigned long pfn);
>  void __free_iova(struct iova_domain *iovad, struct iova *iova);
>  struct iova *alloc_iova(struct iova_domain *iovad, unsigned long size,

Reviewed-by: Damien Le Moal 

-- 
Damien Le Moal
Western Digital Research
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 0/9] Add dynamic iommu backed bounce buffers

2022-05-26 Thread David Stevens
On Tue, May 24, 2022 at 9:27 PM Niklas Schnelle  wrote:
>
> On Fri, 2021-08-06 at 19:34 +0900, David Stevens wrote:
> > From: David Stevens 
> >
> > This patch series adds support for per-domain dynamic pools of iommu
> > bounce buffers to the dma-iommu API. This allows iommu mappings to be
> > reused while still maintaining strict iommu protection.
> >
> > This bounce buffer support is used to add a new config option that, when
> > enabled, causes all non-direct streaming mappings below a configurable
> > size to go through the bounce buffers. This serves as an optimization on
> > systems where manipulating iommu mappings is very expensive. For
> > example, virtio-iommu operations in a guest on a linux host require a
> > vmexit, involvement the VMM, and a VFIO syscall. For relatively small
> > DMA operations, memcpy can be significantly faster.
> >
> > As a performance comparison, on a device with an i5-10210U, I ran fio
> > with a VFIO passthrough NVMe drive and virtio-iommu with '--direct=1
> > --rw=read --ioengine=libaio --iodepth=64' and block sizes 4k, 16k, 64k,
> > and 128k. Test throughput increased by 2.8x, 4.7x, 3.6x, and 3.6x. Time
> > spent in iommu_dma_unmap_(page|sg) per GB processed decreased by 97%,
> > 94%, 90%, and 87%. Time spent in iommu_dma_map_(page|sg) decreased
> > by >99%, as bounce buffers don't require syncing here in the read case.
> > Running with multiple jobs doesn't serve as a useful performance
> > comparison because virtio-iommu and vfio_iommu_type1 both have big
> > locks that significantly limit mulithreaded DMA performance.
> >
> > These pooled bounce buffers are also used for subgranule mappings with
> > untrusted devices, replacing the single use bounce buffers used
> > currently. The biggest difference here is that the new implementation
> > maps a whole sglist using a single bounce buffer. The new implementation
> > does not support using bounce buffers for only some segments of the
> > sglist, so it may require more copying. However, the current
> > implementation requires per-segment iommu map/unmap operations for all
> > untrusted sglist mappings (fully aligned sglists included). On a
> > i5-10210U laptop with the internal NVMe drive made to appear untrusted,
> > fio --direct=1 --rw=read --ioengine=libaio --iodepth=64 --bs=64k showed
> > a statistically significant decrease in CPU load from 2.28% -> 2.17%
> > with the new iommu bounce buffer optimization enabled.
> >
> > Each domain's buffer pool is split into multiple power-of-2 size
> > classes. Each class allocates a fixed number of buffer slot metadata. A
> > large iova range is allocated, and each slot is assigned an iova from
> > the range. This allows the iova to be easily mapped back to the slot,
> > and allows the critical section of most pool operations to be constant
> > time. The one exception is finding a cached buffer to reuse. These are
> > only separated according to R/W permissions - the use of other
> > permissions such as IOMMU_PRIV may require a linear search through the
> > cache. However, these other permissions are rare and likely exhibit high
> > locality, so the should not be a bottleneck in practice.
> >
> > Since untrusted devices may require bounce buffers, each domain has a
> > fallback rbtree to manage single use buffers. This may be necessary if a
> > very large number of DMA operations are simultaneously in-flight, or for
> > very large individual DMA operations.
> >
> > This patch set does not use swiotlb. There are two primary ways in which
> > swiotlb isn't compatible with per-domain buffer pools. First, swiotlb
> > allocates buffers to be compatible with a single device, whereas
> > per-domain buffer pools don't handle that during buffer allocation as a
> > single buffer may end up being used by multiple devices. Second, swiotlb
> > allocation establishes the original to bounce buffer mapping, which
> > again doesn't work if buffers can be reused. Effectively the only code
> > that can be shared between the two use cases is allocating slots from
> > the swiotlb's memory. However, given that we're going to be allocating
> > memory for use with an iommu, allocating memory from a block of memory
> > explicitly set aside to deal with a lack of iommu seems kind of
> > contradictory. At best there might be a small performance improvement if
> > wiotlb allocation is faster than regular page allocation, but buffer
> > allocation isn't on the hot path anyway.
> >
> > Not using the swiotlb has the benefit that memory doesn't have to be
> > preallocated. Instead, bounce buffers consume memory only for in-flight
> > dma transactions (ignoring temporarily cached buffers), which is the
> > smallest amount possible. This makes it easier to use bounce buffers as
> > an optimization on systems with large numbers of devices or in
> > situations where devices are unknown, since it is not necessary to try
> > to tune how much memory needs to be set aside to achieve good
> > performance without cos

[PATCH 1/1] iommu/vt-d: Remove unused iovad from dmar_domain

2022-05-26 Thread Lu Baolu
Not used anywhere. Cleanup it to avoid dead code.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 0f9df5a19ef7..a22adfbdf870 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -543,7 +543,6 @@ struct dmar_domain {
u8 set_pte_snp:1;
 
struct list_head devices;   /* all devices' list */
-   struct iova_domain iovad;   /* iova's that belong to this domain */
 
struct dma_pte  *pgd;   /* virtual address */
int gaw;/* max guest address width */
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 00/12] iommu/vt-d: Optimize the use of locks

2022-05-26 Thread Lu Baolu
Hi folks,

This series tries to optimize the uses of two locks in the Intel IOMMU
driver:

- The intel_iommu::lock is used to protect the IOMMU resources shared by
  devices. They include the IOMMU root and context tables, the pasid
  tables and the domain IDs.
- The global device_domain_lock is used to protect the global and the
  per-domain device tracking lists.

The optimization includes:

- Remove the unnecessary global device tracking list;
- Remove unnecessary locking;
- Reduce the scope of the lock as much as possible, that is, use the
  lock only where necessary;
- The global lock is transformed into a local lock to improve the
  efficiency. 
- Convert spinlock into mutex so that non-critical functions could be
  called when the lock is held.

This series is also available on github:
https://github.com/LuBaolu/intel-iommu/commits/intel-iommu-lock-optimization-v1

Your comments and suggestions are very appreciated.

Best regards,
baolu

Lu Baolu (12):
  iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs
  iommu/vt-d: Remove for_each_device_domain()
  iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()
  iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()
  iommu/vt-d: Unncessary spinlock for root table alloc and free
  iommu/vt-d: Acquiring lock in domain ID allocation helpers
  iommu/vt-d: Acquiring lock in pasid manipulation helpers
  iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()
  iommu/vt-d: Check device list of domain in domain free path
  iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller
  iommu/vt-d: Use device_domain_lock accurately
  iommu/vt-d: Convert device_domain_lock into per-domain mutex

 drivers/iommu/intel/iommu.h   |   5 +-
 drivers/iommu/intel/debugfs.c |  26 ++--
 drivers/iommu/intel/iommu.c   | 271 +-
 drivers/iommu/intel/pasid.c   | 143 +-
 drivers/iommu/intel/svm.c |   5 +-
 5 files changed, 147 insertions(+), 303 deletions(-)

-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 01/12] iommu/vt-d: Use iommu_get_domain_for_dev() in debugfs

2022-05-26 Thread Lu Baolu
Retrieve the attached domain for a device through the generic interface
exposed by the iommu core. This also makes device_domain_lock static.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h   |  1 -
 drivers/iommu/intel/debugfs.c | 20 
 drivers/iommu/intel/iommu.c   |  2 +-
 3 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index a22adfbdf870..8a6d64d726c0 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -480,7 +480,6 @@ enum {
 #define VTD_FLAG_SVM_CAPABLE   (1 << 2)
 
 extern int intel_iommu_sm;
-extern spinlock_t device_domain_lock;
 
 #define sm_supported(iommu)(intel_iommu_sm && ecap_smts((iommu)->ecap))
 #define pasid_supported(iommu) (sm_supported(iommu) && \
diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index d927ef10641b..eea8727aa7bc 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -344,19 +344,21 @@ static void pgtable_walk_level(struct seq_file *m, struct 
dma_pte *pde,
 
 static int show_device_domain_translation(struct device *dev, void *data)
 {
-   struct device_domain_info *info = dev_iommu_priv_get(dev);
-   struct dmar_domain *domain = info->domain;
+   struct dmar_domain *dmar_domain;
+   struct iommu_domain *domain;
struct seq_file *m = data;
u64 path[6] = { 0 };
 
+   domain = iommu_get_domain_for_dev(dev);
if (!domain)
return 0;
 
+   dmar_domain = to_dmar_domain(domain);
seq_printf(m, "Device %s @0x%llx\n", dev_name(dev),
-  (u64)virt_to_phys(domain->pgd));
+  (u64)virt_to_phys(dmar_domain->pgd));
seq_puts(m, 
"IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n");
 
-   pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path);
+   pgtable_walk_level(m, dmar_domain->pgd, dmar_domain->agaw + 2, 0, path);
seq_putc(m, '\n');
 
return 0;
@@ -364,15 +366,9 @@ static int show_device_domain_translation(struct device 
*dev, void *data)
 
 static int domain_translation_struct_show(struct seq_file *m, void *unused)
 {
-   unsigned long flags;
-   int ret;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
-   ret = bus_for_each_dev(&pci_bus_type, NULL, m,
-  show_device_domain_translation);
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-
-   return ret;
+   return bus_for_each_dev(&pci_bus_type, NULL, m,
+   show_device_domain_translation);
 }
 DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
 
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1af4b6562266..cacae8bdaa65 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -314,7 +314,7 @@ static int iommu_skip_te_disable;
 #define IDENTMAP_GFX   2
 #define IDENTMAP_AZALIA4
 
-DEFINE_SPINLOCK(device_domain_lock);
+static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
 /*
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 02/12] iommu/vt-d: Remove for_each_device_domain()

2022-05-26 Thread Lu Baolu
The per-device device_domain_info data could be retrieved from the
device itself. There's no need to search a global list.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h |  2 --
 drivers/iommu/intel/iommu.c | 25 -
 drivers/iommu/intel/pasid.c | 37 +++--
 3 files changed, 11 insertions(+), 53 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 8a6d64d726c0..2f4a5b9509c0 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -727,8 +727,6 @@ extern int dmar_ir_support(void);
 void *alloc_pgtable_page(int node);
 void free_pgtable_page(void *vaddr);
 struct intel_iommu *domain_get_iommu(struct dmar_domain *domain);
-int for_each_device_domain(int (*fn)(struct device_domain_info *info,
-void *data), void *data);
 void iommu_flush_write_buffer(struct intel_iommu *iommu);
 int intel_iommu_enable_pasid(struct intel_iommu *iommu, struct device *dev);
 struct intel_iommu *device_to_iommu(struct device *dev, u8 *bus, u8 *devfn);
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cacae8bdaa65..6549b09d7f32 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -316,31 +316,6 @@ static int iommu_skip_te_disable;
 
 static DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
-
-/*
- * Iterate over elements in device_domain_list and call the specified
- * callback @fn against each element.
- */
-int for_each_device_domain(int (*fn)(struct device_domain_info *info,
-void *data), void *data)
-{
-   int ret = 0;
-   unsigned long flags;
-   struct device_domain_info *info;
-
-   spin_lock_irqsave(&device_domain_lock, flags);
-   list_for_each_entry(info, &device_domain_list, global) {
-   ret = fn(info, data);
-   if (ret) {
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-   return ret;
-   }
-   }
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-
-   return 0;
-}
-
 const struct iommu_ops intel_iommu_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index b2ac5869286f..0627d6465f25 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -103,36 +103,20 @@ device_detach_pasid_table(struct device_domain_info *info,
 }
 
 struct pasid_table_opaque {
-   struct pasid_table  **pasid_table;
-   int segment;
-   int bus;
-   int devfn;
+   struct pasid_table  *pasid_table;
 };
 
-static int search_pasid_table(struct device_domain_info *info, void *opaque)
-{
-   struct pasid_table_opaque *data = opaque;
-
-   if (info->iommu->segment == data->segment &&
-   info->bus == data->bus &&
-   info->devfn == data->devfn &&
-   info->pasid_table) {
-   *data->pasid_table = info->pasid_table;
-   return 1;
-   }
-
-   return 0;
-}
-
 static int get_alias_pasid_table(struct pci_dev *pdev, u16 alias, void *opaque)
 {
struct pasid_table_opaque *data = opaque;
+   struct device_domain_info *info;
 
-   data->segment = pci_domain_nr(pdev->bus);
-   data->bus = PCI_BUS_NUM(alias);
-   data->devfn = alias & 0xff;
+   info = dev_iommu_priv_get(&pdev->dev);
+   if (!info || !info->pasid_table)
+   return 0;
 
-   return for_each_device_domain(&search_pasid_table, data);
+   data->pasid_table = info->pasid_table;
+   return 1;
 }
 
 /*
@@ -141,9 +125,9 @@ static int get_alias_pasid_table(struct pci_dev *pdev, u16 
alias, void *opaque)
  */
 int intel_pasid_alloc_table(struct device *dev)
 {
+   struct pasid_table_opaque data = { NULL };
struct device_domain_info *info;
struct pasid_table *pasid_table;
-   struct pasid_table_opaque data;
struct page *pages;
u32 max_pasid = 0;
int ret, order;
@@ -155,11 +139,12 @@ int intel_pasid_alloc_table(struct device *dev)
return -EINVAL;
 
/* DMA alias device already has a pasid table, use it: */
-   data.pasid_table = &pasid_table;
ret = pci_for_each_dma_alias(to_pci_dev(dev),
 &get_alias_pasid_table, &data);
-   if (ret)
+   if (ret) {
+   pasid_table = data.pasid_table;
goto attach_out;
+   }
 
pasid_table = kzalloc(sizeof(*pasid_table), GFP_KERNEL);
if (!pasid_table)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 03/12] iommu/vt-d: Remove clearing translation data in disable_dmar_iommu()

2022-05-26 Thread Lu Baolu
The disable_dmar_iommu() is called when IOMMU initialzation fails or
the IOMMU is hot-removed from the system. In both cases, there is no
need to clear the IOMMU translation data structures for devices.

On the initialization path, the device probing only happens after the
IOMMU is initialized successfully, hence there're no translation data
structures.

On the hot-remove path, there is no real use case where the IOMMU is
hot-removed, but the devices that it manages are still alive in the
system. The translation data structures were torn down during device
release, hence there's no need to repeat it in IOMMU hot-remove path
either.

So, let's remove this unnecessary code.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 15 ---
 1 file changed, 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6549b09d7f32..25d4c5200526 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1715,24 +1715,9 @@ static int iommu_init_domains(struct intel_iommu *iommu)
 
 static void disable_dmar_iommu(struct intel_iommu *iommu)
 {
-   struct device_domain_info *info, *tmp;
-   unsigned long flags;
-
if (!iommu->domain_ids)
return;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
-   list_for_each_entry_safe(info, tmp, &device_domain_list, global) {
-   if (info->iommu != iommu)
-   continue;
-
-   if (!info->dev || !info->domain)
-   continue;
-
-   __dmar_remove_one_dev_info(info);
-   }
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-
if (iommu->gcmd & DMA_GCMD_TE)
iommu_disable_translation(iommu);
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 05/12] iommu/vt-d: Unncessary spinlock for root table alloc and free

2022-05-26 Thread Lu Baolu
The IOMMU root table is allocated and freed in the IOMMU initialization
code in static boot or hot-plug paths. There's no need for a spinlock.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 18 +-
 1 file changed, 5 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index bbdd3417a1b1..2d5f02b85de8 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -809,14 +809,12 @@ static int device_context_mapped(struct intel_iommu 
*iommu, u8 bus, u8 devfn)
 
 static void free_context_table(struct intel_iommu *iommu)
 {
-   int i;
-   unsigned long flags;
struct context_entry *context;
+   int i;
+
+   if (!iommu->root_entry)
+   return;
 
-   spin_lock_irqsave(&iommu->lock, flags);
-   if (!iommu->root_entry) {
-   goto out;
-   }
for (i = 0; i < ROOT_ENTRY_NR; i++) {
context = iommu_context_addr(iommu, i, 0, 0);
if (context)
@@ -828,12 +826,10 @@ static void free_context_table(struct intel_iommu *iommu)
context = iommu_context_addr(iommu, i, 0x80, 0);
if (context)
free_pgtable_page(context);
-
}
+
free_pgtable_page(iommu->root_entry);
iommu->root_entry = NULL;
-out:
-   spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
 #ifdef CONFIG_DMAR_DEBUG
@@ -1232,7 +1228,6 @@ static void domain_unmap(struct dmar_domain *domain, 
unsigned long start_pfn,
 static int iommu_alloc_root_entry(struct intel_iommu *iommu)
 {
struct root_entry *root;
-   unsigned long flags;
 
root = (struct root_entry *)alloc_pgtable_page(iommu->node);
if (!root) {
@@ -1242,10 +1237,7 @@ static int iommu_alloc_root_entry(struct intel_iommu 
*iommu)
}
 
__iommu_flush_cache(iommu, root, ROOT_SIZE);
-
-   spin_lock_irqsave(&iommu->lock, flags);
iommu->root_entry = root;
-   spin_unlock_irqrestore(&iommu->lock, flags);
 
return 0;
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 04/12] iommu/vt-d: Use pci_get_domain_bus_and_slot() in pgtable_walk()

2022-05-26 Thread Lu Baolu
Use pci_get_domain_bus_and_slot() instead of searching the global list
to retrieve the pci device pointer. This removes device_domain_list
global list as there are no consumers anymore.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h |  1 -
 drivers/iommu/intel/iommu.c | 33 ++---
 2 files changed, 6 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 2f4a5b9509c0..6724703d573b 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -609,7 +609,6 @@ struct intel_iommu {
 /* PCI domain-device relationship */
 struct device_domain_info {
struct list_head link;  /* link to domain siblings */
-   struct list_head global; /* link to global list */
struct list_head table; /* link to pasid table */
u32 segment;/* PCI segment number */
u8 bus; /* PCI bus number */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 25d4c5200526..bbdd3417a1b1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -131,8 +131,6 @@ static struct intel_iommu **g_iommus;
 
 static void __init check_tylersburg_isoch(void);
 static int rwbf_quirk;
-static inline struct device_domain_info *
-dmar_search_domain_by_dev_info(int segment, int bus, int devfn);
 
 /*
  * set to 1 to panic kernel if can't successfully enable VT-d
@@ -315,7 +313,6 @@ static int iommu_skip_te_disable;
 #define IDENTMAP_AZALIA4
 
 static DEFINE_SPINLOCK(device_domain_lock);
-static LIST_HEAD(device_domain_list);
 const struct iommu_ops intel_iommu_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
@@ -845,9 +842,14 @@ static void pgtable_walk(struct intel_iommu *iommu, 
unsigned long pfn, u8 bus, u
struct device_domain_info *info;
struct dma_pte *parent, *pte;
struct dmar_domain *domain;
+   struct pci_dev *pdev;
int offset, level;
 
-   info = dmar_search_domain_by_dev_info(iommu->segment, bus, devfn);
+   pdev = pci_get_domain_bus_and_slot(iommu->segment, bus, devfn);
+   if (!pdev)
+   return;
+
+   info = dev_iommu_priv_get(&pdev->dev);
if (!info || !info->domain) {
pr_info("device [%02x:%02x.%d] not probed\n",
bus, PCI_SLOT(devfn), PCI_FUNC(devfn));
@@ -2348,19 +2350,6 @@ static void domain_remove_dev_info(struct dmar_domain 
*domain)
spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
-static inline struct device_domain_info *
-dmar_search_domain_by_dev_info(int segment, int bus, int devfn)
-{
-   struct device_domain_info *info;
-
-   list_for_each_entry(info, &device_domain_list, global)
-   if (info->segment == segment && info->bus == bus &&
-   info->devfn == devfn)
-   return info;
-
-   return NULL;
-}
-
 static int domain_setup_first_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev,
@@ -4564,7 +4553,6 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
struct pci_dev *pdev = dev_is_pci(dev) ? to_pci_dev(dev) : NULL;
struct device_domain_info *info;
struct intel_iommu *iommu;
-   unsigned long flags;
u8 bus, devfn;
 
iommu = device_to_iommu(dev, &bus, &devfn);
@@ -4607,10 +4595,7 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
}
}
 
-   spin_lock_irqsave(&device_domain_lock, flags);
-   list_add(&info->global, &device_domain_list);
dev_iommu_priv_set(dev, info);
-   spin_unlock_irqrestore(&device_domain_lock, flags);
 
return &iommu->iommu;
 }
@@ -4618,15 +4603,9 @@ static struct iommu_device 
*intel_iommu_probe_device(struct device *dev)
 static void intel_iommu_release_device(struct device *dev)
 {
struct device_domain_info *info = dev_iommu_priv_get(dev);
-   unsigned long flags;
 
dmar_remove_one_dev_info(dev);
-
-   spin_lock_irqsave(&device_domain_lock, flags);
dev_iommu_priv_set(dev, NULL);
-   list_del(&info->global);
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-
kfree(info);
set_dma_ops(dev, NULL);
 }
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 06/12] iommu/vt-d: Acquiring lock in domain ID allocation helpers

2022-05-26 Thread Lu Baolu
The iommu->lock is used to protect the per-IOMMU domain ID resource.
Move the spinlock acquisition/release into the helpers where domain
IDs are allocated and freed. The device_domain_lock is irrelevant to
domain ID resources, remove its assertion as well.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 25 +
 1 file changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2d5f02b85de8..0da937ce0534 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -1774,16 +1774,13 @@ static struct dmar_domain *alloc_domain(unsigned int 
type)
return domain;
 }
 
-/* Must be called with iommu->lock */
 static int domain_attach_iommu(struct dmar_domain *domain,
   struct intel_iommu *iommu)
 {
unsigned long ndomains;
-   int num;
-
-   assert_spin_locked(&device_domain_lock);
-   assert_spin_locked(&iommu->lock);
+   int num, ret = 0;
 
+   spin_lock(&iommu->lock);
domain->iommu_refcnt[iommu->seq_id] += 1;
if (domain->iommu_refcnt[iommu->seq_id] == 1) {
ndomains = cap_ndoms(iommu->cap);
@@ -1792,7 +1789,8 @@ static int domain_attach_iommu(struct dmar_domain *domain,
if (num >= ndomains) {
pr_err("%s: No free domain ids\n", iommu->name);
domain->iommu_refcnt[iommu->seq_id] -= 1;
-   return -ENOSPC;
+   ret = -ENOSPC;
+   goto out_unlock;
}
 
set_bit(num, iommu->domain_ids);
@@ -1801,7 +1799,9 @@ static int domain_attach_iommu(struct dmar_domain *domain,
domain_update_iommu_cap(domain);
}
 
-   return 0;
+out_unlock:
+   spin_unlock(&iommu->lock);
+   return ret;
 }
 
 static void domain_detach_iommu(struct dmar_domain *domain,
@@ -1809,9 +1809,7 @@ static void domain_detach_iommu(struct dmar_domain 
*domain,
 {
int num;
 
-   assert_spin_locked(&device_domain_lock);
-   assert_spin_locked(&iommu->lock);
-
+   spin_lock(&iommu->lock);
domain->iommu_refcnt[iommu->seq_id] -= 1;
if (domain->iommu_refcnt[iommu->seq_id] == 0) {
num = domain->iommu_did[iommu->seq_id];
@@ -1819,6 +1817,7 @@ static void domain_detach_iommu(struct dmar_domain 
*domain,
domain_update_iommu_cap(domain);
domain->iommu_did[iommu->seq_id] = 0;
}
+   spin_unlock(&iommu->lock);
 }
 
 static inline int guestwidth_to_adjustwidth(int gaw)
@@ -2471,9 +2470,7 @@ static int domain_add_dev_info(struct dmar_domain 
*domain, struct device *dev)
 
spin_lock_irqsave(&device_domain_lock, flags);
info->domain = domain;
-   spin_lock(&iommu->lock);
ret = domain_attach_iommu(domain, iommu);
-   spin_unlock(&iommu->lock);
if (ret) {
spin_unlock_irqrestore(&device_domain_lock, flags);
return ret;
@@ -4158,7 +4155,6 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
 {
struct dmar_domain *domain;
struct intel_iommu *iommu;
-   unsigned long flags;
 
assert_spin_locked(&device_domain_lock);
 
@@ -4179,10 +4175,7 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
}
 
list_del(&info->link);
-
-   spin_lock_irqsave(&iommu->lock, flags);
domain_detach_iommu(domain, iommu);
-   spin_unlock_irqrestore(&iommu->lock, flags);
 }
 
 static void dmar_remove_one_dev_info(struct device *dev)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 07/12] iommu/vt-d: Acquiring lock in pasid manipulation helpers

2022-05-26 Thread Lu Baolu
The iommu->lock is used to protect the per-IOMMU pasid directory table
and pasid table. Move the spinlock acquisition/release into the helpers
to make the code self-contained.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c |   2 -
 drivers/iommu/intel/pasid.c | 106 +++-
 drivers/iommu/intel/svm.c   |   5 +-
 3 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 0da937ce0534..ccf3c7fa26f1 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2488,7 +2488,6 @@ static int domain_add_dev_info(struct dmar_domain 
*domain, struct device *dev)
}
 
/* Setup the PASID entry for requests without PASID: */
-   spin_lock_irqsave(&iommu->lock, flags);
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain,
dev, PASID_RID2PASID);
@@ -2498,7 +2497,6 @@ static int domain_add_dev_info(struct dmar_domain 
*domain, struct device *dev)
else
ret = intel_pasid_setup_second_level(iommu, domain,
dev, PASID_RID2PASID);
-   spin_unlock_irqrestore(&iommu->lock, flags);
if (ret) {
dev_err(dev, "Setup RID2PASID failed\n");
dmar_remove_one_dev_info(dev);
diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0627d6465f25..bab5c385fa1e 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -498,17 +498,17 @@ void intel_pasid_tear_down_entry(struct intel_iommu 
*iommu, struct device *dev,
struct pasid_entry *pte;
u16 did, pgtt;
 
+   spin_lock(&iommu->lock);
pte = intel_pasid_get_entry(dev, pasid);
-   if (WARN_ON(!pte))
-   return;
-
-   if (!pasid_pte_is_present(pte))
+   if (WARN_ON(!pte) || !pasid_pte_is_present(pte)) {
+   spin_unlock(&iommu->lock);
return;
+   }
 
did = pasid_get_domain_id(pte);
pgtt = pasid_pte_get_pgtt(pte);
-
intel_pasid_clear_entry(dev, pasid, fault_ignore);
+   spin_unlock(&iommu->lock);
 
if (!ecap_coherent(iommu->ecap))
clflush_cache_range(pte, sizeof(*pte));
@@ -544,21 +544,17 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
}
 }
 
-static inline int pasid_enable_wpe(struct pasid_entry *pte)
+static struct pasid_entry *get_non_present_pasid_entry(struct device *dev,
+  u32 pasid)
 {
-#ifdef CONFIG_X86
-   unsigned long cr0 = read_cr0();
+   struct pasid_entry *pte;
 
-   /* CR0.WP is normally set but just to be sure */
-   if (unlikely(!(cr0 & X86_CR0_WP))) {
-   pr_err_ratelimited("No CPU write protect!\n");
-   return -EINVAL;
-   }
-#endif
-   pasid_set_wpe(pte);
+   pte = intel_pasid_get_entry(dev, pasid);
+   if (!pte || pasid_pte_is_present(pte))
+   return NULL;
 
-   return 0;
-};
+   return pte;
+}
 
 /*
  * Set up the scalable mode pasid table entry for first only
@@ -576,39 +572,47 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
return -EINVAL;
}
 
-   pte = intel_pasid_get_entry(dev, pasid);
-   if (WARN_ON(!pte))
+   if ((flags & PASID_FLAG_SUPERVISOR_MODE)) {
+#ifdef CONFIG_X86
+   unsigned long cr0 = read_cr0();
+
+   /* CR0.WP is normally set but just to be sure */
+   if (unlikely(!(cr0 & X86_CR0_WP))) {
+   pr_err("No CPU write protect!\n");
+   return -EINVAL;
+   }
+#endif
+   if (!ecap_srs(iommu->ecap)) {
+   pr_err("No supervisor request support on %s\n",
+  iommu->name);
+   return -EINVAL;
+   }
+   }
+
+   if ((flags & PASID_FLAG_FL5LP) && !cap_5lp_support(iommu->cap)) {
+   pr_err("No 5-level paging support for first-level on %s\n",
+  iommu->name);
return -EINVAL;
+   }
 
-   /* Caller must ensure PASID entry is not in use. */
-   if (pasid_pte_is_present(pte))
-   return -EBUSY;
+   spin_lock(&iommu->lock);
+   pte = get_non_present_pasid_entry(dev, pasid);
+   if (!pte) {
+   spin_unlock(&iommu->lock);
+   return -ENODEV;
+   }
 
pasid_clear_entry(pte);
 
/* Setup the first level page table pointer: */
pasid_set_flptr(pte, (u64)__pa(pgd));
if (flags & PASID_FLAG_SUPERVISOR_MODE) {
-   if (!ecap_srs(iommu->ecap)) {
-   pr_err("No supervisor request support on %s\n",
-

[PATCH 08/12] iommu/vt-d: Replace spin_lock_irqsave() with spin_lock()

2022-05-26 Thread Lu Baolu
The iommu->lock is used to protect changes in root/context/pasid tables
and domain ID allocation. There's no use case to change these resources
in any interrupt context. Hence there's no need to disable interrupts
when helding the spinlock.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/debugfs.c |  6 ++
 drivers/iommu/intel/iommu.c   | 17 +++--
 2 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/debugfs.c b/drivers/iommu/intel/debugfs.c
index eea8727aa7bc..ca1adceeba19 100644
--- a/drivers/iommu/intel/debugfs.c
+++ b/drivers/iommu/intel/debugfs.c
@@ -263,10 +263,9 @@ static void ctx_tbl_walk(struct seq_file *m, struct 
intel_iommu *iommu, u16 bus)
 
 static void root_tbl_walk(struct seq_file *m, struct intel_iommu *iommu)
 {
-   unsigned long flags;
u16 bus;
 
-   spin_lock_irqsave(&iommu->lock, flags);
+   spin_lock(&iommu->lock);
seq_printf(m, "IOMMU %s: Root Table Address: 0x%llx\n", iommu->name,
   (u64)virt_to_phys(iommu->root_entry));
seq_puts(m, 
"B.D.F\tRoot_entry\t\t\t\tContext_entry\t\t\t\tPASID\tPASID_table_entry\n");
@@ -278,8 +277,7 @@ static void root_tbl_walk(struct seq_file *m, struct 
intel_iommu *iommu)
 */
for (bus = 0; bus < 256; bus++)
ctx_tbl_walk(m, iommu, bus);
-
-   spin_unlock_irqrestore(&iommu->lock, flags);
+   spin_unlock(&iommu->lock);
 }
 
 static int dmar_translation_struct_show(struct seq_file *m, void *unused)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ccf3c7fa26f1..2e195a639502 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -797,13 +797,12 @@ static int device_context_mapped(struct intel_iommu 
*iommu, u8 bus, u8 devfn)
 {
struct context_entry *context;
int ret = 0;
-   unsigned long flags;
 
-   spin_lock_irqsave(&iommu->lock, flags);
+   spin_lock(&iommu->lock);
context = iommu_context_addr(iommu, bus, devfn, 0);
if (context)
ret = context_present(context);
-   spin_unlock_irqrestore(&iommu->lock, flags);
+   spin_unlock(&iommu->lock);
return ret;
 }
 
@@ -2287,16 +2286,15 @@ static void domain_context_clear_one(struct 
device_domain_info *info, u8 bus, u8
 {
struct intel_iommu *iommu = info->iommu;
struct context_entry *context;
-   unsigned long flags;
u16 did_old;
 
if (!iommu)
return;
 
-   spin_lock_irqsave(&iommu->lock, flags);
+   spin_lock(&iommu->lock);
context = iommu_context_addr(iommu, bus, devfn, 0);
if (!context) {
-   spin_unlock_irqrestore(&iommu->lock, flags);
+   spin_unlock(&iommu->lock);
return;
}
 
@@ -2311,7 +2309,7 @@ static void domain_context_clear_one(struct 
device_domain_info *info, u8 bus, u8
 
context_clear_entry(context);
__iommu_flush_cache(iommu, context, sizeof(*context));
-   spin_unlock_irqrestore(&iommu->lock, flags);
+   spin_unlock(&iommu->lock);
iommu->flush.flush_context(iommu,
   did_old,
   (((u16)bus) << 8) | devfn,
@@ -2764,7 +2762,6 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
struct root_entry *old_rt;
phys_addr_t old_rt_phys;
int ctxt_table_entries;
-   unsigned long flags;
u64 rtaddr_reg;
int bus, ret;
bool new_ext, ext;
@@ -2807,7 +2804,7 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
}
}
 
-   spin_lock_irqsave(&iommu->lock, flags);
+   spin_lock(&iommu->lock);
 
/* Context tables are copied, now write them to the root_entry table */
for (bus = 0; bus < 256; bus++) {
@@ -2826,7 +2823,7 @@ static int copy_translation_tables(struct intel_iommu 
*iommu)
iommu->root_entry[bus].hi = val;
}
 
-   spin_unlock_irqrestore(&iommu->lock, flags);
+   spin_unlock(&iommu->lock);
 
kfree(ctxt_tbls);
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 09/12] iommu/vt-d: Check device list of domain in domain free path

2022-05-26 Thread Lu Baolu
When the IOMMU domain is about to be freed, it should not be set on any
device. Instead of silently dealing with some bug cases, it's better to
trigger a warning to report and fix any potential bugs at the first time.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 17 ++---
 1 file changed, 2 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 2e195a639502..6f3119c68cd2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -294,7 +294,6 @@ static LIST_HEAD(dmar_satc_units);
 /* bitmap for indexing intel_iommus */
 static int g_num_of_iommus;
 
-static void domain_remove_dev_info(struct dmar_domain *domain);
 static void dmar_remove_one_dev_info(struct device *dev);
 static void __dmar_remove_one_dev_info(struct device_domain_info *info);
 
@@ -1835,9 +1834,8 @@ static inline int guestwidth_to_adjustwidth(int gaw)
 
 static void domain_exit(struct dmar_domain *domain)
 {
-
-   /* Remove associated devices and clear attached or cached domains */
-   domain_remove_dev_info(domain);
+   if (WARN_ON(!list_empty(&domain->devices)))
+   return;
 
if (domain->pgd) {
LIST_HEAD(freelist);
@@ -2328,17 +2326,6 @@ static void domain_context_clear_one(struct 
device_domain_info *info, u8 bus, u8
__iommu_flush_dev_iotlb(info, 0, MAX_AGAW_PFN_WIDTH);
 }
 
-static void domain_remove_dev_info(struct dmar_domain *domain)
-{
-   struct device_domain_info *info, *tmp;
-   unsigned long flags;
-
-   spin_lock_irqsave(&device_domain_lock, flags);
-   list_for_each_entry_safe(info, tmp, &domain->devices, link)
-   __dmar_remove_one_dev_info(info);
-   spin_unlock_irqrestore(&device_domain_lock, flags);
-}
-
 static int domain_setup_first_level(struct intel_iommu *iommu,
struct dmar_domain *domain,
struct device *dev,
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 10/12] iommu/vt-d: Fold __dmar_remove_one_dev_info() into its caller

2022-05-26 Thread Lu Baolu
Fold __dmar_remove_one_dev_info() into dmar_remove_one_dev_info() which
is its only caller. Make the spin lock critical range only cover the
device list change code and remove some unnecessary checks.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 34 +-
 1 file changed, 9 insertions(+), 25 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 6f3119c68cd2..d02ddd338afd 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -295,7 +295,6 @@ static LIST_HEAD(dmar_satc_units);
 static int g_num_of_iommus;
 
 static void dmar_remove_one_dev_info(struct device *dev);
-static void __dmar_remove_one_dev_info(struct device_domain_info *info);
 
 int dmar_disabled = !IS_ENABLED(CONFIG_INTEL_IOMMU_DEFAULT_ON);
 int intel_iommu_sm = IS_ENABLED(CONFIG_INTEL_IOMMU_SCALABLE_MODE_DEFAULT_ON);
@@ -4133,20 +4132,14 @@ static void domain_context_clear(struct 
device_domain_info *info)
   &domain_context_clear_one_cb, info);
 }
 
-static void __dmar_remove_one_dev_info(struct device_domain_info *info)
+static void dmar_remove_one_dev_info(struct device *dev)
 {
-   struct dmar_domain *domain;
-   struct intel_iommu *iommu;
-
-   assert_spin_locked(&device_domain_lock);
-
-   if (WARN_ON(!info))
-   return;
-
-   iommu = info->iommu;
-   domain = info->domain;
+   struct device_domain_info *info = dev_iommu_priv_get(dev);
+   struct dmar_domain *domain = info->domain;
+   struct intel_iommu *iommu = info->iommu;
+   unsigned long flags;
 
-   if (info->dev && !dev_is_real_dma_subdevice(info->dev)) {
+   if (!dev_is_real_dma_subdevice(info->dev)) {
if (dev_is_pci(info->dev) && sm_supported(iommu))
intel_pasid_tear_down_entry(iommu, info->dev,
PASID_RID2PASID, false);
@@ -4156,20 +4149,11 @@ static void __dmar_remove_one_dev_info(struct 
device_domain_info *info)
intel_pasid_free_table(info->dev);
}
 
-   list_del(&info->link);
-   domain_detach_iommu(domain, iommu);
-}
-
-static void dmar_remove_one_dev_info(struct device *dev)
-{
-   struct device_domain_info *info;
-   unsigned long flags;
-
spin_lock_irqsave(&device_domain_lock, flags);
-   info = dev_iommu_priv_get(dev);
-   if (info)
-   __dmar_remove_one_dev_info(info);
+   list_del(&info->link);
spin_unlock_irqrestore(&device_domain_lock, flags);
+
+   domain_detach_iommu(domain, iommu);
 }
 
 static int md_domain_init(struct dmar_domain *domain, int guest_width)
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 11/12] iommu/vt-d: Use device_domain_lock accurately

2022-05-26 Thread Lu Baolu
The device_domain_lock is used to protect the device tracking list of
a domain. Remove unnecessary spin_lock/unlock()'s and move the necessary
ones around the list access.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.c | 68 +++--
 1 file changed, 27 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d02ddd338afd..f8aa8649dc6f 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -534,16 +534,10 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
 {
struct device_domain_info *info;
int nid = NUMA_NO_NODE;
+   unsigned long flags;
 
-   assert_spin_locked(&device_domain_lock);
-
-   if (list_empty(&domain->devices))
-   return NUMA_NO_NODE;
-
+   spin_lock_irqsave(&device_domain_lock, flags);
list_for_each_entry(info, &domain->devices, link) {
-   if (!info->dev)
-   continue;
-
/*
 * There could possibly be multiple device numa nodes as devices
 * within the same domain may sit behind different IOMMUs. There
@@ -554,6 +548,7 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
if (nid != NUMA_NO_NODE)
break;
}
+   spin_unlock_irqrestore(&device_domain_lock, flags);
 
return nid;
 }
@@ -1376,49 +1371,50 @@ static void __iommu_flush_iotlb(struct intel_iommu 
*iommu, u16 did,
 }
 
 static struct device_domain_info *
-iommu_support_dev_iotlb (struct dmar_domain *domain, struct intel_iommu *iommu,
-u8 bus, u8 devfn)
+iommu_support_dev_iotlb(struct dmar_domain *domain, struct intel_iommu *iommu,
+   u8 bus, u8 devfn)
 {
-   struct device_domain_info *info;
-
-   assert_spin_locked(&device_domain_lock);
+   struct device_domain_info *info = NULL, *tmp;
+   unsigned long flags;
 
if (!iommu->qi)
return NULL;
 
-   list_for_each_entry(info, &domain->devices, link)
-   if (info->iommu == iommu && info->bus == bus &&
-   info->devfn == devfn) {
-   if (info->ats_supported && info->dev)
-   return info;
+   spin_lock_irqsave(&device_domain_lock, flags);
+   list_for_each_entry(tmp, &domain->devices, link) {
+   if (tmp->iommu == iommu && tmp->bus == bus &&
+   tmp->devfn == devfn) {
+   if (tmp->ats_supported)
+   info = tmp;
break;
}
+   }
+   spin_unlock_irqrestore(&device_domain_lock, flags);
 
-   return NULL;
+   return info;
 }
 
 static void domain_update_iotlb(struct dmar_domain *domain)
 {
struct device_domain_info *info;
bool has_iotlb_device = false;
+   unsigned long flags;
 
-   assert_spin_locked(&device_domain_lock);
-
-   list_for_each_entry(info, &domain->devices, link)
+   spin_lock_irqsave(&device_domain_lock, flags);
+   list_for_each_entry(info, &domain->devices, link) {
if (info->ats_enabled) {
has_iotlb_device = true;
break;
}
-
+   }
domain->has_iotlb_device = has_iotlb_device;
+   spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
 static void iommu_enable_dev_iotlb(struct device_domain_info *info)
 {
struct pci_dev *pdev;
 
-   assert_spin_locked(&device_domain_lock);
-
if (!info || !dev_is_pci(info->dev))
return;
 
@@ -1464,8 +1460,6 @@ static void iommu_disable_dev_iotlb(struct 
device_domain_info *info)
 {
struct pci_dev *pdev;
 
-   assert_spin_locked(&device_domain_lock);
-
if (!dev_is_pci(info->dev))
return;
 
@@ -1900,11 +1894,11 @@ static int domain_context_mapping_one(struct 
dmar_domain *domain,
  struct pasid_table *table,
  u8 bus, u8 devfn)
 {
+   struct device_domain_info *info =
+   iommu_support_dev_iotlb(domain, iommu, bus, devfn);
u16 did = domain->iommu_did[iommu->seq_id];
int translation = CONTEXT_TT_MULTI_LEVEL;
-   struct device_domain_info *info = NULL;
struct context_entry *context;
-   unsigned long flags;
int ret;
 
WARN_ON(did == 0);
@@ -1917,7 +1911,6 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 
BUG_ON(!domain->pgd);
 
-   spin_lock_irqsave(&device_domain_lock, flags);
spin_lock(&iommu->lock);
 
ret = -ENOMEM;
@@ -1970,7 +1963,6 @@ static int domain_context_mapping_one(struct dmar_domain 
*domain,
 * Setup the Device-TLB enable bit and Page request
 * Enable bit:
 */
-   

[PATCH 12/12] iommu/vt-d: Convert device_domain_lock into per-domain mutex

2022-05-26 Thread Lu Baolu
Using a global device_domain_lock spinlock to protect per-domain device
tracking lists is an inefficient way, especially considering this lock
is also needed in the hot paths.

On the other hand, in the iommu_unmap() path, the driver needs to iterate
over the device tracking list and flush the caches on the devices through
qi_submit_sync(), where unfortunately cpu_relax() is used. In order to
avoid holding a spinlock lock when cpu_relax() is called, this also covert
the spinlock into a mutex one. This works as the device tracking lists are
not touched in any interrupt contexts.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel/iommu.h |  1 +
 drivers/iommu/intel/iommu.c | 45 +++--
 2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/intel/iommu.h b/drivers/iommu/intel/iommu.h
index 6724703d573b..9e572ddffc08 100644
--- a/drivers/iommu/intel/iommu.h
+++ b/drivers/iommu/intel/iommu.h
@@ -541,6 +541,7 @@ struct dmar_domain {
u8 force_snooping : 1;  /* Create IOPTEs with snoop control */
u8 set_pte_snp:1;
 
+   struct mutex mutex; /* Protect device tracking lists */
struct list_head devices;   /* all devices' list */
 
struct dma_pte  *pgd;   /* virtual address */
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index f8aa8649dc6f..1815a9d73426 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -310,7 +310,6 @@ static int iommu_skip_te_disable;
 #define IDENTMAP_GFX   2
 #define IDENTMAP_AZALIA4
 
-static DEFINE_SPINLOCK(device_domain_lock);
 const struct iommu_ops intel_iommu_ops;
 
 static bool translation_pre_enabled(struct intel_iommu *iommu)
@@ -534,9 +533,8 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
 {
struct device_domain_info *info;
int nid = NUMA_NO_NODE;
-   unsigned long flags;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
+   mutex_lock(&domain->mutex);
list_for_each_entry(info, &domain->devices, link) {
/*
 * There could possibly be multiple device numa nodes as devices
@@ -548,7 +546,7 @@ static int domain_update_device_node(struct dmar_domain 
*domain)
if (nid != NUMA_NO_NODE)
break;
}
-   spin_unlock_irqrestore(&device_domain_lock, flags);
+   mutex_unlock(&domain->mutex);
 
return nid;
 }
@@ -1375,12 +1373,11 @@ iommu_support_dev_iotlb(struct dmar_domain *domain, 
struct intel_iommu *iommu,
u8 bus, u8 devfn)
 {
struct device_domain_info *info = NULL, *tmp;
-   unsigned long flags;
 
if (!iommu->qi)
return NULL;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
+   mutex_lock(&domain->mutex);
list_for_each_entry(tmp, &domain->devices, link) {
if (tmp->iommu == iommu && tmp->bus == bus &&
tmp->devfn == devfn) {
@@ -1389,7 +1386,7 @@ iommu_support_dev_iotlb(struct dmar_domain *domain, 
struct intel_iommu *iommu,
break;
}
}
-   spin_unlock_irqrestore(&device_domain_lock, flags);
+   mutex_unlock(&domain->mutex);
 
return info;
 }
@@ -1398,9 +1395,8 @@ static void domain_update_iotlb(struct dmar_domain 
*domain)
 {
struct device_domain_info *info;
bool has_iotlb_device = false;
-   unsigned long flags;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
+   mutex_lock(&domain->mutex);
list_for_each_entry(info, &domain->devices, link) {
if (info->ats_enabled) {
has_iotlb_device = true;
@@ -1408,7 +1404,7 @@ static void domain_update_iotlb(struct dmar_domain 
*domain)
}
}
domain->has_iotlb_device = has_iotlb_device;
-   spin_unlock_irqrestore(&device_domain_lock, flags);
+   mutex_unlock(&domain->mutex);
 }
 
 static void iommu_enable_dev_iotlb(struct device_domain_info *info)
@@ -1499,17 +1495,15 @@ static void __iommu_flush_dev_iotlb(struct 
device_domain_info *info,
 static void iommu_flush_dev_iotlb(struct dmar_domain *domain,
  u64 addr, unsigned mask)
 {
-   unsigned long flags;
struct device_domain_info *info;
 
if (!domain->has_iotlb_device)
return;
 
-   spin_lock_irqsave(&device_domain_lock, flags);
+   mutex_lock(&domain->mutex);
list_for_each_entry(info, &domain->devices, link)
__iommu_flush_dev_iotlb(info, addr, mask);
-
-   spin_unlock_irqrestore(&device_domain_lock, flags);
+   mutex_unlock(&domain->mutex);
 }
 
 static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
@@ -1761,6 +1755,7 @@ static struct dmar_domain *alloc_domain(unsigned int type)
domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->h