[PATCH v2 3/3] mmc: sdhci-msm: Request non-strict IOMMU mode
ile booted from USB): echo 3 > /proc/sys/vm/drop_caches dd if=/dev/mmcblk1 of=/dev/null bs=4M count=512 I attempted to run my tests for enough iterations that results stabilized and weren't too noisy. Tests were run with patches picked to the chromeos-5.4 tree (sanity checked against v5.13-rc7). I also attempted to compare to other attempts to address IOMMU problems and/or attempts to bump the cpufreq up to solve this problem: - eMMC datasheet spec: 300 MB/s "Typical Sequential Performance" NOTE: we're driving the bus at 192 MHz instead of 200 Mhz so we might not be able to achieve the full 300 MB/s. - Baseline: 210.9 MB/s - Baseline + peg cpufreq to max: 284.3 MB/s - This patch: 279.6 MB/s - This patch + peg cpufreq to max: 288.1 MB/s - Joel's IO Wait fix [1]: 258.4 MB/s - Joel's IO Wait fix [1] + peg cpufreq to max: 287.8 MB/s - TLBIVA patches [2] + [3]: 214.7 MB/s - TLBIVA patches [2] + [3] + peg cpufreq to max: 285.7 MB/s - This patch plus Joel's [1]: 280.2 MB/s - This patch plus Joel's [1] + peg...: 279.0 MB/s NOTE: I suspect something in the system was thermal throttling since there's a heat wave right now. I also spent a little bit of time trying to see if I could get the IOMMU flush for MMC out of the critical path but was unable to figure out how to do this and get good performance. Overall I'd say that the performance results above show: * It's really not straightforward to point at "one thing" that is making our eMMC performance bad. * It's certainly possible to get pretty good eMMC performance even without this patch. * This patch makes it much easier to get good eMMC performance. * No other solutions that I found resulted in quite as good eMMC performance as having this patch. Given all the above (security safety concerns are minimal and it's a nice performance win), I'm proposing that running SDHCI on Qualcomm SoCs in non-strict mode is the right thing to do until such point in time as someone can come up with a better solution to get good SD/eMMC performance without it. Now that we've decided we want the SD/MMC controller in non-strict mode, we need to figure out how to make it happen. We will take advantage of the fact that on Qualcomm IOMMUs we know that SD/MMC controllers are in a domain by themselves and hook in when initting the domain context. In response to a previous version of this series there had been discussion [4] of having this driven from a device tree property and this solution doesn't preclude that but is a good jumping off point. NOTES: * It's likely that arguments similar to the above can be made for other SDHCI controllers. However, given that this is something that can have an impact on security it feels like we want each SDHCI controller to opt-in. I believe it is conceivable, for instance, that some SDHCI controllers might have loadable or updatable firmware. * It's also likely other peripherals will want this to get the quick performance win. That also should be fine, though anyone landing a similar patch should be very careful that it is low risk for all users of a given peripheral. * Conceivably if even this patch is considered too "high risk", we could limit this to just non-removable cards (like eMMC) by just checking the device tree. This is one nice advantage of using the pre_probe() to set this. [1] https://lore.kernel.org/r/20210618040639.3113489-1-j...@joelfernandes.org [2] https://lore.kernel.org/r/1623850736-389584-1-git-send-email-quic_c_gdj...@quicinc.com/ [3] https://lore.kernel.org/r/cover.1623981933.git.saiprakash.ran...@codeaurora.org/ [4] https://lore.kernel.org/r/20210621235248.2521620-1-diand...@chromium.org Signed-off-by: Douglas Anderson --- Changes in v2: - Now accomplish the goal by putting rules in the IOMMU driver. - Reworded commit message to clarify things pointed out by Greg. drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c index 98b3a1c2a181..bd66376d21ce 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c @@ -172,6 +172,24 @@ static const struct of_device_id qcom_smmu_client_of_match[] __maybe_unused = { { } }; +static const struct of_device_id qcom_smmu_nonstrict_of_match[] __maybe_unused = { + { .compatible = "qcom,sdhci-msm-v4" }, + { .compatible = "qcom,sdhci-msm-v5" }, + { } +}; + +static int qcom_smmu_init_context(struct arm_smmu_domain *smmu_domain, + struct io_pgtable_cfg *pgtbl_cfg, struct device *dev) +{ + const struct of_device_id *match = + of_match_device(qcom_smmu_nonstrict_of_match, dev); + + if (match) + smmu_domain->domain.strictness = IOMMU_NOT_STRICT; + + return 0; +} + static int qcom_smmu_cfg_probe(struct a
[PATCH v2 2/3] iommu/arm-smmu: Check for strictness after calling impl->init_context()
Implementations should be able to affect the strictness so reorder a little bit so we call them before we look at the strictness. Signed-off-by: Douglas Anderson --- Changes in v2: - Patch moving check for strictness in arm-smmu new for v2. drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c index 6f72c4d208ca..659d3fddffa5 100644 --- a/drivers/iommu/arm/arm-smmu/arm-smmu.c +++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c @@ -761,15 +761,15 @@ static int arm_smmu_init_domain_context(struct iommu_domain *domain, .iommu_dev = smmu->dev, }; - if (!iommu_get_dma_strict(domain)) - pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; - if (smmu->impl && smmu->impl->init_context) { ret = smmu->impl->init_context(smmu_domain, _cfg, dev); if (ret) goto out_clear_smmu; } + if (!iommu_get_dma_strict(domain)) + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_NON_STRICT; + if (smmu_domain->pgtbl_quirks) pgtbl_cfg.quirks |= smmu_domain->pgtbl_quirks; -- 2.32.0.93.g670b81a890-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2 1/3] iommu: Add per-domain strictness and combine with the global default
Strictness has the semantic of being a per-domain property. This is why iommu_get_dma_strict() takes a "struct iommu_domain" as a parameter. Let's add knowledge to the "struct iommu_domain" so we can know whether we'd like each domain to be strict. In this patch nothing sets the per-domain strictness, it just paves the way for future patches to do so. Prior to this patch we could only affect strictness at a global level. We'll still honor the global strictness level if it has been explicitly set and it's stricter than the one requested per-domain. NOTE: it's even more obvious that iommu_set_dma_strict() and iommu_get_dma_strict() are non-symmetric after this change. However, they have always been asymmetric by design [0]. The function iommu_get_dma_strict() should now make it super obvious where strictness comes from and who overides who. Though the function changed a bunch to make the logic clearer, the only two new rules should be: * Devices can force strictness for themselves, overriding the cmdline "iommu.strict=0" or a call to iommu_set_dma_strict(false)). * Devices can request non-strictness for themselves, assuming there was no cmdline "iommu.strict=1" or a call to iommu_set_dma_strict(true). [0] https://lore.kernel.org/r/a023af85-5060-0a3c-4648-b00f8b8c0...@arm.com/ Signed-off-by: Douglas Anderson --- This patch clearly will cause conflicts if John Garry's patches [1] land before it. It shouldn't be too hard to rebase, though. Essentially with John's patches it'll be impossible for what's called `cmdline_dma_strict` in my patch to be "default". It'll probably make sense to rearrange the logic/names a bit though just to make things clearer. [1] https://lore.kernel.org/r/1624016058-189713-1-git-send-email-john.ga...@huawei.com/ Changes in v2: - No longer based on changes adding strictness to "struct device" - Updated kernel-parameters docs. .../admin-guide/kernel-parameters.txt | 5 ++- drivers/iommu/iommu.c | 43 +++ include/linux/iommu.h | 7 +++ 3 files changed, 45 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index cb89dbdedc46..7675fd79f9a9 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1995,9 +1995,12 @@ throughput at the cost of reduced device isolation. Will fall back to strict mode if not supported by the relevant IOMMU driver. - 1 - Strict mode (default). + 1 - Strict mode. DMA unmap operations invalidate IOMMU hardware TLBs synchronously. + NOTE: if "iommu.strict" is not specified in the command + line then it's up to the system to try to determine the + proper strictness. iommu.passthrough= [ARM64, X86] Configure DMA to bypass the IOMMU by default. diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..7943d2105b2f 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -29,7 +29,8 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static enum iommu_strictness cmdline_dma_strict __read_mostly; +static enum iommu_strictness driver_dma_strict __read_mostly; static u32 iommu_cmd_line __read_mostly; struct iommu_group { @@ -69,7 +70,6 @@ static const char * const iommu_group_resv_type_string[] = { }; #define IOMMU_CMD_LINE_DMA_API BIT(0) -#define IOMMU_CMD_LINE_STRICT BIT(1) static int iommu_alloc_default_domain(struct iommu_group *group, struct device *dev); @@ -334,27 +334,52 @@ static int __init iommu_set_def_domain_type(char *str) } early_param("iommu.passthrough", iommu_set_def_domain_type); +static inline enum iommu_strictness bool_to_strictness(bool strict) +{ + return strict ? IOMMU_STRICT : IOMMU_NOT_STRICT; +} + static int __init iommu_dma_setup(char *str) { - int ret = kstrtobool(str, _dma_strict); + bool strict; + int ret = kstrtobool(str, ); if (!ret) - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; + cmdline_dma_strict = bool_to_strictness(strict); return ret; } early_param("iommu.strict", iommu_dma_setup); void iommu_set_dma_strict(bool strict) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + /* +* Valid transitions: +* - DEFAULT -> NON_STRICT +* - DEF
[PATCH v2 0/3] iommu: Enable non-strict DMA on QCom SD/MMC
The goal of this patch series is to get better SD/MMC performance on Qualcomm eMMC controllers and in generally nudge us forward on the path of allowing some devices to be in strict mode and others to be in non-strict mode. This patch series doesn't save the world but hopefully at least moves us in the right direction while accomplishing something useful. Specifically: - No attempt is made to touch the PCI subsystem or cleanup the way that it requests strict vs. non-strict. - No fully generic mechanism is come up with that makes it super easy for everyone to be in non-strict mode. This patch conflicts with a few other patch series that are in flight. I've tried to call them out "after the cut" in patches. I assume other in flight patches will land before this one, so I'd expect to send a rebased version when that happens, assuming that this series isn't NAKed into the ground. Changes in v2: - No longer based on changes adding strictness to "struct device" - Updated kernel-parameters docs. - Patch moving check for strictness in arm-smmu new for v2. - Now accomplish the goal by putting rules in the IOMMU driver. - Reworded commit message to clarify things pointed out by Greg. Douglas Anderson (3): iommu: Add per-domain strictness and combine with the global default iommu/arm-smmu: Check for strictness after calling impl->init_context() mmc: sdhci-msm: Request non-strict IOMMU mode .../admin-guide/kernel-parameters.txt | 5 ++- drivers/iommu/arm/arm-smmu/arm-smmu-qcom.c| 19 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-- drivers/iommu/iommu.c | 43 +++ include/linux/iommu.h | 7 +++ 5 files changed, 67 insertions(+), 13 deletions(-) -- 2.32.0.93.g670b81a890-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/6] mmc: sdhci-msm: Request non-strict IOMMU mode
line: 210.9 MB/s - Baseline + peg cpufreq to max: 284.3 MB/s - This patch: 279.6 MB/s - This patch + peg cpufreq to max: 288.1 MB/s - Joel's IO Wait fix [1]: 258.4 MB/s - Joel's IO Wait fix [1] + peg cpufreq to max: 287.8 MB/s - TLBIVA patches [2] + [3]: 214.7 MB/s - TLBIVA patches [2] + [3] + peg cpufreq to max: 285.7 MB/s - This patch plus Joel's [1]: 280.2 MB/s - This patch plus Joel's [1] + peg...: 279.0 MB/s NOTE: I suspect something in the system was thermal throttling since there's a heat wave right now. I also spent a little bit of time trying to see if I could get the IOMMU flush for MMC out of the critical path but was unable to figure out how to do this and get good performance. Overall I'd say that the performance results above show: * It's really not straightforward to point at "one thing" that is making our eMMC performance bad. * It's certainly possible to get pretty good eMMC performance even without this patch. * This patch makes it much easier to get good eMMC performance. * No other solutions that I found resulted in quite as good eMMC performance as having this patch. Given all the above (security safety concerns are minimal and it's a nice performance win), I'm proposing that running SDHCI on Qualcomm SoCs in non-strict mode is the right thing to do until such point in time as someone can come up with a better solution to get good SD/eMMC performance without it. NOTES: * It's likely that arguments similar to the above can be made for other SDHCI controllers. However, given that this is something that can have an impact on security it feels like we want each SDHCI controller to opt-in. I believe it is conceivable, for instance, that some SDHCI controllers might have loadable or updatable firmware. * It's also likely other peripherals will want this to get the quick performance win. That also should be fine, though anyone landing a similar patch should be very careful that it is low risk for all users of a given peripheral. * Conceivably if even this patch is considered too "high risk", we could limit this to just non-removable cards (like eMMC) by just checking the device tree. This is one nice advantage of using the pre_probe() to set this. [1] https://lore.kernel.org/r/20210618040639.3113489-1-j...@joelfernandes.org [2] https://lore.kernel.org/r/1623850736-389584-1-git-send-email-quic_c_gdj...@quicinc.com/ [3] https://lore.kernel.org/r/cover.1623981933.git.saiprakash.ran...@codeaurora.org/ Signed-off-by: Douglas Anderson --- drivers/mmc/host/sdhci-msm.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/mmc/host/sdhci-msm.c b/drivers/mmc/host/sdhci-msm.c index e44b7a66b73c..33ef5e6941d7 100644 --- a/drivers/mmc/host/sdhci-msm.c +++ b/drivers/mmc/host/sdhci-msm.c @@ -2465,6 +2465,13 @@ static inline void sdhci_msm_get_of_property(struct platform_device *pdev, } +static int sdhci_msm_pre_probe(struct device *dev) +{ + dev->request_non_strict_iommu = true; + + return 0; +} + static int sdhci_msm_probe(struct platform_device *pdev) { struct sdhci_host *host; @@ -2811,6 +2818,7 @@ static struct platform_driver sdhci_msm_driver = { .of_match_table = sdhci_msm_dt_match, .pm = _msm_pm_ops, .probe_type = PROBE_PREFER_ASYNCHRONOUS, + .pre_probe = sdhci_msm_pre_probe, }, }; -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/6] iommu: Stop reaching into PCIe devices to decide strict vs. non-strict
We now have a way for PCIe devices to force iommu.strict through the "struct device" and that's now hooked up. Let's remove the special case for PCIe devices. NOTE: there are still other places in this file that make decisions based on the PCIe "untrusted" status. This patch only handles removing the one related to iommu.strict. Removing the other cases is left as an exercise to the reader. Signed-off-by: Douglas Anderson --- drivers/iommu/dma-iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 7bcdd1205535..e50c06ce1a6b 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -368,7 +368,7 @@ static int iommu_dma_init_domain(struct iommu_domain *domain, dma_addr_t base, init_iova_domain(iovad, 1UL << order, base_pfn); - if (!cookie->fq_domain && (!dev || !dev_is_untrusted(dev)) && + if (!cookie->fq_domain && domain->ops->flush_iotlb_all && !iommu_get_dma_strict(domain)) { if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all, iommu_dma_entry_dtor)) -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/6] iommu: Combine device strictness requests with the global default
In the patch ("drivers: base: Add bits to struct device to control iommu strictness") we add the ability for devices to tell us about their IOMMU strictness requirements. Let's now take that into account in the IOMMU layer. A few notes here: * Presumably this is always how iommu_get_dma_strict() was intended to behave. Had this not been the intention then it never would have taken a domain as a parameter. * The iommu_set_dma_strict() feels awfully non-symmetric now. That function sets the _default_ strictness globally in the system whereas iommu_get_dma_strict() returns the value for a given domain (falling back to the default). Presumably, at least, the fact that iommu_set_dma_strict() doesn't take a domain makes this obvious. The function iommu_get_dma_strict() should now make it super obvious where strictness comes from and who overides who. Though the function changed a bunch to make the logic clearer, the only two new rules should be: * Devices can force strictness for themselves, overriding the cmdline "iommu.strict=0" or a call to iommu_set_dma_strict(false)). * Devices can request non-strictness for themselves, assuming there was no cmdline "iommu.strict=1" or a call to iommu_set_dma_strict(true). Signed-off-by: Douglas Anderson --- drivers/iommu/iommu.c | 56 +-- include/linux/iommu.h | 2 ++ 2 files changed, 45 insertions(+), 13 deletions(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 808ab70d5df5..0c84a4c06110 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -28,8 +28,19 @@ static struct kset *iommu_group_kset; static DEFINE_IDA(iommu_group_ida); +enum iommu_strictness { + IOMMU_DEFAULT_STRICTNESS = -1, + IOMMU_NOT_STRICT = 0, + IOMMU_STRICT = 1, +}; +static inline enum iommu_strictness bool_to_strictness(bool strictness) +{ + return (enum iommu_strictness)strictness; +} + static unsigned int iommu_def_domain_type __read_mostly; -static bool iommu_dma_strict __read_mostly = true; +static enum iommu_strictness cmdline_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; +static enum iommu_strictness driver_dma_strict __read_mostly = IOMMU_DEFAULT_STRICTNESS; static u32 iommu_cmd_line __read_mostly; struct iommu_group { @@ -69,7 +80,6 @@ static const char * const iommu_group_resv_type_string[] = { }; #define IOMMU_CMD_LINE_DMA_API BIT(0) -#define IOMMU_CMD_LINE_STRICT BIT(1) static int iommu_alloc_default_domain(struct iommu_group *group, struct device *dev); @@ -336,25 +346,38 @@ early_param("iommu.passthrough", iommu_set_def_domain_type); static int __init iommu_dma_setup(char *str) { - int ret = kstrtobool(str, _dma_strict); + bool strict; + int ret = kstrtobool(str, ); if (!ret) - iommu_cmd_line |= IOMMU_CMD_LINE_STRICT; + cmdline_dma_strict = bool_to_strictness(strict); return ret; } early_param("iommu.strict", iommu_dma_setup); void iommu_set_dma_strict(bool strict) { - if (strict || !(iommu_cmd_line & IOMMU_CMD_LINE_STRICT)) - iommu_dma_strict = strict; + /* A driver can request strictness but not the other way around */ + if (driver_dma_strict != IOMMU_STRICT) + driver_dma_strict = bool_to_strictness(strict); } bool iommu_get_dma_strict(struct iommu_domain *domain) { - /* only allow lazy flushing for DMA domains */ - if (domain->type == IOMMU_DOMAIN_DMA) - return iommu_dma_strict; + /* Non-DMA domains or anyone forcing it to strict makes it strict */ + if (domain->type != IOMMU_DOMAIN_DMA || + cmdline_dma_strict == IOMMU_STRICT || + driver_dma_strict == IOMMU_STRICT || + domain->force_strict) + return true; + + /* Anyone requesting non-strict (if no forces) makes it non-strict */ + if (cmdline_dma_strict == IOMMU_NOT_STRICT || + driver_dma_strict == IOMMU_NOT_STRICT || + domain->request_non_strict) + return false; + + /* Nobody said anything, so it's strict by default */ return true; } EXPORT_SYMBOL_GPL(iommu_get_dma_strict); @@ -1519,7 +1542,8 @@ static int iommu_get_def_domain_type(struct device *dev) static int iommu_group_alloc_default_domain(struct bus_type *bus, struct iommu_group *group, - unsigned int type) + unsigned int type, + struct device *dev) { struct iommu_domain *dom; @@ -1534,6 +1558,12 @@ static int iommu_group_alloc_default_domain(struct bus_type *bus, if (!dom) return -ENOMEM; + /* Save the strictness reque
[PATCH 3/6] PCI: Indicate that we want to force strict DMA for untrusted devices
At the moment the generic IOMMU framework reaches into the PCIe device to check the "untrusted" state and uses this information to figure out if it should be running the IOMMU in strict or non-strict mode. Let's instead set the new boolean in "struct device" to indicate when we want forced strictness. NOTE: we still continue to set the "untrusted" bit in PCIe since that apparently is used for more than just IOMMU strictness. It probably makes sense for a later patchset to clarify all of the other needs we have for "untrusted" PCIe devices (perhaps add more booleans into the "struct device") so we can fully eliminate the need for the IOMMU framework to reach into a PCIe device. Signed-off-by: Douglas Anderson --- drivers/pci/probe.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index 275204646c68..8d81f0fb3e50 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1572,8 +1572,10 @@ static void set_pcie_untrusted(struct pci_dev *dev) * untrusted as well. */ parent = pci_upstream_bridge(dev); - if (parent && (parent->untrusted || parent->external_facing)) + if (parent && (parent->untrusted || parent->external_facing)) { dev->untrusted = true; + dev->dev.force_strict_iommu = true; + } } /** -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/6] drivers: base: Add bits to struct device to control iommu strictness
How to control the "strictness" of an IOMMU is a bit of a mess right now. As far as I can tell, right now: * You can set the default to "non-strict" and some devices (right now, only PCI devices) can request to run in "strict" mode. * You can set the default to "strict" and no devices in the system are allowed to run as "non-strict". I believe this needs to be improved a bit. Specifically: * We should be able to default to "strict" mode but let devices that claim to be fairly low risk request that they be run in "non-strict" mode. * We should allow devices outside of PCIe to request "strict" mode if the system default is "non-strict". I believe the correct way to do this is two bits in "struct device". One allows a device to force things to "strict" mode and the other allows a device to _request_ "non-strict" mode. The asymmetry here is on purpose. Generally if anything in the system makes a request for strictness of something then we want it strict. Thus drivers can only request (but not force) non-strictness. It's expected that the strictness fields can be filled in by the bus code like in the patch ("PCI: Indicate that we want to force strict DMA for untrusted devices") or by using the new pre_probe concept introduced in the patch ("drivers: base: Add the concept of "pre_probe" to drivers"). Signed-off-by: Douglas Anderson --- include/linux/device.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/device.h b/include/linux/device.h index f1a00040fa53..c1b985e10c47 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -449,6 +449,15 @@ struct dev_links_info { * and optionall (if the coherent mask is large enough) also * for dma allocations. This flag is managed by the dma ops * instance from ->dma_supported. + * @force_strict_iommu: If set to %true then we should force this device to + * iommu.strict regardless of the other defaults in the + * system. Only has an effect if an IOMMU is in place. + * @request_non_strict_iommu: If set to %true and there are no other known + * reasons to make the iommu.strict for this device, + * then default to non-strict mode. This implies + * some belief that the DMA master for this device + * won't abuse the DMA path to compromise the kernel. + * Only has an effect if an IOMMU is in place. * * At the lowest level, every device in a Linux system is represented by an * instance of struct device. The device structure contains the information @@ -557,6 +566,8 @@ struct device { #ifdef CONFIG_DMA_OPS_BYPASS booldma_ops_bypass : 1; #endif + boolforce_strict_iommu:1; + boolrequest_non_strict_iommu:1; }; /** -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/6] drivers: base: Add the concept of "pre_probe" to drivers
Right now things are a bit awkward if a driver would like a chance to run before some of the more "automatic" things (pinctrl, DMA, IOMMUs, ...) happen to a device. This patch aims to fix that problem by introducing the concept of a "pre_probe" function that drivers can implement to run before the "automatic" stuff. Why would you want to run before the "automatic" stuff? The incentive in my case is that I want to be able to fill in some boolean flags in the "struct device" before the IOMMU init runs. It appears that the strictness vs. non-strictness of a device's iommu config is determined once at init time and can't be changed afterwards. However, I would like to avoid hardcoding the rules for strictness in the IOMMU driver. Instead I'd like to let individual drivers be able to make informed decisions about the appropriateness of strictness vs. non-strictness. The desire for running code pre_probe is likely not limited to my use case. I believe that the list "qcom_smmu_client_of_match" is hacked into the iommu driver specifically because there was no real good framework for this. For the existing list it wasn't _quite_ as ugly as my needs since the decision could be made solely on compatible string, but it still feels like it would have been better for individual drivers to run code and setup some state rather than coding up a big list in the IOMMU driver. Even without this patch, I believe it is possible for a driver to run before the "automatic" things by registering for "BUS_NOTIFY_BIND_DRIVER" in its init call, though I haven't personally tested this. Using the notifier is a bit awkward, though, and I'd rather avoid it. Also, using "BUS_NOTIFY_BIND_DRIVER" would require drivers to stop using the convenience module_platform_driver() helper and roll a bunch of boilerplate code. NOTE: the pre_probe here is listed in the driver structure. As a side effect of this it will be passed a "struct device *" rather than the more specific device type (like the "struct platform_device *" that most platform devices get passed to their probe). Presumably this won't cause trouble and it's a lot less code to write but if we need to make it more symmetric that's also possible by touching more files. Signed-off-by: Douglas Anderson --- drivers/base/dd.c | 10 -- include/linux/device/driver.h | 9 + 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/base/dd.c b/drivers/base/dd.c index ecd7cf848daf..9a13bff8dafa 100644 --- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -549,10 +549,16 @@ static int really_probe(struct device *dev, struct device_driver *drv) re_probe: dev->driver = drv; + if (drv->pre_probe) { + ret = drv->pre_probe(dev); + if (ret) + goto probe_failed_pre_dma; + } + /* If using pinctrl, bind pins now before probing */ ret = pinctrl_bind_pins(dev); if (ret) - goto pinctrl_bind_failed; + goto probe_failed_pre_dma; if (dev->bus->dma_configure) { ret = dev->bus->dma_configure(dev); @@ -639,7 +645,7 @@ static int really_probe(struct device *dev, struct device_driver *drv) if (dev->bus) blocking_notifier_call_chain(>bus->p->bus_notifier, BUS_NOTIFY_DRIVER_NOT_BOUND, dev); -pinctrl_bind_failed: +probe_failed_pre_dma: device_links_no_driver(dev); devres_release_all(dev); arch_teardown_dma_ops(dev); diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h index a498ebcf4993..f7305dd6ceb1 100644 --- a/include/linux/device/driver.h +++ b/include/linux/device/driver.h @@ -57,6 +57,14 @@ enum probe_type { * @probe_type:Type of the probe (synchronous or asynchronous) to use. * @of_match_table: The open firmware table. * @acpi_match_table: The ACPI match table. + * @pre_probe: Called after a device has been bound to a driver but before + * anything "automatic" (pinctrl, DMA, IOMMUs, ...) has been + * setup. This is mostly a chance for the driver to do things + * that might need to be run before any of those automatic + * processes. The vast majority of devices don't need to + * implement this. Note that there is no "post_remove" at the + * moment. If you need to undo something that you did in + * pre_probe() you can use devres. * @probe: Called to query the existence of a specific device, * whether this driver can work with it, and bind the driver * to a specific device. @@ -105,6 +113,7 @@ struct device_driver { const struct of_device_id *of_match_table; const struct acpi_device_
[PATCH 0/6] iommu: Enable devices to request non-strict DMA, starting with QCom SD/MMC
This patch attempts to put forward a proposal for enabling non-strict DMA on a device-by-device basis. The patch series requests non-strict DMA for the Qualcomm SDHCI controller as a first device to enable, getting a nice bump in performance with what's believed to be a very small drop in security / safety (see the patch for the full argument). As part of this patch series I am end up slightly cleaning up some of the interactions between the PCI subsystem and the IOMMU subsystem but I don't go all the way to fully remove all the tentacles. Specifically this patch series only concerns itself with a single aspect: strict vs. non-strict mode for the IOMMU. I'm hoping that this will be easier to talk about / reason about for more subsystems compared to overall deciding what it means for a device to be "external" or "untrusted". If something like this patch series ends up being landable, it will undoubtedly need coordination between many maintainers to land. I believe it's fully bisectable but later patches in the series definitely depend on earlier ones. Sorry for the long CC list. :( Douglas Anderson (6): drivers: base: Add the concept of "pre_probe" to drivers drivers: base: Add bits to struct device to control iommu strictness PCI: Indicate that we want to force strict DMA for untrusted devices iommu: Combine device strictness requests with the global default iommu: Stop reaching into PCIe devices to decide strict vs. non-strict mmc: sdhci-msm: Request non-strict IOMMU mode drivers/base/dd.c | 10 +-- drivers/iommu/dma-iommu.c | 2 +- drivers/iommu/iommu.c | 56 +++ drivers/mmc/host/sdhci-msm.c | 8 + drivers/pci/probe.c | 4 ++- include/linux/device.h| 11 +++ include/linux/device/driver.h | 9 ++ include/linux/iommu.h | 2 ++ 8 files changed, 85 insertions(+), 17 deletions(-) -- 2.32.0.288.g62a8d224e6-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu: Properly pass gfp_t in _iommu_map() to avoid atomic sleeping
Sleeping while atomic = bad. Let's fix an obvious typo to try to avoid it. The warning that was seen (on a downstream kernel with the problematic patch backported): BUG: sleeping function called from invalid context at mm/page_alloc.c:4726 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 9, name: ksoftirqd/0 CPU: 0 PID: 9 Comm: ksoftirqd/0 Not tainted 5.4.93-12508-gc10c93e28e39 #1 Call trace: dump_backtrace+0x0/0x154 show_stack+0x20/0x2c dump_stack+0xa0/0xfc ___might_sleep+0x11c/0x12c __might_sleep+0x50/0x84 __alloc_pages_nodemask+0xf8/0x2bc __arm_lpae_alloc_pages+0x48/0x1b4 __arm_lpae_map+0x124/0x274 __arm_lpae_map+0x1cc/0x274 arm_lpae_map+0x140/0x170 arm_smmu_map+0x78/0xbc __iommu_map+0xd4/0x210 _iommu_map+0x4c/0x84 iommu_map_atomic+0x44/0x58 __iommu_dma_map+0x8c/0xc4 iommu_dma_map_page+0xac/0xf0 Fixes: d8c1df02ac7f ("iommu: Move iotlb_sync_map out from __iommu_map") Signed-off-by: Douglas Anderson --- I haven't done any serious testing on this. I saw a report of the warning and the fix seemed obvious so I'm shooting it out. drivers/iommu/iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c index 3d099a31ddca..2b06b01850d5 100644 --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2441,7 +2441,7 @@ static int _iommu_map(struct iommu_domain *domain, unsigned long iova, const struct iommu_ops *ops = domain->ops; int ret; - ret = __iommu_map(domain, iova, paddr, size, prot, GFP_KERNEL); + ret = __iommu_map(domain, iova, paddr, size, prot, gfp); if (ret == 0 && ops->iotlb_sync_map) ops->iotlb_sync_map(domain, iova, size); -- 2.30.0.365.g02bc693789-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v2] iommu/arm-smmu: Break insecure users by disabling bypass by default
If you're bisecting why your peripherals stopped working, it's probably this CL. Specifically if you see this in your dmesg: Unexpected global fault, this could be serious ...then it's almost certainly this CL. Running your IOMMU-enabled peripherals with the IOMMU in bypass mode is insecure and effectively disables the protection they provide. There are few reasons to allow unmatched stream bypass, and even fewer good ones. This patch starts the transition over to make it much harder to run your system insecurely. Expected steps: 1. By default disable bypass (so anyone insecure will notice) but make it easy for someone to re-enable bypass with just a KConfig change. That's this patch. 2. After people have had a little time to come to grips with the fact that they need to set their IOMMUs properly and have had time to dig into how to do this, the KConfig will be eliminated and bypass will simply be disabled. Folks who are truly upset and still haven't fixed their system can either figure out how to add 'arm-smmu.disable_bypass=n' to their command line or revert the patch in their own private kernel. Of course these folks will be less secure. Suggested-by: Robin Murphy Signed-off-by: Douglas Anderson --- Changes in v2: - Flipped default to 'yes' and changed comments a lot. drivers/iommu/Kconfig| 25 + drivers/iommu/arm-smmu.c | 3 ++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 1ca1fa107b21..a4210672804a 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -359,6 +359,31 @@ config ARM_SMMU Say Y here if your SoC includes an IOMMU device implementing the ARM SMMU architecture. +config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT + bool "Default to disabling bypass on ARM SMMU v1 and v2" + depends on ARM_SMMU + default y + help + Say Y here to (by default) disable bypass streams such that + incoming transactions from devices that are not attached to + an iommu domain will report an abort back to the device and + will not be allowed to pass through the SMMU. + + Any old kernels that existed before this KConfig was + introduced would default to _allowing_ bypass (AKA the + equivalent of NO for this config). However the default for + this option is YES because the old behavior is insecure. + + There are few reasons to allow unmatched stream bypass, and + even fewer good ones. If saying YES here breaks your board + you should work on fixing your board. This KConfig option + is expected to be removed in the future and we'll simply + hardcode the bypass disable in the code. + + NOTE: the kernel command line parameter + 'arm-smmu.disable_bypass' will continue to override this + config. + config ARM_SMMU_V3 bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" depends on ARM64 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 045d93884164..930c07635956 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -110,7 +110,8 @@ static int force_stage; module_param(force_stage, int, S_IRUGO); MODULE_PARM_DESC(force_stage, "Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation."); -static bool disable_bypass; +static bool disable_bypass = + IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT); module_param(disable_bypass, bool, S_IRUGO); MODULE_PARM_DESC(disable_bypass, "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); -- 2.21.0.352.gf09ad66450-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/arm-smmu: Allow disabling bypass via kernel config
Right now the only way to disable the iommu bypass for the ARM SMMU is with the kernel command line parameter 'arm-smmu.disable_bypass'. In general kernel command line parameters make sense for things that someone would like to tweak without rebuilding the kernel or for very basic communication between the bootloader and the kernel, but are awkward for other things. Specifically: * Human parsing of the kernel command line can be difficult since it's just a big runon space separated line of text. * If every bit of the system was configured via the kernel command line the kernel command line would get very large and even more unwieldly. * Typically there are not easy ways in build systems to adjust the kernel command line for config-like options. Let's introduce a new config option that allows us to disable the iommu bypass without affecting the existing default nor the existing ability to adjust the configuration via kernel command line. Signed-off-by: Douglas Anderson --- drivers/iommu/Kconfig| 22 ++ drivers/iommu/arm-smmu.c | 3 ++- 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig index 46fcd75d4364..c614beab08f8 100644 --- a/drivers/iommu/Kconfig +++ b/drivers/iommu/Kconfig @@ -359,6 +359,28 @@ config ARM_SMMU Say Y here if your SoC includes an IOMMU device implementing the ARM SMMU architecture. +config ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT + bool "Default to disabling bypass on ARM SMMU v1 and v2" + depends on ARM_SMMU + default n + help + Say Y here to (by default) disable bypass streams such that + incoming transactions from devices that are not attached to + an iommu domain will report an abort back to the device and + will not be allowed to pass through the SMMU. + + Historically the ARM SMMU v1 and v2 driver has defaulted + to allow bypass by default but it could be disabled with + the parameter 'arm-smmu.disable_bypass'. The parameter is + still present and can be used to override this config + option, but this config option allows you to disable bypass + without bloating the kernel command line. + + Disabling bypass is more secure but presumably will break + old systems. + + Say N if unsure. + config ARM_SMMU_V3 bool "ARM Ltd. System MMU Version 3 (SMMUv3) Support" depends on ARM64 diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 045d93884164..930c07635956 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -110,7 +110,8 @@ static int force_stage; module_param(force_stage, int, S_IRUGO); MODULE_PARM_DESC(force_stage, "Force SMMU mappings to be installed at a particular stage of translation. A value of '1' or '2' forces the corresponding stage. All other values are ignored (i.e. no stage is forced). Note that selecting a specific stage will disable support for nested translation."); -static bool disable_bypass; +static bool disable_bypass = + IS_ENABLED(CONFIG_ARM_SMMU_DISABLE_BYPASS_BY_DEFAULT); module_param(disable_bypass, bool, S_IRUGO); MODULE_PARM_DESC(disable_bypass, "Disable bypass streams such that incoming transactions from devices that are not attached to an iommu domain will report an abort back to the device and will not be allowed to pass through the SMMU."); -- 2.21.0.rc0.258.g878e2cd30e-goog ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu