答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch
> -邮件原件- > 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com] > 发送时间: 2019年12月20日 17:23 > 收件人: Jim,Yan > 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org; > linux-ker...@vger.kernel.org > 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch > > On Fri Dec 20 19, jimyan wrote: > >On a system with an Intel PCIe port configured as a nvme host device, > >iommu initialization fails with > > > >DMAR: Device scope type does not match for :80:00.0 > > > >This is because the DMAR table reports this device as having scope 2 > >(ACPI_DMAR_SCOPE_TYPE_BRIDGE): > > > > Isn't that a problem to be fixed in the DMAR table then? > > >but the device has a type 0 PCI header: > >80:00.0 Class 0600: Device 8086:2020 (rev 06) > >00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 > >10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 > >30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 > > > >VT-d works perfectly on this system, so there's no reason to bail out > >on initialization due to this apparent scope mismatch. Add the class > >0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR > >initialization for non-bridge PCI devices listed with scope bridge. > > > >Signed-off-by: jimyan > >--- > > drivers/iommu/dmar.c | 1 + > > 1 file changed, 1 insertion(+) > > > >diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >eecd6a421667..9faf2f0e0237 100644 > >--- a/drivers/iommu/dmar.c > >+++ b/drivers/iommu/dmar.c > >@@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct > dmar_pci_notify_info *info, > > info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || > > (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && > > (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && > >+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST && > > info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { > > pr_warn("Device scope type does not match for %s\n", > > pci_name(info->dev)); > >-- > >2.11.0 > > > >___ > >iommu mailing list > >iommu@lists.linux-foundation.org > >https://lists.linuxfoundation.org/mailman/listinfo/iommu > > Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't reject NTB devices due to scope mismatch"). Besides, modifying DMAR table need OEM update BIOS. It is hard to implement. Jim ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
On Sat, 21 Dec 2019, Tom Murphy wrote: > This patchset converts the intel iommu driver to the dma-iommu api. > > While converting the driver I exposed a bug in the intel i915 driver > which causes a huge amount of artifacts on the screen of my > laptop. You can see a picture of it here: > https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg > > This issue is most likely in the i915 driver and is most likely caused > by the driver not respecting the return value of the > dma_map_ops::map_sg function. You can see the driver ignoring the > return value here: > https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51 > > Previously this didn’t cause issues because the intel map_sg always > returned the same number of elements as the input scatter gather list > but with the change to this dma-iommu api this is no longer the > case. I wasn’t able to track the bug down to a specific line of code > unfortunately. > > Could someone from the intel team look at this? Let me get this straight. There is current API that on success always returns the same number of elements as the input scatter gather list. You propose to change the API so that this is no longer the case? A quick check of various dma_map_sg() calls in the kernel seems to indicate checking for 0 for errors and then ignoring the non-zero return is a common pattern. Are you sure it's okay to make the change you're proposing? Anyway, due to the time of year and all, I'd like to ask you to file a bug against i915 at [1] so this is not forgotten, and please let's not merge the changes before this is resolved. Thanks, Jani. [1] https://gitlab.freedesktop.org/drm/intel/issues/new -- Jani Nikula, Intel Open Source Graphics Center ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
On 2019-12-23 10:37 am, Jani Nikula wrote: On Sat, 21 Dec 2019, Tom Murphy wrote: This patchset converts the intel iommu driver to the dma-iommu api. While converting the driver I exposed a bug in the intel i915 driver which causes a huge amount of artifacts on the screen of my laptop. You can see a picture of it here: https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg This issue is most likely in the i915 driver and is most likely caused by the driver not respecting the return value of the dma_map_ops::map_sg function. You can see the driver ignoring the return value here: https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51 Previously this didn’t cause issues because the intel map_sg always returned the same number of elements as the input scatter gather list but with the change to this dma-iommu api this is no longer the case. I wasn’t able to track the bug down to a specific line of code unfortunately. Could someone from the intel team look at this? Let me get this straight. There is current API that on success always returns the same number of elements as the input scatter gather list. You propose to change the API so that this is no longer the case? No, the API for dma_map_sg() has always been that it may return fewer DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, the return value would surely be a simple success/fail condition). Relying on a particular implementation behaviour has never been strictly correct, even if it does happen to be a very common behaviour. A quick check of various dma_map_sg() calls in the kernel seems to indicate checking for 0 for errors and then ignoring the non-zero return is a common pattern. Are you sure it's okay to make the change you're proposing? Various code uses tricks like just iterating the mapped list until the first segment with zero sg_dma_len(). Others may well simply have bugs. Robin. Anyway, due to the time of year and all, I'd like to ask you to file a bug against i915 at [1] so this is not forgotten, and please let's not merge the changes before this is resolved. Thanks, Jani. [1] https://gitlab.freedesktop.org/drm/intel/issues/new ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api
On Mon, 23 Dec 2019, Robin Murphy wrote: > On 2019-12-23 10:37 am, Jani Nikula wrote: >> On Sat, 21 Dec 2019, Tom Murphy wrote: >>> This patchset converts the intel iommu driver to the dma-iommu api. >>> >>> While converting the driver I exposed a bug in the intel i915 driver >>> which causes a huge amount of artifacts on the screen of my >>> laptop. You can see a picture of it here: >>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg >>> >>> This issue is most likely in the i915 driver and is most likely caused >>> by the driver not respecting the return value of the >>> dma_map_ops::map_sg function. You can see the driver ignoring the >>> return value here: >>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51 >>> >>> Previously this didn’t cause issues because the intel map_sg always >>> returned the same number of elements as the input scatter gather list >>> but with the change to this dma-iommu api this is no longer the >>> case. I wasn’t able to track the bug down to a specific line of code >>> unfortunately. >>> >>> Could someone from the intel team look at this? >> >> Let me get this straight. There is current API that on success always >> returns the same number of elements as the input scatter gather >> list. You propose to change the API so that this is no longer the case? > > No, the API for dma_map_sg() has always been that it may return fewer > DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, > the return value would surely be a simple success/fail condition). > Relying on a particular implementation behaviour has never been strictly > correct, even if it does happen to be a very common behaviour. > >> A quick check of various dma_map_sg() calls in the kernel seems to >> indicate checking for 0 for errors and then ignoring the non-zero return >> is a common pattern. Are you sure it's okay to make the change you're >> proposing? > > Various code uses tricks like just iterating the mapped list until the > first segment with zero sg_dma_len(). Others may well simply have bugs. Thanks for the clarification. BR, Jani. > > Robin. > >> Anyway, due to the time of year and all, I'd like to ask you to file a >> bug against i915 at [1] so this is not forgotten, and please let's not >> merge the changes before this is resolved. >> >> >> Thanks, >> Jani. >> >> >> [1] https://gitlab.freedesktop.org/drm/intel/issues/new >> >> -- Jani Nikula, Intel Open Source Graphics Center ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch
Hi, On 2019/12/23 15:59, Jim,Yan wrote: -邮件原件- 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com] 发送时间: 2019年12月20日 17:23 收件人: Jim,Yan 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch On Fri Dec 20 19, jimyan wrote: On a system with an Intel PCIe port configured as a nvme host device, iommu initialization fails with DMAR: Device scope type does not match for :80:00.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): Isn't that a problem to be fixed in the DMAR table then? but the device has a type 0 PCI header: 80:00.0 Class 0600: Device 8086:2020 (rev 06) 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Add the class 0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: jimyan --- drivers/iommu/dmar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index eecd6a421667..9faf2f0e0237 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && + info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST && info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't reject NTB devices due to scope mismatch"). Besides, modifying DMAR table need OEM update BIOS. It is hard to implement. For both cases, a quirk flag seems to be more reasonable, so that unrelated devices will not be impacted. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] iommu/amd: Remove unused variable
From: Joerg Roedel The iommu variable in set_device_exclusion_range() us unused now and causes a compiler warning. Remove it. Fixes: 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w unity-mapped regions") Signed-off-by: Joerg Roedel --- drivers/iommu/amd_iommu_init.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c index 36649592ddf3..ba7ee4aa04f9 100644 --- a/drivers/iommu/amd_iommu_init.c +++ b/drivers/iommu/amd_iommu_init.c @@ -1118,8 +1118,6 @@ static int __init add_early_maps(void) */ static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m) { - struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; - if (!(m->flags & IVMD_FLAG_EXCL_RANGE)) return; -- 2.16.4 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v10 0/4] Add uacce module for Accelerator
Hi, Greg On 2019/12/16 上午11:08, Zhangfei Gao wrote: Uacce (Unified/User-space-access-intended Accelerator Framework) targets to provide Shared Virtual Addressing (SVA) between accelerators and processes. So accelerator can access any data structure of the main cpu. This differs from the data sharing between cpu and io device, which share data content rather than address. Because of unified address, hardware and user space of process can share the same virtual address in the communication. Uacce is intended to be used with Jean Philippe Brucker's SVA patchset[1], which enables IO side page fault and PASID support. We have keep verifying with Jean's sva patchset [2] We also keep verifying with Eric's SMMUv3 Nested Stage patches [3] This series and related zip & qm driver https://github.com/Linaro/linux-kernel-warpdrive/tree/v5.5-rc1-uacce-v10 The library and user application: https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v10 References: [1] http://jpbrucker.net/sva/ [2] http://jpbrucker.net/git/linux/log/?h=sva/zip-devel [3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9 Change History: v10: Modify the include header to fix kbuild test erorr in other arch. Kenneth Lee (2): uacce: Add documents for uacce uacce: add uacce driver Zhangfei Gao (2): crypto: hisilicon - Remove module_param uacce_mode crypto: hisilicon - register zip engine to uacce Would you mind take a look at the patch set? The patches are also used for verifying the sva feature. https://lore.kernel.org/linux-iommu/20191219163033.2608177-1-jean-phili...@linaro.org/ Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Patch "iommu: set group default domain before creating direct mappings" has been added to the 5.4-stable tree
This is a note to let you know that I've just added the patch titled iommu: set group default domain before creating direct mappings to the 5.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: iommu-set-group-default-domain-before-creating-direct-mappings.patch and it can be found in the queue-5.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From d360211524bece6db9920f32c91808235290b51c Mon Sep 17 00:00:00 2001 From: Jerry Snitselaar Date: Tue, 10 Dec 2019 11:56:06 -0700 Subject: iommu: set group default domain before creating direct mappings From: Jerry Snitselaar commit d360211524bece6db9920f32c91808235290b51c upstream. iommu_group_create_direct_mappings uses group->default_domain, but right after it is called, request_default_domain_for_dev calls iommu_domain_free for the default domain, and sets the group default domain to a different domain. Move the iommu_group_create_direct_mappings call to after the group default domain is set, so the direct mappings get associated with that domain. Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: sta...@vger.kernel.org Fixes: 7423e01741dd ("iommu: Add API to request DMA domain for device") Signed-off-by: Jerry Snitselaar Reviewed-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/iommu.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/drivers/iommu/iommu.c +++ b/drivers/iommu/iommu.c @@ -2221,13 +2221,13 @@ request_default_domain_for_dev(struct de goto out; } - iommu_group_create_direct_mappings(group, dev); - /* Make the domain the default for this group */ if (group->default_domain) iommu_domain_free(group->default_domain); group->default_domain = domain; + iommu_group_create_direct_mappings(group, dev); + dev_info(dev, "Using iommu %s mapping\n", type == IOMMU_DOMAIN_DMA ? "dma" : "direct"); Patches currently in stable-queue which might be from jsnit...@redhat.com are queue-5.4/iommu-fix-kasan-use-after-free-in-iommu_insert_resv_region.patch queue-5.4/iommu-vt-d-fix-dmar-pte-read-access-not-set-error.patch queue-5.4/iommu-set-group-default-domain-before-creating-direct-mappings.patch queue-5.4/tpm_tis-reserve-chip-for-duration-of-tpm_tis_core_init.patch queue-5.4/iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch queue-5.4/iommu-vt-d-set-isa-bridge-reserved-region-as-relaxable.patch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Patch "iommu/vt-d: Allocate reserved region for ISA with correct permission" has been added to the 5.4-stable tree
This is a note to let you know that I've just added the patch titled iommu/vt-d: Allocate reserved region for ISA with correct permission to the 5.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch and it can be found in the queue-5.4 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From cde9319e884eb6267a0df446f3c131fe1108defb Mon Sep 17 00:00:00 2001 From: Jerry Snitselaar Date: Thu, 12 Dec 2019 22:36:42 -0700 Subject: iommu/vt-d: Allocate reserved region for ISA with correct permission From: Jerry Snitselaar commit cde9319e884eb6267a0df446f3c131fe1108defb upstream. Currently the reserved region for ISA is allocated with no permissions. If a dma domain is being used, mapping this region will fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE. Cc: Joerg Roedel Cc: Lu Baolu Cc: iommu@lists.linux-foundation.org Cc: sta...@vger.kernel.org # v5.3+ Fixes: d850c2ee5fe2 ("iommu/vt-d: Expose ISA direct mapping region via iommu_get_resv_regions") Signed-off-by: Jerry Snitselaar Acked-by: Lu Baolu Signed-off-by: Joerg Roedel Signed-off-by: Greg Kroah-Hartman --- drivers/iommu/intel-iommu.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -5697,7 +5697,7 @@ static void intel_iommu_get_resv_regions struct pci_dev *pdev = to_pci_dev(device); if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) { - reg = iommu_alloc_resv_region(0, 1UL << 24, 0, + reg = iommu_alloc_resv_region(0, 1UL << 24, prot, IOMMU_RESV_DIRECT_RELAXABLE); if (reg) list_add_tail(®->list, head); Patches currently in stable-queue which might be from jsnit...@redhat.com are queue-5.4/iommu-fix-kasan-use-after-free-in-iommu_insert_resv_region.patch queue-5.4/iommu-vt-d-fix-dmar-pte-read-access-not-set-error.patch queue-5.4/iommu-set-group-default-domain-before-creating-direct-mappings.patch queue-5.4/tpm_tis-reserve-chip-for-duration-of-tpm_tis_core_init.patch queue-5.4/iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch queue-5.4/iommu-vt-d-set-isa-bridge-reserved-region-as-relaxable.patch ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH] virtio-mmio: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code, which contains platform_get_resource, devm_request_mem_region and devm_ioremap. Signed-off-by: Yangtao Li --- drivers/virtio/virtio_mmio.c | 15 +++ 1 file changed, 3 insertions(+), 12 deletions(-) diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c index e09edb5c5e06..97d5725fd9a2 100644 --- a/drivers/virtio/virtio_mmio.c +++ b/drivers/virtio/virtio_mmio.c @@ -531,18 +531,9 @@ static void virtio_mmio_release_dev(struct device *_d) static int virtio_mmio_probe(struct platform_device *pdev) { struct virtio_mmio_device *vm_dev; - struct resource *mem; unsigned long magic; int rc; - mem = platform_get_resource(pdev, IORESOURCE_MEM, 0); - if (!mem) - return -EINVAL; - - if (!devm_request_mem_region(&pdev->dev, mem->start, - resource_size(mem), pdev->name)) - return -EBUSY; - vm_dev = devm_kzalloc(&pdev->dev, sizeof(*vm_dev), GFP_KERNEL); if (!vm_dev) return -ENOMEM; @@ -554,9 +545,9 @@ static int virtio_mmio_probe(struct platform_device *pdev) INIT_LIST_HEAD(&vm_dev->virtqueues); spin_lock_init(&vm_dev->lock); - vm_dev->base = devm_ioremap(&pdev->dev, mem->start, resource_size(mem)); - if (vm_dev->base == NULL) - return -EFAULT; + vm_dev->base = devm_platform_ioremap_resource(pdev, 0); + if (IS_ERR(vm_dev->base)) + return PTR_ERR(vm_dev->base); /* Check magic value */ magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 1/6] iommu/omap: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/omap-iommu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c index be551cc34be4..297c1be7ecb0 100644 --- a/drivers/iommu/omap-iommu.c +++ b/drivers/iommu/omap-iommu.c @@ -1175,7 +1175,6 @@ static int omap_iommu_probe(struct platform_device *pdev) int err = -ENODEV; int irq; struct omap_iommu *obj; - struct resource *res; struct device_node *of = pdev->dev.of_node; struct orphan_dev *orphan_dev, *tmp; @@ -1218,8 +1217,7 @@ static int omap_iommu_probe(struct platform_device *pdev) spin_lock_init(&obj->iommu_lock); spin_lock_init(&obj->page_table_lock); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - obj->regbase = devm_ioremap_resource(obj->dev, res); + obj->regbase = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(obj->regbase)) return PTR_ERR(obj->regbase); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 2/6] iommu/exynos: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/exynos-iommu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c index 186ff5cc975c..42d8407267ef 100644 --- a/drivers/iommu/exynos-iommu.c +++ b/drivers/iommu/exynos-iommu.c @@ -571,14 +571,12 @@ static int exynos_sysmmu_probe(struct platform_device *pdev) int irq, ret; struct device *dev = &pdev->dev; struct sysmmu_drvdata *data; - struct resource *res; data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL); if (!data) return -ENOMEM; - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - data->sfrbase = devm_ioremap_resource(dev, res); + data->sfrbase = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(data->sfrbase)) return PTR_ERR(data->sfrbase); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 4/6] iommu/ipmmu-vmsa: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/ipmmu-vmsa.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c index d02edd2751f3..3124e28fee85 100644 --- a/drivers/iommu/ipmmu-vmsa.c +++ b/drivers/iommu/ipmmu-vmsa.c @@ -1015,7 +1015,6 @@ static const struct of_device_id ipmmu_of_ids[] = { static int ipmmu_probe(struct platform_device *pdev) { struct ipmmu_vmsa_device *mmu; - struct resource *res; int irq; int ret; @@ -1033,8 +1032,7 @@ static int ipmmu_probe(struct platform_device *pdev) dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40)); /* Map I/O memory and request IRQ. */ - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - mmu->base = devm_ioremap_resource(&pdev->dev, res); + mmu->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(mmu->base)) return PTR_ERR(mmu->base); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 5/6] iommu/mediatek: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/mtk_iommu_v1.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c index e93b94ecac45..3d6bb08b2a54 100644 --- a/drivers/iommu/mtk_iommu_v1.c +++ b/drivers/iommu/mtk_iommu_v1.c @@ -553,7 +553,6 @@ static int mtk_iommu_probe(struct platform_device *pdev) { struct mtk_iommu_data *data; struct device *dev = &pdev->dev; - struct resource *res; struct component_match *match = NULL; struct of_phandle_args larb_spec; struct of_phandle_iterator it; @@ -573,8 +572,7 @@ static int mtk_iommu_probe(struct platform_device *pdev) return -ENOMEM; data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - data->base = devm_ioremap_resource(dev, res); + data->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(data->base)) return PTR_ERR(data->base); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 6/6] iommu/rockchip: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/rockchip-iommu.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c index b33cdd5aad81..c6d50396f4c2 100644 --- a/drivers/iommu/rockchip-iommu.c +++ b/drivers/iommu/rockchip-iommu.c @@ -1138,7 +1138,6 @@ static int rk_iommu_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; struct rk_iommu *iommu; - struct resource *res; int num_res = pdev->num_resources; int err, i; @@ -1156,10 +1155,7 @@ static int rk_iommu_probe(struct platform_device *pdev) return -ENOMEM; for (i = 0; i < num_res; i++) { - res = platform_get_resource(pdev, IORESOURCE_MEM, i); - if (!res) - continue; - iommu->bases[i] = devm_ioremap_resource(&pdev->dev, res); + iommu->bases[i] = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(iommu->bases[i])) continue; iommu->num_mmu++; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH 3/6] iommu/qcom: convert to devm_platform_ioremap_resource
Use devm_platform_ioremap_resource() to simplify code. Signed-off-by: Yangtao Li --- drivers/iommu/qcom_iommu.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c index 52f38292df5b..bf94d4d67da4 100644 --- a/drivers/iommu/qcom_iommu.c +++ b/drivers/iommu/qcom_iommu.c @@ -709,7 +709,6 @@ static int qcom_iommu_ctx_probe(struct platform_device *pdev) struct qcom_iommu_ctx *ctx; struct device *dev = &pdev->dev; struct qcom_iommu_dev *qcom_iommu = dev_get_drvdata(dev->parent); - struct resource *res; int ret, irq; ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL); @@ -719,8 +718,7 @@ static int qcom_iommu_ctx_probe(struct platform_device *pdev) ctx->dev = dev; platform_set_drvdata(pdev, ctx); - res = platform_get_resource(pdev, IORESOURCE_MEM, 0); - ctx->base = devm_ioremap_resource(dev, res); + ctx->base = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(ctx->base)) return PTR_ERR(ctx->base); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] virtio-mmio: convert to devm_platform_ioremap_resource
Please ignore this patch. Thx! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check
On 12/17/19 2:19 PM, Chen, Yian wrote: Regardless, I have two other patches in this series that could resolve the problem for me and probably other people. I'd just like at least one of the three patches to get merged so that my machine boots when the original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region in BIOS is reported as reserved") gets released. when a firmware bug appears, the potential problem may beyond the scope of its visible impacts so that introducing a workaround in official implementation should be considered very carefully. Agreed. I think that in the RMRR case, it wouldn't surprise me if these problems are already occurring, and we just didn't know about it, so I'd like to think about sane workarounds. I only noticed it on a kexec. Not sure how many people with similarly-broken firmware are kexecing kernels on linus/master kernels yet. Specifically, my firmware reports an RMRR with start == 0 and end == 0 (end should be page-aligned-minus-one). The only reason commit f036c7fa0ab6 didn't catch it on a full reboot is that trim_bios_range() reserved the first page, assuming that the BIOS meant to reserve it but just didn't tell us in the e820 map. My firmware didn't mark that first page E820_RESERVED. On a kexec, the range that got trimmed was 0x100-0xfff instead of 0x000-0xfff. In both cases, the kernel won't use the region the broken RMRR points to, but in the kexec case, it wasn't E820_RESERVED, so the new commit aborted the DMAR setup. If the workaround is really needed at this point, I would recommend adding a WARN_TAINT with TAINT_FIRMWARE_WORKAROUND, to tell the workaround is in the place. Sounds good. I can rework the patchset so that whenever I skip an RMRR entry or whatnot, I'll put in a WARN_TAINT. I see a few other examples in dmar.c to work from. If any of the three changes are too aggressive, I'm OK with you all taking just one of them. I'd like to be able to kexec with the new kernel. I'm likely not the only one with bad firmware, and any bug that only shows up on a kexec often a pain to detect. Thanks, Barret ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v3 5/5] drm/msm/a6xx: Support split pagetables
On 2019-12-16 22:07, Jordan Crouse wrote: Attempt to enable split pagetables if the arm-smmu driver supports it. This will move the default address space from the default region to the address range assigned to TTBR1. The behavior should be transparent to the driver for now but it gets the default buffers out of the way when we want to start swapping TTBR0 for context-specific pagetables. Signed-off-by: Jordan Crouse --- drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 ++- 1 file changed, 51 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c index 5dc0b2c..1c6da93 100644 --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c @@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu *gpu) return (unsigned long)busy_time; } +static struct msm_gem_address_space * +a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device *pdev) +{ + struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type); + struct msm_gem_address_space *aspace; + struct msm_mmu *mmu; + u64 start, size; + u32 val = 1; + int ret; + + if (!iommu) + return ERR_PTR(-ENOMEM); + + /* + * Try to request split pagetables - the request has to be made before +* the domian is attached +*/ + iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val); + + mmu = msm_iommu_new(&pdev->dev, iommu); + if (IS_ERR(mmu)) { + iommu_domain_free(iommu); + return ERR_CAST(mmu); + } + + /* + * After the domain is attached, see if the split tables were actually +* successful. +*/ + ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val); + if (!ret && val) { + /* +* The aperture start will be at the beginning of the TTBR1 +* space so use that as a base +*/ + start = iommu->geometry.aperture_start; + size = 0x; This should be the va_end and not the size + } else { + /* Otherwise use the legacy 32 bit region */ + start = SZ_16M; + size = 0x - SZ_16M; same as above + } + + aspace = msm_gem_address_space_create(mmu, "gpu", start, size); + if (IS_ERR(aspace)) + iommu_domain_free(iommu); + + return aspace; +} + static const struct adreno_gpu_funcs funcs = { .base = { .get_param = adreno_get_param, @@ -832,7 +882,7 @@ static const struct adreno_gpu_funcs funcs = { #if defined(CONFIG_DRM_MSM_GPU_STATE) .gpu_state_get = a6xx_gpu_state_get, .gpu_state_put = a6xx_gpu_state_put, - .create_address_space = adreno_iommu_create_address_space, + .create_address_space = a6xx_create_address_space, #endif }, .get_timestamp = a6xx_get_timestamp, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
答复: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch
> -邮件原件- > 发件人: Lu Baolu [mailto:baolu...@linux.intel.com] > 发送时间: 2019年12月23日 21:05 > 收件人: Jim,Yan ; Jerry Snitselaar > 抄送: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org > 主题: Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope > mismatch > > Hi, > > On 2019/12/23 15:59, Jim,Yan wrote: > >> -邮件原件- > >> 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com] > >> 发送时间: 2019年12月20日 17:23 > >> 收件人: Jim,Yan > >> 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org; > >> linux-ker...@vger.kernel.org > >> 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope > >> mismatch > >> > >> On Fri Dec 20 19, jimyan wrote: > >>> On a system with an Intel PCIe port configured as a nvme host > >>> device, iommu initialization fails with > >>> > >>> DMAR: Device scope type does not match for :80:00.0 > >>> > >>> This is because the DMAR table reports this device as having scope 2 > >>> (ACPI_DMAR_SCOPE_TYPE_BRIDGE): > >>> > >> > >> Isn't that a problem to be fixed in the DMAR table then? > >> > >>> but the device has a type 0 PCI header: > >>> 80:00.0 Class 0600: Device 8086:2020 (rev 06) > >>> 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 > >>> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > >>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 > >>> 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 > >>> > >>> VT-d works perfectly on this system, so there's no reason to bail > >>> out on initialization due to this apparent scope mismatch. Add the > >>> class > >>> 0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR > >>> initialization for non-bridge PCI devices listed with scope bridge. > >>> > >>> Signed-off-by: jimyan > >>> --- > >>> drivers/iommu/dmar.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index > >>> eecd6a421667..9faf2f0e0237 100644 > >>> --- a/drivers/iommu/dmar.c > >>> +++ b/drivers/iommu/dmar.c > >>> @@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct > >> dmar_pci_notify_info *info, > >>>info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || > >>> (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE > && > >>>(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && > >>> + info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST && > >>> info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { > >>> pr_warn("Device scope type does not match for %s\n", > >>> pci_name(info->dev)); > >>> -- > >>> 2.11.0 > >>> > >>> ___ > >>> iommu mailing list > >>> iommu@lists.linux-foundation.org > >>> https://lists.linuxfoundation.org/mailman/listinfo/iommu > >>> > > Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: > > Don't > reject NTB devices due to scope mismatch"). Besides, modifying DMAR table > need OEM update BIOS. It is hard to implement. > > > > For both cases, a quirk flag seems to be more reasonable, so that unrelated > devices will not be impacted. > > Best regards, > baolu Hi Baolu, Thanks for your advice. And I modify the patch as follow. On a system with an Intel PCIe port configured as a nvme host device, iommu initialization fails with DMAR: Device scope type does not match for :80:00.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): but the device has a type 0 PCI header: 80:00.0 Class 0600: Device 8086:2020 (rev 06) 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Add the class 0x06 ("PCI_BASE_CLASS_BRIDGE") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: jimyan diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index eecd6a421667..50c92eb23ee4 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -244,7 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && - info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { + info->dev->class >> 16 != PCI_BASE_CLASS_BRIDGE))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); return -EINVAL; Jim ___ iommu mai
[PATCH 1/1] iommu/vt-d: Add a quirk flag for scope mismatched devices
We expect devices with endpoint scope to have normal PCI headers, and devices with bridge scope to have bridge PCI headers. However Some PCI devices may be listed in the DMAR table with bridge scope, even though they have a normal PCI header. Add a quirk flag for those special devices. Cc: Roland Dreier Cc: Jim Yan Signed-off-by: Lu Baolu --- drivers/iommu/dmar.c | 37 +++-- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index fb30d5053664..fc24abc70a05 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -65,6 +65,26 @@ static void free_iommu(struct intel_iommu *iommu); extern const struct iommu_ops intel_iommu_ops; +static int scope_mismatch_quirk; +static void quirk_dmar_scope_mismatch(struct pci_dev *dev) +{ + pci_info(dev, "scope mismatch ignored\n"); + scope_mismatch_quirk = 1; +} + +/* + * We expect devices with endpoint scope to have normal PCI + * headers, and devices with bridge scope to have bridge PCI + * headers. However some PCI devices may be listed in the + * DMAR table with bridge scope, even though they have a + * normal PCI header. We don't declare a socpe mismatch for + * below special cases. + */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2f0d, /* NTB devices */ +quirk_dmar_scope_mismatch); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2020, /* NVME host */ +quirk_dmar_scope_mismatch); + static void dmar_register_drhd_unit(struct dmar_drhd_unit *drhd) { /* @@ -231,20 +251,9 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, if (!dmar_match_pci_path(info, scope->bus, path, level)) continue; - /* -* We expect devices with endpoint scope to have normal PCI -* headers, and devices with bridge scope to have bridge PCI -* headers. However PCI NTB devices may be listed in the -* DMAR table with bridge scope, even though they have a -* normal PCI header. NTB devices are identified by class -* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch -* for this special case. -*/ - if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT && -info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || - (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && -(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && - info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { + if (!scope_mismatch_quirk && + ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^ +(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); return -EINVAL; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: 答复: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch
Hi Jim, On 2019/12/24 11:24, Jim,Yan wrote: -邮件原件- 发件人: Lu Baolu [mailto:baolu...@linux.intel.com] 发送时间: 2019年12月23日 21:05 收件人: Jim,Yan ; Jerry Snitselaar 抄送: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org 主题: Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch Hi, On 2019/12/23 15:59, Jim,Yan wrote: -邮件原件- 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com] 发送时间: 2019年12月20日 17:23 收件人: Jim,Yan 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch On Fri Dec 20 19, jimyan wrote: On a system with an Intel PCIe port configured as a nvme host device, iommu initialization fails with DMAR: Device scope type does not match for :80:00.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): Isn't that a problem to be fixed in the DMAR table then? but the device has a type 0 PCI header: 80:00.0 Class 0600: Device 8086:2020 (rev 06) 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Add the class 0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: jimyan --- drivers/iommu/dmar.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index eecd6a421667..9faf2f0e0237 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && + info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST && info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); -- 2.11.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't reject NTB devices due to scope mismatch"). Besides, modifying DMAR table need OEM update BIOS. It is hard to implement. For both cases, a quirk flag seems to be more reasonable, so that unrelated devices will not be impacted. Best regards, baolu Hi Baolu, Thanks for your advice. And I modify the patch as follow. I just posted a patch for both NTG and NVME cases. Can you please take a look? Does it work for you? Best regards, baolu On a system with an Intel PCIe port configured as a nvme host device, iommu initialization fails with DMAR: Device scope type does not match for :80:00.0 This is because the DMAR table reports this device as having scope 2 (ACPI_DMAR_SCOPE_TYPE_BRIDGE): but the device has a type 0 PCI header: 80:00.0 Class 0600: Device 8086:2020 (rev 06) 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00 VT-d works perfectly on this system, so there's no reason to bail out on initialization due to this apparent scope mismatch. Add the class 0x06 ("PCI_BASE_CLASS_BRIDGE") as a heuristic for allowing DMAR initialization for non-bridge PCI devices listed with scope bridge. Signed-off-by: jimyan diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index eecd6a421667..50c92eb23ee4 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -244,7 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info, info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) || (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE && (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL && - info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) { + info->dev->class >> 16 != PCI_BASE_CLASS_BRIDGE))) { pr_warn("Device scope type does not match for %s\n", pci_name(info->dev)); return -EINVAL; Jim ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 1/9] iommu/vt-d: Identify domains using first level page table
This checks whether a domain should use the first level page table for map/unmap and marks it in the domain structure. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 39 + 1 file changed, 39 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 34723f6be672..71ad5e5feae2 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -307,6 +307,14 @@ static int hw_pass_through = 1; */ #define DOMAIN_FLAG_LOSE_CHILDREN BIT(1) +/* + * When VT-d works in the scalable mode, it allows DMA translation to + * happen through either first level or second level page table. This + * bit marks that the DMA translation for the domain goes through the + * first level page table, otherwise, it goes through the second level. + */ +#define DOMAIN_FLAG_USE_FIRST_LEVELBIT(2) + #define for_each_domain_iommu(idx, domain) \ for (idx = 0; idx < g_num_of_iommus; idx++) \ if (domain->iommu_refcnt[idx]) @@ -1714,6 +1722,35 @@ static void free_dmar_iommu(struct intel_iommu *iommu) #endif } +/* + * Check and return whether first level is used by default for + * DMA translation. Currently, we make it off by setting + * first_level_support = 0, and will change it to -1 after all + * map/unmap paths support first level page table. + */ +static bool first_level_by_default(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + static int first_level_support = 0; + + if (likely(first_level_support != -1)) + return first_level_support; + + first_level_support = 1; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) { + first_level_support = 0; + break; + } + } + rcu_read_unlock(); + + return first_level_support; +} + static struct dmar_domain *alloc_domain(int flags) { struct dmar_domain *domain; @@ -1725,6 +1762,8 @@ static struct dmar_domain *alloc_domain(int flags) memset(domain, 0, sizeof(*domain)); domain->nid = NUMA_NO_NODE; domain->flags = flags; + if (first_level_by_default()) + domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL; domain->has_iotlb_device = false; INIT_LIST_HEAD(&domain->devices); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 2/9] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr
This adds the Intel VT-d specific callback of setting DOMAIN_ATTR_NESTING domain attribution. It is necessary to let the VT-d driver know that the domain represents a virtual machine which requires the IOMMU hardware to support nested translation mode. Return success if the IOMMU hardware suports nested mode, otherwise failure. Signed-off-by: Yi Sun Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 56 + 1 file changed, 56 insertions(+) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 71ad5e5feae2..35f65628202c 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -315,6 +315,12 @@ static int hw_pass_through = 1; */ #define DOMAIN_FLAG_USE_FIRST_LEVELBIT(2) +/* + * Domain represents a virtual machine which demands iommu nested + * translation mode support. + */ +#define DOMAIN_FLAG_NESTING_MODE BIT(3) + #define for_each_domain_iommu(idx, domain) \ for (idx = 0; idx < g_num_of_iommus; idx++) \ if (domain->iommu_refcnt[idx]) @@ -5640,6 +5646,24 @@ static inline bool iommu_pasid_support(void) return ret; } +static inline bool nested_mode_support(void) +{ + struct dmar_drhd_unit *drhd; + struct intel_iommu *iommu; + bool ret = true; + + rcu_read_lock(); + for_each_active_iommu(iommu, drhd) { + if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) { + ret = false; + break; + } + } + rcu_read_unlock(); + + return ret; +} + static bool intel_iommu_capable(enum iommu_cap cap) { if (cap == IOMMU_CAP_CACHE_COHERENCY) @@ -6018,10 +6042,42 @@ static bool intel_iommu_is_attach_deferred(struct iommu_domain *domain, return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO; } +static int +intel_iommu_domain_set_attr(struct iommu_domain *domain, + enum iommu_attr attr, void *data) +{ + struct dmar_domain *dmar_domain = to_dmar_domain(domain); + unsigned long flags; + int ret = 0; + + if (domain->type != IOMMU_DOMAIN_UNMANAGED) + return -EINVAL; + + switch (attr) { + case DOMAIN_ATTR_NESTING: + spin_lock_irqsave(&device_domain_lock, flags); + if (nested_mode_support() && + list_empty(&dmar_domain->devices)) { + dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE; + dmar_domain->flags &= ~DOMAIN_FLAG_USE_FIRST_LEVEL; + } else { + ret = -ENODEV; + } + spin_unlock_irqrestore(&device_domain_lock, flags); + break; + default: + ret = -EINVAL; + break; + } + + return ret; +} + const struct iommu_ops intel_iommu_ops = { .capable= intel_iommu_capable, .domain_alloc = intel_iommu_domain_alloc, .domain_free= intel_iommu_domain_free, + .domain_set_attr= intel_iommu_domain_set_attr, .attach_dev = intel_iommu_attach_device, .detach_dev = intel_iommu_detach_device, .aux_attach_dev = intel_iommu_aux_attach_device, -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 4/9] iommu/vt-d: Setup pasid entries for iova over first level
Intel VT-d in scalable mode supports two types of page tables for IOVA translation: first level and second level. The IOMMU driver can choose one from both for IOVA translation according to the use case. This sets up the pasid entry if a domain is selected to use the first-level page table for iova translation. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 57 + include/linux/intel-iommu.h | 16 +++ 2 files changed, 62 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 35f65628202c..071cbc172ce8 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain *domain) return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY; } +static inline bool domain_use_first_level(struct dmar_domain *domain) +{ + return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL; +} + static inline int domain_pfn_supported(struct dmar_domain *domain, unsigned long pfn) { @@ -932,6 +937,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain, domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE); pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE; + if (domain_use_first_level(domain)) + pteval |= DMA_FL_PTE_XD; if (cmpxchg64(&pte->val, 0ULL, pteval)) /* Someone else set it while we were thinking; use theirs. */ free_pgtable_page(tmp_page); @@ -2281,17 +2288,20 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, unsigned long sg_res = 0; unsigned int largepage_lvl = 0; unsigned long lvl_pages = 0; + u64 attr; BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1)); if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0) return -EINVAL; - prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP; + attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP); + if (domain_use_first_level(domain)) + attr |= DMA_FL_PTE_PRESENT | DMA_FL_PTE_XD; if (!sg) { sg_res = nr_pages; - pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | prot; + pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr; } while (nr_pages > 0) { @@ -2303,7 +2313,7 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, sg_res = aligned_nrpages(sg->offset, sg->length); sg->dma_address = ((dma_addr_t)iov_pfn << VTD_PAGE_SHIFT) + pgoff; sg->dma_length = sg->length; - pteval = (sg_phys(sg) - pgoff) | prot; + pteval = (sg_phys(sg) - pgoff) | attr; phys_pfn = pteval >> VTD_PAGE_SHIFT; } @@ -2515,6 +2525,36 @@ dmar_search_domain_by_dev_info(int segment, int bus, int devfn) return NULL; } +static int domain_setup_first_level(struct intel_iommu *iommu, + struct dmar_domain *domain, + struct device *dev, + int pasid) +{ + int flags = PASID_FLAG_SUPERVISOR_MODE; + struct dma_pte *pgd = domain->pgd; + int agaw, level; + + /* +* Skip top levels of page tables for iommu which has +* less agaw than default. Unnecessary for PT mode. +*/ + for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) { + pgd = phys_to_virt(dma_pte_addr(pgd)); + if (!dma_pte_present(pgd)) + return -ENOMEM; + } + + level = agaw_to_level(agaw); + if (level != 4 && level != 5) + return -EINVAL; + + flags |= (level == 5) ? PASID_FLAG_FL5LP : 0; + + return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid, +domain->iommu_did[iommu->seq_id], +flags); +} + static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, int bus, int devfn, struct device *dev, @@ -2614,6 +2654,9 @@ static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu, if (hw_pass_through && domain_type_is_si(domain)) ret = intel_pasid_setup_pass_through(iommu, domain, dev, PASID_RID2PASID); + else if (domain_use_first_level(domain)) + ret = domain_setup_first_level(iommu,
[PATCH v5 7/9] iommu/vt-d: Update first level super page capability
First-level translation may map input addresses to 4-KByte pages, 2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and 2-Mbyte pages are mandatory for first-level translation. Hardware support for 1-GByte page is reported through the FL1GP field in the Capability Register. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 17 - 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 1ebf5ed460cf..34e619318f64 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -685,11 +685,12 @@ static int domain_update_iommu_snooping(struct intel_iommu *skip) return ret; } -static int domain_update_iommu_superpage(struct intel_iommu *skip) +static int domain_update_iommu_superpage(struct dmar_domain *domain, +struct intel_iommu *skip) { struct dmar_drhd_unit *drhd; struct intel_iommu *iommu; - int mask = 0xf; + int mask = 0x3; if (!intel_iommu_superpage) { return 0; @@ -699,7 +700,13 @@ static int domain_update_iommu_superpage(struct intel_iommu *skip) rcu_read_lock(); for_each_active_iommu(iommu, drhd) { if (iommu != skip) { - mask &= cap_super_page_val(iommu->cap); + if (domain && domain_use_first_level(domain)) { + if (!cap_fl1gp_support(iommu->cap)) + mask = 0x1; + } else { + mask &= cap_super_page_val(iommu->cap); + } + if (!mask) break; } @@ -714,7 +721,7 @@ static void domain_update_iommu_cap(struct dmar_domain *domain) { domain_update_iommu_coherency(domain); domain->iommu_snooping = domain_update_iommu_snooping(NULL); - domain->iommu_superpage = domain_update_iommu_superpage(NULL); + domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL); } struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus, @@ -4604,7 +4611,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru) iommu->name); return -ENXIO; } - sp = domain_update_iommu_superpage(iommu) - 1; + sp = domain_update_iommu_superpage(NULL, iommu) - 1; if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) { pr_warn("%s: Doesn't support large page.\n", iommu->name); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 5/9] iommu/vt-d: Flush PASID-based iotlb for iova over first level
When software has changed first-level tables, it should invalidate the affected IOTLB and the paging-structure-caches using the PASID- based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4. Signed-off-by: Lu Baolu --- drivers/iommu/dmar.c| 41 +++ drivers/iommu/intel-iommu.c | 56 +++-- include/linux/intel-iommu.h | 2 ++ 3 files changed, 84 insertions(+), 15 deletions(-) diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index 3acfa6a25fa2..fb30d5053664 100644 --- a/drivers/iommu/dmar.c +++ b/drivers/iommu/dmar.c @@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 sid, u16 pfsid, qi_submit_sync(&desc, iommu); } +/* PASID-based IOTLB invalidation */ +void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr, +unsigned long npages, bool ih) +{ + struct qi_desc desc = {.qw2 = 0, .qw3 = 0}; + + /* +* npages == -1 means a PASID-selective invalidation, otherwise, +* a positive value for Page-selective-within-PASID invalidation. +* 0 is not a valid input. +*/ + if (WARN_ON(!npages)) { + pr_err("Invalid input npages = %ld\n", npages); + return; + } + + if (npages == -1) { + desc.qw0 = QI_EIOTLB_PASID(pasid) | + QI_EIOTLB_DID(did) | + QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) | + QI_EIOTLB_TYPE; + desc.qw1 = 0; + } else { + int mask = ilog2(__roundup_pow_of_two(npages)); + unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask)); + + if (WARN_ON_ONCE(!ALIGN(addr, align))) + addr &= ~(align - 1); + + desc.qw0 = QI_EIOTLB_PASID(pasid) | + QI_EIOTLB_DID(did) | + QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) | + QI_EIOTLB_TYPE; + desc.qw1 = QI_EIOTLB_ADDR(addr) | + QI_EIOTLB_IH(ih) | + QI_EIOTLB_AM(mask); + } + + qi_submit_sync(&desc, iommu); +} + /* * Disable Queued Invalidation interface. */ diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 071cbc172ce8..54db6bc0b281 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1509,6 +1509,20 @@ static void iommu_flush_dev_iotlb(struct dmar_domain *domain, spin_unlock_irqrestore(&device_domain_lock, flags); } +static void domain_flush_piotlb(struct intel_iommu *iommu, + struct dmar_domain *domain, + u64 addr, unsigned long npages, bool ih) +{ + u16 did = domain->iommu_did[iommu->seq_id]; + + if (domain->default_pasid) + qi_flush_piotlb(iommu, did, domain->default_pasid, + addr, npages, ih); + + if (!list_empty(&domain->devices)) + qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih); +} + static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, struct dmar_domain *domain, unsigned long pfn, unsigned int pages, @@ -1522,18 +1536,23 @@ static void iommu_flush_iotlb_psi(struct intel_iommu *iommu, if (ih) ih = 1 << 6; - /* -* Fallback to domain selective flush if no PSI support or the size is -* too big. -* PSI requires page size to be 2 ^ x, and the base address is naturally -* aligned to the size -*/ - if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap)) - iommu->flush.flush_iotlb(iommu, did, 0, 0, - DMA_TLB_DSI_FLUSH); - else - iommu->flush.flush_iotlb(iommu, did, addr | ih, mask, - DMA_TLB_PSI_FLUSH); + + if (domain_use_first_level(domain)) { + domain_flush_piotlb(iommu, domain, addr, pages, ih); + } else { + /* +* Fallback to domain selective flush if no PSI support or +* the size is too big. PSI requires page size to be 2 ^ x, +* and the base address is naturally aligned to the size. +*/ + if (!cap_pgsel_inv(iommu->cap) || + mask > cap_max_amask_val(iommu->cap)) + iommu->flush.flush_iotlb(iommu, did, 0, 0, + DMA_TLB_DSI_FLUSH); + else + iommu->flush.flush_iotlb(iommu, did, addr | ih, mask, + DMA_TLB_PSI_FLUSH); + } /* * In caching mode, changes of pages fr
[PATCH v5 9/9] iommu/vt-d: debugfs: Add support to show page table internals
Export page table internals of the domain attached to each device. Example of such dump on a Skylake machine: $ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct [ ... ] Device :00:14.0 with pasid 0 @0x15f3d9000 IOVA_PFNPML5E PML4E 0x8ced0 | 0x 0x00015f3da003 0x8ced1 | 0x 0x00015f3da003 0x8ced2 | 0x 0x00015f3da003 0x8ced3 | 0x 0x00015f3da003 0x8ced4 | 0x 0x00015f3da003 0x8ced5 | 0x 0x00015f3da003 0x8ced6 | 0x 0x00015f3da003 0x8ced7 | 0x 0x00015f3da003 0x8ced8 | 0x 0x00015f3da003 0x8ced9 | 0x 0x00015f3da003 PDPEPDE PTE 0x00015f3db003 0x00015f3dc003 0x8ced0003 0x00015f3db003 0x00015f3dc003 0x8ced1003 0x00015f3db003 0x00015f3dc003 0x8ced2003 0x00015f3db003 0x00015f3dc003 0x8ced3003 0x00015f3db003 0x00015f3dc003 0x8ced4003 0x00015f3db003 0x00015f3dc003 0x8ced5003 0x00015f3db003 0x00015f3dc003 0x8ced6003 0x00015f3db003 0x00015f3dc003 0x8ced7003 0x00015f3db003 0x00015f3dc003 0x8ced8003 0x00015f3db003 0x00015f3dc003 0x8ced9003 [ ... ] Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu-debugfs.c | 75 + drivers/iommu/intel-iommu.c | 4 +- include/linux/intel-iommu.h | 2 + 3 files changed, 79 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu-debugfs.c b/drivers/iommu/intel-iommu-debugfs.c index 471f05d452e0..c1257bef553c 100644 --- a/drivers/iommu/intel-iommu-debugfs.c +++ b/drivers/iommu/intel-iommu-debugfs.c @@ -5,6 +5,7 @@ * Authors: Gayatri Kammela * Sohil Mehta * Jacob Pan + * Lu Baolu */ #include @@ -283,6 +284,77 @@ static int dmar_translation_struct_show(struct seq_file *m, void *unused) } DEFINE_SHOW_ATTRIBUTE(dmar_translation_struct); +static inline unsigned long level_to_directory_size(int level) +{ + return BIT_ULL(VTD_PAGE_SHIFT + VTD_STRIDE_SHIFT * (level - 1)); +} + +static inline void +dump_page_info(struct seq_file *m, unsigned long iova, u64 *path) +{ + seq_printf(m, "0x%013lx |\t0x%016llx\t0x%016llx\t0x%016llx\t0x%016llx\t0x%016llx\n", + iova >> VTD_PAGE_SHIFT, path[5], path[4], + path[3], path[2], path[1]); +} + +static void pgtable_walk_level(struct seq_file *m, struct dma_pte *pde, + int level, unsigned long start, + u64 *path) +{ + int i; + + if (level > 5 || level < 1) + return; + + for (i = 0; i < BIT_ULL(VTD_STRIDE_SHIFT); + i++, pde++, start += level_to_directory_size(level)) { + if (!dma_pte_present(pde)) + continue; + + path[level] = pde->val; + if (dma_pte_superpage(pde) || level == 1) + dump_page_info(m, start, path); + else + pgtable_walk_level(m, phys_to_virt(dma_pte_addr(pde)), + level - 1, start, path); + path[level] = 0; + } +} + +static int show_device_domain_translation(struct device *dev, void *data) +{ + struct dmar_domain *domain = find_domain(dev); + struct seq_file *m = data; + u64 path[6] = { 0 }; + + if (!domain) + return 0; + + seq_printf(m, "Device %s with pasid %d @0x%llx\n", + dev_name(dev), domain->default_pasid, + (u64)virt_to_phys(domain->pgd)); + seq_puts(m, "IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n"); + + pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path); + seq_putc(m, '\n'); + + return 0; +} + +static int domain_translation_struct_show(struct seq_file *m, void *unused) +{ + unsigned long flags; + int ret; + + spin_lock_irqsave(&device_domain_lock, flags); + ret = bus_for_each_dev(&pci_bus_type, NULL, m, + show_device_domain_translation); + spin_unlock_irqrestore(&device_domain_lock, flags); + + return ret; +} +DEFINE_SHOW_ATTRIBUTE(domain_translation_struct); + #ifdef CONFIG_IRQ_REMAP static void ir_tbl_remap_entry_show(struct seq_file *m, struct intel_iommu *iommu) @@ -396,6 +468,9 @@ void __init intel_iommu_debugfs_init(void)
[PATCH v5 6/9] iommu/vt-d: Make first level IOVA canonical
First-level translation restricts the input-address to a canonical address (i.e., address bits 63:N have the same value as address bit [N-1], where N is 48-bits with 4-level paging and 57-bits with 5-level paging). (section 3.6 in the spec) This makes first level IOVA canonical by using IOVA with bit [N-1] always cleared. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 17 +++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 54db6bc0b281..1ebf5ed460cf 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3505,8 +3505,21 @@ static unsigned long intel_alloc_iova(struct device *dev, { unsigned long iova_pfn; - /* Restrict dma_mask to the width that the iommu can handle */ - dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask); + /* +* Restrict dma_mask to the width that the iommu can handle. +* First-level translation restricts the input-address to a +* canonical address (i.e., address bits 63:N have the same +* value as address bit [N-1], where N is 48-bits with 4-level +* paging and 57-bits with 5-level paging). Hence, skip bit +* [N-1]. +*/ + if (domain_use_first_level(domain)) + dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw - 1), +dma_mask); + else + dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), +dma_mask); + /* Ensure we reserve the whole size-aligned region */ nrpages = __roundup_pow_of_two(nrpages); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 3/9] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup
Current intel_pasid_setup_first_level() use 5-level paging for first level translation if CPUs use 5-level paging mode too. This makes sense for SVA usages since the page table is shared between CPUs and IOMMUs. But it makes no sense if we only want to use first level for IOVA translation. Add PASID_FLAG_FL5LP bit in the flags which indicates whether the 5-level paging mode should be used. Signed-off-by: Lu Baolu --- drivers/iommu/intel-pasid.c | 7 ++- drivers/iommu/intel-pasid.h | 6 ++ drivers/iommu/intel-svm.c | 8 ++-- 3 files changed, 14 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c index 3cb569e76642..22b30f10b396 100644 --- a/drivers/iommu/intel-pasid.c +++ b/drivers/iommu/intel-pasid.c @@ -477,18 +477,15 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu, pasid_set_sre(pte); } -#ifdef CONFIG_X86 - /* Both CPU and IOMMU paging mode need to match */ - if (cpu_feature_enabled(X86_FEATURE_LA57)) { + if (flags & PASID_FLAG_FL5LP) { if (cap_5lp_support(iommu->cap)) { pasid_set_flpm(pte, 1); } else { - pr_err("VT-d has no 5-level paging support for CPU\n"); + pr_err("No 5-level paging support for first-level\n"); pasid_clear_entry(pte); return -EINVAL; } } -#endif /* CONFIG_X86 */ pasid_set_domain_id(pte, did); pasid_set_address_width(pte, iommu->agaw); diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h index fc8cd8f17de1..92de6df24ccb 100644 --- a/drivers/iommu/intel-pasid.h +++ b/drivers/iommu/intel-pasid.h @@ -37,6 +37,12 @@ */ #define PASID_FLAG_SUPERVISOR_MODE BIT(0) +/* + * The PASID_FLAG_FL5LP flag Indicates using 5-level paging for first- + * level translation, otherwise, 4-level paging will be used. + */ +#define PASID_FLAG_FL5LP BIT(1) + struct pasid_dir_entry { u64 val; }; diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c index 04023033b79f..d7f2a5358900 100644 --- a/drivers/iommu/intel-svm.c +++ b/drivers/iommu/intel-svm.c @@ -364,7 +364,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ ret = intel_pasid_setup_first_level(iommu, dev, mm ? mm->pgd : init_mm.pgd, svm->pasid, FLPT_DEFAULT_DID, - mm ? 0 : PASID_FLAG_SUPERVISOR_MODE); + (mm ? 0 : PASID_FLAG_SUPERVISOR_MODE) | + (cpu_feature_enabled(X86_FEATURE_LA57) ? +PASID_FLAG_FL5LP : 0)); spin_unlock(&iommu->lock); if (ret) { if (mm) @@ -385,7 +387,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int flags, struct svm_dev_ ret = intel_pasid_setup_first_level(iommu, dev, mm ? mm->pgd : init_mm.pgd, svm->pasid, FLPT_DEFAULT_DID, - mm ? 0 : PASID_FLAG_SUPERVISOR_MODE); + (mm ? 0 : PASID_FLAG_SUPERVISOR_MODE) | + (cpu_feature_enabled(X86_FEATURE_LA57) ? + PASID_FLAG_FL5LP : 0)); spin_unlock(&iommu->lock); if (ret) { kfree(sdev); -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[PATCH v5 0/9] Use 1st-level for IOVA translation
Intel VT-d in scalable mode supports two types of page tables for DMA translation: the first level page table and the second level page table. The first level page table uses the same format as the CPU page table, while the second level page table keeps compatible with previous formats. The software is able to choose any one of them for DMA remapping according to the use case. This patchset aims to move IOVA (I/O Virtual Address) translation to 1st-level page table in scalable mode. This will simplify vIOMMU (IOMMU simulated by VM hypervisor) design by using the two-stage translation, a.k.a. nested mode translation. As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA) support is currently implemented in a shadow page manner. The device simulation software, like QEMU, has to figure out GIOVA->GPA mappings and write them to a shadowed page table, which will be used by the physical IOMMU. Each time when mappings are created or destroyed in vIOMMU, the simulation software has to intervene. Hence, the changes on GIOVA->GPA could be shadowed to host. .---. | vIOMMU | |---| .. | |IOTLB flush trap |QEMU| .---. (map/unmap) || |GIOVA->GPA |>|.. | '---' || GIOVA->HPA | | | | |'' | '---' || || '' | < | v VFIO/IOMMU API .---. | pIOMMU | |---| | | .---. |GIOVA->HPA | '---' | | '---' In VT-d 3.0, scalable mode is introduced, which offers two-level translation page tables and nested translation mode. Regards to GIOVA support, it can be simplified by 1) moving the GIOVA support over 1st-level page table to store GIOVA->GPA mapping in vIOMMU, 2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU second level for GPA->HPA translation, and 4) enable nested (a.k.a. dual-stage) translation in host. Compared with current shadow GIOVA support, the new approach makes the vIOMMU design simpler and more efficient as we only need to flush the pIOMMU IOTLB and possible device-IOTLB when an IOVA mapping in vIOMMU is torn down. .---. | vIOMMU | |---| .---. | |IOTLB flush trap | QEMU| .---.(unmap) |---| |GIOVA->GPA |>| | '---' '---' | | | '---' | <-- | VFIO/IOMMU | cache invalidation and | guest gpd bind interfaces v .---. | pIOMMU | |---| .---. |GIOVA->GPA |<---First level '---' | GPA->HPA |<---Scond level '---' '---' This patch applies the first level page table for IOVA translation unless the DOMAIN_ATTR_NESTING domain attribution has been set. Setting of this attribution means the second level will be used to map gPA (guest physical address) to hPA (host physical address), and the mappings between gVA (guest virtual address) and gPA will be maintained by the guest with the page table address binding to host's first level. Based-on-idea-by: Ashok Raj Based-on-idea-by: Kevin Tian Based-on-idea-by: Liu Yi L Based-on-idea-by: Jacob Pan Based-on-idea-by: Sanjay Kumar Based-on-idea-by: Lu Baolu Change log: v4->v5: - The previous version was posted here https://lkml.org/lkml/2019/12/18/1371 - Set Execute Disable in first level page directory entries. - Make first level IOVA canonical. - Update first level super page capability. v3->v4: - The previous version was posted here https://lkml.org/lkml/2019/12/10/2126 - Set Execute Disable (bit 63) in first level table entries. - Enhance pasid-based iotlb invalidation for both default domain and auxiliary domain. - Add debugfs file to expose page table internals. v2->v3: - The previous version was posted here https://lkml.org/lkml/2019/11/27/1831 - Accept Jacob's suggestion on merging two page tables. v1->v2 - The first series was posted here https://lkml.org/lkml/2019/9/23/297 - Use per domain page table ops to handle different page tables. - Use first level for DMA remapping by default on both bare metal and vm guest. - Code refine according to code review comments for v1. Lu Baolu (9): iommu/vt-d: Identify domains using first level
[PATCH v5 8/9] iommu/vt-d: Use iova over first level
After we make all map/unmap paths support first level page table. Let's turn it on if hardware supports scalable mode. Signed-off-by: Lu Baolu --- drivers/iommu/intel-iommu.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c index 34e619318f64..51d60bad0b1d 100644 --- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -1770,15 +1770,13 @@ static void free_dmar_iommu(struct intel_iommu *iommu) /* * Check and return whether first level is used by default for - * DMA translation. Currently, we make it off by setting - * first_level_support = 0, and will change it to -1 after all - * map/unmap paths support first level page table. + * DMA translation. */ static bool first_level_by_default(void) { struct dmar_drhd_unit *drhd; struct intel_iommu *iommu; - static int first_level_support = 0; + static int first_level_support = -1; if (likely(first_level_support != -1)) return first_level_support; -- 2.17.1 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu