答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch

2019-12-23 Thread Jim,Yan
> -邮件原件-
> 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com]
> 发送时间: 2019年12月20日 17:23
> 收件人: Jim,Yan 
> 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org;
> linux-ker...@vger.kernel.org
> 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch
> 
> On Fri Dec 20 19, jimyan wrote:
> >On a system with an Intel PCIe port configured as a nvme host device,
> >iommu initialization fails with
> >
> >DMAR: Device scope type does not match for :80:00.0
> >
> >This is because the DMAR table reports this device as having scope 2
> >(ACPI_DMAR_SCOPE_TYPE_BRIDGE):
> >
> 
> Isn't that a problem to be fixed in the DMAR table then?
> 
> >but the device has a type 0 PCI header:
> >80:00.0 Class 0600: Device 8086:2020 (rev 06)
> >00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
> >10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
> >30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00
> >
> >VT-d works perfectly on this system, so there's no reason to bail out
> >on initialization due to this apparent scope mismatch. Add the class
> >0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR
> >initialization for non-bridge PCI devices listed with scope bridge.
> >
> >Signed-off-by: jimyan 
> >---
> > drivers/iommu/dmar.c | 1 +
> > 1 file changed, 1 insertion(+)
> >
> >diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
> >eecd6a421667..9faf2f0e0237 100644
> >--- a/drivers/iommu/dmar.c
> >+++ b/drivers/iommu/dmar.c
> >@@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct
> dmar_pci_notify_info *info,
> >  info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
> > (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
> >  (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> >+  info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST &&
> >   info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
> > pr_warn("Device scope type does not match for %s\n",
> > pci_name(info->dev));
> >--
> >2.11.0
> >
> >___
> >iommu mailing list
> >iommu@lists.linux-foundation.org
> >https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >
Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't 
reject NTB devices due to scope mismatch"). Besides, modifying DMAR table need 
OEM update BIOS. It is hard to implement.

Jim
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2019-12-23 Thread Jani Nikula
On Sat, 21 Dec 2019, Tom Murphy  wrote:
> This patchset converts the intel iommu driver to the dma-iommu api.
>
> While converting the driver I exposed a bug in the intel i915 driver
> which causes a huge amount of artifacts on the screen of my
> laptop. You can see a picture of it here:
> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
>
> This issue is most likely in the i915 driver and is most likely caused
> by the driver not respecting the return value of the
> dma_map_ops::map_sg function. You can see the driver ignoring the
> return value here:
> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>
> Previously this didn’t cause issues because the intel map_sg always
> returned the same number of elements as the input scatter gather list
> but with the change to this dma-iommu api this is no longer the
> case. I wasn’t able to track the bug down to a specific line of code
> unfortunately.
>
> Could someone from the intel team look at this?

Let me get this straight. There is current API that on success always
returns the same number of elements as the input scatter gather
list. You propose to change the API so that this is no longer the case?

A quick check of various dma_map_sg() calls in the kernel seems to
indicate checking for 0 for errors and then ignoring the non-zero return
is a common pattern. Are you sure it's okay to make the change you're
proposing?

Anyway, due to the time of year and all, I'd like to ask you to file a
bug against i915 at [1] so this is not forgotten, and please let's not
merge the changes before this is resolved.


Thanks,
Jani.


[1] https://gitlab.freedesktop.org/drm/intel/issues/new


-- 
Jani Nikula, Intel Open Source Graphics Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2019-12-23 Thread Robin Murphy

On 2019-12-23 10:37 am, Jani Nikula wrote:

On Sat, 21 Dec 2019, Tom Murphy  wrote:

This patchset converts the intel iommu driver to the dma-iommu api.

While converting the driver I exposed a bug in the intel i915 driver
which causes a huge amount of artifacts on the screen of my
laptop. You can see a picture of it here:
https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg

This issue is most likely in the i915 driver and is most likely caused
by the driver not respecting the return value of the
dma_map_ops::map_sg function. You can see the driver ignoring the
return value here:
https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51

Previously this didn’t cause issues because the intel map_sg always
returned the same number of elements as the input scatter gather list
but with the change to this dma-iommu api this is no longer the
case. I wasn’t able to track the bug down to a specific line of code
unfortunately.

Could someone from the intel team look at this?


Let me get this straight. There is current API that on success always
returns the same number of elements as the input scatter gather
list. You propose to change the API so that this is no longer the case?


No, the API for dma_map_sg() has always been that it may return fewer 
DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, 
the return value would surely be a simple success/fail condition). 
Relying on a particular implementation behaviour has never been strictly 
correct, even if it does happen to be a very common behaviour.



A quick check of various dma_map_sg() calls in the kernel seems to
indicate checking for 0 for errors and then ignoring the non-zero return
is a common pattern. Are you sure it's okay to make the change you're
proposing?


Various code uses tricks like just iterating the mapped list until the 
first segment with zero sg_dma_len(). Others may well simply have bugs.


Robin.


Anyway, due to the time of year and all, I'd like to ask you to file a
bug against i915 at [1] so this is not forgotten, and please let's not
merge the changes before this is resolved.


Thanks,
Jani.


[1] https://gitlab.freedesktop.org/drm/intel/issues/new



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 0/8] Convert the intel iommu driver to the dma-iommu api

2019-12-23 Thread Jani Nikula
On Mon, 23 Dec 2019, Robin Murphy  wrote:
> On 2019-12-23 10:37 am, Jani Nikula wrote:
>> On Sat, 21 Dec 2019, Tom Murphy  wrote:
>>> This patchset converts the intel iommu driver to the dma-iommu api.
>>>
>>> While converting the driver I exposed a bug in the intel i915 driver
>>> which causes a huge amount of artifacts on the screen of my
>>> laptop. You can see a picture of it here:
>>> https://github.com/pippy360/kernelPatches/blob/master/IMG_20191219_225922.jpg
>>>
>>> This issue is most likely in the i915 driver and is most likely caused
>>> by the driver not respecting the return value of the
>>> dma_map_ops::map_sg function. You can see the driver ignoring the
>>> return value here:
>>> https://github.com/torvalds/linux/blob/7e0165b2f1a912a06e381e91f0f4e495f4ac3736/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c#L51
>>>
>>> Previously this didn’t cause issues because the intel map_sg always
>>> returned the same number of elements as the input scatter gather list
>>> but with the change to this dma-iommu api this is no longer the
>>> case. I wasn’t able to track the bug down to a specific line of code
>>> unfortunately.
>>>
>>> Could someone from the intel team look at this?
>> 
>> Let me get this straight. There is current API that on success always
>> returns the same number of elements as the input scatter gather
>> list. You propose to change the API so that this is no longer the case?
>
> No, the API for dma_map_sg() has always been that it may return fewer 
> DMA segments than nents - see Documentation/DMA-API.txt (and otherwise, 
> the return value would surely be a simple success/fail condition). 
> Relying on a particular implementation behaviour has never been strictly 
> correct, even if it does happen to be a very common behaviour.
>
>> A quick check of various dma_map_sg() calls in the kernel seems to
>> indicate checking for 0 for errors and then ignoring the non-zero return
>> is a common pattern. Are you sure it's okay to make the change you're
>> proposing?
>
> Various code uses tricks like just iterating the mapped list until the 
> first segment with zero sg_dma_len(). Others may well simply have bugs.

Thanks for the clarification.

BR,
Jani.

>
> Robin.
>
>> Anyway, due to the time of year and all, I'd like to ask you to file a
>> bug against i915 at [1] so this is not forgotten, and please let's not
>> merge the changes before this is resolved.
>> 
>> 
>> Thanks,
>> Jani.
>> 
>> 
>> [1] https://gitlab.freedesktop.org/drm/intel/issues/new
>> 
>> 

-- 
Jani Nikula, Intel Open Source Graphics Center
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch

2019-12-23 Thread Lu Baolu

Hi,

On 2019/12/23 15:59, Jim,Yan wrote:

-邮件原件-
发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com]
发送时间: 2019年12月20日 17:23
收件人: Jim,Yan 
抄送: j...@8bytes.org; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org
主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch

On Fri Dec 20 19, jimyan wrote:

On a system with an Intel PCIe port configured as a nvme host device,
iommu initialization fails with

DMAR: Device scope type does not match for :80:00.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):



Isn't that a problem to be fixed in the DMAR table then?


but the device has a type 0 PCI header:
80:00.0 Class 0600: Device 8086:2020 (rev 06)
00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00

VT-d works perfectly on this system, so there's no reason to bail out
on initialization due to this apparent scope mismatch. Add the class
0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: jimyan 
---
drivers/iommu/dmar.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
eecd6a421667..9faf2f0e0237 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct

dmar_pci_notify_info *info,

 info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
(scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
 (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST &&
  info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
--
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't reject 
NTB devices due to scope mismatch"). Besides, modifying DMAR table need OEM update 
BIOS. It is hard to implement.



For both cases, a quirk flag seems to be more reasonable, so that
unrelated devices will not be impacted.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH] iommu/amd: Remove unused variable

2019-12-23 Thread Joerg Roedel
From: Joerg Roedel 

The iommu variable in set_device_exclusion_range() us unused
now and causes a compiler warning. Remove it.

Fixes: 387caf0b759a ("iommu/amd: Treat per-device exclusion ranges as r/w 
unity-mapped regions")
Signed-off-by: Joerg Roedel 
---
 drivers/iommu/amd_iommu_init.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 36649592ddf3..ba7ee4aa04f9 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1118,8 +1118,6 @@ static int __init add_early_maps(void)
  */
 static void __init set_device_exclusion_range(u16 devid, struct ivmd_header *m)
 {
-   struct amd_iommu *iommu = amd_iommu_rlookup_table[devid];
-
if (!(m->flags & IVMD_FLAG_EXCL_RANGE))
return;
 
-- 
2.16.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 0/4] Add uacce module for Accelerator

2019-12-23 Thread zhangfei

Hi, Greg

On 2019/12/16 上午11:08, Zhangfei Gao wrote:

Uacce (Unified/User-space-access-intended Accelerator Framework) targets to
provide Shared Virtual Addressing (SVA) between accelerators and processes.
So accelerator can access any data structure of the main cpu.
This differs from the data sharing between cpu and io device, which share
data content rather than address.
Because of unified address, hardware and user space of process can share
the same virtual address in the communication.

Uacce is intended to be used with Jean Philippe Brucker's SVA
patchset[1], which enables IO side page fault and PASID support.
We have keep verifying with Jean's sva patchset [2]
We also keep verifying with Eric's SMMUv3 Nested Stage patches [3]

This series and related zip & qm driver
https://github.com/Linaro/linux-kernel-warpdrive/tree/v5.5-rc1-uacce-v10

The library and user application:
https://github.com/Linaro/warpdrive/tree/wdprd-upstream-v10

References:
[1] http://jpbrucker.net/sva/
[2] http://jpbrucker.net/git/linux/log/?h=sva/zip-devel
[3] https://github.com/eauger/linux/tree/v5.3.0-rc0-2stage-v9

Change History:
v10:
Modify the include header to fix kbuild test erorr in other arch.


Kenneth Lee (2):
   uacce: Add documents for uacce
   uacce: add uacce driver

Zhangfei Gao (2):
   crypto: hisilicon - Remove module_param uacce_mode
   crypto: hisilicon - register zip engine to uacce




Would you mind take a look at the patch set?

The patches are also used for verifying the sva feature.
https://lore.kernel.org/linux-iommu/20191219163033.2608177-1-jean-phili...@linaro.org/

Thanks
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Patch "iommu: set group default domain before creating direct mappings" has been added to the 5.4-stable tree

2019-12-23 Thread gregkh


This is a note to let you know that I've just added the patch titled

iommu: set group default domain before creating direct mappings

to the 5.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 iommu-set-group-default-domain-before-creating-direct-mappings.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From d360211524bece6db9920f32c91808235290b51c Mon Sep 17 00:00:00 2001
From: Jerry Snitselaar 
Date: Tue, 10 Dec 2019 11:56:06 -0700
Subject: iommu: set group default domain before creating direct mappings

From: Jerry Snitselaar 

commit d360211524bece6db9920f32c91808235290b51c upstream.

iommu_group_create_direct_mappings uses group->default_domain, but
right after it is called, request_default_domain_for_dev calls
iommu_domain_free for the default domain, and sets the group default
domain to a different domain. Move the
iommu_group_create_direct_mappings call to after the group default
domain is set, so the direct mappings get associated with that domain.

Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: sta...@vger.kernel.org
Fixes: 7423e01741dd ("iommu: Add API to request DMA domain for device")
Signed-off-by: Jerry Snitselaar 
Reviewed-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/iommu/iommu.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2221,13 +2221,13 @@ request_default_domain_for_dev(struct de
goto out;
}
 
-   iommu_group_create_direct_mappings(group, dev);
-
/* Make the domain the default for this group */
if (group->default_domain)
iommu_domain_free(group->default_domain);
group->default_domain = domain;
 
+   iommu_group_create_direct_mappings(group, dev);
+
dev_info(dev, "Using iommu %s mapping\n",
 type == IOMMU_DOMAIN_DMA ? "dma" : "direct");
 


Patches currently in stable-queue which might be from jsnit...@redhat.com are

queue-5.4/iommu-fix-kasan-use-after-free-in-iommu_insert_resv_region.patch
queue-5.4/iommu-vt-d-fix-dmar-pte-read-access-not-set-error.patch
queue-5.4/iommu-set-group-default-domain-before-creating-direct-mappings.patch
queue-5.4/tpm_tis-reserve-chip-for-duration-of-tpm_tis_core_init.patch
queue-5.4/iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch
queue-5.4/iommu-vt-d-set-isa-bridge-reserved-region-as-relaxable.patch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Patch "iommu/vt-d: Allocate reserved region for ISA with correct permission" has been added to the 5.4-stable tree

2019-12-23 Thread gregkh


This is a note to let you know that I've just added the patch titled

iommu/vt-d: Allocate reserved region for ISA with correct permission

to the 5.4-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch
and it can be found in the queue-5.4 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From cde9319e884eb6267a0df446f3c131fe1108defb Mon Sep 17 00:00:00 2001
From: Jerry Snitselaar 
Date: Thu, 12 Dec 2019 22:36:42 -0700
Subject: iommu/vt-d: Allocate reserved region for ISA with correct permission

From: Jerry Snitselaar 

commit cde9319e884eb6267a0df446f3c131fe1108defb upstream.

Currently the reserved region for ISA is allocated with no
permissions. If a dma domain is being used, mapping this region will
fail. Set the permissions to DMA_PTE_READ|DMA_PTE_WRITE.

Cc: Joerg Roedel 
Cc: Lu Baolu 
Cc: iommu@lists.linux-foundation.org
Cc: sta...@vger.kernel.org # v5.3+
Fixes: d850c2ee5fe2 ("iommu/vt-d: Expose ISA direct mapping region via 
iommu_get_resv_regions")
Signed-off-by: Jerry Snitselaar 
Acked-by: Lu Baolu 
Signed-off-by: Joerg Roedel 
Signed-off-by: Greg Kroah-Hartman 

---
 drivers/iommu/intel-iommu.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5697,7 +5697,7 @@ static void intel_iommu_get_resv_regions
struct pci_dev *pdev = to_pci_dev(device);
 
if ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA) {
-   reg = iommu_alloc_resv_region(0, 1UL << 24, 0,
+   reg = iommu_alloc_resv_region(0, 1UL << 24, prot,
   IOMMU_RESV_DIRECT_RELAXABLE);
if (reg)
list_add_tail(®->list, head);


Patches currently in stable-queue which might be from jsnit...@redhat.com are

queue-5.4/iommu-fix-kasan-use-after-free-in-iommu_insert_resv_region.patch
queue-5.4/iommu-vt-d-fix-dmar-pte-read-access-not-set-error.patch
queue-5.4/iommu-set-group-default-domain-before-creating-direct-mappings.patch
queue-5.4/tpm_tis-reserve-chip-for-duration-of-tpm_tis_core_init.patch
queue-5.4/iommu-vt-d-allocate-reserved-region-for-isa-with-correct-permission.patch
queue-5.4/iommu-vt-d-set-isa-bridge-reserved-region-as-relaxable.patch
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] virtio-mmio: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code, which
contains platform_get_resource, devm_request_mem_region and
devm_ioremap.

Signed-off-by: Yangtao Li 
---
 drivers/virtio/virtio_mmio.c | 15 +++
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index e09edb5c5e06..97d5725fd9a2 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -531,18 +531,9 @@ static void virtio_mmio_release_dev(struct device *_d)
 static int virtio_mmio_probe(struct platform_device *pdev)
 {
struct virtio_mmio_device *vm_dev;
-   struct resource *mem;
unsigned long magic;
int rc;
 
-   mem = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   if (!mem)
-   return -EINVAL;
-
-   if (!devm_request_mem_region(&pdev->dev, mem->start,
-   resource_size(mem), pdev->name))
-   return -EBUSY;
-
vm_dev = devm_kzalloc(&pdev->dev, sizeof(*vm_dev), GFP_KERNEL);
if (!vm_dev)
return -ENOMEM;
@@ -554,9 +545,9 @@ static int virtio_mmio_probe(struct platform_device *pdev)
INIT_LIST_HEAD(&vm_dev->virtqueues);
spin_lock_init(&vm_dev->lock);
 
-   vm_dev->base = devm_ioremap(&pdev->dev, mem->start, resource_size(mem));
-   if (vm_dev->base == NULL)
-   return -EFAULT;
+   vm_dev->base = devm_platform_ioremap_resource(pdev, 0);
+   if (IS_ERR(vm_dev->base))
+   return PTR_ERR(vm_dev->base);
 
/* Check magic value */
magic = readl(vm_dev->base + VIRTIO_MMIO_MAGIC_VALUE);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/6] iommu/omap: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/omap-iommu.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/omap-iommu.c b/drivers/iommu/omap-iommu.c
index be551cc34be4..297c1be7ecb0 100644
--- a/drivers/iommu/omap-iommu.c
+++ b/drivers/iommu/omap-iommu.c
@@ -1175,7 +1175,6 @@ static int omap_iommu_probe(struct platform_device *pdev)
int err = -ENODEV;
int irq;
struct omap_iommu *obj;
-   struct resource *res;
struct device_node *of = pdev->dev.of_node;
struct orphan_dev *orphan_dev, *tmp;
 
@@ -1218,8 +1217,7 @@ static int omap_iommu_probe(struct platform_device *pdev)
spin_lock_init(&obj->iommu_lock);
spin_lock_init(&obj->page_table_lock);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   obj->regbase = devm_ioremap_resource(obj->dev, res);
+   obj->regbase = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(obj->regbase))
return PTR_ERR(obj->regbase);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 2/6] iommu/exynos: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/exynos-iommu.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/exynos-iommu.c b/drivers/iommu/exynos-iommu.c
index 186ff5cc975c..42d8407267ef 100644
--- a/drivers/iommu/exynos-iommu.c
+++ b/drivers/iommu/exynos-iommu.c
@@ -571,14 +571,12 @@ static int exynos_sysmmu_probe(struct platform_device 
*pdev)
int irq, ret;
struct device *dev = &pdev->dev;
struct sysmmu_drvdata *data;
-   struct resource *res;
 
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data)
return -ENOMEM;
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   data->sfrbase = devm_ioremap_resource(dev, res);
+   data->sfrbase = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(data->sfrbase))
return PTR_ERR(data->sfrbase);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/6] iommu/ipmmu-vmsa: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/ipmmu-vmsa.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/ipmmu-vmsa.c b/drivers/iommu/ipmmu-vmsa.c
index d02edd2751f3..3124e28fee85 100644
--- a/drivers/iommu/ipmmu-vmsa.c
+++ b/drivers/iommu/ipmmu-vmsa.c
@@ -1015,7 +1015,6 @@ static const struct of_device_id ipmmu_of_ids[] = {
 static int ipmmu_probe(struct platform_device *pdev)
 {
struct ipmmu_vmsa_device *mmu;
-   struct resource *res;
int irq;
int ret;
 
@@ -1033,8 +1032,7 @@ static int ipmmu_probe(struct platform_device *pdev)
dma_set_mask_and_coherent(&pdev->dev, DMA_BIT_MASK(40));
 
/* Map I/O memory and request IRQ. */
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   mmu->base = devm_ioremap_resource(&pdev->dev, res);
+   mmu->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(mmu->base))
return PTR_ERR(mmu->base);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/6] iommu/mediatek: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/mtk_iommu_v1.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/mtk_iommu_v1.c b/drivers/iommu/mtk_iommu_v1.c
index e93b94ecac45..3d6bb08b2a54 100644
--- a/drivers/iommu/mtk_iommu_v1.c
+++ b/drivers/iommu/mtk_iommu_v1.c
@@ -553,7 +553,6 @@ static int mtk_iommu_probe(struct platform_device *pdev)
 {
struct mtk_iommu_data   *data;
struct device   *dev = &pdev->dev;
-   struct resource *res;
struct component_match  *match = NULL;
struct of_phandle_args  larb_spec;
struct of_phandle_iterator  it;
@@ -573,8 +572,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
return -ENOMEM;
data->protect_base = ALIGN(virt_to_phys(protect), MTK_PROTECT_PA_ALIGN);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   data->base = devm_ioremap_resource(dev, res);
+   data->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(data->base))
return PTR_ERR(data->base);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 6/6] iommu/rockchip: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/rockchip-iommu.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/iommu/rockchip-iommu.c b/drivers/iommu/rockchip-iommu.c
index b33cdd5aad81..c6d50396f4c2 100644
--- a/drivers/iommu/rockchip-iommu.c
+++ b/drivers/iommu/rockchip-iommu.c
@@ -1138,7 +1138,6 @@ static int rk_iommu_probe(struct platform_device *pdev)
 {
struct device *dev = &pdev->dev;
struct rk_iommu *iommu;
-   struct resource *res;
int num_res = pdev->num_resources;
int err, i;
 
@@ -1156,10 +1155,7 @@ static int rk_iommu_probe(struct platform_device *pdev)
return -ENOMEM;
 
for (i = 0; i < num_res; i++) {
-   res = platform_get_resource(pdev, IORESOURCE_MEM, i);
-   if (!res)
-   continue;
-   iommu->bases[i] = devm_ioremap_resource(&pdev->dev, res);
+   iommu->bases[i] = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(iommu->bases[i]))
continue;
iommu->num_mmu++;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/6] iommu/qcom: convert to devm_platform_ioremap_resource

2019-12-23 Thread Yangtao Li
Use devm_platform_ioremap_resource() to simplify code.

Signed-off-by: Yangtao Li 
---
 drivers/iommu/qcom_iommu.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/iommu/qcom_iommu.c b/drivers/iommu/qcom_iommu.c
index 52f38292df5b..bf94d4d67da4 100644
--- a/drivers/iommu/qcom_iommu.c
+++ b/drivers/iommu/qcom_iommu.c
@@ -709,7 +709,6 @@ static int qcom_iommu_ctx_probe(struct platform_device 
*pdev)
struct qcom_iommu_ctx *ctx;
struct device *dev = &pdev->dev;
struct qcom_iommu_dev *qcom_iommu = dev_get_drvdata(dev->parent);
-   struct resource *res;
int ret, irq;
 
ctx = devm_kzalloc(dev, sizeof(*ctx), GFP_KERNEL);
@@ -719,8 +718,7 @@ static int qcom_iommu_ctx_probe(struct platform_device 
*pdev)
ctx->dev = dev;
platform_set_drvdata(pdev, ctx);
 
-   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
-   ctx->base = devm_ioremap_resource(dev, res);
+   ctx->base = devm_platform_ioremap_resource(pdev, 0);
if (IS_ERR(ctx->base))
return PTR_ERR(ctx->base);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] virtio-mmio: convert to devm_platform_ioremap_resource

2019-12-23 Thread Frank Lee
Please ignore this patch.

Thx!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/3] iommu/vt-d: skip RMRR entries that fail the sanity check

2019-12-23 Thread Barret Rhoden via iommu

On 12/17/19 2:19 PM, Chen, Yian wrote:
Regardless, I have two other patches in this series that could resolve 
the problem for me and probably other people.  I'd just like at least 
one of the three patches to get merged so that my machine boots when 
the original commit f036c7fa0ab6 ("iommu/vt-d: Check VT-d RMRR region 
in BIOS is reported as reserved") gets released.


when a firmware bug appears, the potential problem may beyond the scope 
of its visible impacts so that introducing a workaround in official 
implementation should be considered very carefully.


Agreed.  I think that in the RMRR case, it wouldn't surprise me if these 
problems are already occurring, and we just didn't know about it, so I'd 
like to think about sane workarounds.  I only noticed it on a kexec. 
Not sure how many people with similarly-broken firmware are kexecing 
kernels on linus/master kernels yet.


Specifically, my firmware reports an RMRR with start == 0 and end == 0 
(end should be page-aligned-minus-one).  The only reason commit 
f036c7fa0ab6 didn't catch it on a full reboot is that trim_bios_range() 
reserved the first page, assuming that the BIOS meant to reserve it but 
just didn't tell us in the e820 map.  My firmware didn't mark that first 
page E820_RESERVED.  On a kexec, the range that got trimmed was 
0x100-0xfff instead of 0x000-0xfff.  In both cases, the kernel won't use 
the region the broken RMRR points to, but in the kexec case, it wasn't 
E820_RESERVED, so the new commit aborted the DMAR setup.


If the workaround is really needed at this point, I would recommend 
adding a WARN_TAINT with TAINT_FIRMWARE_WORKAROUND, to tell the 
workaround is in the place.


Sounds good.  I can rework the patchset so that whenever I skip an RMRR 
entry or whatnot, I'll put in a WARN_TAINT.  I see a few other examples 
in dmar.c to work from.


If any of the three changes are too aggressive, I'm OK with you all 
taking just one of them.  I'd like to be able to kexec with the new 
kernel.  I'm likely not the only one with bad firmware, and any bug that 
only shows up on a kexec often a pain to detect.


Thanks,

Barret

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 5/5] drm/msm/a6xx: Support split pagetables

2019-12-23 Thread smasetty

On 2019-12-16 22:07, Jordan Crouse wrote:

Attempt to enable split pagetables if the arm-smmu driver supports it.
This will move the default address space from the default region to
the address range assigned to TTBR1. The behavior should be transparent
to the driver for now but it gets the default buffers out of the way
when we want to start swapping TTBR0 for context-specific pagetables.

Signed-off-by: Jordan Crouse 
---

 drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 52 
++-

 1 file changed, 51 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
index 5dc0b2c..1c6da93 100644
--- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
+++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c
@@ -811,6 +811,56 @@ static unsigned long a6xx_gpu_busy(struct msm_gpu 
*gpu)

return (unsigned long)busy_time;
 }

+static struct msm_gem_address_space *
+a6xx_create_address_space(struct msm_gpu *gpu, struct platform_device 
*pdev)

+{
+   struct iommu_domain *iommu = iommu_domain_alloc(&platform_bus_type);
+   struct msm_gem_address_space *aspace;
+   struct msm_mmu *mmu;
+   u64 start, size;
+   u32 val = 1;
+   int ret;
+
+   if (!iommu)
+   return ERR_PTR(-ENOMEM);
+
+   /*
+	 * Try to request split pagetables - the request has to be made 
before

+* the domian is attached
+*/
+   iommu_domain_set_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+
+   mmu = msm_iommu_new(&pdev->dev, iommu);
+   if (IS_ERR(mmu)) {
+   iommu_domain_free(iommu);
+   return ERR_CAST(mmu);
+   }
+
+   /*
+	 * After the domain is attached, see if the split tables were 
actually

+* successful.
+*/
+   ret = iommu_domain_get_attr(iommu, DOMAIN_ATTR_SPLIT_TABLES, &val);
+   if (!ret && val) {
+   /*
+* The aperture start will be at the beginning of the TTBR1
+* space so use that as a base
+*/
+   start = iommu->geometry.aperture_start;
+   size = 0x;

This should be the va_end and not the size

+   } else {
+   /* Otherwise use the legacy 32 bit region */
+   start = SZ_16M;
+   size = 0x - SZ_16M;

same as above

+   }
+
+   aspace = msm_gem_address_space_create(mmu, "gpu", start, size);
+   if (IS_ERR(aspace))
+   iommu_domain_free(iommu);
+
+   return aspace;
+}
+
 static const struct adreno_gpu_funcs funcs = {
.base = {
.get_param = adreno_get_param,
@@ -832,7 +882,7 @@ static const struct adreno_gpu_funcs funcs = {
 #if defined(CONFIG_DRM_MSM_GPU_STATE)
.gpu_state_get = a6xx_gpu_state_get,
.gpu_state_put = a6xx_gpu_state_put,
-   .create_address_space = adreno_iommu_create_address_space,
+   .create_address_space = a6xx_create_address_space,
 #endif
},
.get_timestamp = a6xx_get_timestamp,

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


答复: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch

2019-12-23 Thread Jim,Yan
> -邮件原件-
> 发件人: Lu Baolu [mailto:baolu...@linux.intel.com]
> 发送时间: 2019年12月23日 21:05
> 收件人: Jim,Yan ; Jerry Snitselaar 
> 抄送: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
> 主题: Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope
> mismatch
> 
> Hi,
> 
> On 2019/12/23 15:59, Jim,Yan wrote:
> >> -邮件原件-
> >> 发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com]
> >> 发送时间: 2019年12月20日 17:23
> >> 收件人: Jim,Yan 
> >> 抄送: j...@8bytes.org; iommu@lists.linux-foundation.org;
> >> linux-ker...@vger.kernel.org
> >> 主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope
> >> mismatch
> >>
> >> On Fri Dec 20 19, jimyan wrote:
> >>> On a system with an Intel PCIe port configured as a nvme host
> >>> device, iommu initialization fails with
> >>>
> >>> DMAR: Device scope type does not match for :80:00.0
> >>>
> >>> This is because the DMAR table reports this device as having scope 2
> >>> (ACPI_DMAR_SCOPE_TYPE_BRIDGE):
> >>>
> >>
> >> Isn't that a problem to be fixed in the DMAR table then?
> >>
> >>> but the device has a type 0 PCI header:
> >>> 80:00.0 Class 0600: Device 8086:2020 (rev 06)
> >>> 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
> >>> 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> >>> 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
> >>> 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00
> >>>
> >>> VT-d works perfectly on this system, so there's no reason to bail
> >>> out on initialization due to this apparent scope mismatch. Add the
> >>> class
> >>> 0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR
> >>> initialization for non-bridge PCI devices listed with scope bridge.
> >>>
> >>> Signed-off-by: jimyan 
> >>> ---
> >>> drivers/iommu/dmar.c | 1 +
> >>> 1 file changed, 1 insertion(+)
> >>>
> >>> diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
> >>> eecd6a421667..9faf2f0e0237 100644
> >>> --- a/drivers/iommu/dmar.c
> >>> +++ b/drivers/iommu/dmar.c
> >>> @@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct
> >> dmar_pci_notify_info *info,
> >>>info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
> >>>   (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE
> &&
> >>>(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
> >>> +   info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST &&
> >>> info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
> >>>   pr_warn("Device scope type does not match for %s\n",
> >>>   pci_name(info->dev));
> >>> --
> >>> 2.11.0
> >>>
> >>> ___
> >>> iommu mailing list
> >>> iommu@lists.linux-foundation.org
> >>> https://lists.linuxfoundation.org/mailman/listinfo/iommu
> >>>
> > Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: 
> > Don't
> reject NTB devices due to scope mismatch"). Besides, modifying DMAR table
> need OEM update BIOS. It is hard to implement.
> >
> 
> For both cases, a quirk flag seems to be more reasonable, so that unrelated
> devices will not be impacted.
> 
> Best regards,
> baolu

Hi Baolu,
Thanks for your advice. And I modify the patch as follow.

On a system with an Intel PCIe port configured as a nvme host device, iommu
initialization fails with

DMAR: Device scope type does not match for :80:00.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):

but the device has a type 0 PCI header:
80:00.0 Class 0600: Device 8086:2020 (rev 06)
00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00

VT-d works perfectly on this system, so there's no reason to bail out
on initialization due to this apparent scope mismatch. Add the class
0x06 ("PCI_BASE_CLASS_BRIDGE") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: jimyan 

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index eecd6a421667..50c92eb23ee4 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -244,7 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
 info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
(scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
 (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
- info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
+ info->dev->class >> 16 != PCI_BASE_CLASS_BRIDGE))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
return -EINVAL;


Jim

___
iommu mai

[PATCH 1/1] iommu/vt-d: Add a quirk flag for scope mismatched devices

2019-12-23 Thread Lu Baolu
We expect devices with endpoint scope to have normal PCI headers,
and devices with bridge scope to have bridge PCI headers.  However
Some PCI devices may be listed in the DMAR table with bridge scope,
even though they have a normal PCI header. Add a quirk flag for
those special devices.

Cc: Roland Dreier 
Cc: Jim Yan 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/dmar.c | 37 +++--
 1 file changed, 23 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index fb30d5053664..fc24abc70a05 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -65,6 +65,26 @@ static void free_iommu(struct intel_iommu *iommu);
 
 extern const struct iommu_ops intel_iommu_ops;
 
+static int scope_mismatch_quirk;
+static void quirk_dmar_scope_mismatch(struct pci_dev *dev)
+{
+   pci_info(dev, "scope mismatch ignored\n");
+   scope_mismatch_quirk = 1;
+}
+
+/*
+ * We expect devices with endpoint scope to have normal PCI
+ * headers, and devices with bridge scope to have bridge PCI
+ * headers.  However some PCI devices may be listed in the
+ * DMAR table with bridge scope, even though they have a
+ * normal PCI header. We don't declare a socpe mismatch for
+ * below special cases.
+ */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2f0d,  /* NTB devices  */
+quirk_dmar_scope_mismatch);
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2020,  /* NVME host */
+quirk_dmar_scope_mismatch);
+
 static void dmar_register_drhd_unit(struct dmar_drhd_unit *drhd)
 {
/*
@@ -231,20 +251,9 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info 
*info,
if (!dmar_match_pci_path(info, scope->bus, path, level))
continue;
 
-   /*
-* We expect devices with endpoint scope to have normal PCI
-* headers, and devices with bridge scope to have bridge PCI
-* headers.  However PCI NTB devices may be listed in the
-* DMAR table with bridge scope, even though they have a
-* normal PCI header.  NTB devices are identified by class
-* "BRIDGE_OTHER" (0680h) - we don't declare a socpe mismatch
-* for this special case.
-*/
-   if ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT &&
-info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
-   (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
-(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
- info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
+   if (!scope_mismatch_quirk &&
+   ((scope->entry_type == ACPI_DMAR_SCOPE_TYPE_ENDPOINT) ^
+(info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
return -EINVAL;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: 答复: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope mismatch

2019-12-23 Thread Lu Baolu

Hi Jim,

On 2019/12/24 11:24, Jim,Yan wrote:

-邮件原件-
发件人: Lu Baolu [mailto:baolu...@linux.intel.com]
发送时间: 2019年12月23日 21:05
收件人: Jim,Yan ; Jerry Snitselaar 
抄送: iommu@lists.linux-foundation.org; linux-ker...@vger.kernel.org
主题: Re: 答复: [PATCH] iommu/vt-d: Don't reject nvme host due to scope
mismatch

Hi,

On 2019/12/23 15:59, Jim,Yan wrote:

-邮件原件-
发件人: Jerry Snitselaar [mailto:jsnit...@redhat.com]
发送时间: 2019年12月20日 17:23
收件人: Jim,Yan 
抄送: j...@8bytes.org; iommu@lists.linux-foundation.org;
linux-ker...@vger.kernel.org
主题: Re: [PATCH] iommu/vt-d: Don't reject nvme host due to scope
mismatch

On Fri Dec 20 19, jimyan wrote:

On a system with an Intel PCIe port configured as a nvme host
device, iommu initialization fails with

 DMAR: Device scope type does not match for :80:00.0

This is because the DMAR table reports this device as having scope 2
(ACPI_DMAR_SCOPE_TYPE_BRIDGE):



Isn't that a problem to be fixed in the DMAR table then?


but the device has a type 0 PCI header:
80:00.0 Class 0600: Device 8086:2020 (rev 06)
00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00

VT-d works perfectly on this system, so there's no reason to bail
out on initialization due to this apparent scope mismatch. Add the
class
0x600 ("PCI_CLASS_BRIDGE_HOST") as a heuristic for allowing DMAR
initialization for non-bridge PCI devices listed with scope bridge.

Signed-off-by: jimyan 
---
drivers/iommu/dmar.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c index
eecd6a421667..9faf2f0e0237 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -244,6 +244,7 @@ int dmar_insert_dev_scope(struct

dmar_pci_notify_info *info,

 info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
(scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE

&&

 (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
+ info->dev->class >> 8 != PCI_CLASS_BRIDGE_HOST &&
  info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
pr_warn("Device scope type does not match for %s\n",
pci_name(info->dev));
--
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Actually this patch is similar to the commit: ffb2d1eb88c3("iommu/vt-d: Don't

reject NTB devices due to scope mismatch"). Besides, modifying DMAR table
need OEM update BIOS. It is hard to implement.




For both cases, a quirk flag seems to be more reasonable, so that unrelated
devices will not be impacted.

Best regards,
baolu


Hi Baolu,
Thanks for your advice. And I modify the patch as follow.


I just posted a patch for both NTG and NVME cases. Can you please take a
look? Does it work for you?

Best regards,
baolu



 On a system with an Intel PCIe port configured as a nvme host device, iommu
 initialization fails with
 
 DMAR: Device scope type does not match for :80:00.0
 
 This is because the DMAR table reports this device as having scope 2

 (ACPI_DMAR_SCOPE_TYPE_BRIDGE):
 
 but the device has a type 0 PCI header:

 80:00.0 Class 0600: Device 8086:2020 (rev 06)
 00: 86 80 20 20 47 05 10 00 06 00 00 06 10 00 00 00
 10: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 20: 00 00 00 00 00 00 00 00 00 00 00 00 86 80 00 00
 30: 00 00 00 00 90 00 00 00 00 00 00 00 00 01 00 00
 
 VT-d works perfectly on this system, so there's no reason to bail out

 on initialization due to this apparent scope mismatch. Add the class
 0x06 ("PCI_BASE_CLASS_BRIDGE") as a heuristic for allowing DMAR
 initialization for non-bridge PCI devices listed with scope bridge.
 
 Signed-off-by: jimyan 


diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index eecd6a421667..50c92eb23ee4 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -244,7 +244,7 @@ int dmar_insert_dev_scope(struct dmar_pci_notify_info *info,
  info->dev->hdr_type != PCI_HEADER_TYPE_NORMAL) ||
 (scope->entry_type == ACPI_DMAR_SCOPE_TYPE_BRIDGE &&
  (info->dev->hdr_type == PCI_HEADER_TYPE_NORMAL &&
- info->dev->class >> 8 != PCI_CLASS_BRIDGE_OTHER))) {
+ info->dev->class >> 16 != PCI_BASE_CLASS_BRIDGE))) {
 pr_warn("Device scope type does not match for %s\n",
 pci_name(info->dev));
 return -EINVAL;


Jim


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH v5 1/9] iommu/vt-d: Identify domains using first level page table

2019-12-23 Thread Lu Baolu
This checks whether a domain should use the first level page
table for map/unmap and marks it in the domain structure.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 39 +
 1 file changed, 39 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 34723f6be672..71ad5e5feae2 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -307,6 +307,14 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_LOSE_CHILDREN  BIT(1)
 
+/*
+ * When VT-d works in the scalable mode, it allows DMA translation to
+ * happen through either first level or second level page table. This
+ * bit marks that the DMA translation for the domain goes through the
+ * first level page table, otherwise, it goes through the second level.
+ */
+#define DOMAIN_FLAG_USE_FIRST_LEVELBIT(2)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -1714,6 +1722,35 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 #endif
 }
 
+/*
+ * Check and return whether first level is used by default for
+ * DMA translation. Currently, we make it off by setting
+ * first_level_support = 0, and will change it to -1 after all
+ * map/unmap paths support first level page table.
+ */
+static bool first_level_by_default(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   static int first_level_support = 0;
+
+   if (likely(first_level_support != -1))
+   return first_level_support;
+
+   first_level_support = 1;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!sm_supported(iommu) || !ecap_flts(iommu->ecap)) {
+   first_level_support = 0;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return first_level_support;
+}
+
 static struct dmar_domain *alloc_domain(int flags)
 {
struct dmar_domain *domain;
@@ -1725,6 +1762,8 @@ static struct dmar_domain *alloc_domain(int flags)
memset(domain, 0, sizeof(*domain));
domain->nid = NUMA_NO_NODE;
domain->flags = flags;
+   if (first_level_by_default())
+   domain->flags |= DOMAIN_FLAG_USE_FIRST_LEVEL;
domain->has_iotlb_device = false;
INIT_LIST_HEAD(&domain->devices);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/9] iommu/vt-d: Add set domain DOMAIN_ATTR_NESTING attr

2019-12-23 Thread Lu Baolu
This adds the Intel VT-d specific callback of setting
DOMAIN_ATTR_NESTING domain attribution. It is necessary
to let the VT-d driver know that the domain represents
a virtual machine which requires the IOMMU hardware to
support nested translation mode. Return success if the
IOMMU hardware suports nested mode, otherwise failure.

Signed-off-by: Yi Sun 
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 56 +
 1 file changed, 56 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 71ad5e5feae2..35f65628202c 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -315,6 +315,12 @@ static int hw_pass_through = 1;
  */
 #define DOMAIN_FLAG_USE_FIRST_LEVELBIT(2)
 
+/*
+ * Domain represents a virtual machine which demands iommu nested
+ * translation mode support.
+ */
+#define DOMAIN_FLAG_NESTING_MODE   BIT(3)
+
 #define for_each_domain_iommu(idx, domain) \
for (idx = 0; idx < g_num_of_iommus; idx++) \
if (domain->iommu_refcnt[idx])
@@ -5640,6 +5646,24 @@ static inline bool iommu_pasid_support(void)
return ret;
 }
 
+static inline bool nested_mode_support(void)
+{
+   struct dmar_drhd_unit *drhd;
+   struct intel_iommu *iommu;
+   bool ret = true;
+
+   rcu_read_lock();
+   for_each_active_iommu(iommu, drhd) {
+   if (!sm_supported(iommu) || !ecap_nest(iommu->ecap)) {
+   ret = false;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   return ret;
+}
+
 static bool intel_iommu_capable(enum iommu_cap cap)
 {
if (cap == IOMMU_CAP_CACHE_COHERENCY)
@@ -6018,10 +6042,42 @@ static bool intel_iommu_is_attach_deferred(struct 
iommu_domain *domain,
return dev->archdata.iommu == DEFER_DEVICE_DOMAIN_INFO;
 }
 
+static int
+intel_iommu_domain_set_attr(struct iommu_domain *domain,
+   enum iommu_attr attr, void *data)
+{
+   struct dmar_domain *dmar_domain = to_dmar_domain(domain);
+   unsigned long flags;
+   int ret = 0;
+
+   if (domain->type != IOMMU_DOMAIN_UNMANAGED)
+   return -EINVAL;
+
+   switch (attr) {
+   case DOMAIN_ATTR_NESTING:
+   spin_lock_irqsave(&device_domain_lock, flags);
+   if (nested_mode_support() &&
+   list_empty(&dmar_domain->devices)) {
+   dmar_domain->flags |= DOMAIN_FLAG_NESTING_MODE;
+   dmar_domain->flags &= ~DOMAIN_FLAG_USE_FIRST_LEVEL;
+   } else {
+   ret = -ENODEV;
+   }
+   spin_unlock_irqrestore(&device_domain_lock, flags);
+   break;
+   default:
+   ret = -EINVAL;
+   break;
+   }
+
+   return ret;
+}
+
 const struct iommu_ops intel_iommu_ops = {
.capable= intel_iommu_capable,
.domain_alloc   = intel_iommu_domain_alloc,
.domain_free= intel_iommu_domain_free,
+   .domain_set_attr= intel_iommu_domain_set_attr,
.attach_dev = intel_iommu_attach_device,
.detach_dev = intel_iommu_detach_device,
.aux_attach_dev = intel_iommu_aux_attach_device,
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 4/9] iommu/vt-d: Setup pasid entries for iova over first level

2019-12-23 Thread Lu Baolu
Intel VT-d in scalable mode supports two types of page tables for
IOVA translation: first level and second level. The IOMMU driver
can choose one from both for IOVA translation according to the use
case. This sets up the pasid entry if a domain is selected to use
the first-level page table for iova translation.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 57 +
 include/linux/intel-iommu.h | 16 +++
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 35f65628202c..071cbc172ce8 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -571,6 +571,11 @@ static inline int domain_type_is_si(struct dmar_domain 
*domain)
return domain->flags & DOMAIN_FLAG_STATIC_IDENTITY;
 }
 
+static inline bool domain_use_first_level(struct dmar_domain *domain)
+{
+   return domain->flags & DOMAIN_FLAG_USE_FIRST_LEVEL;
+}
+
 static inline int domain_pfn_supported(struct dmar_domain *domain,
   unsigned long pfn)
 {
@@ -932,6 +937,8 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain 
*domain,
 
domain_flush_cache(domain, tmp_page, VTD_PAGE_SIZE);
pteval = ((uint64_t)virt_to_dma_pfn(tmp_page) << 
VTD_PAGE_SHIFT) | DMA_PTE_READ | DMA_PTE_WRITE;
+   if (domain_use_first_level(domain))
+   pteval |= DMA_FL_PTE_XD;
if (cmpxchg64(&pte->val, 0ULL, pteval))
/* Someone else set it while we were thinking; 
use theirs. */
free_pgtable_page(tmp_page);
@@ -2281,17 +2288,20 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
unsigned long sg_res = 0;
unsigned int largepage_lvl = 0;
unsigned long lvl_pages = 0;
+   u64 attr;
 
BUG_ON(!domain_pfn_supported(domain, iov_pfn + nr_pages - 1));
 
if ((prot & (DMA_PTE_READ|DMA_PTE_WRITE)) == 0)
return -EINVAL;
 
-   prot &= DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP;
+   attr = prot & (DMA_PTE_READ | DMA_PTE_WRITE | DMA_PTE_SNP);
+   if (domain_use_first_level(domain))
+   attr |= DMA_FL_PTE_PRESENT | DMA_FL_PTE_XD;
 
if (!sg) {
sg_res = nr_pages;
-   pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | prot;
+   pteval = ((phys_addr_t)phys_pfn << VTD_PAGE_SHIFT) | attr;
}
 
while (nr_pages > 0) {
@@ -2303,7 +2313,7 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
sg_res = aligned_nrpages(sg->offset, sg->length);
sg->dma_address = ((dma_addr_t)iov_pfn << 
VTD_PAGE_SHIFT) + pgoff;
sg->dma_length = sg->length;
-   pteval = (sg_phys(sg) - pgoff) | prot;
+   pteval = (sg_phys(sg) - pgoff) | attr;
phys_pfn = pteval >> VTD_PAGE_SHIFT;
}
 
@@ -2515,6 +2525,36 @@ dmar_search_domain_by_dev_info(int segment, int bus, int 
devfn)
return NULL;
 }
 
+static int domain_setup_first_level(struct intel_iommu *iommu,
+   struct dmar_domain *domain,
+   struct device *dev,
+   int pasid)
+{
+   int flags = PASID_FLAG_SUPERVISOR_MODE;
+   struct dma_pte *pgd = domain->pgd;
+   int agaw, level;
+
+   /*
+* Skip top levels of page tables for iommu which has
+* less agaw than default. Unnecessary for PT mode.
+*/
+   for (agaw = domain->agaw; agaw > iommu->agaw; agaw--) {
+   pgd = phys_to_virt(dma_pte_addr(pgd));
+   if (!dma_pte_present(pgd))
+   return -ENOMEM;
+   }
+
+   level = agaw_to_level(agaw);
+   if (level != 4 && level != 5)
+   return -EINVAL;
+
+   flags |= (level == 5) ? PASID_FLAG_FL5LP : 0;
+
+   return intel_pasid_setup_first_level(iommu, dev, (pgd_t *)pgd, pasid,
+domain->iommu_did[iommu->seq_id],
+flags);
+}
+
 static struct dmar_domain *dmar_insert_one_dev_info(struct intel_iommu *iommu,
int bus, int devfn,
struct device *dev,
@@ -2614,6 +2654,9 @@ static struct dmar_domain 
*dmar_insert_one_dev_info(struct intel_iommu *iommu,
if (hw_pass_through && domain_type_is_si(domain))
ret = intel_pasid_setup_pass_through(iommu, domain,
dev, PASID_RID2PASID);
+   else if (domain_use_first_level(domain))
+   ret = domain_setup_first_level(iommu, 

[PATCH v5 7/9] iommu/vt-d: Update first level super page capability

2019-12-23 Thread Lu Baolu
First-level translation may map input addresses to 4-KByte pages,
2-MByte pages, or 1-GByte pages. Support for 4-KByte pages and
2-Mbyte pages are mandatory for first-level translation. Hardware
support for 1-GByte page is reported through the FL1GP field in
the Capability Register.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 1ebf5ed460cf..34e619318f64 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -685,11 +685,12 @@ static int domain_update_iommu_snooping(struct 
intel_iommu *skip)
return ret;
 }
 
-static int domain_update_iommu_superpage(struct intel_iommu *skip)
+static int domain_update_iommu_superpage(struct dmar_domain *domain,
+struct intel_iommu *skip)
 {
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu;
-   int mask = 0xf;
+   int mask = 0x3;
 
if (!intel_iommu_superpage) {
return 0;
@@ -699,7 +700,13 @@ static int domain_update_iommu_superpage(struct 
intel_iommu *skip)
rcu_read_lock();
for_each_active_iommu(iommu, drhd) {
if (iommu != skip) {
-   mask &= cap_super_page_val(iommu->cap);
+   if (domain && domain_use_first_level(domain)) {
+   if (!cap_fl1gp_support(iommu->cap))
+   mask = 0x1;
+   } else {
+   mask &= cap_super_page_val(iommu->cap);
+   }
+
if (!mask)
break;
}
@@ -714,7 +721,7 @@ static void domain_update_iommu_cap(struct dmar_domain 
*domain)
 {
domain_update_iommu_coherency(domain);
domain->iommu_snooping = domain_update_iommu_snooping(NULL);
-   domain->iommu_superpage = domain_update_iommu_superpage(NULL);
+   domain->iommu_superpage = domain_update_iommu_superpage(domain, NULL);
 }
 
 struct context_entry *iommu_context_addr(struct intel_iommu *iommu, u8 bus,
@@ -4604,7 +4611,7 @@ static int intel_iommu_add(struct dmar_drhd_unit *dmaru)
iommu->name);
return -ENXIO;
}
-   sp = domain_update_iommu_superpage(iommu) - 1;
+   sp = domain_update_iommu_superpage(NULL, iommu) - 1;
if (sp >= 0 && !(cap_super_page_val(iommu->cap) & (1 << sp))) {
pr_warn("%s: Doesn't support large page.\n",
iommu->name);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 5/9] iommu/vt-d: Flush PASID-based iotlb for iova over first level

2019-12-23 Thread Lu Baolu
When software has changed first-level tables, it should invalidate
the affected IOTLB and the paging-structure-caches using the PASID-
based-IOTLB Invalidate Descriptor defined in spec 6.5.2.4.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/dmar.c| 41 +++
 drivers/iommu/intel-iommu.c | 56 +++--
 include/linux/intel-iommu.h |  2 ++
 3 files changed, 84 insertions(+), 15 deletions(-)

diff --git a/drivers/iommu/dmar.c b/drivers/iommu/dmar.c
index 3acfa6a25fa2..fb30d5053664 100644
--- a/drivers/iommu/dmar.c
+++ b/drivers/iommu/dmar.c
@@ -1371,6 +1371,47 @@ void qi_flush_dev_iotlb(struct intel_iommu *iommu, u16 
sid, u16 pfsid,
qi_submit_sync(&desc, iommu);
 }
 
+/* PASID-based IOTLB invalidation */
+void qi_flush_piotlb(struct intel_iommu *iommu, u16 did, u32 pasid, u64 addr,
+unsigned long npages, bool ih)
+{
+   struct qi_desc desc = {.qw2 = 0, .qw3 = 0};
+
+   /*
+* npages == -1 means a PASID-selective invalidation, otherwise,
+* a positive value for Page-selective-within-PASID invalidation.
+* 0 is not a valid input.
+*/
+   if (WARN_ON(!npages)) {
+   pr_err("Invalid input npages = %ld\n", npages);
+   return;
+   }
+
+   if (npages == -1) {
+   desc.qw0 = QI_EIOTLB_PASID(pasid) |
+   QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(QI_GRAN_NONG_PASID) |
+   QI_EIOTLB_TYPE;
+   desc.qw1 = 0;
+   } else {
+   int mask = ilog2(__roundup_pow_of_two(npages));
+   unsigned long align = (1ULL << (VTD_PAGE_SHIFT + mask));
+
+   if (WARN_ON_ONCE(!ALIGN(addr, align)))
+   addr &= ~(align - 1);
+
+   desc.qw0 = QI_EIOTLB_PASID(pasid) |
+   QI_EIOTLB_DID(did) |
+   QI_EIOTLB_GRAN(QI_GRAN_PSI_PASID) |
+   QI_EIOTLB_TYPE;
+   desc.qw1 = QI_EIOTLB_ADDR(addr) |
+   QI_EIOTLB_IH(ih) |
+   QI_EIOTLB_AM(mask);
+   }
+
+   qi_submit_sync(&desc, iommu);
+}
+
 /*
  * Disable Queued Invalidation interface.
  */
diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 071cbc172ce8..54db6bc0b281 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1509,6 +1509,20 @@ static void iommu_flush_dev_iotlb(struct dmar_domain 
*domain,
spin_unlock_irqrestore(&device_domain_lock, flags);
 }
 
+static void domain_flush_piotlb(struct intel_iommu *iommu,
+   struct dmar_domain *domain,
+   u64 addr, unsigned long npages, bool ih)
+{
+   u16 did = domain->iommu_did[iommu->seq_id];
+
+   if (domain->default_pasid)
+   qi_flush_piotlb(iommu, did, domain->default_pasid,
+   addr, npages, ih);
+
+   if (!list_empty(&domain->devices))
+   qi_flush_piotlb(iommu, did, PASID_RID2PASID, addr, npages, ih);
+}
+
 static void iommu_flush_iotlb_psi(struct intel_iommu *iommu,
  struct dmar_domain *domain,
  unsigned long pfn, unsigned int pages,
@@ -1522,18 +1536,23 @@ static void iommu_flush_iotlb_psi(struct intel_iommu 
*iommu,
 
if (ih)
ih = 1 << 6;
-   /*
-* Fallback to domain selective flush if no PSI support or the size is
-* too big.
-* PSI requires page size to be 2 ^ x, and the base address is naturally
-* aligned to the size
-*/
-   if (!cap_pgsel_inv(iommu->cap) || mask > cap_max_amask_val(iommu->cap))
-   iommu->flush.flush_iotlb(iommu, did, 0, 0,
-   DMA_TLB_DSI_FLUSH);
-   else
-   iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
-   DMA_TLB_PSI_FLUSH);
+
+   if (domain_use_first_level(domain)) {
+   domain_flush_piotlb(iommu, domain, addr, pages, ih);
+   } else {
+   /*
+* Fallback to domain selective flush if no PSI support or
+* the size is too big. PSI requires page size to be 2 ^ x,
+* and the base address is naturally aligned to the size.
+*/
+   if (!cap_pgsel_inv(iommu->cap) ||
+   mask > cap_max_amask_val(iommu->cap))
+   iommu->flush.flush_iotlb(iommu, did, 0, 0,
+   DMA_TLB_DSI_FLUSH);
+   else
+   iommu->flush.flush_iotlb(iommu, did, addr | ih, mask,
+   DMA_TLB_PSI_FLUSH);
+   }
 
/*
 * In caching mode, changes of pages fr

[PATCH v5 9/9] iommu/vt-d: debugfs: Add support to show page table internals

2019-12-23 Thread Lu Baolu
Export page table internals of the domain attached to each device.
Example of such dump on a Skylake machine:

$ sudo cat /sys/kernel/debug/iommu/intel/domain_translation_struct
[ ... ]
Device :00:14.0 with pasid 0 @0x15f3d9000
IOVA_PFNPML5E   PML4E
0x8ced0 |   0x  0x00015f3da003
0x8ced1 |   0x  0x00015f3da003
0x8ced2 |   0x  0x00015f3da003
0x8ced3 |   0x  0x00015f3da003
0x8ced4 |   0x  0x00015f3da003
0x8ced5 |   0x  0x00015f3da003
0x8ced6 |   0x  0x00015f3da003
0x8ced7 |   0x  0x00015f3da003
0x8ced8 |   0x  0x00015f3da003
0x8ced9 |   0x  0x00015f3da003

PDPEPDE PTE
0x00015f3db003  0x00015f3dc003  0x8ced0003
0x00015f3db003  0x00015f3dc003  0x8ced1003
0x00015f3db003  0x00015f3dc003  0x8ced2003
0x00015f3db003  0x00015f3dc003  0x8ced3003
0x00015f3db003  0x00015f3dc003  0x8ced4003
0x00015f3db003  0x00015f3dc003  0x8ced5003
0x00015f3db003  0x00015f3dc003  0x8ced6003
0x00015f3db003  0x00015f3dc003  0x8ced7003
0x00015f3db003  0x00015f3dc003  0x8ced8003
0x00015f3db003  0x00015f3dc003  0x8ced9003
[ ... ]

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu-debugfs.c | 75 +
 drivers/iommu/intel-iommu.c |  4 +-
 include/linux/intel-iommu.h |  2 +
 3 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu-debugfs.c 
b/drivers/iommu/intel-iommu-debugfs.c
index 471f05d452e0..c1257bef553c 100644
--- a/drivers/iommu/intel-iommu-debugfs.c
+++ b/drivers/iommu/intel-iommu-debugfs.c
@@ -5,6 +5,7 @@
  * Authors: Gayatri Kammela 
  * Sohil Mehta 
  * Jacob Pan 
+ * Lu Baolu 
  */
 
 #include 
@@ -283,6 +284,77 @@ static int dmar_translation_struct_show(struct seq_file 
*m, void *unused)
 }
 DEFINE_SHOW_ATTRIBUTE(dmar_translation_struct);
 
+static inline unsigned long level_to_directory_size(int level)
+{
+   return BIT_ULL(VTD_PAGE_SHIFT + VTD_STRIDE_SHIFT * (level - 1));
+}
+
+static inline void
+dump_page_info(struct seq_file *m, unsigned long iova, u64 *path)
+{
+   seq_printf(m, "0x%013lx 
|\t0x%016llx\t0x%016llx\t0x%016llx\t0x%016llx\t0x%016llx\n",
+  iova >> VTD_PAGE_SHIFT, path[5], path[4],
+  path[3], path[2], path[1]);
+}
+
+static void pgtable_walk_level(struct seq_file *m, struct dma_pte *pde,
+  int level, unsigned long start,
+  u64 *path)
+{
+   int i;
+
+   if (level > 5 || level < 1)
+   return;
+
+   for (i = 0; i < BIT_ULL(VTD_STRIDE_SHIFT);
+   i++, pde++, start += level_to_directory_size(level)) {
+   if (!dma_pte_present(pde))
+   continue;
+
+   path[level] = pde->val;
+   if (dma_pte_superpage(pde) || level == 1)
+   dump_page_info(m, start, path);
+   else
+   pgtable_walk_level(m, phys_to_virt(dma_pte_addr(pde)),
+  level - 1, start, path);
+   path[level] = 0;
+   }
+}
+
+static int show_device_domain_translation(struct device *dev, void *data)
+{
+   struct dmar_domain *domain = find_domain(dev);
+   struct seq_file *m = data;
+   u64 path[6] = { 0 };
+
+   if (!domain)
+   return 0;
+
+   seq_printf(m, "Device %s with pasid %d @0x%llx\n",
+  dev_name(dev), domain->default_pasid,
+  (u64)virt_to_phys(domain->pgd));
+   seq_puts(m, 
"IOVA_PFN\t\tPML5E\t\t\tPML4E\t\t\tPDPE\t\t\tPDE\t\t\tPTE\n");
+
+   pgtable_walk_level(m, domain->pgd, domain->agaw + 2, 0, path);
+   seq_putc(m, '\n');
+
+   return 0;
+}
+
+static int domain_translation_struct_show(struct seq_file *m, void *unused)
+{
+   unsigned long flags;
+   int ret;
+
+   spin_lock_irqsave(&device_domain_lock, flags);
+   ret = bus_for_each_dev(&pci_bus_type, NULL, m,
+  show_device_domain_translation);
+   spin_unlock_irqrestore(&device_domain_lock, flags);
+
+   return ret;
+}
+DEFINE_SHOW_ATTRIBUTE(domain_translation_struct);
+
 #ifdef CONFIG_IRQ_REMAP
 static void ir_tbl_remap_entry_show(struct seq_file *m,
struct intel_iommu *iommu)
@@ -396,6 +468,9 @@ void __init intel_iommu_debugfs_init(void)
   

[PATCH v5 6/9] iommu/vt-d: Make first level IOVA canonical

2019-12-23 Thread Lu Baolu
First-level translation restricts the input-address to a canonical
address (i.e., address bits 63:N have the same value as address
bit [N-1], where N is 48-bits with 4-level paging and 57-bits with
5-level paging). (section 3.6 in the spec)

This makes first level IOVA canonical by using IOVA with bit [N-1]
always cleared.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 54db6bc0b281..1ebf5ed460cf 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3505,8 +3505,21 @@ static unsigned long intel_alloc_iova(struct device *dev,
 {
unsigned long iova_pfn;
 
-   /* Restrict dma_mask to the width that the iommu can handle */
-   dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw), dma_mask);
+   /*
+* Restrict dma_mask to the width that the iommu can handle.
+* First-level translation restricts the input-address to a
+* canonical address (i.e., address bits 63:N have the same
+* value as address bit [N-1], where N is 48-bits with 4-level
+* paging and 57-bits with 5-level paging). Hence, skip bit
+* [N-1].
+*/
+   if (domain_use_first_level(domain))
+   dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw - 1),
+dma_mask);
+   else
+   dma_mask = min_t(uint64_t, DOMAIN_MAX_ADDR(domain->gaw),
+dma_mask);
+
/* Ensure we reserve the whole size-aligned region */
nrpages = __roundup_pow_of_two(nrpages);
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 3/9] iommu/vt-d: Add PASID_FLAG_FL5LP for first-level pasid setup

2019-12-23 Thread Lu Baolu
Current intel_pasid_setup_first_level() use 5-level paging for
first level translation if CPUs use 5-level paging mode too.
This makes sense for SVA usages since the page table is shared
between CPUs and IOMMUs. But it makes no sense if we only want
to use first level for IOVA translation. Add PASID_FLAG_FL5LP
bit in the flags which indicates whether the 5-level paging
mode should be used.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-pasid.c | 7 ++-
 drivers/iommu/intel-pasid.h | 6 ++
 drivers/iommu/intel-svm.c   | 8 ++--
 3 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/intel-pasid.c b/drivers/iommu/intel-pasid.c
index 3cb569e76642..22b30f10b396 100644
--- a/drivers/iommu/intel-pasid.c
+++ b/drivers/iommu/intel-pasid.c
@@ -477,18 +477,15 @@ int intel_pasid_setup_first_level(struct intel_iommu 
*iommu,
pasid_set_sre(pte);
}
 
-#ifdef CONFIG_X86
-   /* Both CPU and IOMMU paging mode need to match */
-   if (cpu_feature_enabled(X86_FEATURE_LA57)) {
+   if (flags & PASID_FLAG_FL5LP) {
if (cap_5lp_support(iommu->cap)) {
pasid_set_flpm(pte, 1);
} else {
-   pr_err("VT-d has no 5-level paging support for CPU\n");
+   pr_err("No 5-level paging support for first-level\n");
pasid_clear_entry(pte);
return -EINVAL;
}
}
-#endif /* CONFIG_X86 */
 
pasid_set_domain_id(pte, did);
pasid_set_address_width(pte, iommu->agaw);
diff --git a/drivers/iommu/intel-pasid.h b/drivers/iommu/intel-pasid.h
index fc8cd8f17de1..92de6df24ccb 100644
--- a/drivers/iommu/intel-pasid.h
+++ b/drivers/iommu/intel-pasid.h
@@ -37,6 +37,12 @@
  */
 #define PASID_FLAG_SUPERVISOR_MODE BIT(0)
 
+/*
+ * The PASID_FLAG_FL5LP flag Indicates using 5-level paging for first-
+ * level translation, otherwise, 4-level paging will be used.
+ */
+#define PASID_FLAG_FL5LP   BIT(1)
+
 struct pasid_dir_entry {
u64 val;
 };
diff --git a/drivers/iommu/intel-svm.c b/drivers/iommu/intel-svm.c
index 04023033b79f..d7f2a5358900 100644
--- a/drivers/iommu/intel-svm.c
+++ b/drivers/iommu/intel-svm.c
@@ -364,7 +364,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
ret = intel_pasid_setup_first_level(iommu, dev,
mm ? mm->pgd : init_mm.pgd,
svm->pasid, FLPT_DEFAULT_DID,
-   mm ? 0 : PASID_FLAG_SUPERVISOR_MODE);
+   (mm ? 0 : PASID_FLAG_SUPERVISOR_MODE) |
+   (cpu_feature_enabled(X86_FEATURE_LA57) ?
+PASID_FLAG_FL5LP : 0));
spin_unlock(&iommu->lock);
if (ret) {
if (mm)
@@ -385,7 +387,9 @@ int intel_svm_bind_mm(struct device *dev, int *pasid, int 
flags, struct svm_dev_
ret = intel_pasid_setup_first_level(iommu, dev,
mm ? mm->pgd : init_mm.pgd,
svm->pasid, FLPT_DEFAULT_DID,
-   mm ? 0 : 
PASID_FLAG_SUPERVISOR_MODE);
+   (mm ? 0 : 
PASID_FLAG_SUPERVISOR_MODE) |
+   
(cpu_feature_enabled(X86_FEATURE_LA57) ?
+   PASID_FLAG_FL5LP : 0));
spin_unlock(&iommu->lock);
if (ret) {
kfree(sdev);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 0/9] Use 1st-level for IOVA translation

2019-12-23 Thread Lu Baolu
Intel VT-d in scalable mode supports two types of page tables
for DMA translation: the first level page table and the second
level page table. The first level page table uses the same
format as the CPU page table, while the second level page table
keeps compatible with previous formats. The software is able
to choose any one of them for DMA remapping according to the use
case.

This patchset aims to move IOVA (I/O Virtual Address) translation
to 1st-level page table in scalable mode. This will simplify vIOMMU
(IOMMU simulated by VM hypervisor) design by using the two-stage
translation, a.k.a. nested mode translation.

As Intel VT-d architecture offers caching mode, guest IOVA (GIOVA)
support is currently implemented in a shadow page manner. The device
simulation software, like QEMU, has to figure out GIOVA->GPA mappings
and write them to a shadowed page table, which will be used by the
physical IOMMU. Each time when mappings are created or destroyed in
vIOMMU, the simulation software has to intervene. Hence, the changes
on GIOVA->GPA could be shadowed to host.


 .---.
 |  vIOMMU   |
 |---| ..
 |   |IOTLB flush trap |QEMU|
 .---. (map/unmap) ||
 |GIOVA->GPA |>|..  |
 '---' || GIOVA->HPA |  |
 |   | |''  |
 '---' ||
   ||
   ''
|
<
|
v VFIO/IOMMU API
  .---.
  |  pIOMMU   |
  |---|
  |   |
  .---.
  |GIOVA->HPA |
  '---'
  |   |
  '---'

In VT-d 3.0, scalable mode is introduced, which offers two-level
translation page tables and nested translation mode. Regards to
GIOVA support, it can be simplified by 1) moving the GIOVA support
over 1st-level page table to store GIOVA->GPA mapping in vIOMMU,
2) binding vIOMMU 1st level page table to the pIOMMU, 3) using pIOMMU
second level for GPA->HPA translation, and 4) enable nested (a.k.a.
dual-stage) translation in host. Compared with current shadow GIOVA
support, the new approach makes the vIOMMU design simpler and more
efficient as we only need to flush the pIOMMU IOTLB and possible
device-IOTLB when an IOVA mapping in vIOMMU is torn down.

 .---.
 |  vIOMMU   |
 |---| .---.
 |   |IOTLB flush trap |   QEMU|
 .---.(unmap)  |---|
 |GIOVA->GPA |>|   |
 '---' '---'
 |   |   |
 '---'   |
   <--
   |  VFIO/IOMMU  
   |  cache invalidation and  
   | guest gpd bind interfaces
   v
 .---.
 |  pIOMMU   |
 |---|
 .---.
 |GIOVA->GPA |<---First level
 '---'
 | GPA->HPA  |<---Scond level
 '---'
 '---'

This patch applies the first level page table for IOVA translation
unless the DOMAIN_ATTR_NESTING domain attribution has been set.
Setting of this attribution means the second level will be used to
map gPA (guest physical address) to hPA (host physical address), and
the mappings between gVA (guest virtual address) and gPA will be
maintained by the guest with the page table address binding to host's
first level.

Based-on-idea-by: Ashok Raj 
Based-on-idea-by: Kevin Tian 
Based-on-idea-by: Liu Yi L 
Based-on-idea-by: Jacob Pan 
Based-on-idea-by: Sanjay Kumar 
Based-on-idea-by: Lu Baolu 

Change log:
v4->v5:
 - The previous version was posted here
   https://lkml.org/lkml/2019/12/18/1371
 - Set Execute Disable in first level page directory entries.
 - Make first level IOVA canonical.
 - Update first level super page capability.

v3->v4:
 - The previous version was posted here
   https://lkml.org/lkml/2019/12/10/2126
 - Set Execute Disable (bit 63) in first level table entries.
 - Enhance pasid-based iotlb invalidation for both default domain
   and auxiliary domain.
 - Add debugfs file to expose page table internals.

v2->v3:
 - The previous version was posted here
   https://lkml.org/lkml/2019/11/27/1831
 - Accept Jacob's suggestion on merging two page tables.

 v1->v2
 - The first series was posted here
   https://lkml.org/lkml/2019/9/23/297
 - Use per domain page table ops to handle different page tables.
 - Use first level for DMA remapping by default on both bare metal
   and vm guest.
 - Code refine according to code review comments for v1.

Lu Baolu (9):
  iommu/vt-d: Identify domains using first level 

[PATCH v5 8/9] iommu/vt-d: Use iova over first level

2019-12-23 Thread Lu Baolu
After we make all map/unmap paths support first level page table.
Let's turn it on if hardware supports scalable mode.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 34e619318f64..51d60bad0b1d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1770,15 +1770,13 @@ static void free_dmar_iommu(struct intel_iommu *iommu)
 
 /*
  * Check and return whether first level is used by default for
- * DMA translation. Currently, we make it off by setting
- * first_level_support = 0, and will change it to -1 after all
- * map/unmap paths support first level page table.
+ * DMA translation.
  */
 static bool first_level_by_default(void)
 {
struct dmar_drhd_unit *drhd;
struct intel_iommu *iommu;
-   static int first_level_support = 0;
+   static int first_level_support = -1;
 
if (likely(first_level_support != -1))
return first_level_support;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu