Re: [PATCH V7 0/3] iommu: Add support to change default domain of an iommu group

2020-09-25 Thread Joerg Roedel
Hi Ashok,

On Thu, Sep 24, 2020 at 10:21:48AM -0700, Raj, Ashok wrote:
> Just trying to followup on this series.
> 
> Sai has moved out of Intel, hence I'm trying to followup on his behalf.
> 
> Let me know if you have queued this for the next release.

Not yet, but I think this is mostly ready. Can you please send a new
version in a new mail thread so that I can pick it up with b4?

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-25 Thread Jean-Philippe Brucker
On Thu, Sep 24, 2020 at 10:22:03AM -0500, Bjorn Helgaas wrote:
> On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:
> > Platforms without device-tree nor ACPI can provide a topology
> > description embedded into the virtio config space. Parse it.
> > 
> > Use PCI FIXUP to probe the config space early, because we need to
> > discover the topology before any DMA configuration takes place, and the
> > virtio driver may be loaded much later. Since we discover the topology
> > description when probing the PCI hierarchy, the virtual IOMMU cannot
> > manage other platform devices discovered earlier.
> 
> > +struct viommu_cap_config {
> > +   u8 bar;
> > +   u32 length; /* structure size */
> > +   u32 offset; /* structure offset within the bar */
> 
> s/the bar/the BAR/ (to match comment below).
> 
> > +static void viommu_pci_parse_topology(struct pci_dev *dev)
> > +{
> > +   int ret;
> > +   u32 features;
> > +   void __iomem *regs, *common_regs;
> > +   struct viommu_cap_config cap = {0};
> > +   struct virtio_pci_common_cfg __iomem *common_cfg;
> > +
> > +   /*
> > +* The virtio infrastructure might not be loaded at this point. We need
> > +* to access the BARs ourselves.
> > +*/
> > +   ret = viommu_pci_find_capability(dev, VIRTIO_PCI_CAP_COMMON_CFG, &cap);
> > +   if (!ret) {
> > +   pci_warn(dev, "common capability not found\n");
> 
> Is the lack of this capability really an error, i.e., is this
> pci_warn() or pci_info()?  The "device doesn't have topology
> description" below is only pci_dbg(), which suggests that we can live
> without this.

At this point we know that this is a (modern) virtio-pci device which,
according to the virtio 1.0 specification, must have this capability. So
this is definitely an error, but the topology description is an optional
feature.

> 
> Maybe a hint about what "common capability" means?

Yes, "virtio-pci common configuration capability" would be more
appropriate

> 
> > +   return;
> > +   }
> > +
> > +   if (pci_enable_device_mem(dev))
> > +   return;
> > +
> > +   common_regs = pci_iomap(dev, cap.bar, 0);
> > +   if (!common_regs)
> > +   return;
> > +
> > +   common_cfg = common_regs + cap.offset;
> > +
> > +   /* Perform the init sequence before we can read the config */
> > +   ret = viommu_pci_reset(common_cfg);
> 
> I guess this is some special device-specific reset, not any kind of
> standard PCI reset?

Yes it's the virtio reset - writing 0 to the status register in the BAR.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Jean-Philippe Brucker
On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> Add a topology description to the virtio-iommu driver and enable x86
> platforms.
> 
> Since [v2] we have made some progress on adding ACPI support for
> virtio-iommu, which is the preferred boot method on x86. It will be a
> new vendor-agnostic table describing para-virtual topologies in a
> minimal format. However some platforms don't use either ACPI or DT for
> booting (for example microvm), and will need the alternative topology
> description method proposed here. In addition, since the process to get
> a new ACPI table will take a long time, this provides a boot method even
> to ACPI-based platforms, if only temporarily for testing and
> development.
> 
> v3:
> * Add patch 1 that moves virtio-iommu to a subfolder.
> * Split the rest:
>   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> support.
>   * Patch 4 adds definitions.
>   * Patch 5 adds parser in topology.c.
> * Address other comments.
> 
> Linux and QEMU patches available at:
> https://jpbrucker.net/git/linux virtio-iommu/devel
> https://jpbrucker.net/git/qemu virtio-iommu/devel

I'm parking this work again, until we make progress on the ACPI table, or
until a platform without ACPI and DT needs it. Until then, I've pushed v4
to my virtio-iommu/topo branch and will keep it rebased on master.

Thanks,
Jean

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-25 Thread Jean-Philippe Brucker
On Thu, Sep 24, 2020 at 12:24:19PM -0700, Jacob Pan wrote:
> IOMMU user APIs are responsible for processing user data. This patch
> changes the interface such that user pointers can be passed into IOMMU
> code directly. Separate kernel APIs without user pointers are introduced
> for in-kernel users of the UAPI functionality.
> 
> IOMMU UAPI data has a user filled argsz field which indicates the data
> length of the structure. User data is not trusted, argsz must be
> validated based on the current kernel data size, mandatory data size,
> and feature flags.
> 
> User data may also be extended, resulting in possible argsz increase.
> Backward compatibility is ensured based on size and flags (or
> the functional equivalent fields) checking.
> 
> This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> reserved/unused fields in padding, flags, and version are also checked.
> Details are documented in Documentation/userspace-api/iommu.rst
> 
> Signed-off-by: Liu Yi L 
> Signed-off-by: Jacob Pan 

Reviewed-by: Jean-Philippe Brucker 

Some comments below in case you're resending, but nothing important.

> ---
>  drivers/iommu/iommu.c  | 199 
> +++--
>  include/linux/iommu.h  |  28 +--
>  include/uapi/linux/iommu.h |   1 +
>  3 files changed, 212 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 4ae02291ccc2..5c1b7ae48aae 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -1961,34 +1961,219 @@ int iommu_attach_device(struct iommu_domain *domain, 
> struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(iommu_attach_device);
>  
> +/*
> + * Check flags and other user provided data for valid combinations. We also
> + * make sure no reserved fields or unused flags are set. This is to ensure
> + * not breaking userspace in the future when these fields or flags are used.
> + */
> +static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
> *info)
> +{
> + u32 mask;
> + int i;
> +
> + if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> + return -EINVAL;
> +
> + mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> + if (info->cache & ~mask)
> + return -EINVAL;
> +
> + if (info->granularity >= IOMMU_INV_GRANU_NR)
> + return -EINVAL;
> +
> + switch (info->granularity) {
> + case IOMMU_INV_GRANU_ADDR:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
> + return -EINVAL;
> +
> + mask = IOMMU_INV_ADDR_FLAGS_PASID |
> + IOMMU_INV_ADDR_FLAGS_ARCHID |
> + IOMMU_INV_ADDR_FLAGS_LEAF;
> +
> + if (info->granu.addr_info.flags & ~mask)
> + return -EINVAL;
> + break;
> + case IOMMU_INV_GRANU_PASID:
> + mask = IOMMU_INV_PASID_FLAGS_PASID |
> + IOMMU_INV_PASID_FLAGS_ARCHID;
> + if (info->granu.pasid_info.flags & ~mask)
> + return -EINVAL;
> +
> + break;
> + case IOMMU_INV_GRANU_DOMAIN:
> + if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
> + return -EINVAL;
> + break;
> + default:
> + return -EINVAL;
> + }
> +
> + /* Check reserved padding fields */
> + for (i = 0; i < sizeof(info->padding); i++) {
> + if (info->padding[i])
> + return -EINVAL;
> + }
> +
> + return 0;
> +}
> +
>  int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
> *dev,
> - struct iommu_cache_invalidate_info *inv_info)
> + void __user *uinfo)
>  {
> + struct iommu_cache_invalidate_info inv_info = { 0 };
> + u32 minsz;
> + int ret = 0;

nit: no need to initialize it

> +
>   if (unlikely(!domain->ops->cache_invalidate))
>   return -ENODEV;
>  
> - return domain->ops->cache_invalidate(domain, dev, inv_info);
> + /*
> +  * No new spaces can be added before the variable sized union, the
> +  * minimum size is the offset to the union.
> +  */
> + minsz = offsetof(struct iommu_cache_invalidate_info, granu);

Why not use offsetofend() to avoid naming the unions?

> +
> + /* Copy minsz from user to get flags and argsz */
> + if (copy_from_user(&inv_info, uinfo, minsz))
> + return -EFAULT;
> +
> + /* Fields before variable size union is mandatory */
> + if (inv_info.argsz < minsz)
> + return -EINVAL;
> +
> + /* PASID and address granu require additional info beyond minsz */
> + if (inv_info.argsz == minsz &&
> + ((inv_info.granularity == IOMMU_INV_GRANU_PASID) ||
> + (inv_info.granularity == IOMMU_INV_GRANU_ADDR)))
> + return -EINVAL;

Made redundant by the two checks below

> +
> + if (inv_info.granularity == IOMMU_INV_GRANU_P

[PATCH 2/2] iommu: avoid taking iova_rbtree_lock twice

2020-09-25 Thread John Garry
From: Cong Wang 

Both find_iova() and __free_iova() take iova_rbtree_lock,
there is no reason to take and release it twice inside
free_iova().

Fold them into one critical section by calling the unlock
versions instead.

Signed-off-by: Cong Wang 
Reviewed-by: Robin Murphy 
Tested-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 05e0b462e0d9..921e80f64ae5 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -390,10 +390,14 @@ EXPORT_SYMBOL_GPL(__free_iova);
 void
 free_iova(struct iova_domain *iovad, unsigned long pfn)
 {
-   struct iova *iova = find_iova(iovad, pfn);
+   unsigned long flags;
+   struct iova *iova;
 
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+   iova = private_find_iova(iovad, pfn);
if (iova)
-   __free_iova(iovad, iova);
+   private_free_iova(iovad, iova);
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
 
 }
 EXPORT_SYMBOL_GPL(free_iova);
-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

2020-09-25 Thread John Garry
Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.

To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.

At a certain point, a depot may become full, and also some CPU rcaches may
also be full when we try to insert another IOVA. For this scenario,
currently we free the "loaded" CPU rcache and create a new one. This
free'ing means that we need to free many IOVAs in the RB tree, which
makes IO throughput performance fall off a cliff in our storage scenario:

Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]

And continue in this fashion, without recovering. Note that in this
example we had to wait 16 hours for this to occur. Also note that IO
throughput also becomes gradually becomes more unstable leading up to this
point.

As a solution this issue, we judge that the IOVA rcaches have grown too
big, and just flush all the CPUs rcaches instead.

The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.

In future, some IOVA rcache compaction could be implemented to solve the
instabilty issue, which I figure could be quite complex to implement.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

Reported-by: Xiang Chen 
Tested-by: Xiang Chen 
Signed-off-by: John Garry 
---
 drivers/iommu/iova.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 45a251da5453..05e0b462e0d9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -892,9 +892,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 struct iova_rcache *rcache,
 unsigned long iova_pfn)
 {
-   struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
-   bool can_insert = false;
+   bool can_insert = false, flush = false;
unsigned long flags;
 
cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
@@ -913,13 +912,19 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
+   can_insert = true;
+   cpu_rcache->loaded = new_mag;
} else {
-   mag_to_free = cpu_rcache->loaded;
+   /*
+* The depot is full, meaning that a very large
+* cache of IOVAs has built up, which slows
+* down RB tree accesses significantly
+* -> let's flush at this point.
+*/
+   flush = true;
+   iova_magazine_free(new_mag);
}
spin_unlock(&rcache->lock);
-
-   cpu_rcache->loaded = new_mag;
-   can_insert = true;
}
}
 
@@ -928,9 +933,11 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 
spin_unlock_irqrestore(&cpu_rcache->lock, flags);
 
-   if (mag_to_free) {
-   iova_magazine_free_pfns(mag_to_free, iovad);
-   iova_magazine_free(mag_to_free);
+   if (flush)

Re: [PATCH 02/13] iommu: amd: Prepare for generic IO page table framework

2020-09-25 Thread Suravee Suthikulpanit

Robin,

On 9/24/20 7:25 PM, Robin Murphy wrote:

+struct io_pgtable_ops *amd_iommu_setup_io_pgtable_ops(struct iommu_dev_data 
*dev_data,
+ struct protection_domain *domain)
+{
+domain->iop.pgtbl_cfg = (struct io_pgtable_cfg) {
+.pgsize_bitmap= AMD_IOMMU_PGSIZES,
+.ias= IOMMU_IN_ADDR_BIT_SIZE,
+.oas= IOMMU_OUT_ADDR_BIT_SIZE,
+.coherent_walk= false,


Is that right? Given that you seem to use regular kernel addresses for pagetable pages and don't have any obvious cache 
maintenance around PTE manipulation, I suspect not ;)

> It's fair enough if your implementation doesn't use this and simply assumes 
coherency, but in that case it would be less
confusing to have the driver set it to true for the sake of honesty, or just leave it out 
entirely - explicitly setting false gives the illusion of being meaningful.


AMD IOMMU can be configured to disable snoop for page table walk of a particular device (DTE[SD]=1). However, the 
current Linux driver does not set this bit, which should assume coherency. We can just leaving this out for now. I can 
remove this when I send out V2 along w/ other changes.


Otherwise, the io-pgtable parts all look OK to me - it's nice to finally 
fulfil the original intent of not being an Arm-specific thing :D


Robin.


Thanks,
Suravee
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/2] iommu/iova: Solve longterm IOVA issue

2020-09-25 Thread John Garry
This series contains a patch to solve the longterm IOVA issue which
leizhen originally tried to address at [0].

I also included the small optimisation from Cong Wang, which never seems
to be have been accepted [1]. There was some debate of the other patches
in that series, but this one is quite straightforward.

@Cong Wang, Please resend your series if prefer I didn't upstream your
patch.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/
[1] 
https://lore.kernel.org/linux-iommu/4b74d40a-22d1-af53-fcb6-5d7018370...@huawei.com/

Cong Wang (1):
  iommu: avoid taking iova_rbtree_lock twice

John Garry (1):
  iommu/iova: Flush CPU rcache for when a depot fills

 drivers/iommu/iova.c | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

-- 
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Thu, Sep 24, 2020 at 02:50:46PM +0200, Joerg Roedel wrote:
> On Thu, Sep 24, 2020 at 08:41:21AM -0400, Michael S. Tsirkin wrote:
> > But this has nothing to do with Linux.  There is also no guarantee that
> > the two committees will decide to use exactly the same format. Once one
> > of them sets the format in stone, we can add support for that format to
> > linux. If another one is playing nice and uses the same format, we can
> > use the same parsers. If it doesn't linux will have to follow suit.
> 
> Or Linux decides to support only one of the formats, which would then be
> ACPI.
> 
> Regards,
> 
>   Joerg

It's really up to hypervisors not guests, linux as a guest can for sure
refuse to work somewhere, but that's normally not very attractive.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > Add a topology description to the virtio-iommu driver and enable x86
> > platforms.
> > 
> > Since [v2] we have made some progress on adding ACPI support for
> > virtio-iommu, which is the preferred boot method on x86. It will be a
> > new vendor-agnostic table describing para-virtual topologies in a
> > minimal format. However some platforms don't use either ACPI or DT for
> > booting (for example microvm), and will need the alternative topology
> > description method proposed here. In addition, since the process to get
> > a new ACPI table will take a long time, this provides a boot method even
> > to ACPI-based platforms, if only temporarily for testing and
> > development.
> > 
> > v3:
> > * Add patch 1 that moves virtio-iommu to a subfolder.
> > * Split the rest:
> >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > support.
> >   * Patch 4 adds definitions.
> >   * Patch 5 adds parser in topology.c.
> > * Address other comments.
> > 
> > Linux and QEMU patches available at:
> > https://jpbrucker.net/git/linux virtio-iommu/devel
> > https://jpbrucker.net/git/qemu virtio-iommu/devel
> 
> I'm parking this work again, until we make progress on the ACPI table, or
> until a platform without ACPI and DT needs it. Until then, I've pushed v4
> to my virtio-iommu/topo branch and will keep it rebased on master.
> 
> Thanks,
> Jean

I think you guys need to work on virtio spec too, not too much left to
do there ...

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

2020-09-25 Thread Robin Murphy

On 2020-09-15 16:51, Christoph Hellwig wrote:
[...]

+These APIs allow to allocate pages in the kernel direct mapping that are
+guaranteed to be DMA addressable.  This means that unlike dma_alloc_coherent,
+virt_to_page can be called on the resulting address, and the resulting


Nit: if we explicitly describe this as if it's a guarantee that can be 
relied upon...



+struct page can be used for everything a struct page is suitable for.


[...]

+This routine allocates a region of  bytes of consistent memory.  It
+returns a pointer to the allocated region (in the processor's virtual address
+space) or NULL if the allocation failed.  The returned memory may or may not
+be in the kernels direct mapping.  Drivers must not call virt_to_page on
+the returned memory region.


...then forbid this document's target audience from relying on it, 
something seems off. At the very least it's unhelpfully unclear :/


Given patch #17, I suspect that the first paragraph is the one that's no 
longer true.


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Jean-Philippe Brucker
On Fri, Sep 25, 2020 at 06:22:57AM -0400, Michael S. Tsirkin wrote:
> On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> > On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > > Add a topology description to the virtio-iommu driver and enable x86
> > > platforms.
> > > 
> > > Since [v2] we have made some progress on adding ACPI support for
> > > virtio-iommu, which is the preferred boot method on x86. It will be a
> > > new vendor-agnostic table describing para-virtual topologies in a
> > > minimal format. However some platforms don't use either ACPI or DT for
> > > booting (for example microvm), and will need the alternative topology
> > > description method proposed here. In addition, since the process to get
> > > a new ACPI table will take a long time, this provides a boot method even
> > > to ACPI-based platforms, if only temporarily for testing and
> > > development.
> > > 
> > > v3:
> > > * Add patch 1 that moves virtio-iommu to a subfolder.
> > > * Split the rest:
> > >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > > support.
> > >   * Patch 4 adds definitions.
> > >   * Patch 5 adds parser in topology.c.
> > > * Address other comments.
> > > 
> > > Linux and QEMU patches available at:
> > > https://jpbrucker.net/git/linux virtio-iommu/devel
> > > https://jpbrucker.net/git/qemu virtio-iommu/devel
> > 
> > I'm parking this work again, until we make progress on the ACPI table, or
> > until a platform without ACPI and DT needs it. Until then, I've pushed v4
> > to my virtio-iommu/topo branch and will keep it rebased on master.
> > 
> > Thanks,
> > Jean
> 
> I think you guys need to work on virtio spec too, not too much left to
> do there ...

I know it's ready and I'd really like to move on with this, but I'd rather
not commit it to the spec until we know it's going to be used at all. As
Gerd pointed out the one example we had, microvm, now supports ACPI. Since
we've kicked off the ACPI work anyway it isn't clear that the built-in
topology will be useful.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE

2020-09-25 Thread Suravee Suthikulpanit
When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
current driver disables interrupt remapping when it updates the IRTE
so that the upper and lower 64-bit values can be updated safely.

However, this creates a small window, where the interrupt could
arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.

  IOMMU DriverDevice IRQ
  ===
  irte.RemapEn=0
   ...
   change IRTEIRQ from device ==> IO_PAGE_FAULT !!
   ...
  irte.RemapEn=1

This scenario has been observed when changing irq affinity on a system
running I/O-intensive workload, in which the destination APIC ID
in the IRTE is updated.

Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
disabling the interrupt remapping. However, this means several features,
which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
is not supported (which is unprecedented for AMD processors w/ IOMMU).

Cc: sta...@vger.kernel.org
Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
Reported-by: Sean Osborne 
Signed-off-by: Suravee Suthikulpanit 
Tested-by: Erik Rockstrom 
Reviewed-by: Joao Martins 
Link: 
https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpa...@amd.com
Signed-off-by: Joerg Roedel 
---
Note: This patch is the back-port on top of the stable branch linux-5.4.y
for the upstream commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when
updating 128-bit IRTE") since the original patch does not apply cleanly.

 drivers/iommu/Kconfig  |  2 +-
 drivers/iommu/amd_iommu.c  | 17 +
 drivers/iommu/amd_iommu_init.c | 21 +++--
 3 files changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 390568afee9f..fc0160e8ed33 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -138,7 +138,7 @@ config AMD_IOMMU
select PCI_PASID
select IOMMU_API
select IOMMU_IOVA
-   depends on X86_64 && PCI && ACPI
+   depends on X86_64 && PCI && ACPI && HAVE_CMPXCHG_DOUBLE
---help---
  With this option you can enable support for AMD IOMMU hardware in
  your system. An IOMMU is a hardware component which provides
diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index fa91d856a43e..7b724f7b27a9 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -3873,6 +3873,7 @@ static int alloc_irq_index(u16 devid, int count, bool 
align,
 static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte,
  struct amd_ir_data *data)
 {
+   bool ret;
struct irq_remap_table *table;
struct amd_iommu *iommu;
unsigned long flags;
@@ -3890,10 +3891,18 @@ static int modify_irte_ga(u16 devid, int index, struct 
irte_ga *irte,
 
entry = (struct irte_ga *)table->table;
entry = &entry[index];
-   entry->lo.fields_remap.valid = 0;
-   entry->hi.val = irte->hi.val;
-   entry->lo.val = irte->lo.val;
-   entry->lo.fields_remap.valid = 1;
+
+   ret = cmpxchg_double(&entry->lo.val, &entry->hi.val,
+entry->lo.val, entry->hi.val,
+irte->lo.val, irte->hi.val);
+   /*
+* We use cmpxchg16 to atomically update the 128-bit IRTE,
+* and it cannot be updated by the hardware or other processors
+* behind us, so the return value of cmpxchg16 should be the
+* same as the old value.
+*/
+   WARN_ON(!ret);
+
if (data)
data->ref = entry;
 
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index 135ae5222cf3..31d7e2d4f304 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1522,7 +1522,14 @@ static int __init init_iommu_one(struct amd_iommu 
*iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
-   if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
+
+   /*
+* Note: GA (128-bit IRTE) mode requires cmpxchg16b supports.
+* GAM also requires GA mode. Therefore, we need to
+* check cmpxchg16b support before enabling it.
+*/
+   if (!boot_cpu_has(X86_FEATURE_CX16) ||
+   ((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0))
amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY;
break;
case 0x11:
@@ -1531,8 +1538,18 @@ static int __init init_iommu_one(struct amd_iommu 
*iommu, struct ivhd_header *h)
iommu->mmio_phys_end = MMIO_REG_END_OFFSET;
else
iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET;
-   if (((h->efr_reg & (0x1 <<

Re: [PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

2020-09-25 Thread Robin Murphy

On 2020-09-25 10:51, John Garry wrote:

Leizhen reported some time ago that IOVA performance may degrade over time
[0], but unfortunately his solution to fix this problem was not given
attention.

To summarize, the issue is that as time goes by, the CPU rcache and depot
rcache continue to grow. As such, IOVA RB tree access time also continues
to grow.

At a certain point, a depot may become full, and also some CPU rcaches may
also be full when we try to insert another IOVA. For this scenario,
currently we free the "loaded" CPU rcache and create a new one. This
free'ing means that we need to free many IOVAs in the RB tree, which
makes IO throughput performance fall off a cliff in our storage scenario:

Jobs: 12 (f=12): [] [0.0% done] [6314MB/0KB/0KB /s] [1616K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [5669MB/0KB/0KB /s] [1451K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6673MB/0KB/0KB /s] [1708K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6031MB/0KB/0KB /s] [1544K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6761MB/0KB/0KB /s] [1731K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6705MB/0KB/0KB /s] [1717K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6685MB/0KB/0KB /s] [1711K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6178MB/0KB/0KB /s] [1582K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [6731MB/0KB/0KB /s] [1723K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2387MB/0KB/0KB /s] [611K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2689MB/0KB/0KB /s] [688K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [2278MB/0KB/0KB /s] [583K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1288MB/0KB/0KB /s] [330K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1632MB/0KB/0KB /s] [418K/0/0 iops]
Jobs: 12 (f=12): [] [0.0% done] [1765MB/0KB/0KB /s] [452K/0/0 iops]

And continue in this fashion, without recovering. Note that in this
example we had to wait 16 hours for this to occur. Also note that IO
throughput also becomes gradually becomes more unstable leading up to this
point.

As a solution this issue, we judge that the IOVA rcaches have grown too
big, and just flush all the CPUs rcaches instead.

The depot rcaches, however, are not flushed, as they can be used to
immediately replenish active CPUs.

In future, some IOVA rcache compaction could be implemented to solve the
instabilty issue, which I figure could be quite complex to implement.

[0] 
https://lore.kernel.org/linux-iommu/20190815121104.29140-3-thunder.leiz...@huawei.com/

Reported-by: Xiang Chen 
Tested-by: Xiang Chen 
Signed-off-by: John Garry 
---
  drivers/iommu/iova.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 45a251da5453..05e0b462e0d9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -892,9 +892,8 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
 struct iova_rcache *rcache,
 unsigned long iova_pfn)
  {
-   struct iova_magazine *mag_to_free = NULL;
struct iova_cpu_rcache *cpu_rcache;
-   bool can_insert = false;
+   bool can_insert = false, flush = false;
unsigned long flags;
  
  	cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);

@@ -913,13 +912,19 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,
if (rcache->depot_size < MAX_GLOBAL_MAGS) {
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
+   can_insert = true;
+   cpu_rcache->loaded = new_mag;
} else {
-   mag_to_free = cpu_rcache->loaded;
+   /*
+* The depot is full, meaning that a very large
+* cache of IOVAs has built up, which slows
+* down RB tree accesses significantly
+* -> let's flush at this point.
+*/
+   flush = true;
+   iova_magazine_free(new_mag);
}
spin_unlock(&rcache->lock);
-
-   cpu_rcache->loaded = new_mag;
-   can_insert = true;
}
}
  
@@ -928,9 +933,11 @@ static bool __iova_rcache_insert(struct iova_domain *iovad,
  
  	spin_unlock_irqrestore(&cpu_rcache->lock, flags);
  
-	if (mag_to_free) {

-   iova_magazine_free_pfns(mag_to_free, iovad);
-   iova_magazine_fre

Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-25 Thread Robin Murphy

On 2020-09-24 17:23, Dmitry Osipenko wrote:

24.09.2020 17:01, Thierry Reding пишет:

On Thu, Sep 24, 2020 at 04:23:59PM +0300, Dmitry Osipenko wrote:

04.09.2020 15:59, Thierry Reding пишет:

From: Thierry Reding 

Reserved memory regions can be marked as "active" if hardware is
expected to access the regions during boot and before the operating
system can take control. One example where this is useful is for the
operating system to infer whether the region needs to be identity-
mapped through an IOMMU.

Signed-off-by: Thierry Reding 
---
  .../bindings/reserved-memory/reserved-memory.txt   | 7 +++
  1 file changed, 7 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index 4dd20de6977f..163d2927e4fc 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -63,6 +63,13 @@ reusable (optional) - empty property
able to reclaim it back. Typically that means that the operating
system can use that region to store volatile or cached data that
can be otherwise regenerated or migrated elsewhere.
+active (optional) - empty property
+- If this property is set for a reserved memory region, it indicates
+  that some piece of hardware may be actively accessing this region.
+  Should the operating system want to enable IOMMU protection for a
+  device, all active memory regions must have been identity-mapped
+  in order to ensure that non-quiescent hardware during boot can
+  continue to access the memory.
  
  Linux implementation note:

  - If a "linux,cma-default" property is present, then Linux will use the



Hi,

Could you please explain what devices need this quirk? I see that you're
targeting Tegra SMMU driver, which means that it should be some pre-T186
device.


Primarily I'm looking at Tegra210 and later, because on earlier devices
the bootloader doesn't consistently initialize display. I know that it
does on some devices, but not all of them.


AFAIK, all tablet devices starting with Tegra20 that have display panel
are initializing display at a boot time for showing splash screen. This
includes all T20/T30/T114 tablets that are already supported by upstream
kernel.


This same code should also
work on Tegra186 and later (with an ARM SMMU) although the situation is
slightly more complicated there because IOMMU translations will fault by
default long before these identity mappings can be established.


Is this reservation needed for some device that has display
hardwired to a very specific IOMMU domain at the boot time?


No, this is only used to convey information about the active framebuffer
to the kernel. In practice the DMA/IOMMU code will use this information
to establish a 1:1 mapping on whatever IOMMU domain that was picked for
display.


If you're targeting devices that don't have IOMMU enabled by default at
the boot time, then this approach won't work for the existing devices
which won't ever get an updated bootloader.


If the devices don't use an IOMMU, then there should be no problem. The
extra reserved-memory nodes would still be necessary to ensure that the
kernel doesn't reuse the framebuffer memory for the slab allocator, but
if no IOMMU is used, then the display controller accessing the memory
isn't going to cause problems other than perhaps scanning out data that
is no longer a framebuffer.

There should also be no problem for devices with an old bootloader
because this code is triggered by the presence of a reserved-memory node
referenced via the memory-region property. Devices with an old
bootloader should continue to work as they did before. Although I
suppose they would start faulting once we enable DMA/IOMMU integration
for Tegra SMMU if they have a bootloader that does initialize display to
actively scan out during boot.


I think Robin Murphy already suggested that we should simply create
a dummy "identity" IOMMU domain by default for the DRM/VDE devices and
then replace it with an explicitly created domain within the drivers.


I don't recall reading about that suggestion. So does this mean that for
certain devices we'd want to basically passthrough by default and then
at some point during boot take over with a properly managed IOMMU
domain?


Yes, my understanding that this is what Robin suggested here:

https://lore.kernel.org/linux-iommu/cb12808b-7316-19db-7413-b7f852a6f...@arm.com/


Just to clarify, what I was talking about there is largely orthogonal to 
the issue here. That was about systems with limited translation 
resources letting translation be specifically opt-in by IOMMU-aware 
drivers. It probably *would* happen to obviate the issue of disrupting 
live DMA at boot time on these particular Tegra platforms, but we still 
need something like Thierry's solution in general, since IOMMU drivers

Re: [PATCH] iommu/amd: Use cmpxchg_double() when updating 128-bit IRTE

2020-09-25 Thread Greg KH
On Fri, Sep 25, 2020 at 11:45:05AM +, Suravee Suthikulpanit wrote:
> When using 128-bit interrupt-remapping table entry (IRTE) (a.k.a GA mode),
> current driver disables interrupt remapping when it updates the IRTE
> so that the upper and lower 64-bit values can be updated safely.
> 
> However, this creates a small window, where the interrupt could
> arrive and result in IO_PAGE_FAULT (for interrupt) as shown below.
> 
>   IOMMU DriverDevice IRQ
>   ===
>   irte.RemapEn=0
>...
>change IRTEIRQ from device ==> IO_PAGE_FAULT !!
>...
>   irte.RemapEn=1
> 
> This scenario has been observed when changing irq affinity on a system
> running I/O-intensive workload, in which the destination APIC ID
> in the IRTE is updated.
> 
> Instead, use cmpxchg_double() to update the 128-bit IRTE at once without
> disabling the interrupt remapping. However, this means several features,
> which require GA (128-bit IRTE) support will also be affected if cmpxchg16b
> is not supported (which is unprecedented for AMD processors w/ IOMMU).
> 
> Cc: sta...@vger.kernel.org
> Fixes: 880ac60e2538 ("iommu/amd: Introduce interrupt remapping ops structure")
> Reported-by: Sean Osborne 
> Signed-off-by: Suravee Suthikulpanit 
> Tested-by: Erik Rockstrom 
> Reviewed-by: Joao Martins 
> Link: 
> https://lore.kernel.org/r/20200903093822.52012-3-suravee.suthikulpa...@amd.com
> Signed-off-by: Joerg Roedel 
> ---
> Note: This patch is the back-port on top of the stable branch linux-5.4.y
> for the upstream commit e52d58d54a32 ("iommu/amd: Use cmpxchg_double() when
> updating 128-bit IRTE") since the original patch does not apply cleanly.

Now queued up, thanks.

greg k-h
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-25 Thread Dmitry Osipenko
25.09.2020 15:39, Robin Murphy пишет:
...
>> IIRC, in the past Robin Murphy was suggesting to read out hardware state
>> early during kernel boot in order to find what regions are in use by
>> hardware.
> 
> I doubt I suggested that in general, because I've always firmly believed
> it to be a terrible idea. I've debugged too many cases where firmware or
> kexec has inadvertently left DMA running and corrupted kernel memory, so
> in general we definitely *don't* want to blindly trust random hardware
> state. Anything I may have said in relation to Qualcomm's fundamentally
> broken hypervisor/bootloader setup should not be considered outside that
> specific context ;)
> 
> Robin.
> 
>> I think it should be easy to do for the display controller since we
>> could check clock and PD states in order to decide whether DC's IO could
>> be accessed and then read out the FB pointer and size. I guess it should
>> take about hundred lines of code.

The active DMA is indeed very dangerous, but it's a bit less dangerous
in a case of read-only DMA.

I got another idea of how we could benefit from the active display
hardware. Maybe we could do the following:

1. Check whether display is active

2. Allocate CMA that matches the FB size

3. Create identity mapping for the CMA

4. Switch display framebuffer to our CMA

5. Create very early simple-framebuffer out of the CMA

6. Once Tegra DRM driver is loaded, it will kick out the simple-fb, and
thus, release temporal CMA and identity mapping.

This will provide us with a very early framebuffer output and it will
work on all devices out-of-the-box!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Michael S. Tsirkin
On Fri, Sep 25, 2020 at 01:26:29PM +0200, Jean-Philippe Brucker wrote:
> On Fri, Sep 25, 2020 at 06:22:57AM -0400, Michael S. Tsirkin wrote:
> > On Fri, Sep 25, 2020 at 10:48:06AM +0200, Jean-Philippe Brucker wrote:
> > > On Fri, Aug 21, 2020 at 03:15:34PM +0200, Jean-Philippe Brucker wrote:
> > > > Add a topology description to the virtio-iommu driver and enable x86
> > > > platforms.
> > > > 
> > > > Since [v2] we have made some progress on adding ACPI support for
> > > > virtio-iommu, which is the preferred boot method on x86. It will be a
> > > > new vendor-agnostic table describing para-virtual topologies in a
> > > > minimal format. However some platforms don't use either ACPI or DT for
> > > > booting (for example microvm), and will need the alternative topology
> > > > description method proposed here. In addition, since the process to get
> > > > a new ACPI table will take a long time, this provides a boot method even
> > > > to ACPI-based platforms, if only temporarily for testing and
> > > > development.
> > > > 
> > > > v3:
> > > > * Add patch 1 that moves virtio-iommu to a subfolder.
> > > > * Split the rest:
> > > >   * Patch 2 adds topology-helper.c, which will be shared with the ACPI
> > > > support.
> > > >   * Patch 4 adds definitions.
> > > >   * Patch 5 adds parser in topology.c.
> > > > * Address other comments.
> > > > 
> > > > Linux and QEMU patches available at:
> > > > https://jpbrucker.net/git/linux virtio-iommu/devel
> > > > https://jpbrucker.net/git/qemu virtio-iommu/devel
> > > 
> > > I'm parking this work again, until we make progress on the ACPI table, or
> > > until a platform without ACPI and DT needs it. Until then, I've pushed v4
> > > to my virtio-iommu/topo branch and will keep it rebased on master.
> > > 
> > > Thanks,
> > > Jean
> > 
> > I think you guys need to work on virtio spec too, not too much left to
> > do there ...
> 
> I know it's ready and I'd really like to move on with this, but I'd rather
> not commit it to the spec until we know it's going to be used at all. As
> Gerd pointed out the one example we had, microvm, now supports ACPI. Since
> we've kicked off the ACPI work anyway it isn't clear that the built-in
> topology will be useful.
> 
> Thanks,
> Jean

Many power platforms are OF based, thus without ACPI or DT support.

-- 
MST

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 1/4] dt-bindings: reserved-memory: Document "active" property

2020-09-25 Thread Dmitry Osipenko
25.09.2020 15:39, Robin Murphy пишет:
...
>> Yes, my understanding that this is what Robin suggested here:
>>
>> https://lore.kernel.org/linux-iommu/cb12808b-7316-19db-7413-b7f852a6f...@arm.com/
>>
> 
> Just to clarify, what I was talking about there is largely orthogonal to
> the issue here. That was about systems with limited translation
> resources letting translation be specifically opt-in by IOMMU-aware
> drivers. It probably *would* happen to obviate the issue of disrupting
> live DMA at boot time on these particular Tegra platforms, but we still
> need something like Thierry's solution in general, since IOMMU drivers
> may have no other way to determine whether devices are active at boot
> and they have to take care to avoid breaking anything - e.g. SMMUv3 will
> at a bare minimum need to set up *some* form of valid stream table entry
> for the relevant device(s) right at the beginning where we first probe
> and reset the SMMU itself, regardless of what happens with domains and
> addresses later down the line.

Yes, I only meant that yours suggestion also should be useful here.
Anyways, thank you for the clarification :)

I agree that the Thierry's proposal is good! But it needs some more
thought yet because it's not very applicable to the current devices.

>>> The primary goal here is to move towards using the DMA API rather than
>>> the IOMMU API directly, so we don't really have the option of replacing
>>> with an explicitly created domain. Unless we have code in the DMA/IOMMU
>>> code that does this somehow.
>>>
>>> But I'm not sure what would be a good way to mark certain devices as
>>> needing an identity domain by default. Do we still use the reserved-
>>> memory node for that?
>>
>> The reserved-memory indeed shouldn't be needed for resolving the
>> implicit IOMMU problem since we could mark certain devices within the
>> kernel IOMMU driver.
>>
>> I haven't got around to trying to implement the implicit IOMMU support
>> yet, but I suppose we could implement the def_domain_type() hook in the
>> SMMU driver and then return IOMMU_DOMAIN_IDENTITY for the Display/VDE
>> devices. Then the Display/VDE drivers will take over the identity domain
>> and replace it with the explicit domain.
> 
> FWIW I've already cooked up identity domain support for tegra-gart; I
> was planning on tackling it for tegra-smmu as well for the next version
> of my arm default domains series (which will be after the next -rc1 now
> since I'm just about to take some long-overdue holiday).

Very nice! Maybe we will have some more food for the discussion by the
time you'll return. Have a good time!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 2/8] iommu: iova: properly handle 0 as a valid IOVA address

2020-09-25 Thread Marek Szyprowski
Zero is a valid DMA and IOVA address on many architectures, so adjust the
IOVA management code to properly handle it. A new value IOVA_BAD_ADDR
(~0UL) is introduced as a generic value for the error case. Adjust all
callers of the alloc_iova_fast() function for the new return value.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c   | 18 ++
 drivers/iommu/intel/iommu.c | 12 ++--
 drivers/iommu/iova.c| 10 ++
 include/linux/iova.h|  2 ++
 4 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index cd6e3c70ebb3..91dd8f46dae1 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -405,7 +405,7 @@ static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain 
*domain,
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
-   unsigned long shift, iova_len, iova = 0;
+   unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -433,11 +433,13 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
iova = alloc_iova_fast(iovad, iova_len,
   DMA_BIT_MASK(32) >> shift, false);
 
-   if (!iova)
+   if (iova == IOVA_BAD_ADDR)
iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
   true);
 
-   return (dma_addr_t)iova << shift;
+   if (iova != IOVA_BAD_ADDR)
+   return (dma_addr_t)iova << shift;
+   return DMA_MAPPING_ERROR;
 }
 
 static void iommu_dma_free_iova(struct iommu_dma_cookie *cookie,
@@ -493,8 +495,8 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
size = iova_align(iovad, size + iova_off);
 
iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
-   if (!iova)
-   return DMA_MAPPING_ERROR;
+   if (iova == DMA_MAPPING_ERROR)
+   return iova;
 
if (iommu_map_atomic(domain, iova, phys - iova_off, size, prot)) {
iommu_dma_free_iova(cookie, iova, size);
@@ -617,7 +619,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
 
size = iova_align(iovad, size);
iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
if (sg_alloc_table_from_pages(&sgt, pages, count, 0, size, GFP_KERNEL))
@@ -887,7 +889,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
}
 
iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_restore_sg;
 
/*
@@ -1181,7 +1183,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
return NULL;
 
iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
-   if (!iova)
+   if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
if (iommu_map(domain, iova, msi_addr, size, prot))
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 00963cedfd83..885d0dee39cc 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3416,15 +3416,15 @@ static unsigned long intel_alloc_iova(struct device 
*dev,
 */
iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
   IOVA_PFN(DMA_BIT_MASK(32)), false);
-   if (iova_pfn)
+   if (iova_pfn != IOVA_BAD_ADDR)
return iova_pfn;
}
iova_pfn = alloc_iova_fast(&domain->iovad, nrpages,
   IOVA_PFN(dma_mask), true);
-   if (unlikely(!iova_pfn)) {
+   if (unlikely(iova_pfn == IOVA_BAD_ADDR)) {
dev_err_once(dev, "Allocating %ld-page iova failed\n",
 nrpages);
-   return 0;
+   return IOVA_BAD_ADDR;
}
 
return iova_pfn;
@@ -3454,7 +3454,7 @@ static dma_addr_t __intel_map_single(struct device *dev, 
phys_addr_t paddr,
size = aligned_nrpages(paddr, size);
 
iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size), dma_mask);
-   if (!iova_pfn)
+   if (iova_pfn == IOVA_BAD_ADDR)
goto error;
 
/*
@@ -3663,7 +3663,7 @@ static int intel_map_sg(struct device *dev, struct 
scatterlist *sglist, int nele
 
iova_pfn = intel_alloc_iova(dev, domain, dma_to_mm_pfn(size),
*dev->dma_mask);
-   if (!iova_pfn) {
+   if (iova_pfn == IOVA_BAD_ADDR) {
sglist->dma_length = 0;
return 0;
}
@@ -3760,7 +3760,7 @@ bounce_map_sing

[PATCH 1/8] dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute

2020-09-25 Thread Marek Szyprowski
Some devices require to allocate a special buffer (usually for the
firmware) just at the beginning of the address space to ensure that all
further allocations can be expressed as a positive offset from that
special buffer. When IOMMU is used for managing the DMA address space,
such requirement can be easily fulfilled, simply by enforcing the
'first-fit' IOVA allocation algorithm.

This patch adds a DMA attribute for such case.

Signed-off-by: Marek Szyprowski 
---
 include/linux/dma-mapping.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index bb138ac6f5e6..c8c568ba375b 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -66,6 +66,12 @@
  * at least read-only at lesser-privileged levels).
  */
 #define DMA_ATTR_PRIVILEGED(1UL << 9)
+/*
+ * DMA_ATTR_LOW_ADDRESS: used to indicate that the buffer should be allocated
+ * at the lowest possible DMA address, usually just at the beginning of the
+ * DMA/IOVA address space ('first-fit' allocation algorithm).
+ */
+#define DMA_ATTR_LOW_ADDRESS   (1UL << 10)
 
 /*
  * A dma_addr_t can hold any valid DMA or bus address for the platform.
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 3/8] iommu: iova: add support for 'first-fit' algorithm

2020-09-25 Thread Marek Szyprowski
Add support for the 'first-fit' allocation algorithm. It will be used for
the special case of implementing DMA_ATTR_LOW_ADDRESS, so this path
doesn't use IOVA cache.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/iova.c | 78 
 include/linux/iova.h |  2 ++
 2 files changed, 80 insertions(+)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 87555ed1737a..0911d36f7ee5 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -227,6 +227,59 @@ static int __alloc_and_insert_iova_range(struct 
iova_domain *iovad,
return -ENOMEM;
 }
 
+static unsigned long
+__iova_get_aligned_start(unsigned long start, unsigned long size)
+{
+   unsigned long mask = __roundup_pow_of_two(size) - 1;
+
+   return (start + mask) & ~mask;
+}
+
+static int __alloc_and_insert_iova_range_forward(struct iova_domain *iovad,
+   unsigned long size, unsigned long limit_pfn,
+   struct iova *new)
+{
+   struct rb_node *curr;
+   unsigned long flags;
+   unsigned long start, limit;
+
+   spin_lock_irqsave(&iovad->iova_rbtree_lock, flags);
+
+   curr = rb_first(&iovad->rbroot);
+   limit = limit_pfn;
+   start = __iova_get_aligned_start(iovad->start_pfn, size);
+
+   while (curr) {
+   struct iova *curr_iova = rb_entry(curr, struct iova, node);
+   struct rb_node *next = rb_next(curr);
+
+   start = __iova_get_aligned_start(curr_iova->pfn_hi + 1, size);
+   if (next) {
+   struct iova *next_iova = rb_entry(next, struct iova, 
node);
+   limit = next_iova->pfn_lo - 1;
+   } else {
+   limit = limit_pfn;
+   }
+
+   if ((start + size) <= limit)
+   break;  /* found a free slot */
+   curr = next;
+   }
+
+   if (!curr && start + size > limit) {
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+   return -ENOMEM;
+   }
+
+   new->pfn_lo = start;
+   new->pfn_hi = new->pfn_lo + size - 1;
+   iova_insert_rbtree(&iovad->rbroot, new, curr);
+
+   spin_unlock_irqrestore(&iovad->iova_rbtree_lock, flags);
+
+   return 0;
+}
+
 static struct kmem_cache *iova_cache;
 static unsigned int iova_cache_users;
 static DEFINE_MUTEX(iova_cache_mutex);
@@ -398,6 +451,31 @@ free_iova(struct iova_domain *iovad, unsigned long pfn)
 }
 EXPORT_SYMBOL_GPL(free_iova);
 
+/**
+ * alloc_iova_first_fit - allocates an iova from the beginning of address space
+ * @iovad: - iova domain in question
+ * @size: - size of page frames to allocate
+ * @limit_pfn: - max limit address
+ * Returns a pfn the allocated iova starts at or IOVA_BAD_ADDR in the case
+ * of a failure.
+*/
+unsigned long
+alloc_iova_first_fit(struct iova_domain *iovad, unsigned long size,
+unsigned long limit_pfn)
+{
+   struct iova *new_iova = alloc_iova_mem();
+
+   if (!new_iova)
+   return IOVA_BAD_ADDR;
+
+   if (__alloc_and_insert_iova_range_forward(iovad, size, limit_pfn, 
new_iova)) {
+   free_iova_mem(new_iova);
+   return IOVA_BAD_ADDR;
+   }
+   return new_iova->pfn_lo;
+}
+EXPORT_SYMBOL_GPL(alloc_iova_first_fit);
+
 /**
  * alloc_iova_fast - allocates an iova from rcache
  * @iovad: - iova domain in question
diff --git a/include/linux/iova.h b/include/linux/iova.h
index 69737e6bcef6..01c29044488c 100644
--- a/include/linux/iova.h
+++ b/include/linux/iova.h
@@ -152,6 +152,8 @@ void queue_iova(struct iova_domain *iovad,
unsigned long data);
 unsigned long alloc_iova_fast(struct iova_domain *iovad, unsigned long size,
  unsigned long limit_pfn, bool flush_rcache);
+unsigned long alloc_iova_first_fit(struct iova_domain *iovad, unsigned long 
size,
+  unsigned long limit_pfn);
 struct iova *reserve_iova(struct iova_domain *iovad, unsigned long pfn_lo,
unsigned long pfn_hi);
 void copy_reserved_iova(struct iova_domain *from, struct iova_domain *to);
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 0/8] IOMMU-DMA - support old allocation algorithm used on ARM

2020-09-25 Thread Marek Szyprowski
Hi,

This patchset is a continuation of the planned rework of the ARM
IOMMU/DMA-mapping code proposed by Robin Murphy in [1]. However, there
are drivers (for example S5P-MFC and Exynos4-IS) which depend on the way
the old ARM IOMMU/DMA-mapping glue code worked (it used 'first-fit' IOVA
allocation algorithm), so before switching ARM to the generic code, such
drivers have to be updated.

This patchset provides the needed extensions to the generic IOMMU-DMA
framework to enable support for the drivers that relied on the old ARM
IOMMU/DMA-mapping behavior. This patchset is based on the idea proposed
by Robin Murphy in [2] after the discussion of the workaround implemented
directly in the mentioned drivers [3].

Here is a git branch with this patchset and [1] patches applied on top of
linux next-20200925:
https://github.com/mszyprow/linux/tree/v5.9-next-20200925-arm-dma-iommu-low-address

Best regards,
Marek Szyprowski


References:

[1] https://lore.kernel.org/lkml/cover.1597931875.git.robin.mur...@arm.com/
[2] 
https://lore.kernel.org/linux-iommu/bff57cbe-2247-05e1-9059-d9c66d64c...@arm.com/
[3] 
https://lore.kernel.org/linux-samsung-soc/20200918144833.14618-1-m.szyprow...@samsung.com/T/


Patch summary:

Marek Szyprowski (8):
  dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute
  iommu: iova: properly handle 0 as a valid IOVA address
  iommu: iova: add support for 'first-fit' algorithm
  iommu: dma-iommu: refactor iommu_dma_alloc_iova()
  iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS
  media: platform: exynos4-is: remove all references to physicall
addresses
  media: platform: exynos4-is: use DMA_ATTR_LOW_ADDRESS
  media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

 drivers/iommu/dma-iommu.c | 79 -
 drivers/iommu/intel/iommu.c   | 12 +--
 drivers/iommu/iova.c  | 88 ++-
 .../media/platform/exynos4-is/fimc-capture.c  |  6 +-
 drivers/media/platform/exynos4-is/fimc-core.c | 28 +++---
 drivers/media/platform/exynos4-is/fimc-core.h | 18 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   | 23 ++---
 drivers/media/platform/exynos4-is/fimc-is.h   |  6 +-
 .../media/platform/exynos4-is/fimc-lite-reg.c |  4 +-
 drivers/media/platform/exynos4-is/fimc-lite.c |  2 +-
 drivers/media/platform/exynos4-is/fimc-lite.h |  4 +-
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  8 +-
 drivers/media/platform/exynos4-is/fimc-reg.c  | 18 ++--
 drivers/media/platform/exynos4-is/fimc-reg.h  |  4 +-
 drivers/media/platform/s5p-mfc/s5p_mfc.c  |  8 +-
 include/linux/dma-mapping.h   |  6 ++
 include/linux/iova.h  |  4 +
 17 files changed, 221 insertions(+), 97 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 4/8] iommu: dma-iommu: refactor iommu_dma_alloc_iova()

2020-09-25 Thread Marek Szyprowski
Change the parameters passed to iommu_dma_alloc_iova(): the dma_limit can
be easily extracted from the parameters of the passed struct device, so
replace it with a flags parameter, which can later hold more information
about the way the IOVA allocator should do it job. While touching the
parameter list, move struct device to the second position to better match
the convention of the DMA-mapping related functions.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c | 23 +--
 1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 91dd8f46dae1..0ea87023306f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -400,12 +400,16 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
}
 }
 
+#define DMA_ALLOC_IOVA_COHERENTBIT(0)
+
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
-   size_t size, u64 dma_limit, struct device *dev)
+   struct device *dev, size_t size, unsigned int flags)
 {
struct iommu_dma_cookie *cookie = domain->iova_cookie;
struct iova_domain *iovad = &cookie->iovad;
unsigned long shift, iova_len, iova = IOVA_BAD_ADDR;
+   u64 dma_limit = (flags & DMA_ALLOC_IOVA_COHERENT) ?
+   dev->coherent_dma_mask : dma_get_mask(dev);
 
if (cookie->type == IOMMU_DMA_MSI_COOKIE) {
cookie->msi_iova += size;
@@ -481,7 +485,7 @@ static void __iommu_dma_unmap(struct device *dev, 
dma_addr_t dma_addr,
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-   size_t size, int prot, u64 dma_mask)
+   size_t size, int prot, unsigned int flags)
 {
struct iommu_domain *domain = iommu_get_dma_domain(dev);
struct iommu_dma_cookie *cookie = domain->iova_cookie;
@@ -494,7 +498,7 @@ static dma_addr_t __iommu_dma_map(struct device *dev, 
phys_addr_t phys,
 
size = iova_align(iovad, size + iova_off);
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
return iova;
 
@@ -618,7 +622,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, size, dev->coherent_dma_mask, dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -733,7 +737,7 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot, dma_get_mask(dev));
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, 0);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(phys, size, dir);
@@ -888,7 +892,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
-   iova = iommu_dma_alloc_iova(domain, iova_len, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, dev, iova_len, 0);
if (iova == DMA_MAPPING_ERROR)
goto out_restore_sg;
 
@@ -936,8 +940,7 @@ static dma_addr_t iommu_dma_map_resource(struct device 
*dev, phys_addr_t phys,
size_t size, enum dma_data_direction dir, unsigned long attrs)
 {
return __iommu_dma_map(dev, phys, size,
-   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO,
-   dma_get_mask(dev));
+   dma_info_to_prot(dir, false, attrs) | IOMMU_MMIO, 0);
 }
 
 static void iommu_dma_unmap_resource(struct device *dev, dma_addr_t handle,
@@ -1045,7 +1048,7 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
return NULL;
 
*handle = __iommu_dma_map(dev, page_to_phys(page), size, ioprot,
-   dev->coherent_dma_mask);
+ DMA_ALLOC_IOVA_COHERENT);
if (*handle == DMA_MAPPING_ERROR) {
__iommu_dma_free(dev, size, cpu_addr);
return NULL;
@@ -1182,7 +1185,7 @@ static struct iommu_dma_msi_page 
*iommu_dma_get_msi_page(struct device *dev,
if (!msi_page)
return NULL;
 
-   iova = iommu_dma_alloc_iova(domain, size, dma_get_mask(dev), dev);
+   iova = iommu_dma_alloc_iova(domain, dev, size, 0);
if (iova == DMA_MAPPING_ERROR)
goto out_free_page;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 5/8] iommu: dma-iommu: add support for DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
Implement support for the DMA_ATTR_LOW_ADDRESS DMA attribute. If it has
been set, call alloc_iova_first_fit() instead of the alloc_iova_fast() to
allocate the new IOVA from the beginning of the address space.

Signed-off-by: Marek Szyprowski 
---
 drivers/iommu/dma-iommu.c | 50 +--
 1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 0ea87023306f..ab39659c727a 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -401,6 +401,18 @@ static int dma_info_to_prot(enum dma_data_direction dir, 
bool coherent,
 }
 
 #define DMA_ALLOC_IOVA_COHERENTBIT(0)
+#define DMA_ALLOC_IOVA_FIRST_FIT   BIT(1)
+
+static unsigned int dma_attrs_to_alloc_flags(unsigned long attrs, bool 
coherent)
+{
+   unsigned int flags = 0;
+
+   if (coherent)
+   flags |= DMA_ALLOC_IOVA_COHERENT;
+   if (attrs & DMA_ATTR_LOW_ADDRESS)
+   flags |= DMA_ALLOC_IOVA_FIRST_FIT;
+   return flags;
+}
 
 static dma_addr_t iommu_dma_alloc_iova(struct iommu_domain *domain,
struct device *dev, size_t size, unsigned int flags)
@@ -433,13 +445,23 @@ static dma_addr_t iommu_dma_alloc_iova(struct 
iommu_domain *domain,
dma_limit = min(dma_limit, (u64)domain->geometry.aperture_end);
 
/* Try to get PCI devices a SAC address */
-   if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev))
-   iova = alloc_iova_fast(iovad, iova_len,
-  DMA_BIT_MASK(32) >> shift, false);
+   if (dma_limit > DMA_BIT_MASK(32) && dev_is_pci(dev)) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   DMA_BIT_MASK(32) >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+ DMA_BIT_MASK(32) >> shift, false);
+   }
 
-   if (iova == IOVA_BAD_ADDR)
-   iova = alloc_iova_fast(iovad, iova_len, dma_limit >> shift,
-  true);
+   if (iova == IOVA_BAD_ADDR) {
+   if (unlikely(flags & DMA_ALLOC_IOVA_FIRST_FIT))
+   iova = alloc_iova_first_fit(iovad, iova_len,
+   dma_limit >> shift);
+   else
+   iova = alloc_iova_fast(iovad, iova_len,
+  dma_limit >> shift, true);
+   }
 
if (iova != IOVA_BAD_ADDR)
return (dma_addr_t)iova << shift;
@@ -593,6 +615,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
struct iova_domain *iovad = &cookie->iovad;
bool coherent = dev_is_dma_coherent(dev);
int ioprot = dma_info_to_prot(DMA_BIDIRECTIONAL, coherent, attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, true);
pgprot_t prot = dma_pgprot(dev, PAGE_KERNEL, attrs);
unsigned int count, min_size, alloc_sizes = domain->pgsize_bitmap;
struct page **pages;
@@ -622,7 +645,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
return NULL;
 
size = iova_align(iovad, size);
-   iova = iommu_dma_alloc_iova(domain, dev, size, DMA_ALLOC_IOVA_COHERENT);
+   iova = iommu_dma_alloc_iova(domain, dev, size, flags);
if (iova == DMA_MAPPING_ERROR)
goto out_free_pages;
 
@@ -732,12 +755,13 @@ static dma_addr_t iommu_dma_map_page(struct device *dev, 
struct page *page,
unsigned long offset, size_t size, enum dma_data_direction dir,
unsigned long attrs)
 {
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, false);
phys_addr_t phys = page_to_phys(page) + offset;
bool coherent = dev_is_dma_coherent(dev);
int prot = dma_info_to_prot(dir, coherent, attrs);
dma_addr_t dma_handle;
 
-   dma_handle = __iommu_dma_map(dev, phys, size, prot, 0);
+   dma_handle = __iommu_dma_map(dev, phys, size, prot, flags);
if (!coherent && !(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
dma_handle != DMA_MAPPING_ERROR)
arch_sync_dma_for_device(phys, size, dir);
@@ -842,6 +866,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
struct iova_domain *iovad = &cookie->iovad;
struct scatterlist *s, *prev = NULL;
int prot = dma_info_to_prot(dir, dev_is_dma_coherent(dev), attrs);
+   unsigned int flags = dma_attrs_to_alloc_flags(attrs, false);
dma_addr_t iova;
size_t iova_len = 0;
unsigned long mask = dma_get_seg_boundary(dev);
@@ -892,7 +917,7 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
prev = s;
}
 
-   iova = iommu_dma_alloc_iova(dom

[PATCH 6/8] media: platform: exynos4-is: remove all references to physicall addresses

2020-09-25 Thread Marek Szyprowski
This driver always operates on the DMA/IOVA addresses, so calling them
physicall addresses is misleading, although when no IOMMU is used they
equal each other. Fix this by renaming all such entries to 'addr' and
adjusting comments.

Signed-off-by: Marek Szyprowski 
---
 .../media/platform/exynos4-is/fimc-capture.c  |  6 ++--
 drivers/media/platform/exynos4-is/fimc-core.c | 28 +--
 drivers/media/platform/exynos4-is/fimc-core.h | 18 ++--
 drivers/media/platform/exynos4-is/fimc-is.c   | 20 ++---
 drivers/media/platform/exynos4-is/fimc-is.h   |  6 ++--
 .../media/platform/exynos4-is/fimc-lite-reg.c |  4 +--
 drivers/media/platform/exynos4-is/fimc-lite.c |  2 +-
 drivers/media/platform/exynos4-is/fimc-lite.h |  4 +--
 drivers/media/platform/exynos4-is/fimc-m2m.c  |  8 +++---
 drivers/media/platform/exynos4-is/fimc-reg.c  | 18 ++--
 drivers/media/platform/exynos4-is/fimc-reg.h  |  4 +--
 11 files changed, 58 insertions(+), 60 deletions(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-capture.c 
b/drivers/media/platform/exynos4-is/fimc-capture.c
index 6000a4e789ad..13c838d3f947 100644
--- a/drivers/media/platform/exynos4-is/fimc-capture.c
+++ b/drivers/media/platform/exynos4-is/fimc-capture.c
@@ -201,7 +201,7 @@ void fimc_capture_irq_handler(struct fimc_dev *fimc, int 
deq_buf)
if (!list_empty(&cap->pending_buf_q)) {
 
v_buf = fimc_pending_queue_pop(cap);
-   fimc_hw_set_output_addr(fimc, &v_buf->paddr, cap->buf_index);
+   fimc_hw_set_output_addr(fimc, &v_buf->addr, cap->buf_index);
v_buf->index = cap->buf_index;
 
/* Move the buffer to the capture active queue */
@@ -410,7 +410,7 @@ static void buffer_queue(struct vb2_buffer *vb)
int min_bufs;
 
spin_lock_irqsave(&fimc->slock, flags);
-   fimc_prepare_addr(ctx, &buf->vb.vb2_buf, &ctx->d_frame, &buf->paddr);
+   fimc_prepare_addr(ctx, &buf->vb.vb2_buf, &ctx->d_frame, &buf->addr);
 
if (!test_bit(ST_CAPT_SUSPENDED, &fimc->state) &&
!test_bit(ST_CAPT_STREAM, &fimc->state) &&
@@ -419,7 +419,7 @@ static void buffer_queue(struct vb2_buffer *vb)
int buf_id = (vid_cap->reqbufs_count == 1) ? -1 :
vid_cap->buf_index;
 
-   fimc_hw_set_output_addr(fimc, &buf->paddr, buf_id);
+   fimc_hw_set_output_addr(fimc, &buf->addr, buf_id);
buf->index = vid_cap->buf_index;
fimc_active_queue_add(vid_cap, buf);
 
diff --git a/drivers/media/platform/exynos4-is/fimc-core.c 
b/drivers/media/platform/exynos4-is/fimc-core.c
index 08d1f39a914c..c989abeb478e 100644
--- a/drivers/media/platform/exynos4-is/fimc-core.c
+++ b/drivers/media/platform/exynos4-is/fimc-core.c
@@ -325,7 +325,7 @@ static irqreturn_t fimc_irq_handler(int irq, void *priv)
 
 /* The color format (colplanes, memplanes) must be already configured. */
 int fimc_prepare_addr(struct fimc_ctx *ctx, struct vb2_buffer *vb,
- struct fimc_frame *frame, struct fimc_addr *paddr)
+ struct fimc_frame *frame, struct fimc_addr *addr)
 {
int ret = 0;
u32 pix_size;
@@ -338,42 +338,40 @@ int fimc_prepare_addr(struct fimc_ctx *ctx, struct 
vb2_buffer *vb,
dbg("memplanes= %d, colplanes= %d, pix_size= %d",
frame->fmt->memplanes, frame->fmt->colplanes, pix_size);
 
-   paddr->y = vb2_dma_contig_plane_dma_addr(vb, 0);
+   addr->y = vb2_dma_contig_plane_dma_addr(vb, 0);
 
if (frame->fmt->memplanes == 1) {
switch (frame->fmt->colplanes) {
case 1:
-   paddr->cb = 0;
-   paddr->cr = 0;
+   addr->cb = 0;
+   addr->cr = 0;
break;
case 2:
/* decompose Y into Y/Cb */
-   paddr->cb = (u32)(paddr->y + pix_size);
-   paddr->cr = 0;
+   addr->cb = (u32)(addr->y + pix_size);
+   addr->cr = 0;
break;
case 3:
-   paddr->cb = (u32)(paddr->y + pix_size);
+   addr->cb = (u32)(addr->y + pix_size);
/* decompose Y into Y/Cb/Cr */
if (FIMC_FMT_YCBCR420 == frame->fmt->color)
-   paddr->cr = (u32)(paddr->cb
-   + (pix_size >> 2));
+   addr->cr = (u32)(addr->cb + (pix_size >> 2));
else /* 422 */
-   paddr->cr = (u32)(paddr->cb
-   + (pix_size >> 1));
+   addr->cr = (u32)(addr->cb + (pix_size >> 1));
break;
default:
return -EINVAL;
}

[PATCH 8/8] media: platform: s5p-mfc: use DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
S5P-MFC driver relied on the way the ARM DMA-IOMMU glue code worked -
mainly it relied on the fact that the allocator used first-fit algorithm
and the first allocated buffer were at 0x0 DMA/IOVA address. This is not
true for the generic IOMMU-DMA glue code that will be used for ARM
architecture soon, so limit the dma_mask to size of the DMA window the
hardware can use and add the needed DMA attribute to force proper IOVA
allocation of the firmware buffer.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/platform/s5p-mfc/s5p_mfc.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc.c 
b/drivers/media/platform/s5p-mfc/s5p_mfc.c
index eba2b9f040df..171fd9fd22e4 100644
--- a/drivers/media/platform/s5p-mfc/s5p_mfc.c
+++ b/drivers/media/platform/s5p-mfc/s5p_mfc.c
@@ -1199,8 +1199,12 @@ static int s5p_mfc_configure_common_memory(struct 
s5p_mfc_dev *mfc_dev)
if (!mfc_dev->mem_bitmap)
return -ENOMEM;
 
-   mfc_dev->mem_virt = dma_alloc_coherent(dev, mem_size,
-  &mfc_dev->mem_base, GFP_KERNEL);
+   /* MFC v5 can access memory only via the 256M window */
+   if (exynos_is_iommu_available(dev) && !IS_MFCV6_PLUS(mfc_dev))
+   dma_set_mask_and_coherent(dev, SZ_256M - 1);
+
+   mfc_dev->mem_virt = dma_alloc_attrs(dev, mem_size, &mfc_dev->mem_base,
+   GFP_KERNEL, DMA_ATTR_LOW_ADDRESS);
if (!mfc_dev->mem_virt) {
kfree(mfc_dev->mem_bitmap);
dev_err(dev, "failed to preallocate %ld MiB for the firmware 
and context buffers\n",
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 7/8] media: platform: exynos4-is: use DMA_ATTR_LOW_ADDRESS

2020-09-25 Thread Marek Szyprowski
Exynos4-IS driver relied on the way the ARM DMA-IOMMU glue code worked -
mainly it relied on the fact that the allocator used first-fit algorithm
and the first allocated buffer were at 0x0 DMA/IOVA address. This is not
true for the generic IOMMU-DMA glue code that will be used for ARM
architecture soon, so add the needed DMA attribute to force such behavior
of the DMA-mapping code.

Signed-off-by: Marek Szyprowski 
---
 drivers/media/platform/exynos4-is/fimc-is.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/media/platform/exynos4-is/fimc-is.c 
b/drivers/media/platform/exynos4-is/fimc-is.c
index 41b841a96338..9d3556eae5d3 100644
--- a/drivers/media/platform/exynos4-is/fimc-is.c
+++ b/drivers/media/platform/exynos4-is/fimc-is.c
@@ -335,8 +335,9 @@ static int fimc_is_alloc_cpu_memory(struct fimc_is *is)
 {
struct device *dev = &is->pdev->dev;
 
-   is->memory.vaddr = dma_alloc_coherent(dev, FIMC_IS_CPU_MEM_SIZE,
- &is->memory.addr, GFP_KERNEL);
+   is->memory.vaddr = dma_alloc_attrs(dev, FIMC_IS_CPU_MEM_SIZE,
+  &is->memory.addr, GFP_KERNEL,
+  DMA_ATTR_LOW_ADDRESS);
if (is->memory.vaddr == NULL)
return -ENOMEM;
 
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] Re: [PATCH v3 0/6] Add virtio-iommu built-in topology

2020-09-25 Thread Gerd Hoffmann
  Hi,

> Many power platforms are OF based, thus without ACPI or DT support.

pseries has lots of stuff below /proc/device-tree.  Dunno whenever that
is the same kind of device tree we have on arm ...

take care,
  Gerd

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

2020-09-25 Thread John Garry

On 25/09/2020 12:53, Robin Murphy wrote:

---
  drivers/iommu/iova.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 45a251da5453..05e0b462e0d9 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -892,9 +892,8 @@ static bool __iova_rcache_insert(struct 
iova_domain *iovad,

   struct iova_rcache *rcache,
   unsigned long iova_pfn)
  {
-    struct iova_magazine *mag_to_free = NULL;
  struct iova_cpu_rcache *cpu_rcache;
-    bool can_insert = false;
+    bool can_insert = false, flush = false;
  unsigned long flags;
  cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
@@ -913,13 +912,19 @@ static bool __iova_rcache_insert(struct 
iova_domain *iovad,

  if (rcache->depot_size < MAX_GLOBAL_MAGS) {
  rcache->depot[rcache->depot_size++] =
  cpu_rcache->loaded;
+    can_insert = true;
+    cpu_rcache->loaded = new_mag;
  } else {
-    mag_to_free = cpu_rcache->loaded;
+    /*
+ * The depot is full, meaning that a very large
+ * cache of IOVAs has built up, which slows
+ * down RB tree accesses significantly
+ * -> let's flush at this point.
+ */
+    flush = true;
+    iova_magazine_free(new_mag);
  }
  spin_unlock(&rcache->lock);
-
-    cpu_rcache->loaded = new_mag;
-    can_insert = true;
  }
  }
@@ -928,9 +933,11 @@ static bool __iova_rcache_insert(struct 
iova_domain *iovad,

  spin_unlock_irqrestore(&cpu_rcache->lock, flags);
-    if (mag_to_free) {
-    iova_magazine_free_pfns(mag_to_free, iovad);
-    iova_magazine_free(mag_to_free);
+    if (flush) {


Do you really need this flag, or is it effectively just mirroring 
"!can_insert" - in theory if there wasn't enough memory to allocate a 
new magazine, then freeing some more IOVAs wouldn't necessarily be a bad 
thing to do anyway.


Right, I can reuse can_insert.



Other than that, I think this looks reasonable. Every time I look at 
__iova_rcache_insert() I'm convinced there must be a way to restructure 
it to be more streamlined overall, but I can never quite see exactly how...




We could remove the new_mag check, but the code cannot safely handle 
loaded/prev = NULL. Indeed, I think that the mainline code has a bug:


If the initial allocation for the loaded/prev magazines fail (give NULL) 
in init_iova_rcaches(), then in __iova_rcache_insert():


if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;

If cpu_rcache->loaded == NULL, then can_insert is assigned true -> bang, 
as I experimented, below. This needs to be fixed...


Thanks,
john



ereference at virtual address 
[ 10.195299] Mem abort info:
[ 10.198080] ESR = 0x9604
[ 10.201121] EC = 0x25: DABT (current EL), IL = 32 bits
[ 10.206418] SET = 0, FnV = 0
[ 10.209459] EA = 0, S1PTW = 0
[ 10.212585] Data abort info:
[ 10.215452] ISV = 0, ISS = 0x0004
[ 10.219274] CM = 0, WnR = 0
[ 10.28] [] user address but active_mm is swapper
[ 10.228569] Internal error: Oops: 9604 [#1] PREEMPT SMP
[ 10.234127] Modules linked in:
[ 10.237170] CPU: 11 PID: 696 Comm: irq/40-hisi_sas Not tainted 
5.9.0-rc5-47738-gb1ead657a3fa-dirty #658
[ 10.246548] Hardware name: Huawei D06 /D06, BIOS Hisilicon D06 UEFI RC0 
- V1.16.01 03/15/2019

[ 10.255058] pstate: 60c00089 (nZCv daIf +PAN +UAO BTYPE=--)
[ 10.260620] pc : free_iova_fast+0xfc/0x280
[ 10.264703] lr : free_iova_fast+0x94/0x280
[ 10.268785] sp : 80002477bbb0
[ 10.272086] x29: 80002477bbb0 x28: 
[ 10.277385] x27: 002bc8fbb940 x26: 002bc727e26c
[ 10.282684] x25:  x24: 002bc9439008
[ 10.287982] x23: 000fdffe x22: 0080
[ 10.293280] x21: 002bc9439008 x20: 
[ 10.298579] x19: f403e9ebb700 x18: 
[ 10.303877] x17: 0001 x16: 
[ 10.309176] x15:  x14: 0040
[ 10.314474] x13: 7fff x12: 0001
[ 10.319772] x11: 000f x10: 6000
[ 10.325070] x9 :  x8 : 80002477b768
[ 10.330368] x7 :  x6 : 003f
[ 10.335666] x5 : 0040 x4 : 
[ 10.340964] x3 : f403e9ebb700 x2 : 
[ 10.346262] x1 :  x0 : 
[ 10.351561] Call trace:
[ 10.353995]free_iova_fast+0xfc/0x280
[ 10.357731]iommu_dma_free_iova+0x64/0x70
[ 10.361814]__iommu_dma_unmap+0x9c/0xf8
[ 10.365723]iommu_dma_unmap_sg+0xa8/0xc8
[ 10.369720]dma_unmap_sg_attrs+0x28/0x50
[ 10.373717]cq_thread_v3_hw+0x2dc/0x528
[ 10.377626]irq_thread_fn+0x2c/0xa0
[ 10.381188]irq_thread+0x130/0x1e0
[ 10.384664]kthread+0x154/0x158
[ 10.38787

Re: [patch V2 34/46] PCI/MSI: Make arch_.*_msi_irq[s] fallbacks selectable

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:17 +0200, Thomas Gleixner wrote:
> From: Thomas Gleixner 
> 
> The arch_.*_msi_irq[s] fallbacks are compiled in whether an architecture
> requires them or not. Architectures which are fully utilizing hierarchical
> irq domains should never call into that code.
> 
> It's not only architectures which depend on that by implementing one or
> more of the weak functions, there is also a bunch of drivers which relies
> on the weak functions which invoke msi_controller::setup_irq[s] and
> msi_controller::teardown_irq.
> 
> Make the architectures and drivers which rely on them select them in Kconfig
> and if not selected replace them by stub functions which emit a warning and
> fail the PCI/MSI interrupt allocation.
> 
> Signed-off-by: Thomas Gleixner 

Today's linux-next will have some warnings on s390x:

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/s390.config

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

WARNING: unmet direct dependencies detected for PCI_MSI_ARCH_FALLBACKS
  Depends on [n]: PCI [=n]
  Selected by [y]:
  - S390 [=y]

> ---
> V2: Make the architectures (and drivers) which need the fallbacks select them
> and not the other way round (Bjorn).
> ---
>  arch/ia64/Kconfig  |1 +
>  arch/mips/Kconfig  |1 +
>  arch/powerpc/Kconfig   |1 +
>  arch/s390/Kconfig  |1 +
>  arch/sparc/Kconfig |1 +
>  arch/x86/Kconfig   |1 +
>  drivers/pci/Kconfig|3 +++
>  drivers/pci/controller/Kconfig |3 +++
>  drivers/pci/msi.c  |3 ++-
>  include/linux/msi.h|   31 ++-
>  10 files changed, 40 insertions(+), 6 deletions(-)
> 
> --- a/arch/ia64/Kconfig
> +++ b/arch/ia64/Kconfig
> @@ -56,6 +56,7 @@ config IA64
>   select NEED_DMA_MAP_STATE
>   select NEED_SG_DMA_LENGTH
>   select NUMA if !FLATMEM
> + select PCI_MSI_ARCH_FALLBACKS
>   default y
>   help
> The Itanium Processor Family is Intel's 64-bit successor to
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -86,6 +86,7 @@ config MIPS
>   select MODULES_USE_ELF_REL if MODULES
>   select MODULES_USE_ELF_RELA if MODULES && 64BIT
>   select PERF_USE_VMALLOC
> + select PCI_MSI_ARCH_FALLBACKS
>   select RTC_LIB
>   select SYSCTL_EXCEPTION_TRACE
>   select VIRT_TO_BUS
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -246,6 +246,7 @@ config PPC
>   select OLD_SIGACTIONif PPC32
>   select OLD_SIGSUSPEND
>   select PCI_DOMAINS  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PCI_SYSCALL  if PCI
>   select PPC_DAWR if PPC64
>   select RTC_LIB
> --- a/arch/s390/Kconfig
> +++ b/arch/s390/Kconfig
> @@ -185,6 +185,7 @@ config S390
>   select OLD_SIGSUSPEND3
>   select PCI_DOMAINS  if PCI
>   select PCI_MSI  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select SPARSE_IRQ
>   select SYSCTL_EXCEPTION_TRACE
>   select THREAD_INFO_IN_TASK
> --- a/arch/sparc/Kconfig
> +++ b/arch/sparc/Kconfig
> @@ -43,6 +43,7 @@ config SPARC
>   select GENERIC_STRNLEN_USER
>   select MODULES_USE_ELF_RELA
>   select PCI_SYSCALL if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select ODD_RT_SIGACTION
>   select OLD_SIGSUSPEND
>   select CPU_NO_EFFICIENT_FFS
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -225,6 +225,7 @@ config X86
>   select NEED_SG_DMA_LENGTH
>   select PCI_DOMAINS  if PCI
>   select PCI_LOCKLESS_CONFIG  if PCI
> + select PCI_MSI_ARCH_FALLBACKS
>   select PERF_EVENTS
>   select RTC_LIB
>   select RTC_MC146818_LIB
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -56,6 +56,9 @@ config PCI_MSI_IRQ_DOMAIN
>   depends on PCI_MSI
>   select GENERIC_MSI_IRQ_DOMAIN
>  
> +config PCI_MSI_ARCH_FALLBACKS
> + bool
> +
>  config PCI_QUIRKS
>   default y
>   bool "Enable PCI quirk workarounds" if EXPERT
> --- a/drivers/pci/controller/Kconfig
> +++ b/drivers/pci/controller/Kconfig
> @@ -41,6 +41,7 @@ config PCI_TEGRA
>   bool "NVIDIA Tegra PCIe controller"
>   depends on ARCH_TEGRA || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want support for the PCIe host controller found
> on NVIDIA Tegra SoCs.
> @@ -67,6 +68,7 @@ config PCIE_RCAR_HOST
>   bool "Renesas R-Car PCIe host controller"
>   depends on ARCH_RENESAS || COMPILE_TEST
>   depends on PCI_MSI_IRQ_DOMAIN
> + select PCI_MSI_ARCH_FALLBACKS
>   help
> Say Y here if you want PCIe controller support on R-Car SoCs in host
> mode.
> @@ -103,6 +105,7 @@ config PCIE_X

Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-09-25 Thread Ross Philipson
On 9/24/20 1:38 PM, Arvind Sankar wrote:
> On Thu, Sep 24, 2020 at 10:58:35AM -0400, Ross Philipson wrote:
>> The Secure Launch (SL) stub provides the entry point for Intel TXT (and
>> later AMD SKINIT) to vector to during the late launch. The symbol
>> sl_stub_entry is that entry point and its offset into the kernel is
>> conveyed to the launching code using the MLE (Measured Launch
>> Environment) header in the structure named mle_header. The offset of the
>> MLE header is set in the kernel_info. The routine sl_stub contains the
>> very early late launch setup code responsible for setting up the basic
>> environment to allow the normal kernel startup_32 code to proceed. It is
>> also responsible for properly waking and handling the APs on Intel
>> platforms. The routine sl_main which runs after entering 64b mode is
>> responsible for measuring configuration and module information before
>> it is used like the boot params, the kernel command line, the TXT heap,
>> an external initramfs, etc.
>>
>> Signed-off-by: Ross Philipson 
> 
> Which version of the kernel is this based on?

git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

master branch

> 
>> diff --git a/arch/x86/boot/compressed/head_64.S 
>> b/arch/x86/boot/compressed/head_64.S
>> index 97d37f0..42043bf 100644
>> --- a/arch/x86/boot/compressed/head_64.S
>> +++ b/arch/x86/boot/compressed/head_64.S
>> @@ -279,6 +279,21 @@ SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
>>  SYM_FUNC_END(efi32_stub_entry)
>>  #endif
>>  
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +SYM_FUNC_START(sl_stub_entry)
>> +/*
>> + * On entry, %ebx has the entry abs offset to sl_stub_entry. To
>> + * find the beginning of where we are loaded, sub off from the
>> + * beginning.
>> + */
> 
> This requirement should be added to the documentation. Is it necessary
> or can this stub just figure out the address the same way as the other
> 32-bit entry points, using the scratch space in bootparams as a little
> stack?

It is based on the state of the BSP when TXT vectors to the measured
launch environment. It is documented in the TXT spec and the SDMs.

> 
>> +leal(startup_32 - sl_stub_entry)(%ebx), %ebx
>> +
>> +/* More room to work in sl_stub in the text section */
>> +jmp sl_stub
>> +
>> +SYM_FUNC_END(sl_stub_entry)
>> +#endif
>> +
>>  .code64
>>  .org 0x200
>>  SYM_CODE_START(startup_64)
>> @@ -537,6 +552,25 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
>>  shrq$3, %rcx
>>  rep stosq
>>  
>> +#ifdef CONFIG_SECURE_LAUNCH
>> +/*
>> + * Have to do the final early sl stub work in 64b area.
>> + *
>> + * *** NOTE ***
>> + *
>> + * Several boot params get used before we get a chance to measure
>> + * them in this call. This is a known issue and we currently don't
>> + * have a solution. The scratch field doesn't matter and loadflags
>> + * have KEEP_SEGMENTS set by the stub code. There is no obvious way
>> + * to do anything about the use of kernel_alignment or init_size
>> + * though these seem low risk.
>> + */
> 
> There are various fields in bootparams that depend on where the
> kernel/initrd and cmdline are loaded in memory. If the entire bootparams
> page is getting measured, does that mean they all have to be at fixed
> addresses on every boot?

Yes that is a very good point. In other places when measuring we make
sure to skip things like addresses and sizes of things outside of the
structure being measured. This needs to be done with boot params too.

> 
> Also KEEP_SEGMENTS support is gone from the kernel since v5.7, since it
> was unused. startup_32 now always loads a GDT and then the segment
> registers. I think this should be ok for you as the only thing the flag
> used to do in the 64-bit kernel was to stop startup_32 from blindly
> loading __BOOT_DS into the segment registers before it had setup its own
> GDT.

Yea this was there to prevent that blind loading of __BOOT_DS. I see it
is gone so I will remove the comment and the place where the flag is set.

> 
> For the 32-bit assembler code that's being added, tip/master now has
> changes that prevent the compressed kernel from having any runtime
> relocations.  You'll need to revise some of the code and the data
> structures initial values to avoid creating relocations.

Could you elaborate on this some more? I am not sure I see places in the
secure launch asm that would be creating relocations like this.

Thank you,
Ross

> 
> Thanks.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/13] x86: Secure Launch Kconfig

2020-09-25 Thread Ross Philipson
On 9/24/20 10:08 PM, Randy Dunlap wrote:
> On 9/24/20 7:58 AM, Ross Philipson wrote:
>> Initial bits to bring in Secure Launch functionality. Add Kconfig
>> options for compiling in/out the Secure Launch code.
>>
>> Signed-off-by: Ross Philipson 
> 
> Hi,
> from Documentation/process/coding-style.rst:
> 
> Lines under a ``config`` definition
> are indented with one tab, while help text is indented an additional two
> spaces.

Ok sorry about that. I probably just copied what the previous entry was
doing. Will fix.

Thanks
Ross

> 
>> ---
>>  arch/x86/Kconfig | 36 
>>  1 file changed, 36 insertions(+)
>>
>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>> index 7101ac6..8957981 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1968,6 +1968,42 @@ config EFI_MIXED
>>  
>> If unsure, say N.
>>  
>> +config SECURE_LAUNCH
>> +bool "Secure Launch support"
>> +default n
>> +depends on X86_64
>> +help
>> +   The Secure Launch feature allows a kernel to be loaded
>> +   directly through an Intel TXT measured launch. Intel TXT
>> +   establishes a Dynamic Root of Trust for Measurement (DRTM)
>> +   where the CPU measures the kernel image. This feature then
>> +   continues the measurement chain over kernel configuration
>> +   information and init images.
>> +
>> +choice
>> +prompt "Select Secure Launch Algorithm for TPM2"
>> +depends on SECURE_LAUNCH
>> +
>> +config SECURE_LAUNCH_SHA1
>> +bool "Secure Launch TPM1 SHA1"
>> +help
>> +   When using Secure Launch and TPM1 is present, use SHA1 hash
>> +   algorithm for measurements.
>> +
>> +config SECURE_LAUNCH_SHA256
>> +bool "Secure Launch TPM2 SHA256"
>> +help
>> +   When using Secure Launch and TPM2 is present, use SHA256 hash
>> +   algorithm for measurements.
>> +
>> +config SECURE_LAUNCH_SHA512
>> +bool "Secure Launch TPM2 SHA512"
>> +help
>> +   When using Secure Launch and TPM2 is present, use SHA512 hash
>> +   algorithm for measurements.
>> +
>> +endchoice
>> +
> 
> 
> thanks.
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Qian Cai
On Wed, 2020-08-26 at 13:16 +0200, Thomas Gleixner wrote:
> This is the second version of providing a base to support device MSI (non
> PCI based) and on top of that support for IMS (Interrupt Message Storm)
> based devices in a halfways architecture independent way.
> 
> The first version can be found here:
> 
> https://lore.kernel.org/r/20200821002424.119492...@linutronix.de
> 
> It's still a mixed bag of bug fixes, cleanups and general improvements
> which are worthwhile independent of device MSI.

Reverting the part of this patchset on the top of today's linux-next fixed an
boot issue on HPE ProLiant DL560 Gen10, i.e.,

$ git revert --no-edit 13b90cadfc29..bc95fd0d7c42

.config: https://gitlab.com/cailca/linux-mm/-/blob/master/x86.config

It looks like the crashes happen in the interrupt remapping code where they are
only able to to generate partial call traces.

[1.912386][T0] ACPI: X2APIC_NMI (uid[0xf5] high level 9983][T0] ... 
MAX_LOCK_DEPTH:  48
[7.914876][T0] ... MAX_LOCKDEP_KEYS:8192
[7.919942][T0] ... CLASSHASH_SIZE:  4096
[7.925009][T0] ... MAX_LOCKDEP_ENTRIES: 32768
[7.930163][T0] ... MAX_LOCKDEP_CHAINS:  65536
[7.935318][T0] ... CHAINHASH_SIZE:  32768
[7.940473][T0]  memory used by lock dependency info: 6301 kB
[7.946586][T0]  memory used for stack traces: 4224 kB
[7.952088][T0]  per task-struct memory footprint: 1920 bytes
[7.968312][T0] mempolicy: Enabling automatic NUMA balancing. Configure 
with numa_balancing= or the kernel.numa_balancing sysctl
[7.980281][T0] ACPI: Core revision 20200717
[7.993343][T0] clocksource: hpet: mask: 0x max_cycles: 
0x, max_idle_ns: 79635855245 ns
[8.003270][T0] APIC: Switch to symmetric I/O mode setup
[8.008951][T0] DMAR: Host address width 46
[8.013512][T0] DMAR: DRHD base: 0x00e5ffc000 flags: 0x0
[8.019680][T0] DMAR: dmar0: reg_base_addr e5ffc000 ver 1:0 cap 
8d2078c106f0466 [T0] DMAR-IR: IOAPIC id 15 under DRHD base  0xe5ffc000 
IOMMU 0
[8.420990][T0] DMAR-IR: IOAPIC id 8 under DRHD base  0xddffc000 IOMMU 15
[8.428166][T0] DMAR-IR: IOAPIC id 9 under DRHD base  0xddffc000 IOMMU 15
[8.435341][T0] DMAR-IR: HPET id 0 under DRHD base 0xddffc000
[8.441456][T0] DMAR-IR: Queued invalidation will be enabled to support 
x2apic and Intr-remapping.
[8.457911][T0] DMAR-IR: Enabled IRQ remapping in x2apic mode
[8.466614][T0] BUG: kernel NULL pointer dereference, address: 

[8.474295][T0] #PF: supervisor instruction fetch in kernel mode
[8.480669][T0] #PF: error_code(0x0010) - not-present page
[8.486518][T0] PGD 0 P4D 0 
[8.489757][T0] Oops: 0010 [#1] SMP KASAN PTI
[8.494476][T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I  
 5.9.0-rc6-next-20200925 #2
[8.503987][T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 
Gen10, BIOS U34 11/13/2019
[8.513238][T0] RIP: 0010:0x0
[8.516562][T0] Code: Bad RIP v

or

[2.906744][T0] ACPI: X2API32, address 0xfec68000, GSI 128-135
[2.907063][T0] IOAPIC[15]: apic_id 29, version 32, address 0xfec7, 
GSI 136-143
[2.907071][T0] IOAPIC[16]: apic_id 30, version 32, address 0xfec78000, 
GSI 144-151
[2.907079][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[2.907084][T0] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high 
level)
[2.907100][T0] Using ACPI (MADT) for SMP configuration information
[2.907105][T0] ACPI: HPET id: 0x8086a701 base: 0xfed0
[2.907116][T0] ACPI: SPCR: console: uart,mmio,0x0,115200
[2.907121][T0] TSC deadline timer available
[2.907126][T0] smpboot: Allowing 144 CPUs, 0 hotplug CPUs
[2.907163][T0] [mem 0xd000-0xfdff] available for PCI devices
[2.907175][T0] clocksource: refined-jiffies: mask: 0x 
max_cycles: 0x, max_idle_ns: 1911260446275 ns
[2.914541][T0] setup_percpu: NR_CPUS:256 nr_cpumask_bits:144 
nr_cpu_ids:144 nr_node_ids:4
[2.926109][   466 ecap f020df
[9.134709][T0] DMAR: DRHD base: 0x00f5ffc000 flags: 0x0
[9.140867][T0] DMAR: dmar8: reg_base_addr f5ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.149610][T0] DMAR: DRHD base: 0x00f7ffc000 flags: 0x0
[9.155762][T0] DMAR: dmar9: reg_base_addr f7ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.164491][T0] DMAR: DRHD base: 0x00f9ffc000 flags: 0x0
[9.170645][T0] DMAR: dmar10: reg_base_addr f9ffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.179476][T0] DMAR: DRHD base: 0x00fbffc000 flags: 0x0
[9.185626][T0] DMAR: dmar11: reg_base_addr fbffc000 ver 1:0 cap 
8d2078c106f0466 ecap f020df
[9.194442][T0] DMAR: DRHD base: 0x00dfffc000 flags:

Re: [PATCH v3 5/6] iommu/virtio: Support topology description in config space

2020-09-25 Thread Bjorn Helgaas
On Fri, Sep 25, 2020 at 10:12:43AM +0200, Jean-Philippe Brucker wrote:
> On Thu, Sep 24, 2020 at 10:22:03AM -0500, Bjorn Helgaas wrote:
> > On Fri, Aug 21, 2020 at 03:15:39PM +0200, Jean-Philippe Brucker wrote:

> > > + /* Perform the init sequence before we can read the config */
> > > + ret = viommu_pci_reset(common_cfg);
> > 
> > I guess this is some special device-specific reset, not any kind of
> > standard PCI reset?
> 
> Yes it's the virtio reset - writing 0 to the status register in the BAR.

I wonder if this should be named something like viommu_virtio_reset(),
so there's no confusion with PCI resets and all the timing
restrictions, config space restoration, etc. associated with them.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Peter Zijlstra
On Fri, Sep 25, 2020 at 11:29:13AM -0400, Qian Cai wrote:

> It looks like the crashes happen in the interrupt remapping code where they 
> are
> only able to to generate partial call traces.

> [8.466614][T0] BUG: kernel NULL pointer dereference, address: 
> 
> [8.474295][T0] #PF: supervisor instruction fetch in kernel mode
> [8.480669][T0] #PF: error_code(0x0010) - not-present page
> [8.486518][T0] PGD 0 P4D 0 
> [8.489757][T0] Oops: 0010 [#1] SMP KASAN PTI
> [8.494476][T0] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G  I    
>5.9.0-rc6-next-20200925 #2
> [8.503987][T0] Hardware name: HPE ProLiant DL560 Gen10/ProLiant DL560 
> Gen10, BIOS U34 11/13/2019
> [8.513238][T0] RIP: 0010:0x0
> [8.516562][T0] Code: Bad RIP v

Here it looks like this:

[1.830276] BUG: kernel NULL pointer dereference, address: 
[1.838043] #PF: supervisor instruction fetch in kernel mode
[1.844357] #PF: error_code(0x0010) - not-present page
[1.850090] PGD 0 P4D 0
[1.852915] Oops: 0010 [#1] SMP
[1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
5.9.0-rc6-00700-g0248dedd12d4 #419
[1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
SE5C600.86B.02.02.0002.122320131210 12/23/2013
[1.876902] RIP: 0010:0x0
[1.879824] Code: Bad RIP value.
[1.883423] RSP: :82803da0 EFLAGS: 00010282
[1.889251] RAX:  RBX: 8282b980 RCX: 82803e40
[1.897241] RDX: 0001 RSI: 82803e40 RDI: 8282b980
[1.905201] RBP: 88842f331000 R08:  R09: 0001
[1.913162] R10: 0001 R11:  R12: 0048
[1.921123] R13: 82803e40 R14: 8282b9c0 R15: 
[1.929085] FS:  () GS:88842f40() 
knlGS:
[1.938113] CS:  0010 DS:  ES:  CR0: 80050033
[1.944524] CR2: ffd6 CR3: 02811001 CR4: 000606b0
[1.952484] Call Trace:
[1.955214]  msi_domain_alloc+0x36/0x130
[1.959594]  __irq_domain_alloc_irqs+0x165/0x380
[1.964748]  dmar_alloc_hwirq+0x9a/0x120
[1.969127]  dmar_set_interrupt.part.0+0x1c/0x60
[1.974281]  enable_drhd_fault_handling+0x2c/0x6c
[1.979532]  apic_intr_mode_init+0xfa/0x100
[1.984191]  x86_late_time_init+0x20/0x30
[1.988662]  start_kernel+0x723/0x7e6
[1.992748]  secondary_startup_64_no_verify+0xa6/0xab
[1.998386] Modules linked in:
[2.001794] CR2: 
[2.005510] ---[ end trace 837dc60d7c66efa2 ]---

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v11 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-25 Thread Jacob Pan
Hi Jean-Philippe,

On Fri, 25 Sep 2020 11:46:36 +0200, Jean-Philippe Brucker
 wrote:

> On Thu, Sep 24, 2020 at 12:24:19PM -0700, Jacob Pan wrote:
> > IOMMU user APIs are responsible for processing user data. This patch
> > changes the interface such that user pointers can be passed into IOMMU
> > code directly. Separate kernel APIs without user pointers are introduced
> > for in-kernel users of the UAPI functionality.
> > 
> > IOMMU UAPI data has a user filled argsz field which indicates the data
> > length of the structure. User data is not trusted, argsz must be
> > validated based on the current kernel data size, mandatory data size,
> > and feature flags.
> > 
> > User data may also be extended, resulting in possible argsz increase.
> > Backward compatibility is ensured based on size and flags (or
> > the functional equivalent fields) checking.
> > 
> > This patch adds sanity checks in the IOMMU layer. In addition to argsz,
> > reserved/unused fields in padding, flags, and version are also checked.
> > Details are documented in Documentation/userspace-api/iommu.rst
> > 
> > Signed-off-by: Liu Yi L 
> > Signed-off-by: Jacob Pan   
> 
> Reviewed-by: Jean-Philippe Brucker 
> 
> Some comments below in case you're resending, but nothing important.
> 
Thanks for the review, I will respin.

> > ---
> >  drivers/iommu/iommu.c  | 199
> > +++--
> > include/linux/iommu.h  |  28 +-- include/uapi/linux/iommu.h |
> > 1 + 3 files changed, 212 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> > index 4ae02291ccc2..5c1b7ae48aae 100644
> > --- a/drivers/iommu/iommu.c
> > +++ b/drivers/iommu/iommu.c
> > @@ -1961,34 +1961,219 @@ int iommu_attach_device(struct iommu_domain
> > *domain, struct device *dev) }
> >  EXPORT_SYMBOL_GPL(iommu_attach_device);
> >  
> > +/*
> > + * Check flags and other user provided data for valid combinations. We
> > also
> > + * make sure no reserved fields or unused flags are set. This is to
> > ensure
> > + * not breaking userspace in the future when these fields or flags are
> > used.
> > + */
> > +static int iommu_check_cache_invl_data(struct
> > iommu_cache_invalidate_info *info) +{
> > +   u32 mask;
> > +   int i;
> > +
> > +   if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
> > +   return -EINVAL;
> > +
> > +   mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
> > +   if (info->cache & ~mask)
> > +   return -EINVAL;
> > +
> > +   if (info->granularity >= IOMMU_INV_GRANU_NR)
> > +   return -EINVAL;
> > +
> > +   switch (info->granularity) {
> > +   case IOMMU_INV_GRANU_ADDR:
> > +   if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
> > +   return -EINVAL;
> > +
> > +   mask = IOMMU_INV_ADDR_FLAGS_PASID |
> > +   IOMMU_INV_ADDR_FLAGS_ARCHID |
> > +   IOMMU_INV_ADDR_FLAGS_LEAF;
> > +
> > +   if (info->granu.addr_info.flags & ~mask)
> > +   return -EINVAL;
> > +   break;
> > +   case IOMMU_INV_GRANU_PASID:
> > +   mask = IOMMU_INV_PASID_FLAGS_PASID |
> > +   IOMMU_INV_PASID_FLAGS_ARCHID;
> > +   if (info->granu.pasid_info.flags & ~mask)
> > +   return -EINVAL;
> > +
> > +   break;
> > +   case IOMMU_INV_GRANU_DOMAIN:
> > +   if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
> > +   return -EINVAL;
> > +   break;
> > +   default:
> > +   return -EINVAL;
> > +   }
> > +
> > +   /* Check reserved padding fields */
> > +   for (i = 0; i < sizeof(info->padding); i++) {
> > +   if (info->padding[i])
> > +   return -EINVAL;
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> >  int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct
> > device *dev,
> > -   struct iommu_cache_invalidate_info
> > *inv_info)
> > +   void __user *uinfo)
> >  {
> > +   struct iommu_cache_invalidate_info inv_info = { 0 };
> > +   u32 minsz;
> > +   int ret = 0;  
> 
> nit: no need to initialize it
> 
got it.

> > +
> > if (unlikely(!domain->ops->cache_invalidate))
> > return -ENODEV;
> >  
> > -   return domain->ops->cache_invalidate(domain, dev, inv_info);
> > +   /*
> > +* No new spaces can be added before the variable sized union,
> > the
> > +* minimum size is the offset to the union.
> > +*/
> > +   minsz = offsetof(struct iommu_cache_invalidate_info, granu);  
> 
> Why not use offsetofend() to avoid naming the unions?
> 
offsetofend() was used in earlier version but the named union would avoid
future code change if we were to re-purpose the padding fields.
minzs is always at the offsetof the union due to our expansion rules.

> > +
> > +   /* Copy minsz from user to get flags and argsz */
> > +   if (copy_from_user(&inv_info, uinfo, minsz))
> > +   return -EFAULT;
> > +
> > +   /* Fiel

Re: [PATCH 1/2] iommu/iova: Flush CPU rcache for when a depot fills

2020-09-25 Thread John Garry

On 25/09/2020 15:34, John Garry wrote:

Indeed, I think that the mainline code has a bug:

If the initial allocation for the loaded/prev magazines fail (give NULL) 
in init_iova_rcaches(), then in __iova_rcache_insert():


if (!iova_magazine_full(cpu_rcache->loaded)) {
 can_insert = true;

If cpu_rcache->loaded == NULL, then can_insert is assigned true -> bang, 
as I experimented, below. This needs to be fixed...




This looks better:

Subject: [PATCH] iommu/iova: Avoid double-negatives with magazine helpers

Expression !iova_magazine_full(mag) evaluates true when mag == NULL.

This falls over in __iova_rcache_insert() when loaded == NULL:

if (!iova_magazine_full(cpu_rcache->loaded)) {
can_insert = true;

...

if (can_insert)
iova_magazine_push(cpu_rcache->loaded, iova_pfn);

Here, can_insert is evaluated true, which is wrong. Members
loaded/prev can possibly be NULL if the initial allocations fail in
__iova_rcache_insert().

Let's stop using double-negatives, like !iova_magazine_full(), and use
iova_magazine_has_space() instead in this case. And similar for
!iova_magazine_empty().

Signed-off-by: John Garry 

diff --git a/drivers/iommu/iova.c b/drivers/iommu/iova.c
index 5b4ffab7140b..42ca9d0f39b7 100644
--- a/drivers/iommu/iova.c
+++ b/drivers/iommu/iova.c
@@ -827,14 +827,18 @@ iova_magazine_free_pfns(struct iova_magazine *mag, 
struct iova_domain *iovad)

mag->size = 0;
 }

-static bool iova_magazine_full(struct iova_magazine *mag)
+static bool iova_magazine_has_space(struct iova_magazine *mag)
 {
-   return (mag && mag->size == IOVA_MAG_SIZE);
+   if (!mag)
+   return false;
+   return mag->size < IOVA_MAG_SIZE;
 }

-static bool iova_magazine_empty(struct iova_magazine *mag)
+static bool iova_magazine_has_pfns(struct iova_magazine *mag)
 {
-   return (!mag || mag->size == 0);
+   if (!mag)
+   return false;
+   return mag->size;
 }

 static unsigned long iova_magazine_pop(struct iova_magazine *mag,
@@ -843,7 +847,7 @@ static unsigned long iova_magazine_pop(struct 
iova_magazine *mag,

int i;
unsigned long pfn;

-   BUG_ON(iova_magazine_empty(mag));
+   BUG_ON(!iova_magazine_has_pfns(mag));

/* Only fall back to the rbtree if we have no suitable pfns at all */
for (i = mag->size - 1; mag->pfns[i] > limit_pfn; i--)
@@ -859,7 +863,7 @@ static unsigned long iova_magazine_pop(struct 
iova_magazine *mag,


 static void iova_magazine_push(struct iova_magazine *mag, unsigned 
long pfn)

 {
-   BUG_ON(iova_magazine_full(mag));
+   BUG_ON(!iova_magazine_has_space(mag));

mag->pfns[mag->size++] = pfn;
 }
@@ -905,9 +909,9 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,

cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(&cpu_rcache->lock, flags);

-   if (!iova_magazine_full(cpu_rcache->loaded)) {
+   if (iova_magazine_has_space(cpu_rcache->loaded)) {
can_insert = true;
-   } else if (!iova_magazine_full(cpu_rcache->prev)) {
+   } else if (iova_magazine_has_space(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
can_insert = true;
} else {
@@ -915,7 +919,8 @@ static bool __iova_rcache_insert(struct iova_domain 
*iovad,


if (new_mag) {
spin_lock(&rcache->lock);
-   if (rcache->depot_size < MAX_GLOBAL_MAGS) {
+   if (rcache->depot_size < MAX_GLOBAL_MAGS &&
+   cpu_rcache->loaded) {
rcache->depot[rcache->depot_size++] =
cpu_rcache->loaded;
} else {
@@ -968,9 +973,9 @@ static unsigned long __iova_rcache_get(struct 
iova_rcache *rcache,

cpu_rcache = raw_cpu_ptr(rcache->cpu_rcaches);
spin_lock_irqsave(&cpu_rcache->lock, flags);

-   if (!iova_magazine_empty(cpu_rcache->loaded)) {
+   if (iova_magazine_has_pfns(cpu_rcache->loaded)) {
has_pfn = true;
-   } else if (!iova_magazine_empty(cpu_rcache->prev)) {
+   } else if (iova_magazine_has_pfns(cpu_rcache->prev)) {
swap(cpu_rcache->prev, cpu_rcache->loaded);
has_pfn = true;
} else {
--
2.26.2

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 08/18] dma-mapping: add a new dma_alloc_noncoherent API

2020-09-25 Thread Christoph Hellwig
On Fri, Sep 25, 2020 at 12:15:37PM +0100, Robin Murphy wrote:
> On 2020-09-15 16:51, Christoph Hellwig wrote:
> [...]
>> +These APIs allow to allocate pages in the kernel direct mapping that are
>> +guaranteed to be DMA addressable.  This means that unlike 
>> dma_alloc_coherent,
>> +virt_to_page can be called on the resulting address, and the resulting
>
> Nit: if we explicitly describe this as if it's a guarantee that can be 
> relied upon...
>
>> +struct page can be used for everything a struct page is suitable for.
>
> [...]
>> +This routine allocates a region of  bytes of consistent memory.  It
>> +returns a pointer to the allocated region (in the processor's virtual 
>> address
>> +space) or NULL if the allocation failed.  The returned memory may or may not
>> +be in the kernels direct mapping.  Drivers must not call virt_to_page on
>> +the returned memory region.
>
> ...then forbid this document's target audience from relying on it, 
> something seems off. At the very least it's unhelpfully unclear :/
>
> Given patch #17, I suspect that the first paragraph is the one that's no 
> longer true.

Yes.  dma_alloc_pages is the replacement for allocations that need the
direct mapping.  I'll send a patch to document dma_alloc_pages and
fixes this up
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/8] dma-mapping: add DMA_ATTR_LOW_ADDRESS attribute

2020-09-25 Thread Christoph Hellwig
>  #define DMA_ATTR_PRIVILEGED  (1UL << 9)
> +/*
> + * DMA_ATTR_LOW_ADDRESS: used to indicate that the buffer should be allocated
> + * at the lowest possible DMA address, usually just at the beginning of the
> + * DMA/IOVA address space ('first-fit' allocation algorithm).
> + */
> +#define DMA_ATTR_LOW_ADDRESS (1UL << 10)

I think we need better comments explaining that this is best effort
and only applies to DMA API implementations that actually have an
allocatable IOVA space.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 2/6] iommu/uapi: Add argsz for user filled data

2020-09-25 Thread Jacob Pan
As IOMMU UAPI gets extended, user data size may increase. To support
backward compatibiliy, this patch introduces a size field to each UAPI
data structures. It is *always* the responsibility for the user to fill in
the correct size. Padding fields are adjusted to ensure 8 byte alignment.

Specific scenarios for user data handling are documented in:
Documentation/userspace-api/iommu.rst

As there is no current users of the API, struct version is not
incremented.

Reviewed-by: Eric Auger 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 include/uapi/linux/iommu.h | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index c2b2caf9ed41..b42acc8fe007 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -139,6 +139,7 @@ enum iommu_page_response_code {
 
 /**
  * struct iommu_page_response - Generic page response information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @flags: encodes whether the corresponding fields are valid
  * (IOMMU_FAULT_PAGE_RESPONSE_* values)
@@ -147,6 +148,7 @@ enum iommu_page_response_code {
  * @code: response code from &enum iommu_page_response_code
  */
 struct iommu_page_response {
+   __u32   argsz;
 #define IOMMU_PAGE_RESP_VERSION_1  1
__u32   version;
 #define IOMMU_PAGE_RESP_PASID_VALID(1 << 0)
@@ -222,6 +224,7 @@ struct iommu_inv_pasid_info {
 /**
  * struct iommu_cache_invalidate_info - First level/stage invalidation
  * information
+ * @argsz: User filled size of this data
  * @version: API version of this structure
  * @cache: bitfield that allows to select which caches to invalidate
  * @granularity: defines the lowest granularity used for the invalidation:
@@ -250,6 +253,7 @@ struct iommu_inv_pasid_info {
  * must support the used granularity.
  */
 struct iommu_cache_invalidate_info {
+   __u32   argsz;
 #define IOMMU_CACHE_INVALIDATE_INFO_VERSION_1 1
__u32   version;
 /* IOMMU paging structure cache */
@@ -259,7 +263,7 @@ struct iommu_cache_invalidate_info {
 #define IOMMU_CACHE_INV_TYPE_NR(3)
__u8cache;
__u8granularity;
-   __u8padding[2];
+   __u8padding[6];
union {
struct iommu_inv_pasid_info pasid_info;
struct iommu_inv_addr_info addr_info;
@@ -296,6 +300,7 @@ struct iommu_gpasid_bind_data_vtd {
 
 /**
  * struct iommu_gpasid_bind_data - Information about device and guest PASID 
binding
+ * @argsz: User filled size of this data
  * @version:   Version of this data structure
  * @format:PASID table entry format
  * @flags: Additional information on guest bind request
@@ -313,17 +318,18 @@ struct iommu_gpasid_bind_data_vtd {
  * PASID to host PASID based on this bind data.
  */
 struct iommu_gpasid_bind_data {
+   __u32 argsz;
 #define IOMMU_GPASID_BIND_VERSION_11
__u32 version;
 #define IOMMU_PASID_FORMAT_INTEL_VTD   1
__u32 format;
+   __u32 addr_width;
 #define IOMMU_SVA_GPASID_VAL   (1 << 0) /* guest PASID valid */
__u64 flags;
__u64 gpgd;
__u64 hpasid;
__u64 gpasid;
-   __u32 addr_width;
-   __u8  padding[12];
+   __u8  padding[8];
/* Vendor specific data */
union {
struct iommu_gpasid_bind_data_vtd vtd;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 6/6] iommu/vt-d: Check UAPI data processed by IOMMU core

2020-09-25 Thread Jacob Pan
IOMMU generic layer already does sanity checks on UAPI data for version
match and argsz range based on generic information.

This patch adjusts the following data checking responsibilities:
- removes the redundant version check from VT-d driver
- removes the check for vendor specific data size
- adds check for the use of reserved/undefined flags

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/iommu.c |  3 +--
 drivers/iommu/intel/svm.c   | 11 +--
 include/uapi/linux/iommu.h  |  1 +
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 461f3a6864d4..18ed3b3c70d7 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5408,8 +5408,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
int ret = 0;
u64 size = 0;
 
-   if (!inv_info || !dmar_domain ||
-   inv_info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   if (!inv_info || !dmar_domain)
return -EINVAL;
 
if (!dev || !dev_is_pci(dev))
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 99353d6468fa..0cb9a15f1112 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -284,8 +284,15 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
if (WARN_ON(!iommu) || !data)
return -EINVAL;
 
-   if (data->version != IOMMU_GPASID_BIND_VERSION_1 ||
-   data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   if (data->format != IOMMU_PASID_FORMAT_INTEL_VTD)
+   return -EINVAL;
+
+   /* IOMMU core ensures argsz is more than the start of the union */
+   if (data->argsz < offsetofend(struct iommu_gpasid_bind_data, 
vendor.vtd))
+   return -EINVAL;
+
+   /* Make sure no undefined flags are used in vendor data */
+   if (data->vendor.vtd.flags & ~(IOMMU_SVA_VTD_GPASID_LAST - 1))
return -EINVAL;
 
if (!dev_is_pci(dev))
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 66d4ca40b40f..e1d9e75f2c94 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,6 +288,7 @@ struct iommu_gpasid_bind_data_vtd {
 #define IOMMU_SVA_VTD_GPASID_PWT   (1 << 3) /* page-level write through */
 #define IOMMU_SVA_VTD_GPASID_EMTE  (1 << 4) /* extended mem type enable */
 #define IOMMU_SVA_VTD_GPASID_CD(1 << 5) /* PASID-level cache 
disable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
__u64 flags;
__u32 pat;
__u32 emt;
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 0/6] IOMMU user API enhancement

2020-09-25 Thread Jacob Pan
IOMMU user API header was introduced to support nested DMA translation and
related fault handling. The current UAPI data structures consist of three
areas that cover the interactions between host kernel and guest:
 - fault handling
 - cache invalidation
 - bind guest page tables, i.e. guest PASID

Future extensions are likely to support more architectures and vIOMMU features.

In the previous discussion, using user-filled data size and feature flags is
made a preferred approach over a unified version number.
https://lkml.org/lkml/2020/1/29/45

In addition to introduce argsz field to data structures, this patchset is also
trying to document the UAPI design, usage, and extension rules. VT-d driver
changes to utilize the new argsz field is included, VFIO usage is to follow.

This set is available at:
https://github.com/jacobpan/linux.git vsva_v5.9_uapi_v12

Thanks,

Jacob


Changelog:
v12
- Removed a redundant check in cache invalidate API
v11
- Use #define instead of enum in PASID data format, squashed change
  into "iommu/uapi: Handle data and argsz filled by users"
- Remove alloc/free from documentation per Yi's comment. IOMMU UAPI
  does not perform IOASID alloc/free.
v10
- Documentation grammar fixes based on Randy's review
v9
- Directly pass PASID value to iommu_sva_unbind_gpasid() without
  the superfluous data in struct iommu_gpasid_bind_data.
v8
- Rebased to v5.9-rc2
- Addressed review comments from Eric Auger
  1. added a check for the unused vendor flags
  2. commit message improvements
v7
- Added PASID data format enum for range checking
- Tidy up based on reviews from Alex W.
- Removed doc section for vIOMMU fault handling
v6
- Renamed all UAPI functions with iommu_uapi_ prefix
- Replaced argsz maxsz checking with flag specific size checks
- Documentation improvements based on suggestions by Eric Auger
  Replaced example code with a pointer to the actual code
- Added more checks for illegal flags combinations
- Added doc file to MAINTAINERS
v5
- Addjusted paddings in UAPI data to be 8 byte aligned
- Do not clobber argsz in IOMMU core before passing on to vendor driver
- Removed pr_warn_ for invalid UAPI data check, just return -EINVAL
- Clarified VFIO responsibility in UAPI data handling
- Use iommu_uapi prefix to differentiate APIs has in-kernel caller
- Added comment for unchecked flags of invalidation granularity
- Added example in doc to show vendor data checking

v4
- Added checks of UAPI data for reserved fields, version, and flags.
- Removed version check from vendor driver (vt-d)
- Relaxed argsz check to match the UAPI struct size instead of variable
  union size
- Updated documentation

v3:
- Rewrote backward compatibility rule to support existing code
  re-compiled with newer kernel UAPI header that runs on older
  kernel. Based on review comment from Alex W.
  https://lore.kernel.org/linux-iommu/20200611094741.6d118...@w520.home/
- Take user pointer directly in UAPI functions. Perform argsz check
  and copy_from_user() in IOMMU driver. Eliminate the need for
  VFIO or other upper layer to parse IOMMU data.
- Create wrapper function for in-kernel users of UAPI functions
v2:
- Removed unified API version and helper
- Introduced argsz for each UAPI data
- Introduced UAPI doc


Jacob Pan (6):
  docs: IOMMU user API
  iommu/uapi: Add argsz for user filled data
  iommu/uapi: Use named union for user data
  iommu/uapi: Rename uapi functions
  iommu/uapi: Handle data and argsz filled by users
  iommu/vt-d: Check UAPI data processed by IOMMU core

 Documentation/userspace-api/iommu.rst | 209 ++
 MAINTAINERS   |   1 +
 drivers/iommu/intel/iommu.c   |  25 ++--
 drivers/iommu/intel/svm.c |  13 ++-
 drivers/iommu/iommu.c | 196 +--
 include/linux/iommu.h |  35 --
 include/uapi/linux/iommu.h|  18 ++-
 7 files changed, 456 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/userspace-api/iommu.rst

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v12 5/6] iommu/uapi: Handle data and argsz filled by users

2020-09-25 Thread Jacob Pan
IOMMU user APIs are responsible for processing user data. This patch
changes the interface such that user pointers can be passed into IOMMU
code directly. Separate kernel APIs without user pointers are introduced
for in-kernel users of the UAPI functionality.

IOMMU UAPI data has a user filled argsz field which indicates the data
length of the structure. User data is not trusted, argsz must be
validated based on the current kernel data size, mandatory data size,
and feature flags.

User data may also be extended, resulting in possible argsz increase.
Backward compatibility is ensured based on size and flags (or
the functional equivalent fields) checking.

This patch adds sanity checks in the IOMMU layer. In addition to argsz,
reserved/unused fields in padding, flags, and version are also checked.
Details are documented in Documentation/userspace-api/iommu.rst

Reviewed-by: Jean-Philippe Brucker 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c  | 194 +++--
 include/linux/iommu.h  |  28 ---
 include/uapi/linux/iommu.h |   1 +
 3 files changed, 207 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 4ae02291ccc2..a11f2733dc54 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1961,34 +1961,214 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
+/*
+ * Check flags and other user provided data for valid combinations. We also
+ * make sure no reserved fields or unused flags are set. This is to ensure
+ * not breaking userspace in the future when these fields or flags are used.
+ */
+static int iommu_check_cache_invl_data(struct iommu_cache_invalidate_info 
*info)
+{
+   u32 mask;
+   int i;
+
+   if (info->version != IOMMU_CACHE_INVALIDATE_INFO_VERSION_1)
+   return -EINVAL;
+
+   mask = (1 << IOMMU_CACHE_INV_TYPE_NR) - 1;
+   if (info->cache & ~mask)
+   return -EINVAL;
+
+   if (info->granularity >= IOMMU_INV_GRANU_NR)
+   return -EINVAL;
+
+   switch (info->granularity) {
+   case IOMMU_INV_GRANU_ADDR:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_PASID)
+   return -EINVAL;
+
+   mask = IOMMU_INV_ADDR_FLAGS_PASID |
+   IOMMU_INV_ADDR_FLAGS_ARCHID |
+   IOMMU_INV_ADDR_FLAGS_LEAF;
+
+   if (info->granu.addr_info.flags & ~mask)
+   return -EINVAL;
+   break;
+   case IOMMU_INV_GRANU_PASID:
+   mask = IOMMU_INV_PASID_FLAGS_PASID |
+   IOMMU_INV_PASID_FLAGS_ARCHID;
+   if (info->granu.pasid_info.flags & ~mask)
+   return -EINVAL;
+
+   break;
+   case IOMMU_INV_GRANU_DOMAIN:
+   if (info->cache & IOMMU_CACHE_INV_TYPE_DEV_IOTLB)
+   return -EINVAL;
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   /* Check reserved padding fields */
+   for (i = 0; i < sizeof(info->padding); i++) {
+   if (info->padding[i])
+   return -EINVAL;
+   }
+
+   return 0;
+}
+
 int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
-   struct iommu_cache_invalidate_info *inv_info)
+   void __user *uinfo)
 {
+   struct iommu_cache_invalidate_info inv_info = { 0 };
+   u32 minsz;
+   int ret;
+
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
-   return domain->ops->cache_invalidate(domain, dev, inv_info);
+   /*
+* No new spaces can be added before the variable sized union, the
+* minimum size is the offset to the union.
+*/
+   minsz = offsetof(struct iommu_cache_invalidate_info, granu);
+
+   /* Copy minsz from user to get flags and argsz */
+   if (copy_from_user(&inv_info, uinfo, minsz))
+   return -EFAULT;
+
+   /* Fields before the variable size union are mandatory */
+   if (inv_info.argsz < minsz)
+   return -EINVAL;
+
+   /* PASID and address granu require additional info beyond minsz */
+   if (inv_info.granularity == IOMMU_INV_GRANU_PASID &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.pasid_info))
+   return -EINVAL;
+
+   if (inv_info.granularity == IOMMU_INV_GRANU_ADDR &&
+   inv_info.argsz < offsetofend(struct iommu_cache_invalidate_info, 
granu.addr_info))
+   return -EINVAL;
+
+   /*
+* User might be using a newer UAPI header which has a larger data
+* size, we shall support the existing flags within the current
+* size. Copy the remaining user data _after_ minsz but not more
+* than the current kernel

[PATCH v12 4/6] iommu/uapi: Rename uapi functions

2020-09-25 Thread Jacob Pan
User APIs such as iommu_sva_unbind_gpasid() may also be used by the
kernel. Since we introduced user pointer to the UAPI functions,
in-kernel callers cannot share the same APIs. In-kernel callers are also
trusted, there is no need to validate the data.

We plan to have two flavors of the same API functions, one called
through ioctls, carrying a user pointer and one called directly with
valid IOMMU UAPI structs. To differentiate both, let's rename existing
functions with an iommu_uapi_ prefix.

Suggested-by: Alex Williamson 
Reviewed-by: Eric Auger 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/iommu.c | 18 +-
 include/linux/iommu.h | 31 ---
 2 files changed, 25 insertions(+), 24 deletions(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 609bd25bf154..4ae02291ccc2 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -1961,35 +1961,35 @@ int iommu_attach_device(struct iommu_domain *domain, 
struct device *dev)
 }
 EXPORT_SYMBOL_GPL(iommu_attach_device);
 
-int iommu_cache_invalidate(struct iommu_domain *domain, struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+int iommu_uapi_cache_invalidate(struct iommu_domain *domain, struct device 
*dev,
+   struct iommu_cache_invalidate_info *inv_info)
 {
if (unlikely(!domain->ops->cache_invalidate))
return -ENODEV;
 
return domain->ops->cache_invalidate(domain, dev, inv_info);
 }
-EXPORT_SYMBOL_GPL(iommu_cache_invalidate);
+EXPORT_SYMBOL_GPL(iommu_uapi_cache_invalidate);
 
-int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-  struct device *dev, struct iommu_gpasid_bind_data 
*data)
+int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+  struct device *dev, struct 
iommu_gpasid_bind_data *data)
 {
if (unlikely(!domain->ops->sva_bind_gpasid))
return -ENODEV;
 
return domain->ops->sva_bind_gpasid(domain, dev, data);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_bind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_bind_gpasid);
 
-int iommu_sva_unbind_gpasid(struct iommu_domain *domain, struct device *dev,
-ioasid_t pasid)
+int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain, struct device 
*dev,
+ioasid_t pasid)
 {
if (unlikely(!domain->ops->sva_unbind_gpasid))
return -ENODEV;
 
return domain->ops->sva_unbind_gpasid(dev, pasid);
 }
-EXPORT_SYMBOL_GPL(iommu_sva_unbind_gpasid);
+EXPORT_SYMBOL_GPL(iommu_uapi_sva_unbind_gpasid);
 
 static void __iommu_detach_device(struct iommu_domain *domain,
  struct device *dev)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h
index fee209efb756..710d5d2691eb 100644
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -424,13 +424,13 @@ extern int iommu_attach_device(struct iommu_domain 
*domain,
   struct device *dev);
 extern void iommu_detach_device(struct iommu_domain *domain,
struct device *dev);
-extern int iommu_cache_invalidate(struct iommu_domain *domain,
- struct device *dev,
- struct iommu_cache_invalidate_info *inv_info);
-extern int iommu_sva_bind_gpasid(struct iommu_domain *domain,
-   struct device *dev, struct iommu_gpasid_bind_data *data);
-extern int iommu_sva_unbind_gpasid(struct iommu_domain *domain,
-   struct device *dev, ioasid_t pasid);
+extern int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+  struct device *dev,
+  struct iommu_cache_invalidate_info 
*inv_info);
+extern int iommu_uapi_sva_bind_gpasid(struct iommu_domain *domain,
+ struct device *dev, struct 
iommu_gpasid_bind_data *data);
+extern int iommu_uapi_sva_unbind_gpasid(struct iommu_domain *domain,
+   struct device *dev, ioasid_t pasid);
 extern struct iommu_domain *iommu_get_domain_for_dev(struct device *dev);
 extern struct iommu_domain *iommu_get_dma_domain(struct device *dev);
 extern int iommu_map(struct iommu_domain *domain, unsigned long iova,
@@ -1032,21 +1032,22 @@ static inline int iommu_sva_get_pasid(struct iommu_sva 
*handle)
return IOMMU_PASID_INVALID;
 }
 
-static inline int
-iommu_cache_invalidate(struct iommu_domain *domain,
-  struct device *dev,
-  struct iommu_cache_invalidate_info *inv_info)
+static inline int iommu_uapi_cache_invalidate(struct iommu_domain *domain,
+ struct device *dev,
+ struct 
iommu_cache_invalidate_info *inv_info)
 {
return -ENODEV;
 }
-static inline int iom

[PATCH v12 3/6] iommu/uapi: Use named union for user data

2020-09-25 Thread Jacob Pan
IOMMU UAPI data size is filled by the user space which must be validated
by the kernel. To ensure backward compatibility, user data can only be
extended by either re-purpose padding bytes or extend the variable sized
union at the end. No size change is allowed before the union. Therefore,
the minimum size is the offset of the union.

To use offsetof() on the union, we must make it named.

Link: https://lore.kernel.org/linux-iommu/20200611145518.0c281...@x1.home/
Signed-off-by: Jacob Pan 
Reviewed-by: Lu Baolu 
Reviewed-by: Eric Auger 
---
 drivers/iommu/intel/iommu.c | 22 +++---
 drivers/iommu/intel/svm.c   |  2 +-
 include/uapi/linux/iommu.h  |  4 ++--
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 87b17bac04c2..461f3a6864d4 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5434,8 +5434,8 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 
/* Size is only valid in address selective invalidation */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR)
-   size = to_vtd_size(inv_info->addr_info.granule_size,
-  inv_info->addr_info.nb_granules);
+   size = to_vtd_size(inv_info->granu.addr_info.granule_size,
+  inv_info->granu.addr_info.nb_granules);
 
for_each_set_bit(cache_type,
 (unsigned long *)&inv_info->cache,
@@ -5456,20 +5456,20 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * granularity.
 */
if (inv_info->granularity == IOMMU_INV_GRANU_PASID &&
-   (inv_info->pasid_info.flags & IOMMU_INV_PASID_FLAGS_PASID))
-   pasid = inv_info->pasid_info.pasid;
+   (inv_info->granu.pasid_info.flags & 
IOMMU_INV_PASID_FLAGS_PASID))
+   pasid = inv_info->granu.pasid_info.pasid;
else if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
-(inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
-   pasid = inv_info->addr_info.pasid;
+(inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_PASID))
+   pasid = inv_info->granu.addr_info.pasid;
 
switch (BIT(cache_type)) {
case IOMMU_CACHE_INV_TYPE_IOTLB:
/* HW will ignore LSB bits based on address mask */
if (inv_info->granularity == IOMMU_INV_GRANU_ADDR &&
size &&
-   (inv_info->addr_info.addr & ((BIT(VTD_PAGE_SHIFT + 
size)) - 1))) {
+   (inv_info->granu.addr_info.addr & 
((BIT(VTD_PAGE_SHIFT + size)) - 1))) {
pr_err_ratelimited("User address not aligned, 
0x%llx, size order %llu\n",
-  inv_info->addr_info.addr, 
size);
+  
inv_info->granu.addr_info.addr, size);
}
 
/*
@@ -5477,9 +5477,9 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
 * We use npages = -1 to indicate that.
 */
qi_flush_piotlb(iommu, did, pasid,
-   mm_to_dma_pfn(inv_info->addr_info.addr),
+   
mm_to_dma_pfn(inv_info->granu.addr_info.addr),
(granu == QI_GRAN_NONG_PASID) ? -1 : 1 
<< size,
-   inv_info->addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
+   inv_info->granu.addr_info.flags & 
IOMMU_INV_ADDR_FLAGS_LEAF);
 
if (!info->ats_enabled)
break;
@@ -5502,7 +5502,7 @@ intel_iommu_sva_invalidate(struct iommu_domain *domain, 
struct device *dev,
size = 64 - VTD_PAGE_SHIFT;
addr = 0;
} else if (inv_info->granularity == 
IOMMU_INV_GRANU_ADDR) {
-   addr = inv_info->addr_info.addr;
+   addr = inv_info->granu.addr_info.addr;
}
 
if (info->ats_enabled)
diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 95c3164a2302..99353d6468fa 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -370,7 +370,7 @@ int intel_svm_bind_gpasid(struct iommu_domain *domain, 
struct device *dev,
spin_lock(&iommu->lock);
ret = intel_pasid_setup_nested(iommu, dev,
   (pgd_t *)(uintptr_t)data->gpgd,
-  data->hpasid,

[PATCH v12 1/6] docs: IOMMU user API

2020-09-25 Thread Jacob Pan
IOMMU UAPI is newly introduced to support communications between guest
virtual IOMMU and host IOMMU. There has been lots of discussions on how
it should work with VFIO UAPI and userspace in general.

This document is intended to clarify the UAPI design and usage. The
mechanics of how future extensions should be achieved are also covered
in this documentation.

Cc: linux-...@vger.kernel.org
Cc: Jonathan Corbet 
Cc: linux-...@vger.kernel.org
Reviewed-by: Eric Auger 
Signed-off-by: Liu Yi L 
Signed-off-by: Jacob Pan 
---
 Documentation/userspace-api/iommu.rst | 209 ++
 MAINTAINERS   |   1 +
 2 files changed, 210 insertions(+)
 create mode 100644 Documentation/userspace-api/iommu.rst

diff --git a/Documentation/userspace-api/iommu.rst 
b/Documentation/userspace-api/iommu.rst
new file mode 100644
index ..d3108c1519d5
--- /dev/null
+++ b/Documentation/userspace-api/iommu.rst
@@ -0,0 +1,209 @@
+.. SPDX-License-Identifier: GPL-2.0
+.. iommu:
+
+=
+IOMMU Userspace API
+=
+
+IOMMU UAPI is used for virtualization cases where communications are
+needed between physical and virtual IOMMU drivers. For baremetal
+usage, the IOMMU is a system device which does not need to communicate
+with userspace directly.
+
+The primary use cases are guest Shared Virtual Address (SVA) and
+guest IO virtual address (IOVA), wherein the vIOMMU implementation
+relies on the physical IOMMU and for this reason requires interactions
+with the host driver.
+
+.. contents:: :local:
+
+Functionalities
+===
+Communications of user and kernel involve both directions. The
+supported user-kernel APIs are as follows:
+
+1. Bind/Unbind guest PASID (e.g. Intel VT-d)
+2. Bind/Unbind guest PASID table (e.g. ARM SMMU)
+3. Invalidate IOMMU caches upon guest requests
+4. Report errors to the guest and serve page requests
+
+Requirements
+
+The IOMMU UAPIs are generic and extensible to meet the following
+requirements:
+
+1. Emulated and para-virtualised vIOMMUs
+2. Multiple vendors (Intel VT-d, ARM SMMU, etc.)
+3. Extensions to the UAPI shall not break existing userspace
+
+Interfaces
+==
+Although the data structures defined in IOMMU UAPI are self-contained,
+there are no user API functions introduced. Instead, IOMMU UAPI is
+designed to work with existing user driver frameworks such as VFIO.
+
+Extension Rules & Precautions
+-
+When IOMMU UAPI gets extended, the data structures can *only* be
+modified in two ways:
+
+1. Adding new fields by re-purposing the padding[] field. No size change.
+2. Adding new union members at the end. May increase the structure sizes.
+
+No new fields can be added *after* the variable sized union in that it
+will break backward compatibility when offset moves. A new flag must
+be introduced whenever a change affects the structure using either
+method. The IOMMU driver processes the data based on flags which
+ensures backward compatibility.
+
+Version field is only reserved for the unlikely event of UAPI upgrade
+at its entirety.
+
+It's *always* the caller's responsibility to indicate the size of the
+structure passed by setting argsz appropriately.
+Though at the same time, argsz is user provided data which is not
+trusted. The argsz field allows the user app to indicate how much data
+it is providing; it's still the kernel's responsibility to validate
+whether it's correct and sufficient for the requested operation.
+
+Compatibility Checking
+--
+When IOMMU UAPI extension results in some structure size increase,
+IOMMU UAPI code shall handle the following cases:
+
+1. User and kernel has exact size match
+2. An older user with older kernel header (smaller UAPI size) running on a
+   newer kernel (larger UAPI size)
+3. A newer user with newer kernel header (larger UAPI size) running
+   on an older kernel.
+4. A malicious/misbehaving user passing illegal/invalid size but within
+   range. The data may contain garbage.
+
+Feature Checking
+
+While launching a guest with vIOMMU, it is strongly advised to check
+the compatibility upfront, as some subsequent errors happening during
+vIOMMU operation, such as cache invalidation failures cannot be nicely
+escalated to the guest due to IOMMU specifications. This can lead to
+catastrophic failures for the users.
+
+User applications such as QEMU are expected to import kernel UAPI
+headers. Backward compatibility is supported per feature flags.
+For example, an older QEMU (with older kernel header) can run on newer
+kernel. Newer QEMU (with new kernel header) may refuse to initialize
+on an older kernel if new feature flags are not supported by older
+kernel. Simply recompiling existing code with newer kernel header should
+not be an issue in that only existing flags are used.
+
+IOMMU vendor driver should report the below features to IOMMU UAPI
+consu

Re: [PATCH V7 0/3] iommu: Add support to change default domain of an iommu group

2020-09-25 Thread Raj, Ashok
Hi Joerg,

thanks!

On Fri, Sep 25, 2020 at 09:34:23AM +0200, Joerg Roedel wrote:
> Hi Ashok,
> 
> On Thu, Sep 24, 2020 at 10:21:48AM -0700, Raj, Ashok wrote:
> > Just trying to followup on this series.
> > 
> > Sai has moved out of Intel, hence I'm trying to followup on his behalf.
> > 
> > Let me know if you have queued this for the next release.
> 
> Not yet, but I think this is mostly ready. Can you please send a new
> version in a new mail thread so that I can pick it up with b4?

So just another version bump, no other changes?

I thought v6-v7 was a version bump...
> 
> Thanks,
> 
>   Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 17/18] dma-iommu: implement ->alloc_noncoherent

2020-09-25 Thread Tomasz Figa
Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:21PM +0200, Christoph Hellwig wrote:
> Implement the alloc_noncoherent method to provide memory that is neither
> coherent not contiguous.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/iommu/dma-iommu.c | 41 +++
>  1 file changed, 37 insertions(+), 4 deletions(-)
> 

Sorry for being late to the party and thanks a lot for the patch. Please see my
comments inline.

[snip]
> @@ -1052,6 +1055,34 @@ static void *iommu_dma_alloc(struct device *dev, 
> size_t size,
>   return cpu_addr;
>  }
>  
> +#ifdef CONFIG_DMA_REMAP
> +static void *iommu_dma_alloc_noncoherent(struct device *dev, size_t size,
> + dma_addr_t *handle, enum dma_data_direction dir, gfp_t gfp)
> +{
> + if (!gfpflags_allow_blocking(gfp)) {
> + struct page *page;
> +
> + page = dma_common_alloc_pages(dev, size, handle, dir, gfp);
> + if (!page)
> + return NULL;
> + return page_address(page);
> + }
> +
> + return iommu_dma_alloc_remap(dev, size, handle, gfp | __GFP_ZERO,
> +  PAGE_KERNEL, 0);

iommu_dma_alloc_remap() makes use of the DMA_ATTR_ALLOC_SINGLE_PAGES attribute
to optimize the allocations for devices which don't care about how contiguous
the backing memory is. Do you think we could add an attrs argument to this
function and pass it there?

As ARM is being moved to the common iommu-dma layer as well, we'll probably
make use of the argument to support the DMA_ATTR_NO_KERNEL_MAPPING attribute to
conserve the vmalloc area.

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/18] media/v4l2: remove V4L2-FLAG-MEMORY-NON-CONSISTENT flag

2020-09-25 Thread Tomasz Figa
Hi Christoph,

On Tue, Sep 15, 2020 at 05:51:05PM +0200, Christoph Hellwig wrote:
> From: Sergey Senozhatsky 
> 
> The patch partially reverts some of the UAPI bits of the buffer
> cache management hints. Namely, the queue consistency (memory
> coherency) user-space hint because, as it turned out, the kernel
> implementation of this feature was misusing DMA_ATTR_NON_CONSISTENT.
> 
> The patch revers both kernel and user space parts: removes the
> DMA consistency attr functions, rollbacks changes to v4l2_requestbuffers,
> v4l2_create_buffers structures and corresponding UAPI functions
> (plus compat32 layer) and cleanups the documentation.
> 
> Signed-off-by: Christoph Hellwig 
> Signed-off-by: Sergey Senozhatsky 
> Signed-off-by: Christoph Hellwig 
> ---
>  .../userspace-api/media/v4l/buffer.rst| 17 ---
>  .../media/v4l/vidioc-create-bufs.rst  |  6 +--
>  .../media/v4l/vidioc-reqbufs.rst  | 12 +
>  .../media/common/videobuf2/videobuf2-core.c   | 46 +++
>  .../common/videobuf2/videobuf2-dma-contig.c   | 19 
>  .../media/common/videobuf2/videobuf2-dma-sg.c |  3 +-
>  .../media/common/videobuf2/videobuf2-v4l2.c   | 18 +---
>  drivers/media/v4l2-core/v4l2-compat-ioctl32.c | 10 +---
>  drivers/media/v4l2-core/v4l2-ioctl.c  |  5 +-
>  include/media/videobuf2-core.h|  7 +--
>  include/uapi/linux/videodev2.h| 13 +-
>  11 files changed, 22 insertions(+), 134 deletions(-)

Acked-by: Tomasz Figa 

Best regards,
Tomasz
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch V8 0/3] iommu: Add support to change default domain of an iommu group

2020-09-25 Thread Ashok Raj
Presently, the default domain of an iommu group is allocated during boot time
and it cannot be changed later. So, the device would typically be either in
identity (pass_through) mode or the device would be in DMA mode as long as the
system is up and running. There is no way to change the default domain type
dynamically i.e. after booting, a device cannot switch between identity mode and
DMA mode.

Assume a use case wherein the privileged user would want to use the device in
pass-through mode when the device is used for host so that it would be high
performing. Presently, this is not supported. Hence add support to change the
default domain of an iommu group dynamically.

Support this by writing to a sysfs file, namely
"/sys/kernel/iommu_groups//type".

Testing:

Tested by dynamically changing storage device (nvme) from
1. identity mode to DMA and making sure file transfer works
2. DMA mode to identity mode and making sure file transfer works
Tested only for intel_iommu/vt-d. Would appreciate if someone could test on AMD
and ARM based machines.

Based on iommu maintainer's 'next' branch.

Changes from V6,v7:

1. None except for version bump.
https://lore.kernel.org/linux-iommu/20200925073423.gt27...@8bytes.org/

Changes from V5:

1. None except for version bump because Joerg had asked to resend the patches
   after the merge window closes.

Changes from V4:

1. Created device direct mappings before attaching the device to the domain
2. Used list_first_entry() instead of list_for_each_entry() to get the first
   element of a linked list.
3. Used get_device() and put_device() before and after device_lock()
4. Passed device as an argument to iommu_change_dev_def_domain() to check that
   the device hasn't changed between calls.
5. Changed error message from "Group assigned to user level for direct access"
   to "Group not assigned to default domain".
6. Changed error message from "Cannot change default domain of a group with two
   or more devices" to "Cannot change default domain: Group has more than one
   device".
7. Removed printing error message "'def_domain_type' call back isn't registered"

Changes from V3:

1. Made changes to commit message as suggested by Baolu.
2. Don't pass "prev_dom" and "dev" as parameters to
   iommu_change_dev_def_domain(). Instead get them from group.
3. Sanitize the logic to validate user default domain type request. The logic
   remains same but is implmented differently.
4. Push lot of error checking into iommu_change_dev_def_domain() from
   iommu_group_store_type().
5. iommu_change_dev_def_domain() takes/releases group mutex as needed. So, it
   shouldn't be called holding a group mutex.
6. Use pr_err_ratelimited() instead of pr_err() to avoid DOS attack.

Changes from V2:

1. Change the logic of updating default domain from V2 because
   ops->probe_finalize() could be used to update dma_ops.
2. Drop 1st and 2nd patch of V2 series because they are no longer needed on
   iommu maintainer's 'next' branch.
3. Limit this feature to iommu groups with only one device.
4. Hold device_lock and group mutex until the default domain is changed.

Changes from V1:

1. V1 patch set wasn't updating dma_ops for some vendors (Eg: AMD), hence,
   change the logic of updating default domain as below (because adding a device
   to iommu_group automatically updates dma_ops)
   a. Allocate a new domain
   b. For every device in the group
i. Remove the device from the group
ii. Add the device back to the group
   c. Free previous domain
2. Drop 1st patch of V1 (iommu/vt-d: Modify device_def_domain_type() to use at
   runtime) because "iommu=pt" has no effect on this function anymore.
3. Added a patch to take/release lock while reading 
iommu_group->default_domain->type
   because it can be changed any time by user.
4. Before changing default domain type of a group, check if the group is
   directly assigned for user level access. If so, abort.
5. Sanitize return path (using ternary operator) in iommu_group_store_type()
6. Split 2nd patch of V1 (iommu: Add device_def_domain_type() call back function
   to iommu_ops) into two patches such that iommu generic changes are now in 1st
   patch of V2 and vt-d specific changes are in 2nd patch of V2.
7. Rename device_def_domain_type() to dev_def_domain_type()
8. Remove example from documentation
9. Change the value written to file "/sys/kernel/iommu_groups//type"
   from "dma" to "DMA".

Changes from RFC:
-
1. Added support for "auto" type, so that kernel selects one among identity or
   dma mode.
2. Use "system_state" in device_def_domain_type() instead of an argument.

Sai Praneeth Prakhya (3):
  iommu: Add support to change default domain of an iommu group
  iommu: Take lock before reading iommu group default domain type
  iommu: Document usage of "/sys/kernel/iommu_groups//type" file


Sai Praneeth Prakhya (3):
  iom

[Patch V8 3/3] iommu: Document usage of "/sys/kernel/iommu_groups//type" file

2020-09-25 Thread Ashok Raj
From: Sai Praneeth Prakhya 

The default domain type of an iommu group can be changed by writing to
"/sys/kernel/iommu_groups//type" file. Hence, document it's usage
and more importantly spell out its limitations.

Cc: Christoph Hellwig 
Cc: Joerg Roedel 
Cc: Ashok Raj 
Cc: Will Deacon 
Cc: Lu Baolu 
Cc: Sohil Mehta 
Cc: Robin Murphy 
Cc: Jacob Pan 
Reviewed-by: Lu Baolu 
Signed-off-by: Sai Praneeth Prakhya 
---
 .../ABI/testing/sysfs-kernel-iommu_groups  | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/Documentation/ABI/testing/sysfs-kernel-iommu_groups 
b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
index 017f5bc3920c..effde9d23f4f 100644
--- a/Documentation/ABI/testing/sysfs-kernel-iommu_groups
+++ b/Documentation/ABI/testing/sysfs-kernel-iommu_groups
@@ -33,3 +33,33 @@ Description:In case an RMRR is used only by graphics or 
USB devices
it is now exposed as "direct-relaxable" instead of "direct".
In device assignment use case, for instance, those RMRR
are considered to be relaxable and safe.
+
+What:  /sys/kernel/iommu_groups//type
+Date:  September 2020
+KernelVersion: v5.10
+Contact:   Sai Praneeth Prakhya 
+Description:   Let the user know the type of default domain in use by iommu
+   for this group. A privileged user could request kernel to change
+   the group type by writing to this file. Presently, only three
+   types are supported
+   1. DMA: All the DMA transactions from the device in this group
+   are translated by the iommu.
+   2. identity: All the DMA transactions from the device in this
+group are *not* translated by the iommu.
+   3. auto: Change to the type the device was booted with. When the
+user reads the file he would never see "auto". This is
+just a write only value.
+   Note:
+   -
+   A group type could be modified only when
+   1. The group has *only* one device
+   2. The device in the group is not bound to any device driver.
+  So, the user must first unbind the appropriate driver and
+  then change the default domain type.
+   Caution:
+   
+   Unbinding a device driver will take away the driver's control
+   over the device and if done on devices that host root file
+   system could lead to catastrophic effects (the user might
+   need to reboot the machine to get it to normal state). So, it's
+   expected that the user understands what he is doing.
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch V8 2/3] iommu: Take lock before reading iommu group default domain type

2020-09-25 Thread Ashok Raj
From: Sai Praneeth Prakhya 

"/sys/kernel/iommu_groups//type" file could be read to find out the
default domain type of an iommu group. The default domain of an iommu group
doesn't change after booting and hence could be read directly. But,
after addding support to dynamically change iommu group default domain, the
above assumption no longer stays valid.

iommu group default domain type could be changed at any time by writing to
"/sys/kernel/iommu_groups//type". So, take group mutex before
reading iommu group default domain type so that the user wouldn't see stale
values or iommu_group_show_type() doesn't try to derefernce stale pointers.

Cc: Christoph Hellwig 
Cc: Joerg Roedel 
Cc: Ashok Raj 
Cc: Will Deacon 
Cc: Lu Baolu 
Cc: Sohil Mehta 
Cc: Robin Murphy 
Cc: Jacob Pan 
Reviewed-by: Lu Baolu 
Signed-off-by: Sai Praneeth Prakhya 
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2e93c48ce248..b540ae1e679d 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -501,6 +501,7 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
 {
char *type = "unknown\n";
 
+   mutex_lock(&group->mutex);
if (group->default_domain) {
switch (group->default_domain->type) {
case IOMMU_DOMAIN_BLOCKED:
@@ -517,6 +518,7 @@ static ssize_t iommu_group_show_type(struct iommu_group 
*group,
break;
}
}
+   mutex_unlock(&group->mutex);
strcpy(buf, type);
 
return strlen(type);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch V8 1/3] iommu: Add support to change default domain of an iommu group

2020-09-25 Thread Ashok Raj
From: Sai Praneeth Prakhya 

Presently, the default domain of an iommu group is allocated during boot
time and it cannot be changed later. So, the device would typically be
either in identity (also known as pass_through) mode or the device would be
in DMA mode as long as the machine is up and running. There is no way to
change the default domain type dynamically i.e. after booting, a device
cannot switch between identity mode and DMA mode.

But, assume a use case wherein the user trusts the device and believes that
the OS is secure enough and hence wants *only* this device to bypass IOMMU
(so that it could be high performing) whereas all the other devices to go
through IOMMU (so that the system is protected). Presently, this use case
is not supported. It will be helpful if there is some way to change the
default domain of an iommu group dynamically. Hence, add such support.

A privileged user could request the kernel to change the default domain
type of a iommu group by writing to
"/sys/kernel/iommu_groups//type" file. Presently, only three values
are supported
1. identity: all the DMA transactions from the device in this group are
 *not* translated by the iommu
2. DMA: all the DMA transactions from the device in this group are
translated by the iommu
3. auto: change to the type the device was booted with

Note:
1. Default domain of an iommu group with two or more devices cannot be
   changed.
2. The device in the iommu group shouldn't be bound to any driver.
3. The device shouldn't be assigned to user for direct access.
4. The vendor iommu driver is required to add def_domain_type() callback.
   The change request will fail if the request type conflicts with that
   returned from the callback.

Please see "Documentation/ABI/testing/sysfs-kernel-iommu_groups" for more
information.

Cc: Christoph Hellwig 
Cc: Joerg Roedel 
Cc: Ashok Raj 
Cc: Will Deacon 
Cc: Lu Baolu 
Cc: Sohil Mehta 
Cc: Robin Murphy 
Cc: Jacob Pan 
Reviewed-by: Lu Baolu 
Signed-off-by: Sai Praneeth Prakhya 
---
 drivers/iommu/iommu.c | 225 +-
 1 file changed, 224 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 6c14c88cd525..2e93c48ce248 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -93,6 +93,8 @@ static void __iommu_detach_group(struct iommu_domain *domain,
 static int iommu_create_device_direct_mappings(struct iommu_group *group,
   struct device *dev);
 static struct iommu_group *iommu_group_get_for_dev(struct device *dev);
+static ssize_t iommu_group_store_type(struct iommu_group *group,
+ const char *buf, size_t count);
 
 #define IOMMU_GROUP_ATTR(_name, _mode, _show, _store)  \
 struct iommu_group_attribute iommu_group_attr_##_name =\
@@ -525,7 +527,8 @@ static IOMMU_GROUP_ATTR(name, S_IRUGO, 
iommu_group_show_name, NULL);
 static IOMMU_GROUP_ATTR(reserved_regions, 0444,
iommu_group_show_resv_regions, NULL);
 
-static IOMMU_GROUP_ATTR(type, 0444, iommu_group_show_type, NULL);
+static IOMMU_GROUP_ATTR(type, 0644, iommu_group_show_type,
+   iommu_group_store_type);
 
 static void iommu_group_release(struct kobject *kobj)
 {
@@ -2849,3 +2852,223 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
return ops->sva_get_pasid(handle);
 }
 EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+/*
+ * Changes the default domain of an iommu group that has *only* one device
+ *
+ * @group: The group for which the default domain should be changed
+ * @prev_dev: The device in the group (this is used to make sure that the 
device
+ *  hasn't changed after the caller has called this function)
+ * @type: The type of the new default domain that gets associated with the 
group
+ *
+ * Returns 0 on success and error code on failure
+ *
+ * Note:
+ * 1. Presently, this function is called only when user requests to change the
+ *group's default domain type through 
/sys/kernel/iommu_groups//type
+ *Please take a closer look if intended to use for other purposes.
+ */
+static int iommu_change_dev_def_domain(struct iommu_group *group,
+  struct device *prev_dev, int type)
+{
+   struct iommu_domain *prev_dom;
+   struct group_device *grp_dev;
+   const struct iommu_ops *ops;
+   int ret, dev_def_dom;
+   struct device *dev;
+
+   if (!group)
+   return -EINVAL;
+
+   mutex_lock(&group->mutex);
+
+   if (group->default_domain != group->domain) {
+   pr_err_ratelimited("Group not assigned to default domain\n");
+   ret = -EBUSY;
+   goto out;
+   }
+
+   /*
+* iommu group wasn't locked while acquiring device lock in
+* iommu_group_store_type(). So, make sure that the device count hasn't
+* changed while acquiring device lock.

Re: [PATCH 07/13] x86: Secure Launch kernel early boot stub

2020-09-25 Thread Arvind Sankar
On Fri, Sep 25, 2020 at 10:56:43AM -0400, Ross Philipson wrote:
> On 9/24/20 1:38 PM, Arvind Sankar wrote:
> > On Thu, Sep 24, 2020 at 10:58:35AM -0400, Ross Philipson wrote:
> > 
> >> diff --git a/arch/x86/boot/compressed/head_64.S 
> >> b/arch/x86/boot/compressed/head_64.S
> >> index 97d37f0..42043bf 100644
> >> --- a/arch/x86/boot/compressed/head_64.S
> >> +++ b/arch/x86/boot/compressed/head_64.S
> >> @@ -279,6 +279,21 @@ SYM_INNER_LABEL(efi32_pe_stub_entry, SYM_L_LOCAL)
> >>  SYM_FUNC_END(efi32_stub_entry)
> >>  #endif
> >>  
> >> +#ifdef CONFIG_SECURE_LAUNCH
> >> +SYM_FUNC_START(sl_stub_entry)
> >> +  /*
> >> +   * On entry, %ebx has the entry abs offset to sl_stub_entry. To
> >> +   * find the beginning of where we are loaded, sub off from the
> >> +   * beginning.
> >> +   */
> > 
> > This requirement should be added to the documentation. Is it necessary
> > or can this stub just figure out the address the same way as the other
> > 32-bit entry points, using the scratch space in bootparams as a little
> > stack?
> 
> It is based on the state of the BSP when TXT vectors to the measured
> launch environment. It is documented in the TXT spec and the SDMs.
> 

I think it would be useful to add to the x86 boot documentation how
exactly this new entry point is called, even if it's just adding a link
to some section of those specs. The doc should also say that an
mle_header_offset of 0 means the kernel isn't secure launch enabled.

> > 
> > For the 32-bit assembler code that's being added, tip/master now has
> > changes that prevent the compressed kernel from having any runtime
> > relocations.  You'll need to revise some of the code and the data
> > structures initial values to avoid creating relocations.
> 
> Could you elaborate on this some more? I am not sure I see places in the
> secure launch asm that would be creating relocations like this.
> 
> Thank you,
> Ross
> 

You should see them if you do
readelf -r arch/x86/boot/compressed/vmlinux

In terms of the code, things like:

addl%ebx, (sl_gdt_desc + 2)(%ebx)

will create a relocation, because the linker interprets this as wanting
the runtime address of sl_gdt_desc, rather than just the offset from
startup_32.

https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/tree/arch/x86/boot/compressed/head_64.S#n48

has a comment with some explanation and a macro that the 32-bit code in
startup_32 uses to avoid creating relocations.

Since the SL code is in a different assembler file (and a different
section), you can't directly use the same macro. I would suggest getting
rid of sl_stub_entry and entering directly at sl_stub, and then the code
in sl_stub.S can use sl_stub for the base address, defining the rva()
macro there as

#define rva(X) ((X) - sl_stub)

You will also need to avoid initializing data with symbol addresses.

.long mle_header
.long sl_stub_entry
.long sl_gdt

will create relocations. The third one is easy, just replace it with
sl_gdt - sl_gdt_desc and initialize it at runtime with

lealrva(sl_gdt_desc)(%ebx), %eax
addl%eax, 2(%eax)
lgdt(%eax)

The other two are more messy, unfortunately there is no easy way to tell
the linker what we want here. The other entry point addresses (for the
EFI stub) are populated in a post-processing step after the compressed
kernel has been linked, we could teach it to also update kernel_info.

Without that, for kernel_info, you could change it to store the offset
of the MLE header from kernel_info, instead of from the start of the
image.

For the MLE header, it could be moved to .head.text in head_64.S, and
initialized with
.long rva(sl_stub)
This will also let it be placed at a fixed offset from startup_32, so
that kernel_info can just be populated with a constant.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/2] iommu/iova: Solve longterm IOVA issue

2020-09-25 Thread Cong Wang
On Fri, Sep 25, 2020 at 2:56 AM John Garry  wrote:
>
> This series contains a patch to solve the longterm IOVA issue which
> leizhen originally tried to address at [0].
>
> I also included the small optimisation from Cong Wang, which never seems
> to be have been accepted [1]. There was some debate of the other patches
> in that series, but this one is quite straightforward.
>
> @Cong Wang, Please resend your series if prefer I didn't upstream your
> patch.

Thanks for letting me know. But I still don't think it is worth any effort,
given it is hard to work with Robin. Users who care about latency here
should just disable IOMMU, it is really hard to optimize IOVA cache
performance to catch up with !IOMMU case.

So, please feel free to carry it on your own, I have no problem with it.

Thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Linaro-mm-sig] [PATCH v5 05/38] drm: prime: use sgtable iterators in drm_prime_sg_to_page_addr_arrays()

2020-09-25 Thread Alex Deucher
On Tue, Sep 22, 2020 at 2:28 AM Marek Szyprowski
 wrote:
>
> Hi Alex,
>
> On 22.09.2020 01:15, Alex Goins wrote:
> > Tested-by: Alex Goins 
> >
> > This change fixes a regression with drm_prime_sg_to_page_addr_arrays() and
> > AMDGPU in v5.9.
>
> Thanks for testing!
>
> > Commit 39913934 similarly revamped AMDGPU to use sgtable helper functions. 
> > When
> > it changed from dma_map_sg_attrs() to dma_map_sgtable(), as a side effect it
> > started correctly updating sgt->nents to the return value of 
> > dma_map_sg_attrs().
> > However, drm_prime_sg_to_page_addr_arrays() incorrectly uses sgt->nents to
> > iterate over pages, rather than sgt->orig_nents, resulting in it now 
> > returning
> > the incorrect number of pages on AMDGPU.
> >
> > I had written a patch that changes drm_prime_sg_to_page_addr_arrays() to use
> > for_each_sgtable_sg() instead of for_each_sg(), iterating using 
> > sgt->orig_nents:
> >
> > -   for_each_sg(sgt->sgl, sg, sgt->nents, count) {
> > +   for_each_sgtable_sg(sgt, sg, count) {
> >
> > This patch takes it further, but still has the effect of fixing the number 
> > of
> > pages that drm_prime_sg_to_page_addr_arrays() returns. Something like this
> > should be included in v5.9 to prevent a regression with AMDGPU.
>
> Probably the easiest way to handle a fix for v5.9 would be to simply
> merge the latest version of this patch also to v5.9-rcX:
> https://lore.kernel.org/dri-devel/20200904131711.12950-3-m.szyprow...@samsung.com/
>
>
> This way we would get it fixed and avoid possible conflict in the -next.
> Do you have any AMDGPU fixes for v5.9 in the queue? Maybe you can add
> that patch to the queue? Dave: would it be okay that way?

I think this should go into drm-misc for 5.9 since it's an update to
drm_prime.c.  Is that patch ready to merge?
Acked-by: Alex Deucher 

Alex

>
> Best regards
> --
> Marek Szyprowski, PhD
> Samsung R&D Institute Poland
>
> ___
> Linaro-mm-sig mailing list
> linaro-mm-...@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/linaro-mm-sig
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH 1/1] pci: pciehp: Handle MRL interrupts to enable slot for hotplug.

2020-09-25 Thread Ashok Raj
When Mechanical Retention Lock (MRL) is present, Linux doesn't process
those change events.

The following changes need to be enabled when MRL is present.

1. Subscribe to MRL change events in SlotControl.
2. When MRL is closed,
   - If there is no ATTN button, then POWER on the slot.
   - If there is ATTN button, and an MRL event pending, ignore
 Presence Detect. Since we want ATTN button to drive the
 hotplug event.


Signed-off-by: Ashok Raj 
Co-developed-by: Kuppuswamy Sathyanarayanan 

---
 drivers/pci/hotplug/pciehp.h  |  1 +
 drivers/pci/hotplug/pciehp_ctrl.c | 69 +++
 drivers/pci/hotplug/pciehp_hpc.c  | 27 ++-
 3 files changed, 96 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp.h b/drivers/pci/hotplug/pciehp.h
index 4fd200d8b0a9..24a1c9c8ac78 100644
--- a/drivers/pci/hotplug/pciehp.h
+++ b/drivers/pci/hotplug/pciehp.h
@@ -155,6 +155,7 @@ void pciehp_request(struct controller *ctrl, int action);
 void pciehp_handle_button_press(struct controller *ctrl);
 void pciehp_handle_disable_request(struct controller *ctrl);
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 
events);
+void pciehp_handle_mrl_change(struct controller *ctrl);
 int pciehp_configure_device(struct controller *ctrl);
 void pciehp_unconfigure_device(struct controller *ctrl, bool presence);
 void pciehp_queue_pushbutton_work(struct work_struct *work);
diff --git a/drivers/pci/hotplug/pciehp_ctrl.c 
b/drivers/pci/hotplug/pciehp_ctrl.c
index 9f85815b4f53..c4310ee3678b 100644
--- a/drivers/pci/hotplug/pciehp_ctrl.c
+++ b/drivers/pci/hotplug/pciehp_ctrl.c
@@ -227,6 +227,7 @@ void pciehp_handle_disable_request(struct controller *ctrl)
 void pciehp_handle_presence_or_link_change(struct controller *ctrl, u32 events)
 {
int present, link_active;
+   u8 getstatus = 0;
 
/*
 * If the slot is on and presence or link has changed, turn it off.
@@ -275,6 +276,16 @@ void pciehp_handle_presence_or_link_change(struct 
controller *ctrl, u32 events)
if (link_active)
ctrl_info(ctrl, "Slot(%s): Link Up\n",
  slot_name(ctrl));
+   if (MRL_SENS(ctrl)) {
+   pciehp_get_latch_status(ctrl, &getstatus);
+   /*
+* If slot is closed && ATTN button exists
+* don't continue, let the ATTN button
+* drive the hot-plug
+*/
+   if (!getstatus && ATTN_BUTTN(ctrl))
+   return;
+   }
ctrl->request_result = pciehp_enable_slot(ctrl);
break;
default:
@@ -283,6 +294,64 @@ void pciehp_handle_presence_or_link_change(struct 
controller *ctrl, u32 events)
}
 }
 
+void pciehp_handle_mrl_change(struct controller *ctrl)
+{
+   u8 getstatus = 0;
+   int present, link_active;
+
+   pciehp_get_latch_status(ctrl, &getstatus);
+
+   present = pciehp_card_present(ctrl);
+   link_active = pciehp_check_link_active(ctrl);
+
+   ctrl_info(ctrl, "Slot(%s): Card %spresent\n",
+ slot_name(ctrl), present ? "" : "not ");
+
+   ctrl_info(ctrl, "Slot(%s): Link %s\n",
+ slot_name(ctrl), link_active ? "Up" : "Down");
+
+   ctrl_info(ctrl, "Slot(%s): Latch %s\n",
+ slot_name(ctrl), getstatus ? "Open" : "Closed");
+
+   /*
+* Need to handle only MRL Open. When MRL is closed with
+* a Card Present, either the ATTN button, or the PDC indication
+* should power the slot and add the card in the slot
+*/
+   if (getstatus) {
+   /*
+* If slot was powered on, time to power off
+* and remove the card
+*/
+   mutex_lock(&ctrl->state_lock);
+   if (ctrl->state == ON_STATE) {
+   mutex_unlock(&ctrl->state_lock);
+   pciehp_handle_disable_request(ctrl);
+   } else
+   mutex_unlock(&ctrl->state_lock);
+   } else {
+   /*
+* If latch is closed, and previous state is OFF
+* Then enable the slot
+*/
+   mutex_lock(&ctrl->state_lock);
+   if (ctrl->state == OFF_STATE) {
+   /*
+* Only continue to power on the slot when the
+* Attention button is not present. When button
+* present, button press event will process the
+* hot-add part of the flow.
+*/
+   if ((present || link_active) && !ATTN_BUTTN(ctrl)) {
+   ctrl->state = POWERON_STATE;
+   mutex_unlock(&ctrl->state_lock);
+

Re: [PATCH 00/13] x86: Trenchboot secure dynamic launch Linux kernel support

2020-09-25 Thread Daniel P. Smith
On 9/25/20 1:30 AM, Jarkko Sakkinen wrote:
> On Thu, Sep 24, 2020 at 10:58:28AM -0400, Ross Philipson wrote:
>> The Trenchboot project focus on boot security has led to the enabling of
>> the Linux kernel to be directly invocable by the x86 Dynamic Launch
>> instruction(s) for establishing a Dynamic Root of Trust for Measurement
>> (DRTM). The dynamic launch will be initiated by a boot loader with
> 
> What is "the dynamic launch"?

Dynamic launch is the term used to reference the event/process of
restarting a system without reboot to establish the DRTM. It is defined
in the TCG Glossary[1], is discussed in detail in the TCG D-RTM
Architecture specification[2], and covered in minimal detail in sections
9.5.5 and 34.2 of the TCG TPM2.0 Architecture specification[3].

[1]
https://trustedcomputinggroup.org/wp-content/uploads/TCG-Glossary-V1.1-Rev-1.0.pdf
[2]
https://trustedcomputinggroup.org/wp-content/uploads/TCG_D-RTM_Architecture_v1-0_Published_06172013.pdf
[3]
https://trustedcomputinggroup.org/wp-content/uploads/TCG_TPM2_r1p59_Part1_Architecture_pub.pdf

>> associated support added to it, for example the first targeted boot
>> loader will be GRUB2. An integral part of establishing the DRTM involves
>> measuring everything that is intended to be run (kernel image, initrd,
>> etc) and everything that will configure that kernel to run (command
>> line, boot params, etc) into specific PCRs, the DRTM PCRs (17-22), in
>> the TPM. Another key aspect is the dynamic launch is rooted in hardware,
>> that is to say the hardware (CPU) is what takes the first measurement
>> for the chain of integrity measurements. On Intel this is done using
>> the GETSEC instruction provided by Intel's TXT and the SKINIT
>> instruction provided by AMD's AMD-V. Information on these technologies
>> can be readily found online. This patchset introduces Intel TXT support.
> 
> Why not both Intel and AMD? You should explain this in the cover letter.

The work for this is split across different teams with different
resourcing levels resulting in one organization working Intel and
another working AMD. This then raised the concern over submitting a
single patch set developed by two groups pseudo-independently. In this
situation the result would be patches being submitted from one
organization that had no direct development or testing and therefore
could not sign off on a subset of the patches being submitted.

> I'd be more motivated to review and test a full all encompassing x86
> solution. It would increase the patch set size but would also give it
> a better test coverage, which I think would be a huge plus in such a
> complex patch set.

We would not disagree with those sentiments but see the previous
response about the conflict that exists.

>> To enable the kernel to be launched by GETSEC, a stub must be built
>> into the setup section of the compressed kernel to handle the specific
>> state that the dynamic launch process leaves the BSP in. This is
>> analogous to the EFI stub that is found in the same area. Also this stub
> 
> How is it analogous?

It is analogous as we used it as the pattern to follow for adding a
configurable entry point to the kernel. There was a discussion on this
when we published the RFC patches[4].

[4] https://lkml.org/lkml/2020/3/25/982

>> must measure everything that is going to be used as early as possible.
>> This stub code and subsequent code must also deal with the specific
>> state that the dynamic launch leaves the APs in.
> 
> What is "the specific state"?

The details are a bit more than I would prefer to explain here, I would
recommend reading section 2.3 and 2.4 of Intel's TXT Software
Development Guide[5] for all the details of the state and the prescribed
initialization sequence.

[5]
https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf

>> A quick note on terminology. The larger open source project itself is
>> called Trenchboot, which is hosted on Github (links below). The kernel
>> feature enabling the use of the x86 technology is referred to as "Secure
>> Launch" within the kernel code. As such the prefixes sl_/SL_ or
>> slaunch/SLAUNCH will be seen in the code. The stub code discussed above
>> is referred to as the SL stub.
> 
> Is this only for Trenchboot? I'm a bit lost. What is it anyway?

TrenchBoot is a meta-project that is working to bring a unified approach
to using DRTM across CPU implementations and open source projects.
Currently we are working to integrate a dynamic launch preamble (the
code that sets up for calling the dynamic launch CPU instruction) in
GRUB, building an open AMD Secure Loader that aligns with how Intel's
SINIT ACM hands off to an MLE, bring the first directly launchable
implementation to Linux (Secure Launch) with Xen being the next directly
launchable implementation, providing the u-root project a secure launch
initramfs init routine to demonstrate a policy driven measurement and
attestation framework th

Re: [patch V2 00/46] x86, PCI, XEN, genirq ...: Prepare for device MSI

2020-09-25 Thread Thomas Gleixner
On Fri, Sep 25 2020 at 17:49, Peter Zijlstra wrote:
> Here it looks like this:
>
> [1.830276] BUG: kernel NULL pointer dereference, address: 
> [1.838043] #PF: supervisor instruction fetch in kernel mode
> [1.844357] #PF: error_code(0x0010) - not-present page
> [1.850090] PGD 0 P4D 0
> [1.852915] Oops: 0010 [#1] SMP
> [1.856419] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 
> 5.9.0-rc6-00700-g0248dedd12d4 #419
> [1.865447] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS 
> SE5C600.86B.02.02.0002.122320131210 12/23/2013
> [1.876902] RIP: 0010:0x0
> [1.879824] Code: Bad RIP value.
> [1.883423] RSP: :82803da0 EFLAGS: 00010282
> [1.889251] RAX:  RBX: 8282b980 RCX: 
> 82803e40
> [1.897241] RDX: 0001 RSI: 82803e40 RDI: 
> 8282b980
> [1.905201] RBP: 88842f331000 R08:  R09: 
> 0001
> [1.913162] R10: 0001 R11:  R12: 
> 0048
> [1.921123] R13: 82803e40 R14: 8282b9c0 R15: 
> 
> [1.929085] FS:  () GS:88842f40() 
> knlGS:
> [1.938113] CS:  0010 DS:  ES:  CR0: 80050033
> [1.944524] CR2: ffd6 CR3: 02811001 CR4: 
> 000606b0
> [1.952484] Call Trace:
> [1.955214]  msi_domain_alloc+0x36/0x130

Hrm. That looks like a not initialized mandatory callback. Confused.

Is this on -next and if so, does this happen on tip:x86/irq as well?

Can you provide yoru config please?

Thanks,

tglx
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu