Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Harsh Jain

On 26-09-2017 00:16, Casey Leedom wrote:
> | From: Raj, Ashok 
> | Sent: Monday, September 25, 2017 8:54 AM
> |
> | Not sure how the page->offset would end up being greater than page-size?
Refer below
> |
> | If you have additional traces, please send them by.
> |
> | Is this a new driver? wondering how we didn't run into this?
>
>   According to Herbert Xu and one of our own engineers, it's actually legal
> for Scatter/Gather Lists to have this.  This isn't my area of expertise
> though so I'm just passing that on.
>
>   I've asked our team to produce a detailed trace of the exact
> Scatter/Gather Lists they're seeing and what ends up coming out of the DMA
> Mappings, etc.  They're in India, so I expect that they'll have this for you
> by tomorrow morning.
Below mentioned log was already there in 1st mail. Copied here for easy 
reference. Let me know if you need
additional traces.

1) IN esp_output() "__skb_to_sgvec()" convert skb frags to scatter gather list. 
At that moment sg->offset was 4094.
2) From esp_output control reaches to "crypto_authenc_encrypt()". Here in 
"scatterwalk_ffwd()" sg->offset become 4110.
3) Same sg list received by chelsio crypto driver(chcr). When chcr try to do 
DMA mapping it starts giving DMA errors.

Following error observed. first two prints are added for debugging in chcr. 
Kernel version used to reproduce is 4.9.28 on x86_64 with Page size 4K.

Sep 15 12:40:52 heptagon kernel: process_cipher req src 8803cb41f0a8
Sep 15 12:40:52 heptagon kernel: = issuehit offset:4110 === 
dma_addr f24b000e ==> DMA mapped address returned by dma_map_sg()

Sep 15 12:40:52 heptagon kernel: DMAR: DRHD: handling fault status reg 2
Sep 15 12:40:52 heptagon kernel: DMAR: [DMA Write] Request device [02:00.4] 
fault addr f24b [fault reason 05] PTE Write access is not set

>
> Casey

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Casey Leedom
| From: Raj, Ashok 
| Sent: Monday, September 25, 2017 12:03 PM
|   
| On Mon, Sep 25, 2017 at 01:11:04PM -0700, Dan Williams wrote:
| > On Mon, Sep 25, 2017 at 1:05 PM, Casey Leedom  wrote:
| > > | From: Dan Williams 
| > > | Sent: Monday, September 25, 2017 12:31 PM
| > > | ...
| > > | IIUC it looks like this has been broken ever since commit e1605495c716
| > > | "intel-iommu: Introduce domain_sg_mapping() to speed up
| > > | intel_map_sg()". I.e. it looks like the calculation for pte_val should
| > > | be:
| > > |
| > > | pteval = (page_to_phys(sg_page(sg)) + sg->offset) | prot;
| > >
| > > Hhmmm, shouldn't that be:
| > >
| > > pteval = (page_to_phys(sg_page(sg)) + (sg->offset>>PAGE_SHIFT)) | 
prot;
| >
| > Yes, I think you're right. We do want to mask off the page-unaligned
| > portion of sg->offset.
|
| Shoulnd't we normalize the entire sg_page(sg) + sg_offset.
|
| if when you only mask the page-unaligned portion i suspect you might be
| pointing to a different region?
|
| something like (sg_page(sg) + (sg->offset << VTD_PAGE_SHIFT))
|
| then add the unaligned part.. sg->offset>>VTD_PAGE_SHIFT
|
| Is this happening because you are using a 2M page? not sure what triggers
| this or causes the driver to get passed in larger than 4K offset, or
| running 32bit kernel?
|
| if its legal to get passed in such odd values, we should fix IOMMU driver to
| handle it properly, otherwise we should atleast fail those requests.

  (woof) This is all above me.  I've spent a chunk of time fruitlessly
trying to find documentation which says that scatterlist's are allowed to
have offset/length values which extend outside the sg_page(sg).  So someone
much more familiar with this stuff is going to need to say what's allowed.

  As I said, I've asked Harsh to provide us with a detailed trace of exactly
what he's seeing and what the Scatter/Gather Lists are getting translated
into.  That information may make it easier to understand if/how
__domain_mapping() is screwing up ...

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Raj, Ashok
Hi

On Mon, Sep 25, 2017 at 01:11:04PM -0700, Dan Williams wrote:
> On Mon, Sep 25, 2017 at 1:05 PM, Casey Leedom  wrote:
> > | From: Dan Williams 
> > | Sent: Monday, September 25, 2017 12:31 PM
> > | ...
> > | IIUC it looks like this has been broken ever since commit e1605495c716
> > | "intel-iommu: Introduce domain_sg_mapping() to speed up
> > | intel_map_sg()". I.e. it looks like the calculation for pte_val should
> > | be:
> > |
> > | pteval = (page_to_phys(sg_page(sg)) + sg->offset) | prot;
> >
> > Hhmmm, shouldn't that be:
> >
> > pteval = (page_to_phys(sg_page(sg)) + (sg->offset>>PAGE_SHIFT)) | prot;
> 
> Yes, I think you're right. We do want to mask off the page-unaligned
> portion of sg->offset.

Shoulnd't we normalize the entire sg_page(sg) + sg_offset.

if when you only mask the page-unaligned portion i suspect you might be
pointing to a different region?

something like (sg_page(sg) + (sg->offset << VTD_PAGE_SHIFT)) 

then add the unaligned part.. sg->offset>>VTD_PAGE_SHIFT

Is this happening because you are using a 2M page? not sure what triggers
this or causes the driver to get passed in larger than 4K offset, or 
running 32bit kernel?

if its legal to get passed in such odd values, we should fix IOMMU driver to 
handle it properly, otherwise we should atleast fail those requests.

Cheers,
Ashok


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Casey Leedom
| From: David Woodhouse 
| Sent: Monday, September 25, 2017 11:45 AM
|
| On Wed, 2017-09-20 at 16:01 +0800, Herbert Xu wrote:
| > Harsh Jain  wrote:
| > >
| > > While debugging DMA mapping error in chelsio crypto driver we
| > observed that when scatter/gather list received by driver has some
| > entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA
| > error.  Without IOMMU it works fine.
| >
| > This is not a bug.  The network stack can and will feed us such
| > SG lists.
|
| Hm? Under what circumstances is the offset permitted to be >
| PAGE_SIZE?

  As I noted earlier, this is an area of the kernel with which I'm not super
familiar.  Both Herbert Xu and our local VM Expert have said that having
Scatter/Gather Lists with Offsets greater than Page Size is not a bug ...

  I'm mostly trying to help out keeping focus on this because Harsh is in
India (presumable enjoying a good night's sleep while we look at this.
Hopefully we'll have a present of a bug fix for him when he wakes up ... :-)

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Dan Williams
On Mon, Sep 25, 2017 at 1:05 PM, Casey Leedom  wrote:
> | From: Dan Williams 
> | Sent: Monday, September 25, 2017 12:31 PM
> | ...
> | IIUC it looks like this has been broken ever since commit e1605495c716
> | "intel-iommu: Introduce domain_sg_mapping() to speed up
> | intel_map_sg()". I.e. it looks like the calculation for pte_val should
> | be:
> |
> | pteval = (page_to_phys(sg_page(sg)) + sg->offset) | prot;
>
> Hhmmm, shouldn't that be:
>
> pteval = (page_to_phys(sg_page(sg)) + (sg->offset>>PAGE_SHIFT)) | prot;

Yes, I think you're right. We do want to mask off the page-unaligned
portion of sg->offset.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Casey Leedom
| From: Dan Williams 
| Sent: Monday, September 25, 2017 12:31 PM
| ...
| IIUC it looks like this has been broken ever since commit e1605495c716
| "intel-iommu: Introduce domain_sg_mapping() to speed up
| intel_map_sg()". I.e. it looks like the calculation for pte_val should
| be:
| 
|     pteval = (page_to_phys(sg_page(sg)) + sg->offset) | prot;

Hhmmm, shouldn't that be:

pteval = (page_to_phys(sg_page(sg)) + (sg->offset>>PAGE_SHIFT)) | prot;

???

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Dan Williams
On Mon, Sep 25, 2017 at 10:46 AM, Casey Leedom  wrote:
> | From: Robin Murphy 
> | Sent: Wednesday, September 20, 2017 3:12 AM
> |
> | On 20/09/17 09:01, Herbert Xu wrote:
> | >
> | > Harsh Jain  wrote:
> | >>
> | >> While debugging DMA mapping error in chelsio crypto driver we
> | >> observed that when scatter/gather list received by driver has
> | >> some entry with page->offset > 4096 (PAGE_SIZE). It starts
> | >> giving DMA error.  Without IOMMU it works fine.
> | >
> | > This is not a bug.  The network stack can and will feed us such
> | > SG lists.
> | >
> | >> 2) It cannot be driver's responsibilty to update received sg
> | >> entries to adjust offset and page because we are not the only
> | >> one who directly uses received sg list.
> | >
> | > No the driver must deal with this.  Having said that, if we can
> | > improve our driver helper interface to make this easier then we
> | > should do that too.  What we certainly shouldn't do is to take a
> | > whack-a-mole approach like this patch does.
> |
> | AFAICS this is entirely on intel-iommu - from a brief look it appears
> | that all the IOVA calculations would handle the offset correctly, but
> | then __domain_mapping() blindly uses sg_page() for the physical address,
> | so if offset is larger than a page it would end up with the DMA mapping
> | covering the wrong part of the buffer.
> |
> | Does the diff below help?
> |
> | Robin.
> |
> | ->8-
> | diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
> | index b3914fce8254..2ed43d928135 100644
> | --- a/drivers/iommu/intel-iommu.c
> | +++ b/drivers/iommu/intel-iommu.c
> | @@ -2253,7 +2253,7 @@ static int __domain_mapping(struct dmar_domain 
> *domain, unsigned long iov_pfn,
> |  sg_res = aligned_nrpages(sg->offset, sg->length);
> |  sg->dma_address = ((dma_addr_t)iov_pfn << 
> VTD_PAGE_SHIFT) + sg->offset;
> |  sg->dma_length = sg->length;
> | -   pteval = page_to_phys(sg_page(sg)) | prot;
> | +   pteval = (sg_phys(sg) & PAGE_MASK) | prot;

This breaks on platforms where sizeof(phys_addr_t) > sizeof(unsigned
long). I.e. it's not always safe to assume that PAGE_MASK is the
correct width.

> |  phys_pfn = pteval >> VTD_PAGE_SHIFT;
> |  }
>
>   Adding some likely people to the Cc list so they can comment on this.
> Dan Williams submitted that specific piece of code in kernel.org:3e6110fd54
> ... but there are lots of similar bits in that function.  Hopefully one of
> the Intel I/O MMU Gurus will have a better idea of what may be going wrong
> here.  In the mean time I've asked our team to gather far more detailed
> debug traces showing the exact Scatter/Gather Lists we're getting, what they
> get translated to in the DMA Mappings, and what DMA Addresses were seeing in
> error.

IIUC it looks like this has been broken ever since commit e1605495c716
"intel-iommu: Introduce domain_sg_mapping() to speed up
intel_map_sg()". I.e. it looks like the calculation for pte_val should
be:

pteval = (page_to_phys(sg_page(sg)) + sg->offset) | prot;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Casey Leedom
| From: Raj, Ashok 
| Sent: Monday, September 25, 2017 8:54 AM
|
| Not sure how the page->offset would end up being greater than page-size?
|
| If you have additional traces, please send them by.
|
| Is this a new driver? wondering how we didn't run into this?

  According to Herbert Xu and one of our own engineers, it's actually legal
for Scatter/Gather Lists to have this.  This isn't my area of expertise
though so I'm just passing that on.

  I've asked our team to produce a detailed trace of the exact
Scatter/Gather Lists they're seeing and what ends up coming out of the DMA
Mappings, etc.  They're in India, so I expect that they'll have this for you
by tomorrow morning.

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread David Woodhouse
On Wed, 2017-09-20 at 16:01 +0800, Herbert Xu wrote:
> Harsh Jain  wrote:
> > 
> > While debugging DMA mapping error in chelsio crypto driver we
> observed that when scatter/gather list received by driver has some
> entry with page->offset > 4096 (PAGE_SIZE). It starts giving DMA
> error.  Without IOMMU it works fine.
> 
> This is not a bug.  The network stack can and will feed us such
> SG lists.

Hm? Under what circumstances is the offset permitted to be >
PAGE_SIZE? 

smime.p7s
Description: S/MIME cryptographic signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Raj, Ashok
Hi Casey

Sorry, somehow didn't see this one come by.


On Mon, Sep 25, 2017 at 05:46:40PM +, Casey Leedom wrote:
> | From: Robin Murphy 
> | Sent: Wednesday, September 20, 2017 3:12 AM
> |
> | On 20/09/17 09:01, Herbert Xu wrote:
> | >
> | > Harsh Jain  wrote:
> | >>
> | >> While debugging DMA mapping error in chelsio crypto driver we
> | >> observed that when scatter/gather list received by driver has
> | >> some entry with page->offset > 4096 (PAGE_SIZE). It starts
> | >> giving DMA error.  Without IOMMU it works fine.

Not sure how the page->offset would end up being greater than page-size?

If you have additional traces, please send them by. 

Is this a new driver? wondering how we didn't run into this?


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: DMA error when sg->offset value is greater than PAGE_SIZE in Intel IOMMU

2017-09-25 Thread Casey Leedom
| From: Robin Murphy 
| Sent: Wednesday, September 20, 2017 3:12 AM
|
| On 20/09/17 09:01, Herbert Xu wrote:
| >
| > Harsh Jain  wrote:
| >>
| >> While debugging DMA mapping error in chelsio crypto driver we
| >> observed that when scatter/gather list received by driver has
| >> some entry with page->offset > 4096 (PAGE_SIZE). It starts
| >> giving DMA error.  Without IOMMU it works fine.
| >
| > This is not a bug.  The network stack can and will feed us such
| > SG lists.
| >
| >> 2) It cannot be driver's responsibilty to update received sg
| >> entries to adjust offset and page because we are not the only
| >> one who directly uses received sg list.
| >
| > No the driver must deal with this.  Having said that, if we can
| > improve our driver helper interface to make this easier then we
| > should do that too.  What we certainly shouldn't do is to take a
| > whack-a-mole approach like this patch does.
|
| AFAICS this is entirely on intel-iommu - from a brief look it appears
| that all the IOVA calculations would handle the offset correctly, but
| then __domain_mapping() blindly uses sg_page() for the physical address,
| so if offset is larger than a page it would end up with the DMA mapping
| covering the wrong part of the buffer.
|
| Does the diff below help?
|
| Robin.
|
| ->8-
| diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
| index b3914fce8254..2ed43d928135 100644
| --- a/drivers/iommu/intel-iommu.c
| +++ b/drivers/iommu/intel-iommu.c
| @@ -2253,7 +2253,7 @@ static int __domain_mapping(struct dmar_domain *domain, 
unsigned long iov_pfn,
|  sg_res = aligned_nrpages(sg->offset, sg->length);
|  sg->dma_address = ((dma_addr_t)iov_pfn << 
VTD_PAGE_SHIFT) + sg->offset;
|  sg->dma_length = sg->length;
| -   pteval = page_to_phys(sg_page(sg)) | prot;
| +   pteval = (sg_phys(sg) & PAGE_MASK) | prot;
|  phys_pfn = pteval >> VTD_PAGE_SHIFT;
|  }

  Adding some likely people to the Cc list so they can comment on this.
Dan Williams submitted that specific piece of code in kernel.org:3e6110fd54
... but there are lots of similar bits in that function.  Hopefully one of
the Intel I/O MMU Gurus will have a better idea of what may be going wrong
here.  In the mean time I've asked our team to gather far more detailed
debug traces showing the exact Scatter/Gather Lists we're getting, what they
get translated to in the DMA Mappings, and what DMA Addresses were seeing in
error.

Casey
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: bind pasid table API

2017-09-25 Thread Raj, Ashok
On Mon, Sep 25, 2017 at 12:45:00PM +0100, Jean-Philippe Brucker wrote:
[snip]

> This format tells how the guest organizes its PASID tables. Depending on
> 'format', the PASID table can be:
> * A flat array of descriptors
> * One array of 1st-level descriptors pointing to a 2nd level of
> descriptors. When translating an access, the SMMU splits the PASID at bit
> 6, uses the upper half to get the 1st level descriptor and the lower half
> to get the 2nd level one (4k leaf table).
> * The same format, but instead split at PASID bit 10 (64k leaf table)

When in multilevel PASID table, do you shadow it in the host? or every 
level is all hosted in guest space? in Intel vt-d, when context is in 
a nested mode, PASID table is a gPA. Its a single level table.


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] pci: Add dummy for pci_acs_enabled() if CONFIG_PCI=n to fix iommmu build

2017-09-25 Thread Bjorn Helgaas
On Fri, Sep 22, 2017 at 08:12:46PM +0200, Geert Uytterhoeven wrote:
> Hi Björn,
> 
> On Fri, Sep 22, 2017 at 5:56 PM, Bjorn Helgaas  wrote:
> > On Mon, Sep 11, 2017 at 02:29:15PM +0200, Geert Uytterhoeven wrote:
> >> If CONFIG_PCI=n, and gcc (e.g. 4.1.2) decides not to inline
> >> get_pci_function_alias_group(), the build fails with:
> >>
> >> drivers/iommu/iommu.o: In function `get_pci_function_alias_group':
> >> iommu.c:(.text+0xfdc): undefined reference to `pci_acs_enabled'
> >>
> >> Due to the various dummies for PCI calls in the CONFIG_PCI=n case,
> >> pci_acs_enabled() isn't actually ever called, but not all versions of
> >> gcc are smart enough to realize that.
> >>
> >> While explicitly marking get_pci_function_alias_group() inline would fix
> >> the build, this would inflate the code for the CONFIG_PCI=y case, as
> >> get_pci_function_alias_group() is a not-so-small function called from
> >> two places.
> >>
> >> Hence fix the issue by introducing a dummy for pci_acs_enabled()
> >> instead.
> >>
> >> Signed-off-by: Geert Uytterhoeven 
> >
> > Acked-by: Bjorn Helgaas 
> >
> > Joerg, if you pick this up, would you mind capitalizing the subject
> > line to match the PCI convention, e.g.,
> >
> >   PCI: Add dummy pci_acs_enabled() for CONFIG_PCI=n build
> >
> > If it's too late for you to pick this up this week, I can include it
> > next week.  I assume this is not related to a specific change, i.e.,
> > it's not a regression?  Should it be marked for stable?
> 
> It was introduced by commit 0ae349a0f33fb040 ("iommu/qcom: Add qcom_iommu"),
> which enabled IOMMU support for compile-testing in e.g. allmodconfig on
> platforms that don't have IOMMUs.

I put this on my for-linus branch for v4.14 with Alex's reviewed-by.

Since 0ae349a0f33f was merged in the v4.14 merge window (it first
appeared in v4.14-rc1), I don't think we need to mark it for stable.

Bjorn
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE

2017-09-25 Thread Will Deacon
[+PeterZ because he enjoys things like this]

On Mon, Sep 25, 2017 at 05:37:46PM +0200, Geert Uytterhoeven wrote:
> On Mon, Sep 25, 2017 at 5:21 PM, Will Deacon  wrote:
> > On Mon, Sep 25, 2017 at 09:16:22AM +0200, Geert Uytterhoeven wrote:
> >> On Wed, Jul 12, 2017 at 7:16 PM, Linux Kernel Mailing List
> >>  wrote:
> >> > Web:
> >> > https://git.kernel.org/torvalds/c/c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> >> > Commit: c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> >> > Parent: 58188afeb727e0f73706f1460707bd3ba6ccc221
> >> > Refname:refs/heads/master
> >> > Author: Will Deacon 
> >> > AuthorDate: Fri Jun 23 11:45:57 2017 +0100
> >> > Committer:  Will Deacon 
> >> > CommitDate: Fri Jun 23 17:58:02 2017 +0100
> >> >
> >> > iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using 
> >> > COMPILE_TEST with LPAE
> >> >
> >> > The LPAE/ARMv8 page table format relies on the ability to read and 
> >> > write
> >> > 64-bit page table entries in an atomic fashion. With the move to a 
> >> > lockless
> >> > implementation, we also need support for cmpxchg64 to resolve races 
> >> > when
> >> > installing table entries concurrently.
> >> >
> >> > Unfortunately, not all architectures support cmpxchg64, so the code 
> >> > can
> >> > fail to compiler when building for these architectures using 
> >> > COMPILE_TEST.
> >> > Rather than disable COMPILE_TEST altogether, instead check that
> >> > GENERIC_ATOMIC64 is not selected, which is a reasonable indication 
> >> > that
> >> > the architecture has support for 64-bit cmpxchg.
> >> >
> >> > Reported-by: kbuild test robot 
> >> > Signed-off-by: Will Deacon 
> >> > ---
> >> >  drivers/iommu/Kconfig | 2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> >> > index 6ee3a25ae731..c88cfa7522b2 100644
> >> > --- a/drivers/iommu/Kconfig
> >> > +++ b/drivers/iommu/Kconfig
> >> > @@ -23,7 +23,7 @@ config IOMMU_IO_PGTABLE
> >> >  config IOMMU_IO_PGTABLE_LPAE
> >> > bool "ARMv7/v8 Long Descriptor Format"
> >> > select IOMMU_IO_PGTABLE
> >> > -   depends on HAS_DMA && (ARM || ARM64 || COMPILE_TEST)
> >> > +   depends on HAS_DMA && (ARM || ARM64 || (COMPILE_TEST && 
> >> > !GENERIC_ATOMIC64))
> >> > help
> >> >   Enable support for the ARM long descriptor pagetable format.
> >> >   This allocator supports 4K/2M/1G, 16K/32M and 64K/512M page
> >>
> >> I can't find where this patch was submitted and discussed, so I'm replying
> >> to this email. On which architectures did it fail to compile?
> >
> > It was in response to a report from the kbuild test robot on m32r, but
> > looking back I now see that it didn't go to the lists for some reason. I've
> > included the report at the end of this email so you can have a look.
> 
> Thanks!
> 
> >> cmpxchg64() is defined by include/asm-generic/cmpxchg.h, so I fail to
> >> see what's the relation with GENERIC_ATOMIC64, which is related to
> >> lib/atomic64.c instead.
> >
> > Yeah, it's a bit of a hack, but we're basically relying on architectures
> > that don't select GENERIC_ATOMIC64 providing their own cmpxchg64
> > implementation, which seems to be the case.
> >
> >> E.g. on m68k, which uses GENERIC_ATOMIC64, it compile-tested fine before.
> >
> > FWIW, the lock-based atomics wouldn't be sufficient at runtime, but I
> > appreciate that we're only talking about COMPILE_TEST here.
> 
> Correct.
> 
> >> Perhaps there's another (SMP vs UP?) dependency, as
> >> include/asm-generic/cmpxchg.h cannot be used on SMP?
> >> Should it be COMPILE_TEST && (!GENERIC_ATOMIC64 || !SMP)?
> >
> > I don't see how that helps. Are you seeing build failures on a non-SMP
> > arch?
> 
> I'm not seeing build failures.
> I see reduced build coverage on m68k due to this change.
> 
> So m32r fails to provide a definition for cmpxchg64(), which has nothing to
> do with GENERIC_ATOMIC64.
> Probably m32r just needs the same treatment as in commit feb20ec2bb859d1d
> ("m68k: Add missing cmpxchg64() if CONFIG_RMW_INSNS=y")

I have no idea, tbh. m32r claims to be SMP in some cases but it's orphaned
so I don't know who we could ask about this. If we ended up with a
lock-based 64-bit cmpxchg but a native 8/16/32-bit cmpxchg then we're
asking for trouble because they don't interwork.

> Worse, people are now adding more depends on !GENERIC_ATOMIC64 for
> various IOMMU drivers, to fix up "selects IOMMU_IO_PGTABLE_LPAE which
> has unmet direct dependencies" warnings :-(

Well I'd certainly be willing to revisit the dependency if all architectures
provided cmpxchg64, but that's not something that I'm comfortable
implementing without the ability to test it since it could actually cause
runtime breakage just for the sake of getting a driver building that will
never be used on those architectures.

I didn't want to drop the || COMPILE_TEST completely because we get useful
coverage from it on

Re: iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE

2017-09-25 Thread Geert Uytterhoeven
Hi Will,

On Mon, Sep 25, 2017 at 5:21 PM, Will Deacon  wrote:
> On Mon, Sep 25, 2017 at 09:16:22AM +0200, Geert Uytterhoeven wrote:
>> On Wed, Jul 12, 2017 at 7:16 PM, Linux Kernel Mailing List
>>  wrote:
>> > Web:
>> > https://git.kernel.org/torvalds/c/c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
>> > Commit: c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
>> > Parent: 58188afeb727e0f73706f1460707bd3ba6ccc221
>> > Refname:refs/heads/master
>> > Author: Will Deacon 
>> > AuthorDate: Fri Jun 23 11:45:57 2017 +0100
>> > Committer:  Will Deacon 
>> > CommitDate: Fri Jun 23 17:58:02 2017 +0100
>> >
>> > iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST 
>> > with LPAE
>> >
>> > The LPAE/ARMv8 page table format relies on the ability to read and 
>> > write
>> > 64-bit page table entries in an atomic fashion. With the move to a 
>> > lockless
>> > implementation, we also need support for cmpxchg64 to resolve races 
>> > when
>> > installing table entries concurrently.
>> >
>> > Unfortunately, not all architectures support cmpxchg64, so the code can
>> > fail to compiler when building for these architectures using 
>> > COMPILE_TEST.
>> > Rather than disable COMPILE_TEST altogether, instead check that
>> > GENERIC_ATOMIC64 is not selected, which is a reasonable indication that
>> > the architecture has support for 64-bit cmpxchg.
>> >
>> > Reported-by: kbuild test robot 
>> > Signed-off-by: Will Deacon 
>> > ---
>> >  drivers/iommu/Kconfig | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
>> > index 6ee3a25ae731..c88cfa7522b2 100644
>> > --- a/drivers/iommu/Kconfig
>> > +++ b/drivers/iommu/Kconfig
>> > @@ -23,7 +23,7 @@ config IOMMU_IO_PGTABLE
>> >  config IOMMU_IO_PGTABLE_LPAE
>> > bool "ARMv7/v8 Long Descriptor Format"
>> > select IOMMU_IO_PGTABLE
>> > -   depends on HAS_DMA && (ARM || ARM64 || COMPILE_TEST)
>> > +   depends on HAS_DMA && (ARM || ARM64 || (COMPILE_TEST && 
>> > !GENERIC_ATOMIC64))
>> > help
>> >   Enable support for the ARM long descriptor pagetable format.
>> >   This allocator supports 4K/2M/1G, 16K/32M and 64K/512M page
>>
>> I can't find where this patch was submitted and discussed, so I'm replying
>> to this email. On which architectures did it fail to compile?
>
> It was in response to a report from the kbuild test robot on m32r, but
> looking back I now see that it didn't go to the lists for some reason. I've
> included the report at the end of this email so you can have a look.

Thanks!

>> cmpxchg64() is defined by include/asm-generic/cmpxchg.h, so I fail to
>> see what's the relation with GENERIC_ATOMIC64, which is related to
>> lib/atomic64.c instead.
>
> Yeah, it's a bit of a hack, but we're basically relying on architectures
> that don't select GENERIC_ATOMIC64 providing their own cmpxchg64
> implementation, which seems to be the case.
>
>> E.g. on m68k, which uses GENERIC_ATOMIC64, it compile-tested fine before.
>
> FWIW, the lock-based atomics wouldn't be sufficient at runtime, but I
> appreciate that we're only talking about COMPILE_TEST here.

Correct.

>> Perhaps there's another (SMP vs UP?) dependency, as
>> include/asm-generic/cmpxchg.h cannot be used on SMP?
>> Should it be COMPILE_TEST && (!GENERIC_ATOMIC64 || !SMP)?
>
> I don't see how that helps. Are you seeing build failures on a non-SMP
> arch?

I'm not seeing build failures.
I see reduced build coverage on m68k due to this change.

So m32r fails to provide a definition for cmpxchg64(), which has nothing to
do with GENERIC_ATOMIC64.
Probably m32r just needs the same treatment as in commit feb20ec2bb859d1d
("m68k: Add missing cmpxchg64() if CONFIG_RMW_INSNS=y")

Worse, people are now adding more depends on !GENERIC_ATOMIC64 for
various IOMMU drivers, to fix up "selects IOMMU_IO_PGTABLE_LPAE which
has unmet direct dependencies" warnings :-(

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE

2017-09-25 Thread Will Deacon
Hi Geert,

On Mon, Sep 25, 2017 at 09:16:22AM +0200, Geert Uytterhoeven wrote:
> On Wed, Jul 12, 2017 at 7:16 PM, Linux Kernel Mailing List
>  wrote:
> > Web:
> > https://git.kernel.org/torvalds/c/c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> > Commit: c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> > Parent: 58188afeb727e0f73706f1460707bd3ba6ccc221
> > Refname:refs/heads/master
> > Author: Will Deacon 
> > AuthorDate: Fri Jun 23 11:45:57 2017 +0100
> > Committer:  Will Deacon 
> > CommitDate: Fri Jun 23 17:58:02 2017 +0100
> >
> > iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST 
> > with LPAE
> >
> > The LPAE/ARMv8 page table format relies on the ability to read and write
> > 64-bit page table entries in an atomic fashion. With the move to a 
> > lockless
> > implementation, we also need support for cmpxchg64 to resolve races when
> > installing table entries concurrently.
> >
> > Unfortunately, not all architectures support cmpxchg64, so the code can
> > fail to compiler when building for these architectures using 
> > COMPILE_TEST.
> > Rather than disable COMPILE_TEST altogether, instead check that
> > GENERIC_ATOMIC64 is not selected, which is a reasonable indication that
> > the architecture has support for 64-bit cmpxchg.
> >
> > Reported-by: kbuild test robot 
> > Signed-off-by: Will Deacon 
> > ---
> >  drivers/iommu/Kconfig | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> > index 6ee3a25ae731..c88cfa7522b2 100644
> > --- a/drivers/iommu/Kconfig
> > +++ b/drivers/iommu/Kconfig
> > @@ -23,7 +23,7 @@ config IOMMU_IO_PGTABLE
> >  config IOMMU_IO_PGTABLE_LPAE
> > bool "ARMv7/v8 Long Descriptor Format"
> > select IOMMU_IO_PGTABLE
> > -   depends on HAS_DMA && (ARM || ARM64 || COMPILE_TEST)
> > +   depends on HAS_DMA && (ARM || ARM64 || (COMPILE_TEST && 
> > !GENERIC_ATOMIC64))
> > help
> >   Enable support for the ARM long descriptor pagetable format.
> >   This allocator supports 4K/2M/1G, 16K/32M and 64K/512M page
> 
> I can't find where this patch was submitted and discussed, so I'm replying
> to this email. On which architectures did it fail to compile?

It was in response to a report from the kbuild test robot on m32r, but
looking back I now see that it didn't go to the lists for some reason. I've
included the report at the end of this email so you can have a look.

> cmpxchg64() is defined by include/asm-generic/cmpxchg.h, so I fail to
> see what's the relation with GENERIC_ATOMIC64, which is related to
> lib/atomic64.c instead.

Yeah, it's a bit of a hack, but we're basically relying on architectures
that don't select GENERIC_ATOMIC64 providing their own cmpxchg64
implementation, which seems to be the case.

> E.g. on m68k, which uses GENERIC_ATOMIC64, it compile-tested fine before.

FWIW, the lock-based atomics wouldn't be sufficient at runtime, but I
appreciate that we're only talking about COMPILE_TEST here.

> Perhaps there's another (SMP vs UP?) dependency, as
> include/asm-generic/cmpxchg.h cannot be used on SMP?
> Should it be COMPILE_TEST && (!GENERIC_ATOMIC64 || !SMP)?

I don't see how that helps. Are you seeing build failures on a non-SMP
arch?

Will

--->8

>From fengguang...@intel.com Fri Jun 23 07:06:26 2017
Date: Fri, 23 Jun 2017 14:05:19 +0800
From: kbuild test robot 
To: Robin Murphy 
CC: kbuild-...@01.org, Will Deacon 
Subject: [arm-perf:for-joerg/arm-smmu/updates 10/13]
 include/linux/atomic.h:475:29: error: implicit declaration of function
 'cmpxchg64'
User-Agent: Mutt/1.5.23 (2014-03-12)
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=utf-8
Status: RO
X-Status: A

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git 
for-joerg/arm-smmu/updates
head:   5723c883174667dab5c42cbefa8d4d0e9acc16bc
commit: a22753392d403aba096a3b10b2bc6ed721a2eb8f [10/13] iommu/io-pgtable-arm: 
Support lockless operation
config: m32r-allyesconfig (attached as .config)
compiler: m32r-linux-gcc (GCC) 6.2.0
reproduce:
wget 
https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout a22753392d403aba096a3b10b2bc6ed721a2eb8f
# save the attached .config to linux build tree
make.cross ARCH=m32r 

All error/warnings (new ones prefixed by >>):

   In file included from drivers/iommu/io-pgtable-arm.c:23:0:
   drivers/iommu/io-pgtable-arm.c: In function 'arm_lpae_install_table':
>> include/linux/atomic.h:475:29: error: implicit declaration of function 
>> 'cmpxchg64' [-Werror=implicit-function-declaration]
#define  cmpxchg64_relaxed  cmpxchg64
^
>> drivers/iommu/io-pgtable-arm.c:337:8: note: in expansion of macro 
>> 'cmpxchg64_relaxed'
 old = cmpxchg64_relaxed(ptep, curr, new);
   ^~~

Re: [PATCH] iommu/arm-smmu-v3: Correct COHACC override message

2017-09-25 Thread Lorenzo Pieralisi
On Mon, Sep 25, 2017 at 02:55:40PM +0100, Robin Murphy wrote:
> Slightly confusingly, when reporting a mismatch of the ID register
> value, we still refer to the IORT COHACC override flag as the
> "dma-coherent property" if we booted with ACPI. Update the message
> to be firmware-agnostic in line with SMMUv2.
> 
> Reported-by: Will Deacon 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/arm-smmu-v3.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Apologies - missed while reshuffling the probe path to enable
ACPI probing:

Acked-by: Lorenzo Pieralisi 

> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 47f52b1ab838..4c63b3cf4eba 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2523,7 +2523,7 @@ static int arm_smmu_device_hw_probe(struct 
> arm_smmu_device *smmu)
>* register, but warn on mismatch.
>*/
>   if (!!(reg & IDR0_COHACC) != coherent)
> - dev_warn(smmu->dev, "IDR0.COHACC overridden by dma-coherent 
> property (%s)\n",
> + dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration 
> (%s)\n",
>coherent ? "true" : "false");
>  
>   switch (reg & IDR0_STALL_MODEL_MASK << IDR0_STALL_MODEL_SHIFT) {
> -- 
> 2.13.4.dirty
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/arm-smmu-v3: Correct COHACC override message

2017-09-25 Thread Robin Murphy
Slightly confusingly, when reporting a mismatch of the ID register
value, we still refer to the IORT COHACC override flag as the
"dma-coherent property" if we booted with ACPI. Update the message
to be firmware-agnostic in line with SMMUv2.

Reported-by: Will Deacon 
Signed-off-by: Robin Murphy 
---
 drivers/iommu/arm-smmu-v3.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 47f52b1ab838..4c63b3cf4eba 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -2523,7 +2523,7 @@ static int arm_smmu_device_hw_probe(struct 
arm_smmu_device *smmu)
 * register, but warn on mismatch.
 */
if (!!(reg & IDR0_COHACC) != coherent)
-   dev_warn(smmu->dev, "IDR0.COHACC overridden by dma-coherent 
property (%s)\n",
+   dev_warn(smmu->dev, "IDR0.COHACC overridden by FW configuration 
(%s)\n",
 coherent ? "true" : "false");
 
switch (reg & IDR0_STALL_MODEL_MASK << IDR0_STALL_MODEL_SHIFT) {
-- 
2.13.4.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/mediatek: Limit the physical address in 32bit for v7s

2017-09-25 Thread Will Deacon
On Mon, Sep 25, 2017 at 06:15:26PM +0800, Yong Wu wrote:
> The ARM short descriptor has already limited the physical address
> to 32bit after the commit <76557391433c> ("iommu/io-pgtable: Sanitise
> map/unmap addresses"). But in MediaTek 4GB mode, the physical address
> is from 0x1__ to 0x1__. this will cause:
> 
> WARNING: CPU: 4 PID: 3900 at
> xxx/drivers/iommu/io-pgtable-arm-v7s.c:482 arm_v7s_map+0x40/0xf8
> Modules linked in:
> 
> CPU: 4 PID: 3900 Comm: weston Tainted: G S  W   4.9.44 #1
> Hardware name: MediaTek MT2712m1v1 board (DT)
> task: ffc0eaa5b280 task.stack: ffc0e9858000
> PC is at arm_v7s_map+0x40/0xf8
> LR is at mtk_iommu_map+0x64/0x90
> pc : [] lr : [] pstate: 01c5
> sp : ffc0e985b920
> x29: ffc0e985b920 x28: 000127d0
> x27: 0010 x26: ff8008f9e000
> x25: 0003 x24: 0010
> x23: 000127d0 x22: ff80
> x21: ffc0f7ec8ce0 x20: 0003
> x19: 0003 x18: 0002
> x17: 007f7e5d72c0 x16: ff80082b0f08
> x15: 0001 x14: 003f
> x13:  x12: 0028
> x11: 0088 x10: 
> x9 : ff80092fa000 x8 : ffc0e9858000
> x7 : ff80085b29d8 x6 : 
> x5 : ff80085b09a8 x4 : 0003
> x3 : 0010 x2 : 000127d0
> x1 : ff80 x0 : 0001
> ...
> Call trace:
> [] arm_v7s_map+0x40/0xf8
> [] mtk_iommu_map+0x64/0x90
> [] iommu_map+0x100/0x3a0
> [] default_iommu_map_sg+0x104/0x168
> [] iommu_dma_alloc+0x238/0x3f8
> [] __iommu_alloc_attrs+0xa8/0x260
> [] mtk_drm_gem_create+0xac/0x180
> [] mtk_drm_gem_dumb_create+0x54/0xc8
> [] drm_mode_create_dumb_ioctl+0xa4/0xd8
> [] drm_ioctl+0x1c0/0x490
> 
> In order to satify this, Limit the physical address to 32bit.
> 
> Signed-off-by: Yong Wu 

Based on my understanding of how this works, and from a conversation with
Robin:

Acked-by: Will Deacon 

I guess Joerg will pick this one up directly. I'll queue your other fix in
case I get any more SMMU fixes.

Will

> ---
> Base on v4.14-rc1.
> ---
>  drivers/iommu/mtk_iommu.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
> index bd515be..16d33ac 100644
> --- a/drivers/iommu/mtk_iommu.c
> +++ b/drivers/iommu/mtk_iommu.c
> @@ -371,7 +371,8 @@ static int mtk_iommu_map(struct iommu_domain *domain, 
> unsigned long iova,
>   int ret;
>  
>   spin_lock_irqsave(&dom->pgtlock, flags);
> - ret = dom->iop->map(dom->iop, iova, paddr, size, prot);
> + ret = dom->iop->map(dom->iop, iova, paddr & DMA_BIT_MASK(32),
> + size, prot);
>   spin_unlock_irqrestore(&dom->pgtlock, flags);
>  
>   return ret;
> -- 
> 1.8.1.1.dirty
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] [RFC] virtio-iommu version 0.4

2017-09-25 Thread Jean-Philippe Brucker
On 21/09/17 07:41, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Wednesday, September 6, 2017 7:49 PM
>>
>>
>>> 2.6.8.2.1
>>> Multiple overlapping RESV_MEM properties are merged together. Device
>>> requirement? if same types I assume?
>>
>> Combination rules apply to device and driver. When the driver puts
>> multiple endpoints in the same domain, combination rules apply. They will
>> become important once the guest attempts to do things like combining
>> endpoints with different PASID capabilities in the same domain. I'm
>> considering these endpoints to be behind different physical IOMMUs.
>>
>> For reserved regions, we specify what the driver should do and what the
>> device should enforce with regard to mapping IOVAs of overlapping regions.
>> For example, if a device has some virtual address RESV_T_MSI and an other
>> device has the same virtual address RESV_T_IDENTITY, what should the
>> driver do?
>>
>> I think it should apply the RESV_T_IDENTITY. RESV_T_MSI is just a special
>> case of RESV_T_RESERVED, it's a hint for the IRQ subsystem and doesn't
>> have a meaning within a domain. From DMA mappings point of view, it is
>> effectively the same as RESV_T_RESERVED. When merging
>> RESV_T_RESERVED and
>> RESV_T_IDENTITY, we should make it RESV_T_IDENTITY. Because it is
>> required
>> for one endpoint to work (the one with IDENTITY) and is not accessed by
>> the other (the driver must not allocate this IOVA range for DMA).
>>
>> More generally, I'd like to add the following combination table to the
>> spec, that shows the resulting reserved region type after merging regions
>> from two endpoints. N: no reserved region, R: RESV_T_RESERVED, M:
>> RESV_T_MSI, I: RESV_T_IDENTITY. It is symmetric so I didn't fill the
>> bottom half.
>>
>>   | N | R | M | I
>> --
>> N | N | R | M | ?
>> --
>> R |   | R | R | I
>> --
>> M |   |   | M | I
>> --
>> I |   |   |   | I
>>
>> The 'I' column is problematic. If one endpoint has an IDENTITY region and
>> the other doesn't have anything at that virtual address, then making that
>> region an identity mapping will result in the second endpoint being able
>> to access firmware memory. If using nested translation, stage-2 can help
>> us here. But in general we should allow the device to reject an attach
>> that would result in a N+I combination if the host considers it too dodgy.
>> I think the other combinations are safe enough.
>>
> 
> will overlap happen in real? Can we simplify the spec to have device
> not reporting overlapping regions?

I think it's likely to happen, to have two endpoints with different
overlapping types. Reporting is easy, but handling mismatched reserved
regions is hard. So what I can do is leave these combination rules as
implementation detail, and let the device reject any domain attach that
overlaps mismatched RESV regions.

After all, putting multiple devices in the same domain is a nice feature
but might not be widely used in reality. It's not worth spending too much
time describing all possible behaviors for it (but still worth thinking
about to avoid introducing security holes).

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [virtio-dev] RE: [RFC] virtio-iommu version 0.4

2017-09-25 Thread Jean-Philippe Brucker
On 21/09/17 07:27, Tian, Kevin wrote:
>> From: Jean-Philippe Brucker
>> Sent: Wednesday, September 6, 2017 7:55 PM
>>
>> Hi Kevin,
>>
>> On 28/08/17 08:39, Tian, Kevin wrote:
>>> Here comes some comments:
>>>
>>> 1.1 Motivation
>>>
>>> You describe I/O page faults handling as future work. Seems you
>> considered
>>> only recoverable fault (since "aka. PCI PRI" being used). What about other
>>> unrecoverable faults e.g. what to do if a virtual DMA request doesn't find
>>> a valid mapping? Even when there is no PRI support, we need some basic
>>> form of fault reporting mechanism to indicate such errors to guest.
>>
>> I am considering recoverable faults as the end goal, but reporting
>> unrecoverable faults should use the same queue, with slightly different
>> fields and no need for the driver to reply to the device.
> 
> what about adding a placeholder for now? Though same mechanism
> can be reused, it's an essential part to make virtio-iommu architecture
> complete even before talking support for recoverable faults. :-)

I'll see if I can come up with something simple for v0.5, but it seems
like a big chunk of work. I don't really know what to report to the guest
at the moment. I don't want to report vendor-specific details about the
fault, but it should still be useful content, to let the guest decide
whether they need to reset/kill the device or just print something

[...]
>> Yes I think adding MEM_T_IDENTITY will be necessary. I can see they are
>> used for both iGPU and USB controllers on my x86 machines. Do you know
>> more precisely what they are used for by the firmware?
> 
> VTd spec has a clear description:
> 
> 3.14 Handling Requests to Reserved System Memory
> Reserved system memory regions are typically allocated by BIOS at boot 
> time and reported to OS as reserved address ranges in the system memory 
> map. Requests-without-PASID to these reserved regions may either occur 
> as a result of operations performed by the system software driver (for 
> example in the case of DMA from unified memory access (UMA) graphics 
> controllers to graphics reserved memory), or may be initiated by non 
> system software (for example in case of DMA performed by a USB 
> controller under BIOS SMM control for legacy keyboard emulation). 
> For proper functioning of these legacy reserved memory usages, when 
> system software enables DMA remapping, the second-level translation 
> structures for the respective devices are expected to be set up to provide
> identity mapping for the specified reserved memory regions with read 
> and write permissions.
> 
> (one specific example for GPU happens in legacy VGA usage in early
> boot time before actual graphics driver is loaded)

Thanks for the explanation. So it is only legacy, and enabling nested mode
would be forbidden for a device with Reserved System Memory regions? I'm
wondering if virtio-iommu RESV regions will be extended to affect a
specific PASIDs (or all requests-with-PASID) in the future.
>> It's not necessary with the base virtio-iommu device though (v0.4),
>> because the device can create the identity mappings itself and report them
>> to the guest as MEM_T_BYPASS. However, when we start handing page
> 
> when you say "the device can create ...", I think you really meant
> "host iommu driver can create identity mapping for assigned device",
> correct?
> 
> Then yes, I think above works.

Yes it can be the host IOMMU driver, or simply Qemu sending VFIO ioctls to
create those identity mappings (they are reported in sysfs reserved_regions).

>> table
>> control over to the guest, the host won't be in control of IOVA->GPA
>> mappings and will need to gracefully ask the guest to do it.
>>
>> I'm not aware of any firmware description resembling Intel RMRR or AMD
>> IVMD on ARM platforms. I do think ARM platforms could need
>> MEM_T_IDENTITY
>> for requesting the guest to map MSI windows when page-table handover is
>> in
>> use (MSI addresses are translated by the physical SMMU, so a IOVA->GPA
>> mapping must be installed by the guest). But since a vSMMU would need a
>> solution as well, I think I'll try to implement something more generic.
> 
> curious do you need identity mapping full IOVA->GPA->HPA translation, 
> or just in GPA->HPA stage sufficient for above MSI scenario?

It has to be IOVA->GPA->HPA. So it'll be a bit complicated to implement
for us, I think we're going to need a VFIO ioctl to tell the host what
IOVA the guest allocated for its MSI, but it's not ideal.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: bind pasid table API

2017-09-25 Thread Jean-Philippe Brucker
On 21/09/17 04:00, Liu, Yi L wrote:
> Hi Jean,
> 
>> -Original Message-
>> From: Jean-Philippe Brucker [mailto:jean-philippe.bruc...@arm.com]
>> Sent: Wednesday, September 20, 2017 8:10 PM
>> To: Pan, Jacob jun ; iommu@lists.linux-
>> foundation.org
>> Cc: Liu, Yi L ; Raj, Ashok ; David
>> Woodhouse ; Joerg Roedel ; Tian,
>> Kevin ; Auger Eric 
>> Subject: Re: bind pasid table API
>>
>> Hi Jacob,
>>
>> [Adding Eric as he might need pasid_table_info for vSVM at some point]
>>
>> On 19/09/17 04:45, Jacob Pan wrote:
>>> Hi Jean and All,
>>>
>>> This is a follow-up on the LPC discussion we had last week.
>>> (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
>>>
>>> My understanding is that the data structure below can satisfy the
>>> needs from Intel (pointer + size) and AMD (pointer only). But ARM
>>> pvIOMMU would need additional info to indicate the page table format.
>>> Could you share your idea of the right addition for ARM such that we
>>> can have a unified API?
>>>
>>> /**
>>>  * PASID table data used to bind guest PASID table to the host IOMMU.
>>> This will
>>>  * enable guest managed first level page tables.
>>>  * @ptr:PASID table pointer
>>>  * @size_order: number of bits supported in the guest PASID table, must
>> be less
>>>  *  or equal than the host table size.
>>>  */
>>> struct pasid_table_info {
>>> __u64   ptr;
>>> __u64   size_order;
>>> };
>>
>> For the PASID table, Arm SMMUv3 would need two additional fields:
>> * 'format' telling whether the table has 1 or 2 levels and their
>>   dimensions,
>> * 'default_substream' telling if PASID0 is reserved for non-pasid traffic.
>>
>> I think that's it for the moment, but it does require to leave space for a 
>> vendor-
>> specific structure at the end. It is one reason why I'd prefer having a 
>> 'model' field
>> in the pasid_table_info structure telling what fields the whole structure 
>> actually
>> contains.
>>
>> Another reason is if some IOMMU is able to support multiple PASID table
>> formats, it could advertise them all in sysfs and Qemu could tell which one 
>> it
>> chose in 'model'. I'm not sure we'll ever see that in practice.
> 
> Regards to your idea, I think the whole flow may be:
> * Qemu queries underlying IOMMU for the capability(through sysfs)
> * Combined with requirement from user(the guy who starts the VM), Qemu chose a
>suitable model or exit if HW capability is incompatible with user's 
> requirement
> * In the coming bind_pasid_table() calling, Qemu pass the chosen model info 
> to host
> * Host checks the "model" and use the correct model specific structure to 
> parse
>the model specific data. may be kind of structure intel_pasid_table_info,
>amd_pasid_table_info or arm_pasid_table_info_v#
> 
> Does this flow show what you want? This would be a "model" + "model specific 
> data"
> proposal. And my concern is the model specific field may look like to be kind 
> of "opaque".

It doesn't have to be opaque, it has to be defined in UAPI and an
intermediate layer (VFIO) can verify that they are present. The layer can
also inspect their value more deeply if it wants, since it is clear what
they are supposed to contain. But (in VFIO's case) it doesn't have too,
since the IOMMU driver will validate the values anyway.

> Besides the comments above, is there also possibility for us to put all the 
> possible info
> in a super-set just as what we plan to do for the tlb_invalidate() API?

It's possible, but might require too much effort and might become more
difficult to do once the number of IOMMU drivers supporting PASID
increases. (See my other reply to Jacob)

>> For binding page tables instead of PASID tables (e.g. virtio-iommu), the 
>> generic
>> data would be:
>>
>> struct pgtable_info {
>>  __u32   pasid;
>>  __u64   ptr;
>>  __u32   model;
>>  __u8model_data[];

This should be an union actually, since it's internal to the kernel. It's
VFIO that uses this interface.

>> };
> 
> Besides bind_pasid_table API, you would also want to propose an extra API
> which likely to be named as bind_pgtable() for this page table binding?

Yes, but we could use the same bind() function taking an enum {process,
pgtable, pasid_table} and the parameter structure.

> What would be the "model" field indicate? "vendor" or "vendor+version" ? You
> may want a length field to indicate the size of "model_data" field.

I think it should be vendor+version.

> And same with the bind_pasid_table API, would model_data look like an 
> "opaque"?
> 
>> Followed by a few arch-specific configuration values. For Arm we can 
>> summarize
>> this to three registers, defined in the Armv8 Architecture Reference Manual:
>>
>> struct arm_lpae_pgtable_info {
>>  __u64   tcr;/* Translation Control Register */
>>  __u64   mair;   /* Memory Attributes Indirection Register */
>>  __u64   asid;   /* Address Space ID */
>> };
> 
> Hmmm, just for curious. What is the difference between "p

Re: bind pasid table API

2017-09-25 Thread Jean-Philippe Brucker
On 20/09/17 23:35, Jacob Pan wrote:
> On Wed, 20 Sep 2017 13:09:47 +0100
> Jean-Philippe Brucker  wrote:
> 
>> Hi Jacob,
>>
>> [Adding Eric as he might need pasid_table_info for vSVM at some point]
>>
>> On 19/09/17 04:45, Jacob Pan wrote:
>>> Hi Jean and All,
>>>
>>> This is a follow-up on the LPC discussion we had last week.
>>> (https://linuxplumbersconf.org/2017/ocw/proposals/4748)
>>>
>>> My understanding is that the data structure below can satisfy the
>>> needs from Intel (pointer + size) and AMD (pointer only). But ARM
>>> pvIOMMU would need additional info to indicate the page table
>>> format. Could you share your idea of the right addition for ARM
>>> such that we can have a unified API?
>>>
>>> /**
>>>  * PASID table data used to bind guest PASID table to the host
>>> IOMMU. This will
>>>  * enable guest managed first level page tables.
>>>  * @ptr:PASID table pointer
>>>  * @size_order: number of bits supported in the guest PASID
>>> table, must be less
>>>  *  or equal than the host table size.
>>>  */
>>> struct pasid_table_info {
>>> __u64   ptr;
>>> __u64   size_order;
>>> };  
>>
>> For the PASID table, Arm SMMUv3 would need two additional fields:
>> * 'format' telling whether the table has 1 or 2 levels and their
>>   dimensions,
> just wondering if the format will be different than whatever is used on
> the host? i.e. could there be a mismatch? I am ok with this field just
> wondering if this can be resolved via the sysfs query interface if that
> is a global thing.

This format tells how the guest organizes its PASID tables. Depending on
'format', the PASID table can be:
* A flat array of descriptors
* One array of 1st-level descriptors pointing to a 2nd level of
descriptors. When translating an access, the SMMU splits the PASID at bit
6, uses the upper half to get the 1st level descriptor and the lower half
to get the 2nd level one (4k leaf table).
* The same format, but instead split at PASID bit 10 (64k leaf table)

So the guest has to tell the host what format of PASID tables it chose,
but the configuration might be too vendor-specific to go in the generic
part of pasid_table_info.

It seems that AMD IOMMUv2 can have either one, two or three levels of
PASID tables. See definition of GLX field in the 3.2.2.1 Device Table
Entry Format.

We could try to come up with a generic definition to cover all these
cases, but it might be more complicated than appending a few
vendor-specific fields. Only AMD can have three levels, only ARM can split
the PASID in two different places. And we can expect a future architecture
to come up with a new colorful format...

I'm mostly concerned about VFIO though, as the IOMMU API doesn't have the
same backward compatible restriction and is therefore easier to update.

>> * 'default_substream' telling if PASID0 is reserved for non-pasid
>> traffic.
>>
> could it be part of a feature flag? i.e.
> 
> struct pasid_table_info {
>   __u64   ptr;
>   __u64   size_order;
>   __u64   flags;
> #define PASID0_FOR_DMA_WITHOUT_PASID;
>   enum pasid_table_format format;
> };  
> 
>> I think that's it for the moment, but it does require to leave space
>> for a vendor-specific structure at the end. It is one reason why I'd
>> prefer having a 'model' field in the pasid_table_info structure
>> telling what fields the whole structure actually contains.
>>
> I think we have been there before. the downside is that model specific
> knowledge is required in the generic VFIO layer, if its content is to
> be inspected.

Yes, all of these vendor-specific fields would have to be UAPI. VFIO
doesn't necessarily have to deeply inspect each field, it can just check
that the structure contains the right fields (is the right size) and let
the IOMMU driver, that knows how to read these values, validate that they
make sense.

I think VFIO should be able to trust the IOMMU driver to perform the
sanity checks.

>> Another reason is if some IOMMU is able to support multiple PASID
>> table formats, it could advertise them all in sysfs and Qemu could
>> tell which one it chose in 'model'. I'm not sure we'll ever see that
>> in practice.
>>
> I would expect when query interface between QEMU and sysfs would ensure
> matching format prior to issue bind_pasid_table call.
>>
>>
>> For binding page tables instead of PASID tables (e.g. virtio-iommu),
>> the generic data would be:
>>
>> struct pgtable_info {
>>  __u32   pasid;
>>  __u64   ptr;
>>  __u32   model;
>>  __u8model_data[];
>> };
>>
>> Followed by a few arch-specific configuration values. For Arm we can
>> summarize this to three registers, defined in the Armv8 Architecture
>> Reference Manual:
>>
>> struct arm_lpae_pgtable_info {
>>  __u64   tcr;/* Translation Control Register */
>>  __u64   mair;   /* Memory Attributes Indirection
>> Register */ __u64asid;   /* Address Space ID */
>> };
>>
>> Some data packed in the TCR might be common to most architectures,
>> like

[PATCH] iommu/mediatek: Limit the physical address in 32bit for v7s

2017-09-25 Thread Yong Wu
The ARM short descriptor has already limited the physical address
to 32bit after the commit <76557391433c> ("iommu/io-pgtable: Sanitise
map/unmap addresses"). But in MediaTek 4GB mode, the physical address
is from 0x1__ to 0x1__. this will cause:

WARNING: CPU: 4 PID: 3900 at
xxx/drivers/iommu/io-pgtable-arm-v7s.c:482 arm_v7s_map+0x40/0xf8
Modules linked in:

CPU: 4 PID: 3900 Comm: weston Tainted: G S  W   4.9.44 #1
Hardware name: MediaTek MT2712m1v1 board (DT)
task: ffc0eaa5b280 task.stack: ffc0e9858000
PC is at arm_v7s_map+0x40/0xf8
LR is at mtk_iommu_map+0x64/0x90
pc : [] lr : [] pstate: 01c5
sp : ffc0e985b920
x29: ffc0e985b920 x28: 000127d0
x27: 0010 x26: ff8008f9e000
x25: 0003 x24: 0010
x23: 000127d0 x22: ff80
x21: ffc0f7ec8ce0 x20: 0003
x19: 0003 x18: 0002
x17: 007f7e5d72c0 x16: ff80082b0f08
x15: 0001 x14: 003f
x13:  x12: 0028
x11: 0088 x10: 
x9 : ff80092fa000 x8 : ffc0e9858000
x7 : ff80085b29d8 x6 : 
x5 : ff80085b09a8 x4 : 0003
x3 : 0010 x2 : 000127d0
x1 : ff80 x0 : 0001
...
Call trace:
[] arm_v7s_map+0x40/0xf8
[] mtk_iommu_map+0x64/0x90
[] iommu_map+0x100/0x3a0
[] default_iommu_map_sg+0x104/0x168
[] iommu_dma_alloc+0x238/0x3f8
[] __iommu_alloc_attrs+0xa8/0x260
[] mtk_drm_gem_create+0xac/0x180
[] mtk_drm_gem_dumb_create+0x54/0xc8
[] drm_mode_create_dumb_ioctl+0xa4/0xd8
[] drm_ioctl+0x1c0/0x490

In order to satify this, Limit the physical address to 32bit.

Signed-off-by: Yong Wu 
---
Base on v4.14-rc1.
---
 drivers/iommu/mtk_iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index bd515be..16d33ac 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -371,7 +371,8 @@ static int mtk_iommu_map(struct iommu_domain *domain, 
unsigned long iova,
int ret;
 
spin_lock_irqsave(&dom->pgtlock, flags);
-   ret = dom->iop->map(dom->iop, iova, paddr, size, prot);
+   ret = dom->iop->map(dom->iop, iova, paddr & DMA_BIT_MASK(32),
+   size, prot);
spin_unlock_irqrestore(&dom->pgtlock, flags);
 
return ret;
-- 
1.8.1.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA

2017-09-25 Thread Robin Murphy
On 25/09/17 10:28, Yong Wu wrote:
> Fix the commit 81b3c2521844 ("iommu/io-pgtable: Introduce explicit
> coherency"). If there is no IO_PGTABLE_QUIRK_NO_DMA, we should call
> dma_sync_single_for_device for cache synchronization.

Oops!

Reviewed-by: Robin Murphy 

Thanks,
Robin.

> Signed-off-by: Yong Wu 
> ---
> Rebased on v4.14-rc1.
> ---
>  drivers/iommu/io-pgtable-arm-v7s.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
> b/drivers/iommu/io-pgtable-arm-v7s.c
> index d665d0d..6961fc3 100644
> --- a/drivers/iommu/io-pgtable-arm-v7s.c
> +++ b/drivers/iommu/io-pgtable-arm-v7s.c
> @@ -245,7 +245,7 @@ static void __arm_v7s_free_table(void *table, int lvl,
>  static void __arm_v7s_pte_sync(arm_v7s_iopte *ptep, int num_entries,
>  struct io_pgtable_cfg *cfg)
>  {
> - if (!(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA))
> + if (cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)
>   return;
>  
>   dma_sync_single_for_device(cfg->iommu_dev, __arm_v7s_dma_addr(ptep),
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] iommu/io-pgtable-arm-v7s: Need dma-sync while there is no QUIRK_NO_DMA

2017-09-25 Thread Yong Wu
Fix the commit 81b3c2521844 ("iommu/io-pgtable: Introduce explicit
coherency"). If there is no IO_PGTABLE_QUIRK_NO_DMA, we should call
dma_sync_single_for_device for cache synchronization.

Signed-off-by: Yong Wu 
---
Rebased on v4.14-rc1.
---
 drivers/iommu/io-pgtable-arm-v7s.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index d665d0d..6961fc3 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -245,7 +245,7 @@ static void __arm_v7s_free_table(void *table, int lvl,
 static void __arm_v7s_pte_sync(arm_v7s_iopte *ptep, int num_entries,
   struct io_pgtable_cfg *cfg)
 {
-   if (!(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA))
+   if (cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)
return;
 
dma_sync_single_for_device(cfg->iommu_dev, __arm_v7s_dma_addr(ptep),
-- 
1.8.1.1.dirty

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST with LPAE

2017-09-25 Thread Geert Uytterhoeven
Hi Will,

On Wed, Jul 12, 2017 at 7:16 PM, Linux Kernel Mailing List
 wrote:
> Web:
> https://git.kernel.org/torvalds/c/c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> Commit: c1004803b40596c1aabbbc78a6b1b33e4dfd96c6
> Parent: 58188afeb727e0f73706f1460707bd3ba6ccc221
> Refname:refs/heads/master
> Author: Will Deacon 
> AuthorDate: Fri Jun 23 11:45:57 2017 +0100
> Committer:  Will Deacon 
> CommitDate: Fri Jun 23 17:58:02 2017 +0100
>
> iommu/io-pgtable: depend on !GENERIC_ATOMIC64 when using COMPILE_TEST 
> with LPAE
>
> The LPAE/ARMv8 page table format relies on the ability to read and write
> 64-bit page table entries in an atomic fashion. With the move to a 
> lockless
> implementation, we also need support for cmpxchg64 to resolve races when
> installing table entries concurrently.
>
> Unfortunately, not all architectures support cmpxchg64, so the code can
> fail to compiler when building for these architectures using COMPILE_TEST.
> Rather than disable COMPILE_TEST altogether, instead check that
> GENERIC_ATOMIC64 is not selected, which is a reasonable indication that
> the architecture has support for 64-bit cmpxchg.
>
> Reported-by: kbuild test robot 
> Signed-off-by: Will Deacon 
> ---
>  drivers/iommu/Kconfig | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 6ee3a25ae731..c88cfa7522b2 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -23,7 +23,7 @@ config IOMMU_IO_PGTABLE
>  config IOMMU_IO_PGTABLE_LPAE
> bool "ARMv7/v8 Long Descriptor Format"
> select IOMMU_IO_PGTABLE
> -   depends on HAS_DMA && (ARM || ARM64 || COMPILE_TEST)
> +   depends on HAS_DMA && (ARM || ARM64 || (COMPILE_TEST && 
> !GENERIC_ATOMIC64))
> help
>   Enable support for the ARM long descriptor pagetable format.
>   This allocator supports 4K/2M/1G, 16K/32M and 64K/512M page

I can't find where this patch was submitted and discussed, so I'm replying
to this email. On which architectures did it fail to compile?

cmpxchg64() is defined by include/asm-generic/cmpxchg.h, so I fail to
see what's the relation with GENERIC_ATOMIC64, which is related to
lib/atomic64.c instead.

E.g. on m68k, which uses GENERIC_ATOMIC64, it compile-tested fine before.

Perhaps there's another (SMP vs UP?) dependency, as
include/asm-generic/cmpxchg.h cannot be used on SMP?
Should it be COMPILE_TEST && (!GENERIC_ATOMIC64 || !SMP)?

Thanks!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu