date:20190611

Re: [PATCH v8 2/7] x86/dma: use IS_ENABLED() to simplify the code

2019-06-11 Thread Borislav Petkov

On Thu, May 30, 2019 at 11:48:26AM +0800, Zhen Lei wrote:
> This patch removes the ifdefs around CONFIG_IOMMU_DEFAULT_PASSTHROUGH to
> improve readablity.

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

> Signed-off-by: Zhen Lei 
> ---
>  arch/x86/kernel/pci-dma.c | 7 ++-
>  1 file changed, 2 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index dcd272dbd0a9330..9f2b19c35a060df 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -43,11 +43,8 @@
>   * It is also possible to disable by default in kernel config, and enable 
> with
>   * iommu=nopt at boot time.
>   */
> -#ifdef CONFIG_IOMMU_DEFAULT_PASSTHROUGH
> -int iommu_pass_through __read_mostly = 1;
> -#else
> -int iommu_pass_through __read_mostly;
> -#endif
> +int iommu_pass_through __read_mostly =
> + IS_ENABLED(CONFIG_IOMMU_DEFAULT_PASSTHROUGH);

Let that line stick out.

Thx.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

Re: [PATCH v4 3/9] swiotlb: Zero out bounce buffer for untrusted device

2019-06-11 Thread Lu Baolu


Hi,

On 6/12/19 9:05 AM, Konrad Rzeszutek Wilk wrote:

On Wed, Jun 12, 2019 at 08:43:40AM +0800, Lu Baolu wrote:

Hi Konrad,

Thanks a lot for your reviewing.

On 6/10/19 11:45 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:14AM +0800, Lu Baolu wrote:

This is necessary to avoid exposing valid kernel data to any
milicious device.


malicious


Yes, thanks.





Suggested-by: Christoph Hellwig 
Signed-off-by: Lu Baolu 
---
   kernel/dma/swiotlb.c | 6 ++
   1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f956f785645a..ed41eb7f6131 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,7 @@
   #include 
   #include 
   #include 
+#include 
   #ifdef CONFIG_DEBUG_FS
   #include 
   #endif
@@ -560,6 +561,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+
+   /* Zero out the bounce buffer if the consumer is untrusted. */
+   if (dev_is_untrusted(hwdev))
+   memset(phys_to_virt(tlb_addr), 0, alloc_size);


What if the alloc_size is less than a PAGE? Should this at least have ALIGN or 
such?


It's the consumer (iommu subsystem) who requires this to be page
aligned. For swiotlb, it just clears out all data in the allocated
bounce buffer.


I am thinking that the if you don't memset the full page the malicious hardware 
could read stale date from the rest of the page
that hasn't been cleared?


Yes. My point is that this should be guaranteed by the bounce page
implementation in iommu.

Best regards,
Baolu





Best regards,
Baolu




+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
--
2.17.1

Re: [PATCH v4 0/9] iommu: Bounce page for untrusted devices

2019-06-11 Thread Lu Baolu


Hi,

On 6/11/19 12:10 AM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:11AM +0800, Lu Baolu wrote:

The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.


.. Forgot to ask but should the patches also include the CVE number?
Or at least the last one that enables this?


I am sorry, but what's CVE number and where could I get one?

Best regards,
Baolu

Re: [PATCH v4 0/9] iommu: Bounce page for untrusted devices

2019-06-11 Thread Lu Baolu


Hi,

On 6/10/19 11:42 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:11AM +0800, Lu Baolu wrote:

The Thunderbolt vulnerabilities are public and have a nice
name as Thunderclap [1] [3] nowadays. This patch series aims
to mitigate those concerns.

An external PCI device is a PCI peripheral device connected
to the system through an external bus, such as Thunderbolt.
What makes it different is that it can't be trusted to the
same degree as the devices build into the system. Generally,
a trusted PCIe device will DMA into the designated buffers
and not overrun or otherwise write outside the specified
bounds. But it's different for an external device.

The minimum IOMMU mapping granularity is one page (4k), so
for DMA transfers smaller than that a malicious PCIe device
can access the whole page of memory even if it does not
belong to the driver in question. This opens a possibility
for DMA attack. For more information about DMA attacks
imposed by an untrusted PCI/PCIe device, please refer to [2].

This implements bounce buffer for the untrusted external
devices. The transfers should be limited in isolated pages
so the IOMMU window does not cover memory outside of what
the driver expects. Previously (v3 and before), we proposed
an optimisation to only copy the head and tail of the buffer
if it spans multiple pages, and directly map the ones in the
middle. Figure 1 gives a big picture about this solution.

 swiotlb System
 IOVA  bounce page   Memory
  .-.  .-..-.
  | |  | || |
  | |  | || |
buffer_start .-.  .-..-.
  | |->| |***>| |
  | |  | | swiotlb| |
  | |  | | mapping| |
  IOMMU Page  '-'  '-''-'
   Boundary   | | | |
  | | | |
  | | | |
  | |>| |
  | |IOMMU mapping| |
  | | | |
  IOMMU Page  .-. .-.
   Boundary   | | | |
  | | | |
  | |>| |
  | | IOMMU mapping   | |
  | | | |
  | | | |
  IOMMU Page  .-.  .-..-.
   Boundary   | |  | || |
  | |  | || |
  | |->| |***>| |
   buffer_end '-'  '-' swiotlb'-'
  | |  | | mapping| |
  | |  | || |
  '-'  '-''-'
   Figure 1: A big view of iommu bounce page

As Robin Murphy pointed out, this ties us to using strict mode for
TLB maintenance, which may not be an overall win depending on the
balance between invalidation bandwidth vs. memcpy bandwidth. If we
use standard SWIOTLB logic to always copy the whole thing, we should
be able to release the bounce pages via the flush queue to allow
'safe' lazy unmaps. So since v4 we start to use the standard swiotlb
logic.

 swiotlb System
 IOVA  bounce page   Memory
buffer_start .-.  .-..-.
  | |  | || |
  | |  | || |
  | |  | |.-.physical
  | |->| | -->| |_start
  | |iommu | | swiotlb| |
  | | map  | |   map  | |
  IOMMU Page  .-.  .-.'-'


The prior picture had 'buffer_start' at an offset in the page. I am
assuming you meant that here in as well?


In prior picture, since we only use bounce buffer for head and tail
partial-page buffers, so we need to return buffer_start at the same
offset as the physical buffer.

Here, we use a whole swiotlb bounce buffer, hence we should use the same
offset as the bounce buffer (a.k.a. offset = 0).



Meaning it starts at the same offset as 'physical_start' in the right
side box?


   Boundary   | |  | ||

Re: [PATCH v4 7/9] iommu/vt-d: Add trace events for domain map/unmap

2019-06-11 Thread Lu Baolu


Hi,

On 6/11/19 12:08 AM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:18AM +0800, Lu Baolu wrote:

This adds trace support for the Intel IOMMU driver. It
also declares some events which could be used to trace
the events when an IOVA is being mapped or unmapped in
a domain.


Is that even needed considering SWIOTLB also has tracing events?



Currently there isn't any trace point in swiotlb_tbl_map_single().
If we want to add trace point there, I hope we can distinguish the
bounce page events from other use cases (such as bounce buffer for
direct dma), so that we could calculate how many percents of DMA buffers
used by a specific device driver needs to use bounce page.

Best regards,
Baolu

Re: [PATCH v4 6/9] iommu/vt-d: Check whether device requires bounce buffer

2019-06-11 Thread Lu Baolu


Hi,

On 6/11/19 12:08 AM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:17AM +0800, Lu Baolu wrote:

This adds a helper to check whether a device needs to use bounce
buffer. It also provides a boot time option to disable the bounce
buffer. Users can use this to prevent the iommu driver from using
the bounce buffer for performance gain.

Cc: Ashok Raj  Cc: Jacob Pan
 Cc: Kevin Tian
 Signed-off-by: Lu Baolu
 Tested-by: Xu Pengfei
 Tested-by: Mika Westerberg
 --- 
Documentation/admin-guide/kernel-parameters.txt | 5 + 
drivers/iommu/intel-iommu.c | 6 ++ 2 files

changed, 11 insertions(+)

diff --git a/Documentation/admin-guide/kernel-parameters.txt
b/Documentation/admin-guide/kernel-parameters.txt index
138f6664b2e2..65685c6e53e4 100644 ---
a/Documentation/admin-guide/kernel-parameters.txt +++
b/Documentation/admin-guide/kernel-parameters.txt @@ -1728,6
+1728,11 @@ Note that using this option lowers the security 
provided by tboot because it makes the system vulnerable to DMA

attacks. +  nobounce [Default off] +Do not 
use the bounce buffer
for untrusted devices like +the Thunderbolt devices. This 
will
treat the untrusted


My brain has sometimes a hard time parsing 'Not' and 'un'. Could this
be:

Disable bounce buffer for unstrusted devices ..?



Fair enough.



And perhaps call it 'noswiotlb' ? Not everyone knows that SWIOTLB =
bounce buffer.


As I said in previous thread, swiotlb is not only used for BOUNCE_PAGE
case, but also used by direct dma APIs. Will it cause confusion?

Anyway, I have no strong feeling to use 'nobounce' or 'noswiotlb'. It's
a driver specific switch for debugging purpose. People suggested that we
should move this switch into pci module, but I heard that it's more
helpful to implement per-device switch for "trusted' or "untrusted".
So I kept this untouched in this version.




+   devices as the trusted ones, hence might expose 
security +
risks of DMA attacks.

intel_idle.max_cstate=  [KNL,HW,ACPI,X86] 0 disables intel_idle and
fall back on acpi_idle. diff --git a/drivers/iommu/intel-iommu.c
b/drivers/iommu/intel-iommu.c index 235837c50719..41439647f75d
100644 --- a/drivers/iommu/intel-iommu.c +++
b/drivers/iommu/intel-iommu.c @@ -371,6 +371,7 @@ static int
dmar_forcedac; static int intel_iommu_strict; static int
intel_iommu_superpage = 1; static int iommu_identity_mapping; 
+static int intel_no_bounce;


intel_swiotlb_on = 1 ?



#define IDENTMAP_ALL1 #define IDENTMAP_GFX  2 @@ -384,6 
+385,8 @@
EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped); static
DEFINE_SPINLOCK(device_domain_lock); static
LIST_HEAD(device_domain_list);

+#define device_needs_bounce(d) (!intel_no_bounce &&
dev_is_untrusted(d)) + /* * Iterate over elements in
device_domain_list and call the specified * callback @fn against
each element. @@ -466,6 +469,9 @@ static int __init
intel_iommu_setup(char *str) printk(KERN_INFO "Intel-IOMMU: not
forcing on after tboot. This could expose security risk for
tboot\n"); intel_iommu_tboot_noforce = 1; +} else if
(!strncmp(str, "nobounce", 8)) { +pr_info("Intel-IOMMU: No
bounce buffer. This could expose security risks of DMA
attacks\n");


Again, Intel-IOMMU: No SWIOTLB. T.. blah blah'

Asking for this as doing 'dmesg | grep SWIOTLB' will expose nicely
all the SWIOTLB invocations..


Yes. Will refine this.

Best regards,
Baolu

Re: [PATCH v4 5/9] iommu/vt-d: Don't switch off swiotlb if use direct dma

2019-06-11 Thread Lu Baolu


Hi,

On 6/10/19 11:54 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:16AM +0800, Lu Baolu wrote:

The direct dma implementation depends on swiotlb. Hence, don't
switch of swiotlb since direct dma interfaces are used in this


s/of/off/


Yes.




driver.


But I think you really want to leave the code as is but change
the #ifdef to check for IOMMU_BOUNCE_PAGE and not CONFIG_SWIOTLB.

As one could disable IOMMU_BOUNCE_PAGE.


SWIOTLB is not only used when IOMMU_BOUCE_PAGE is enabled.

Please check this code:

static dma_addr_t intel_map_page(struct device *dev, struct page *page,
 unsigned long offset, size_t size,
 enum dma_data_direction dir,
 unsigned long attrs)
{
if (iommu_need_mapping(dev))
return __intel_map_single(dev, page_to_phys(page) + offset,
size, dir, *dev->dma_mask);
return dma_direct_map_page(dev, page, offset, size, dir, attrs);
}

The dma_direct_map_page() will eventually call swiotlb_map() if the
buffer is beyond the address capability of the device.

Best regards,
Baolu



Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Mika Westerberg 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/intel-iommu.c | 6 --
  1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d5a6c8064c56..235837c50719 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4625,9 +4625,6 @@ static int __init platform_optin_force_iommu(void)
iommu_identity_mapping |= IDENTMAP_ALL;
  
  	dmar_disabled = 0;

-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)
-   swiotlb = 0;
-#endif
no_iommu = 0;
  
  	return 1;

@@ -4765,9 +4762,6 @@ int __init intel_iommu_init(void)
}
up_write(_global_lock);
  
-#if defined(CONFIG_X86) && defined(CONFIG_SWIOTLB)

-   swiotlb = 0;
-#endif
dma_ops = _dma_ops;
  
  	init_iommu_pm_ops();

--
2.17.1

Re: [PATCH v4 3/9] swiotlb: Zero out bounce buffer for untrusted device

2019-06-11 Thread Konrad Rzeszutek Wilk

On Wed, Jun 12, 2019 at 08:43:40AM +0800, Lu Baolu wrote:
> Hi Konrad,
> 
> Thanks a lot for your reviewing.
> 
> On 6/10/19 11:45 PM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Jun 03, 2019 at 09:16:14AM +0800, Lu Baolu wrote:
> > > This is necessary to avoid exposing valid kernel data to any
> > > milicious device.
> > 
> > malicious
> 
> Yes, thanks.
> 
> > 
> > > 
> > > Suggested-by: Christoph Hellwig 
> > > Signed-off-by: Lu Baolu 
> > > ---
> > >   kernel/dma/swiotlb.c | 6 ++
> > >   1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> > > index f956f785645a..ed41eb7f6131 100644
> > > --- a/kernel/dma/swiotlb.c
> > > +++ b/kernel/dma/swiotlb.c
> > > @@ -35,6 +35,7 @@
> > >   #include 
> > >   #include 
> > >   #include 
> > > +#include 
> > >   #ifdef CONFIG_DEBUG_FS
> > >   #include 
> > >   #endif
> > > @@ -560,6 +561,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
> > > *hwdev,
> > >*/
> > >   for (i = 0; i < nslots; i++)
> > >   io_tlb_orig_addr[index+i] = orig_addr + (i << 
> > > IO_TLB_SHIFT);
> > > +
> > > + /* Zero out the bounce buffer if the consumer is untrusted. */
> > > + if (dev_is_untrusted(hwdev))
> > > + memset(phys_to_virt(tlb_addr), 0, alloc_size);
> > 
> > What if the alloc_size is less than a PAGE? Should this at least have ALIGN 
> > or such?
> 
> It's the consumer (iommu subsystem) who requires this to be page
> aligned. For swiotlb, it just clears out all data in the allocated
> bounce buffer.

I am thinking that the if you don't memset the full page the malicious hardware 
could read stale date from the rest of the page
that hasn't been cleared?

> 
> Best regards,
> Baolu
> 
> > 
> > > +
> > >   if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
> > >   (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
> > >   swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
> > > DMA_TO_DEVICE);
> > > -- 
> > > 2.17.1
> > > 
> >

Re: [PATCH v4 4/9] iommu: Add bounce page APIs

2019-06-11 Thread Lu Baolu


Hi Pavel,

Thanks a lot for your reviewing.

On 6/11/19 8:10 PM, Pavel Begunkov wrote:



On 03/06/2019 04:16, Lu Baolu wrote:

IOMMU hardware always use paging for DMA remapping.  The
minimum mapped window is a page size. The device drivers
may map buffers not filling whole IOMMU window. It allows
device to access to possibly unrelated memory and various
malicious devices can exploit this to perform DMA attack.

This introduces the bouce buffer mechanism for DMA buffers
which doesn't fill a minimal IOMMU page. It could be used
by various vendor specific IOMMU drivers as long as the
DMA domain is managed by the generic IOMMU layer. Below
APIs are added:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
   - Map a buffer start at DMA address @addr in bounce page
 manner. For buffer parts that doesn't cross a whole
 minimal IOMMU page, the bounce page policy is applied.
 A bounce page mapped by swiotlb will be used as the DMA
 target in the IOMMU page table. Otherwise, the physical
 address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
 page will be torn down after the bounced data get synced.

* iommu_bounce_sync(dev, addr, size, dir, target)
   - Synce the bounced data in case the bounce mapped buffer is
 reused.

The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
It's useful for cases where bounce page doesn't needed, for example,
embedded cases.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Alan Cox 
Cc: Mika Westerberg 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/Kconfig |  14 +
  drivers/iommu/iommu.c | 119 ++
  include/linux/iommu.h |  35 +
  3 files changed, 168 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db5221d..d837ec3f359b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -86,6 +86,20 @@ config IOMMU_DEFAULT_PASSTHROUGH
  
  	  If unsure, say N here.
  
+config IOMMU_BOUNCE_PAGE

+   bool "Use bounce page for untrusted devices"
+   depends on IOMMU_API
+   select SWIOTLB
+   help
+ IOMMU hardware always use paging for DMA remapping. The minimum
+ mapped window is a page size. The device drivers may map buffers
+ not filling whole IOMMU window. This allows device to access to
+ possibly unrelated memory and malicious device can exploit this
+ to perform a DMA attack. Select this to use a bounce page for the
+ buffer which doesn't fill a whole IOMU page.
+
+ If unsure, say N here.
+
  config OF_IOMMU
 def_bool y
 depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2a906386bb8e..fa44f681a82b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2246,3 +2246,122 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
return ops->sva_get_pasid(handle);
  }
  EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+
+/*
+ * Bounce buffer support for external devices:
+ *
+ * IOMMU hardware always use paging for DMA remapping. The minimum mapped
+ * window is a page size. The device drivers may map buffers not filling
+ * whole IOMMU window. This allows device to access to possibly unrelated
+ * memory and malicious device can exploit this to perform a DMA attack.
+ * Use bounce pages for the buffer which doesn't fill whole IOMMU pages.
+ */
+
+static inline size_t
+get_aligned_size(struct iommu_domain *domain, dma_addr_t addr, size_t size)
+{
+   unsigned long page_size = 1 << __ffs(domain->pgsize_bitmap);
+   unsigned long offset = page_size - 1;
+
+   return ALIGN((addr & offset) + size, page_size);
+}
+
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+   phys_addr_t paddr, size_t size,
+   enum dma_data_direction dir,
+   unsigned long attrs)
+{
+   struct iommu_domain *domain;
+   unsigned int min_pagesz;
+   phys_addr_t tlb_addr;
+   size_t aligned_size;
+   int prot = 0;
+   int ret;
+
+   domain = iommu_get_dma_domain(dev);
+   if (!domain)
+   return DMA_MAPPING_ERROR;
+
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_READ;
+   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_WRITE;
+
+   aligned_size = get_aligned_size(domain, paddr, size);
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+   /*
+* If both the physical buffer start address and size are
+* page aligned, we don't need to use a bounce page.
+*/
+   if (!IS_ALIGNED(paddr | size, min_pagesz)) {
+   tlb_addr = swiotlb_tbl_map_single(dev,
+   __phys_to_dma(dev, io_tlb_start),
+

Re: [PATCH v4 4/9] iommu: Add bounce page APIs

2019-06-11 Thread Lu Baolu


Hi,

On 6/10/19 11:56 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:15AM +0800, Lu Baolu wrote:

IOMMU hardware always use paging for DMA remapping.  The
minimum mapped window is a page size. The device drivers
may map buffers not filling whole IOMMU window. It allows
device to access to possibly unrelated memory and various
malicious devices can exploit this to perform DMA attack.

This introduces the bouce buffer mechanism for DMA buffers
which doesn't fill a minimal IOMMU page. It could be used
by various vendor specific IOMMU drivers as long as the
DMA domain is managed by the generic IOMMU layer. Below
APIs are added:

* iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
   - Map a buffer start at DMA address @addr in bounce page
 manner. For buffer parts that doesn't cross a whole
 minimal IOMMU page, the bounce page policy is applied.
 A bounce page mapped by swiotlb will be used as the DMA
 target in the IOMMU page table. Otherwise, the physical
 address @paddr is mapped instead.

* iommu_bounce_unmap(dev, addr, size, dir, attrs)
   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
 page will be torn down after the bounced data get synced.

* iommu_bounce_sync(dev, addr, size, dir, target)
   - Synce the bounced data in case the bounce mapped buffer is
 reused.

The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
It's useful for cases where bounce page doesn't needed, for example,
embedded cases.

Cc: Ashok Raj 
Cc: Jacob Pan 
Cc: Kevin Tian 
Cc: Alan Cox 
Cc: Mika Westerberg 
Signed-off-by: Lu Baolu 
---
  drivers/iommu/Kconfig |  14 +
  drivers/iommu/iommu.c | 119 ++
  include/linux/iommu.h |  35 +
  3 files changed, 168 insertions(+)

diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
index 83664db5221d..d837ec3f359b 100644
--- a/drivers/iommu/Kconfig
+++ b/drivers/iommu/Kconfig
@@ -86,6 +86,20 @@ config IOMMU_DEFAULT_PASSTHROUGH
  
  	  If unsure, say N here.
  
+config IOMMU_BOUNCE_PAGE

+   bool "Use bounce page for untrusted devices"
+   depends on IOMMU_API
+   select SWIOTLB


I think you want:
depends on IOMMU_API && SWIOTLB

As people may want to have IOMMU and SWIOTLB, and not IOMMU_BOUNCE_PAGE enabled.


Yes. Yours makes more sense.

Best regards,
Baolu




+   help
+ IOMMU hardware always use paging for DMA remapping. The minimum
+ mapped window is a page size. The device drivers may map buffers
+ not filling whole IOMMU window. This allows device to access to
+ possibly unrelated memory and malicious device can exploit this
+ to perform a DMA attack. Select this to use a bounce page for the
+ buffer which doesn't fill a whole IOMU page.
+
+ If unsure, say N here.
+
  config OF_IOMMU
 def_bool y
 depends on OF && IOMMU_API
diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 2a906386bb8e..fa44f681a82b 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -2246,3 +2246,122 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
return ops->sva_get_pasid(handle);
  }
  EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
+
+#ifdef CONFIG_IOMMU_BOUNCE_PAGE
+
+/*
+ * Bounce buffer support for external devices:
+ *
+ * IOMMU hardware always use paging for DMA remapping. The minimum mapped
+ * window is a page size. The device drivers may map buffers not filling
+ * whole IOMMU window. This allows device to access to possibly unrelated
+ * memory and malicious device can exploit this to perform a DMA attack.
+ * Use bounce pages for the buffer which doesn't fill whole IOMMU pages.
+ */
+
+static inline size_t
+get_aligned_size(struct iommu_domain *domain, dma_addr_t addr, size_t size)
+{
+   unsigned long page_size = 1 << __ffs(domain->pgsize_bitmap);
+   unsigned long offset = page_size - 1;
+
+   return ALIGN((addr & offset) + size, page_size);
+}
+
+dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
+   phys_addr_t paddr, size_t size,
+   enum dma_data_direction dir,
+   unsigned long attrs)
+{
+   struct iommu_domain *domain;
+   unsigned int min_pagesz;
+   phys_addr_t tlb_addr;
+   size_t aligned_size;
+   int prot = 0;
+   int ret;
+
+   domain = iommu_get_dma_domain(dev);
+   if (!domain)
+   return DMA_MAPPING_ERROR;
+
+   if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_READ;
+   if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
+   prot |= IOMMU_WRITE;
+
+   aligned_size = get_aligned_size(domain, paddr, size);
+   min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
+
+   /*
+* If both the physical buffer start address and size are
+* page aligned, we don't need to use a bounce page.
+*/
+   if

Re: [PATCH v4 3/9] swiotlb: Zero out bounce buffer for untrusted device

2019-06-11 Thread Lu Baolu


Hi Konrad,

Thanks a lot for your reviewing.

On 6/10/19 11:45 PM, Konrad Rzeszutek Wilk wrote:

On Mon, Jun 03, 2019 at 09:16:14AM +0800, Lu Baolu wrote:

This is necessary to avoid exposing valid kernel data to any
milicious device.


malicious


Yes, thanks.





Suggested-by: Christoph Hellwig 
Signed-off-by: Lu Baolu 
---
  kernel/dma/swiotlb.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index f956f785645a..ed41eb7f6131 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -35,6 +35,7 @@
  #include 
  #include 
  #include 
+#include 
  #ifdef CONFIG_DEBUG_FS
  #include 
  #endif
@@ -560,6 +561,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
 */
for (i = 0; i < nslots; i++)
io_tlb_orig_addr[index+i] = orig_addr + (i << IO_TLB_SHIFT);
+
+   /* Zero out the bounce buffer if the consumer is untrusted. */
+   if (dev_is_untrusted(hwdev))
+   memset(phys_to_virt(tlb_addr), 0, alloc_size);


What if the alloc_size is less than a PAGE? Should this at least have ALIGN or 
such?


It's the consumer (iommu subsystem) who requires this to be page
aligned. For swiotlb, it just clears out all data in the allocated
bounce buffer.

Best regards,
Baolu




+
if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC) &&
(dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL))
swiotlb_bounce(orig_addr, tlb_addr, mapping_size, 
DMA_TO_DEVICE);
--
2.17.1

[PATCH v2 5/7] iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()

2019-06-11 Thread Lu Baolu

The drhd and device scope list should be iterated with the
iommu global lock held. Otherwise, a suspicious RCU usage
message will be displayed.

[3.695886] =
[3.695917] WARNING: suspicious RCU usage
[3.695950] 5.2.0-rc2+ #2467 Not tainted
[3.695981] -
[3.696014] drivers/iommu/intel-iommu.c:4569 suspicious 
rcu_dereference_check() usage!
[3.696069]
   other info that might help us debug this:

[3.696126]
   rcu_scheduler_active = 2, debug_locks = 1
[3.696173] no locks held by swapper/0/1.
[3.696204]
   stack backtrace:
[3.696241] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.2.0-rc2+ #2467
[3.696370] Call Trace:
[3.696404]  dump_stack+0x85/0xcb
[3.696441]  intel_iommu_init+0x128c/0x13ce
[3.696478]  ? kmem_cache_free+0x16b/0x2c0
[3.696516]  ? __fput+0x14b/0x270
[3.696550]  ? __call_rcu+0xb7/0x300
[3.696583]  ? get_max_files+0x10/0x10
[3.696631]  ? set_debug_rodata+0x11/0x11
[3.696668]  ? e820__memblock_setup+0x60/0x60
[3.696704]  ? pci_iommu_init+0x16/0x3f
[3.696737]  ? set_debug_rodata+0x11/0x11
[3.696770]  pci_iommu_init+0x16/0x3f
[3.696805]  do_one_initcall+0x5d/0x2e4
[3.696844]  ? set_debug_rodata+0x11/0x11
[3.696880]  ? rcu_read_lock_sched_held+0x6b/0x80
[3.696924]  kernel_init_freeable+0x1f0/0x27c
[3.696961]  ? rest_init+0x260/0x260
[3.696997]  kernel_init+0xa/0x110
[3.697028]  ret_from_fork+0x3a/0x50

Fixes: fa212a97f3a36 ("iommu/vt-d: Probe DMA-capable ACPI name space devices")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 19c4c387a3f6..84e650c6a46d 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4793,8 +4793,10 @@ int __init intel_iommu_init(void)
cpuhp_setup_state(CPUHP_IOMMU_INTEL_DEAD, "iommu/intel:dead", NULL,
  intel_iommu_cpu_dead);
 
+   down_read(_global_lock);
if (probe_acpi_namespace_devices())
pr_warn("ACPI name space devices didn't probe correctly\n");
+   up_read(_global_lock);
 
/* Finally, we enable the DMA remapping hardware. */
for_each_iommu(iommu, drhd) {
-- 
2.17.1

[PATCH v2 3/7] iommu/vt-d: Don't enable iommu's which have been ignored

2019-06-11 Thread Lu Baolu

The iommu driver will ignore some iommu units if there's no
device under its scope or those devices have been explicitly
set to bypass the DMA translation. Don't enable those iommu
units, otherwise the devices under its scope won't work.

Fixes: d8190dc638866 ("iommu/vt-d: Enable DMA remapping after rmrr mapped")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d1a82039e835..5251533a18a4 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3232,7 +3232,12 @@ static int __init init_dmars(void)
goto error;
}
 
-   for_each_active_iommu(iommu, drhd) {
+   for_each_iommu(iommu, drhd) {
+   if (drhd->ignored) {
+   iommu_disable_translation(iommu);
+   continue;
+   }
+
/*
 * Find the max pasid size of all IOMMU's in the system.
 * We need to ensure the system pasid table is no bigger
@@ -4793,7 +4798,7 @@ int __init intel_iommu_init(void)
 
/* Finally, we enable the DMA remapping hardware. */
for_each_iommu(iommu, drhd) {
-   if (!translation_pre_enabled(iommu))
+   if (!drhd->ignored && !translation_pre_enabled(iommu))
iommu_enable_translation(iommu);
 
iommu_disable_protect_mem_regions(iommu);
-- 
2.17.1

[PATCH v2 6/7] iommu/vt-d: Cleanup after delegating DMA domain to generic iommu

2019-06-11 Thread Lu Baolu

From: Sai Praneeth Prakhya 

[No functional changes]

1. Starting with commit df4f3c603aeb ("iommu/vt-d: Remove static identity
map code") there are no callers for iommu_prepare_rmrr_dev() but the
implementation of the function still exists, so remove it. Also, as a
ripple effect remove get_domain_for_dev() and iommu_prepare_identity_map()
because they aren't being used either.

2. Remove extra new line in couple of places.

Signed-off-by: Sai Praneeth Prakhya 
---
 drivers/iommu/intel-iommu.c | 55 -
 1 file changed, 55 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 84e650c6a46d..5215dcd535a1 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -909,7 +909,6 @@ static struct dma_pte *pfn_to_dma_pte(struct dmar_domain 
*domain,
return pte;
 }
 
-
 /* return address's pte at specific level */
 static struct dma_pte *dma_pfn_level_pte(struct dmar_domain *domain,
 unsigned long pfn,
@@ -1578,7 +1577,6 @@ static void iommu_disable_translation(struct intel_iommu 
*iommu)
raw_spin_unlock_irqrestore(>register_lock, flag);
 }
 
-
 static int iommu_init_domains(struct intel_iommu *iommu)
 {
u32 ndomains, nlongs;
@@ -1616,8 +1614,6 @@ static int iommu_init_domains(struct intel_iommu *iommu)
return -ENOMEM;
}
 
-
-
/*
 * If Caching mode is set, then invalid translations are tagged
 * with domain-id 0, hence we need to pre-allocate it. We also
@@ -2649,29 +2645,6 @@ static struct dmar_domain *set_domain_for_dev(struct 
device *dev,
return domain;
 }
 
-static struct dmar_domain *get_domain_for_dev(struct device *dev, int gaw)
-{
-   struct dmar_domain *domain, *tmp;
-
-   domain = find_domain(dev);
-   if (domain)
-   goto out;
-
-   domain = find_or_alloc_domain(dev, gaw);
-   if (!domain)
-   goto out;
-
-   tmp = set_domain_for_dev(dev, domain);
-   if (!tmp || domain != tmp) {
-   domain_exit(domain);
-   domain = tmp;
-   }
-
-out:
-
-   return domain;
-}
-
 static int iommu_domain_identity_map(struct dmar_domain *domain,
 unsigned long long start,
 unsigned long long end)
@@ -2736,33 +2709,6 @@ static int domain_prepare_identity_map(struct device 
*dev,
return iommu_domain_identity_map(domain, start, end);
 }
 
-static int iommu_prepare_identity_map(struct device *dev,
- unsigned long long start,
- unsigned long long end)
-{
-   struct dmar_domain *domain;
-   int ret;
-
-   domain = get_domain_for_dev(dev, DEFAULT_DOMAIN_ADDRESS_WIDTH);
-   if (!domain)
-   return -ENOMEM;
-
-   ret = domain_prepare_identity_map(dev, domain, start, end);
-   if (ret)
-   domain_exit(domain);
-
-   return ret;
-}
-
-static inline int iommu_prepare_rmrr_dev(struct dmar_rmrr_unit *rmrr,
-struct device *dev)
-{
-   if (dev->archdata.iommu == DUMMY_DEVICE_DOMAIN_INFO)
-   return 0;
-   return iommu_prepare_identity_map(dev, rmrr->base_address,
- rmrr->end_address);
-}
-
 static int md_domain_init(struct dmar_domain *domain, int guest_width);
 
 static int __init si_domain_init(int hw)
@@ -4058,7 +4004,6 @@ static void __init init_iommu_pm_ops(void)
 static inline void init_iommu_pm_ops(void) {}
 #endif /* CONFIG_PM */
 
-
 int __init dmar_parse_one_rmrr(struct acpi_dmar_header *header, void *arg)
 {
struct acpi_dmar_reserved_memory *rmrr;
-- 
2.17.1

[PATCH v2 7/7] iommu/vt-d: Consolidate domain_init() to avoid duplication

2019-06-11 Thread Lu Baolu

The domain_init() and md_domain_init() do almost the same job.
Consolidate them to avoid duplication.

Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 123 +++-
 1 file changed, 36 insertions(+), 87 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5215dcd535a1..b8c6cf1d5f90 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -1825,63 +1825,6 @@ static inline int guestwidth_to_adjustwidth(int gaw)
return agaw;
 }
 
-static int domain_init(struct dmar_domain *domain, struct intel_iommu *iommu,
-  int guest_width)
-{
-   int adjust_width, agaw;
-   unsigned long sagaw;
-   int err;
-
-   init_iova_domain(>iovad, VTD_PAGE_SIZE, IOVA_START_PFN);
-
-   err = init_iova_flush_queue(>iovad,
-   iommu_flush_iova, iova_entry_free);
-   if (err)
-   return err;
-
-   domain_reserve_special_ranges(domain);
-
-   /* calculate AGAW */
-   if (guest_width > cap_mgaw(iommu->cap))
-   guest_width = cap_mgaw(iommu->cap);
-   domain->gaw = guest_width;
-   adjust_width = guestwidth_to_adjustwidth(guest_width);
-   agaw = width_to_agaw(adjust_width);
-   sagaw = cap_sagaw(iommu->cap);
-   if (!test_bit(agaw, )) {
-   /* hardware doesn't support it, choose a bigger one */
-   pr_debug("Hardware doesn't support agaw %d\n", agaw);
-   agaw = find_next_bit(, 5, agaw);
-   if (agaw >= 5)
-   return -ENODEV;
-   }
-   domain->agaw = agaw;
-
-   if (ecap_coherent(iommu->ecap))
-   domain->iommu_coherency = 1;
-   else
-   domain->iommu_coherency = 0;
-
-   if (ecap_sc_support(iommu->ecap))
-   domain->iommu_snooping = 1;
-   else
-   domain->iommu_snooping = 0;
-
-   if (intel_iommu_superpage)
-   domain->iommu_superpage = fls(cap_super_page_val(iommu->cap));
-   else
-   domain->iommu_superpage = 0;
-
-   domain->nid = iommu->node;
-
-   /* always allocate the top pgd */
-   domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
-   if (!domain->pgd)
-   return -ENOMEM;
-   __iommu_flush_cache(iommu, domain->pgd, PAGE_SIZE);
-   return 0;
-}
-
 static void domain_exit(struct dmar_domain *domain)
 {
struct page *freelist;
@@ -2563,6 +2506,31 @@ static int get_last_alias(struct pci_dev *pdev, u16 
alias, void *opaque)
return 0;
 }
 
+static int domain_init(struct dmar_domain *domain, int guest_width)
+{
+   int adjust_width;
+
+   init_iova_domain(>iovad, VTD_PAGE_SIZE, IOVA_START_PFN);
+   domain_reserve_special_ranges(domain);
+
+   /* calculate AGAW */
+   domain->gaw = guest_width;
+   adjust_width = guestwidth_to_adjustwidth(guest_width);
+   domain->agaw = width_to_agaw(adjust_width);
+
+   domain->iommu_coherency = 0;
+   domain->iommu_snooping = 0;
+   domain->iommu_superpage = 0;
+   domain->max_addr = 0;
+
+   /* always allocate the top pgd */
+   domain->pgd = (struct dma_pte *)alloc_pgtable_page(domain->nid);
+   if (!domain->pgd)
+   return -ENOMEM;
+   domain_flush_cache(domain, domain->pgd, PAGE_SIZE);
+   return 0;
+}
+
 static struct dmar_domain *find_or_alloc_domain(struct device *dev, int gaw)
 {
struct device_domain_info *info;
@@ -2600,11 +2568,19 @@ static struct dmar_domain *find_or_alloc_domain(struct 
device *dev, int gaw)
domain = alloc_domain(0);
if (!domain)
return NULL;
-   if (domain_init(domain, iommu, gaw)) {
+
+   if (domain_init(domain, gaw)) {
domain_exit(domain);
return NULL;
}
 
+   if (init_iova_flush_queue(>iovad,
+ iommu_flush_iova,
+ iova_entry_free)) {
+   pr_warn("iova flush queue initialization failed\n");
+   intel_iommu_strict = 1;
+   }
+
 out:
return domain;
 }
@@ -2709,8 +2685,6 @@ static int domain_prepare_identity_map(struct device *dev,
return iommu_domain_identity_map(domain, start, end);
 }
 
-static int md_domain_init(struct dmar_domain *domain, int guest_width);
-
 static int __init si_domain_init(int hw)
 {
struct dmar_rmrr_unit *rmrr;
@@ -2721,7 +2695,7 @@ static int __init si_domain_init(int hw)
if (!si_domain)
return -EFAULT;
 
-   if (md_domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
+   if (domain_init(si_domain, DEFAULT_DOMAIN_ADDRESS_WIDTH)) {
domain_exit(si_domain);
return -EFAULT;
}
@@ -4837,31 +4811,6 @@ static void dmar_remove_one_dev_info(struct device *dev)
spin_unlock_irqrestore(_domain_lock, flags);
 }
 
-static int

[PATCH v2 4/7] iommu/vt-d: Allow DMA domain attaching to rmrr locked device

2019-06-11 Thread Lu Baolu

We don't allow a device to be assigned to user level when it is locked
by any RMRR's. Hence, intel_iommu_attach_device() will return error if
a domain of type IOMMU_DOMAIN_UNMANAGED is about to attach to a device
locked by rmrr. But this doesn't apply to a domain of type other than
IOMMU_DOMAIN_UNMANAGED. This adds a check to fix this.

Fixes: fa954e6831789 ("iommu/vt-d: Delegate the dma domain to upper layer")
Signed-off-by: Lu Baolu 
Reported-and-tested-by: Qian Cai 
---
 drivers/iommu/intel-iommu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index 5251533a18a4..19c4c387a3f6 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5139,7 +5139,8 @@ static int intel_iommu_attach_device(struct iommu_domain 
*domain,
 {
int ret;
 
-   if (device_is_rmrr_locked(dev)) {
+   if (domain->type == IOMMU_DOMAIN_UNMANAGED &&
+   device_is_rmrr_locked(dev)) {
dev_warn(dev, "Device is ineligible for IOMMU domain attach due 
to platform RMRR requirement.  Contact your platform vendor.\n");
return -EPERM;
}
-- 
2.17.1

[PATCH v2 2/7] iommu/vt-d: Set domain type for a private domain

2019-06-11 Thread Lu Baolu

Otherwise, domain_get_iommu() will be broken.

Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with 
private")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index d5a6c8064c56..d1a82039e835 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -3458,6 +3458,8 @@ static struct dmar_domain 
*get_private_domain_for_dev(struct device *dev)
 out:
if (!domain)
dev_err(dev, "Allocating domain failed\n");
+   else
+   domain->domain.type = IOMMU_DOMAIN_DMA;
 
return domain;
 }
-- 
2.17.1

[PATCH v2 0/7] iommu/vt-d: Fixes and cleanups for linux-next

2019-06-11 Thread Lu Baolu

Hi Joerg,

This series includes several fixes and cleanups after delegating
DMA domain to generic iommu. Please review and consider them for
linux-next.

Best regards,
Baolu

Change log:
  v1->v2:
  - Refine "iommu/vt-d: Cleanup after delegating DMA domain to
generic iommu" by removing an unnecessary cleanup.
  - Add "Allow DMA domain attaching to rmrr locked device" which
fix a driver load issue.

Lu Baolu (6):
  iommu/vt-d: Don't return error when device gets right domain
  iommu/vt-d: Set domain type for a private domain
  iommu/vt-d: Don't enable iommu's which have been ignored
  iommu/vt-d: Allow DMA domain attaching to rmrr locked device
  iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
  iommu/vt-d: Consolidate domain_init() to avoid duplication

Sai Praneeth Prakhya (1):
  iommu/vt-d: Cleanup after delegating DMA domain to generic iommu

 drivers/iommu/intel-iommu.c | 200 +---
 1 file changed, 49 insertions(+), 151 deletions(-)

-- 
2.17.1

[PATCH v2 1/7] iommu/vt-d: Don't return error when device gets right domain

2019-06-11 Thread Lu Baolu

If a device gets a right domain in add_device ops, it shouldn't
return error.

Fixes: 942067f1b6b97 ("iommu/vt-d: Identify default domains replaced with 
private")
Signed-off-by: Lu Baolu 
---
 drivers/iommu/intel-iommu.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
index b431cc6f6ba4..d5a6c8064c56 100644
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -5360,10 +5360,7 @@ static int intel_iommu_add_device(struct device *dev)
domain_add_dev_info(si_domain, dev);
dev_info(dev,
 "Device uses a private identity 
domain.\n");
-   return 0;
}
-
-   return -ENODEV;
}
} else {
if (device_def_domain_type(dev) == IOMMU_DOMAIN_DMA) {
@@ -5378,10 +5375,7 @@ static int intel_iommu_add_device(struct device *dev)
 
dev_info(dev,
 "Device uses a private dma domain.\n");
-   return 0;
}
-
-   return -ENODEV;
}
}
 
-- 
2.17.1

Re: [PATCH 0/6] iommu/vt-d: Fixes and cleanups for linux-next

2019-06-11 Thread Lu Baolu


Hi,

This is supposed to be fixed by this patch

https://lkml.org/lkml/2019/6/3/115

which is part of several RMRR related fixes and enhancements.

Best regards,
Baolu

On 6/12/19 12:55 AM, Qian Cai wrote:

On Sun, 2019-06-09 at 10:37 +0800, Lu Baolu wrote:

Hi Joerg,

This series includes several fixes and cleanups after delegating
DMA domain to generic iommu. Please review and consider them for
linux-next.

Best regards,
Baolu

Lu Baolu (5):
   iommu/vt-d: Don't return error when device gets right domain
   iommu/vt-d: Set domain type for a private domain
   iommu/vt-d: Don't enable iommu's which have been ignored
   iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
   iommu/vt-d: Consolidate domain_init() to avoid duplication

Sai Praneeth Prakhya (1):
   iommu/vt-d: Cleanup after delegating DMA domain to generic iommu

  drivers/iommu/intel-iommu.c | 210 +---
  1 file changed, 53 insertions(+), 157 deletions(-)



BTW, the linux-next commit "iommu/vt-d: Expose ISA direct mapping region via
iommu_get_resv_regions" [1] also introduced a memory leak below, as it forgets
to ask intel_iommu_put_resv_regions() to call kfree() when
CONFIG_INTEL_IOMMU_FLOPPY_WA=y.

[1] https://lore.kernel.org/patchwork/patch/1078963/

unreferenced object 0x88912ef789c8 (size 64):
   comm "swapper/0", pid 1, jiffies 4294946232 (age 5399.530s)
   hex dump (first 32 bytes):
 48 83 f7 2e 91 88 ff ff 30 fa e3 00 82 88 ff ff  H...0...
 00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00  
   backtrace:
 [] kmem_cache_alloc_trace+0x266/0x380
 [] iommu_alloc_resv_region+0x40/0xb0
 [] intel_iommu_get_resv_regions+0x25e/0x2d0
 [<21fbc6c3>] iommu_group_create_direct_mappings+0x159/0x3d0
 [<22259268>] iommu_group_add_device+0x17b/0x4f0
 [<28b91093>] iommu_group_get_for_dev+0x153/0x460
 [<577c33b4>] intel_iommu_add_device+0xc4/0x210
 [<587b7492>] iommu_probe_device+0x63/0x80
 [<4aa997d1>] add_iommu_group+0xe/0x20
 [] bus_for_each_dev+0xf0/0x150
 [] bus_set_iommu+0xc6/0x100
 [] intel_iommu_init+0x682/0xb0a
 [<226f7444>] pci_iommu_init+0x26/0x62
 [<2d8694f5>] do_one_initcall+0xe5/0x3ea
 [<4bc60101>] kernel_init_freeable+0x5ad/0x640
 [<91b0bad6>] kernel_init+0x11/0x138



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: "iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-11 Thread Lu Baolu


Hi,

On 6/11/19 10:00 PM, Qian Cai wrote:



On Jun 10, 2019, at 9:41 PM, Lu Baolu  wrote:

Ah, good catch!

The device failed to be attached by a DMA domain. Can you please try the
attached fix patch?

It works fine.



Thanks a lot for the report and verification.

Best regards,
Baolu

Re: [RFC PATCH v4 12/21] watchdog/hardlockup/hpet: Adjust timer expiration on the number of monitored CPUs

2019-06-11 Thread Thomas Gleixner

On Thu, 23 May 2019, Ricardo Neri wrote:
> @@ -52,10 +59,10 @@ static void kick_timer(struct hpet_hld_data *hdata, bool 
> force)
>   return;
>  
>   if (hdata->has_periodic)
> - period = watchdog_thresh * hdata->ticks_per_second;
> + period = watchdog_thresh * hdata->ticks_per_cpu;
>  
>   count = hpet_readl(HPET_COUNTER);
> - new_compare = count + watchdog_thresh * hdata->ticks_per_second;
> + new_compare = count + watchdog_thresh * hdata->ticks_per_cpu;
>   hpet_set_comparator(hdata->num, (u32)new_compare, (u32)period);

So with this you might get close to the point where you trip over the SMI
induced madness where CPUs vanish for several milliseconds in some value
add code. You really want to do a read back of the hpet to detect that. See
the comment in the hpet code. RHEL 7/8 allow up to 768 logical CPUs

Thanks,

tglx

Re: [RFC PATCH v4 05/21] x86/hpet: Reserve timer for the HPET hardlockup detector

2019-06-11 Thread Thomas Gleixner

On Thu, 23 May 2019, Ricardo Neri wrote:

> HPET timer 2 will be used to drive the HPET-based hardlockup detector.
> Reserve such timer to ensure it cannot be used by user space programs or
> for clock events.
> 
> When looking for MSI-capable timers for clock events, skip timer 2 if
> the HPET hardlockup detector is selected.

Why? Both the changelog and the code change lack an explanation why this
timer is actually touched after it got reserved for the platform. The
reservation should make it inaccessible for other things.

Thanks,

tglx

Re: [PATCH 0/2] swiotlb: Cleanup and consistency fix

2019-06-11 Thread Konrad Rzeszutek Wilk

On Tue, Jun 11, 2019 at 10:58:23AM -0700, Florian Fainelli wrote:
> Hi Christoph,

I pulled the patches in my tree. 
> 
> Still with my contrived memory layout where there is no physical memory
> the kernel can use below 4GB, it was possible to fail swiotlb_init(),
> but still not hit swiotlb_map_single() since all peripherals have a
> DMA_BIT_MASK() that is within the remaining addressable physical memory.
> 
> The second path could be backported to stable, but for the same reasons
> as the one we had just discussed before, this requires a very contrived
> test case that is not necessarily realistic or would warrant a stable
> backport IMHO.
> 
> Thanks!
> 
> Florian Fainelli (2):
>   swiotlb: Group identical cleanup in swiotlb_cleanup()
>   swiotlb: Return consistent SWIOTLB segments/nr_tbl
> 
>  kernel/dma/swiotlb.c | 26 ++
>  1 file changed, 14 insertions(+), 12 deletions(-)
> 
> -- 
> 2.17.1
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-11 Thread Jacob Pan

On Tue, 11 Jun 2019 15:35:22 +0100
Jean-Philippe Brucker  wrote:

> On 11/06/2019 10:36, Jonathan Cameron wrote:
> >> +/**
> >> + * ioasid_alloc - Allocate an IOASID
> >> + * @set: the IOASID set
> >> + * @min: the minimum ID (inclusive)
> >> + * @max: the maximum ID (inclusive)
> >> + * @private: data private to the caller
> >> + *
> >> + * Allocate an ID between @min and @max. The @private pointer is
> >> stored
> >> + * internally and can be retrieved with ioasid_find().
> >> + *
> >> + * Return: the allocated ID on success, or %INVALID_IOASID on
> >> failure.
> >> + */
> >> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min,
> >> ioasid_t max,
> >> +void *private)
> >> +{
> >> +  u32 id = INVALID_IOASID;
> >> +  struct ioasid_data *data;
> >> +
> >> +  data = kzalloc(sizeof(*data), GFP_KERNEL);
> >> +  if (!data)
> >> +  return INVALID_IOASID;
> >> +
> >> +  data->set = set;
> >> +  data->private = private;
> >> +
> >> +  if (xa_alloc(_xa, , data, XA_LIMIT(min, max),
> >> GFP_KERNEL)) {
> >> +  pr_err("Failed to alloc ioasid from %d to %d\n",
> >> min, max);
> >> +  goto exit_free;
> >> +  }
> >> +  data->id = id;
> >> +
> >> +exit_free:  
> > 
> > This error flow is perhaps a little more confusing than it needs to
> > be?
> > 
> > My assumption (perhaps wrong) is that we only have an id ==
> > INVALID_IOASID if the xa_alloc fails, and that we will always have
> > such an id value if it does (I'm not totally sure this second
> > element is true in __xa_alloc).
> > 
> > If I'm missing something perhaps a comment on how else we'd get
> > here.  
> 
> Yes we can simplify this:
> 
>   return id;
>   exit_free:
>   kfree(data)
>   return INVALID_IOASID;
>   }
> 
> The XA API doesn't say that @id passed to xa_alloc() won't be modified
> in case of error. It's true in the current implementation, but won't
> necessarily stay that way. On the other hand I think it's safe to
> expect @id to always be set when xa_alloc() succeeds.
> 
the flow with custom allocator is slightly different, but you are right
I can simplify it as you suggested.
Jonathan, I will add you to the cc list in next version. If you could
also review the current version, it would be greatly appreciated.

https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun@linux.intel.com/

> >> +/**
> >> + * ioasid_find - Find IOASID data
> >> + * @set: the IOASID set
> >> + * @ioasid: the IOASID to find
> >> + * @getter: function to call on the found object
> >> + *
> >> + * The optional getter function allows to take a reference to the
> >> found object
> >> + * under the rcu lock. The function can also check if the object
> >> is still valid:
> >> + * if @getter returns false, then the object is invalid and NULL
> >> is returned.
> >> + *
> >> + * If the IOASID has been allocated for this set, return the
> >> private pointer
> >> + * passed to ioasid_alloc. Private data can be NULL if not set.
> >> Return an error
> >> + * if the IOASID is not found or does not belong to the set.  
> > 
> > Perhaps should make it clear that @set can be null.  
> 
> Indeed. But I'm not sure allowing @set to be NULL is such a good idea,
> because the data type associated to an ioasid depends on its set. For
> example SVA will put an mm_struct in there, and auxiliary domains use
> some structure private to the IOMMU domain.
> 
I am not sure we need to count on @set to decipher data type. Whoever
does the allocation and owns the IOASID should knows its own data type.
My thought was that @set is only used to group IDs, permission check
etc.

> Jacob, could me make @set mandatory, or do you see a use for a global
> search? If @set is NULL, then callers can check if the return pointer
> is NULL, but will run into trouble if they try to dereference it.
> 
A global search use case can be for PRQ. IOMMU driver gets a IOASID
(first interrupt then retrieve from a queue), it has no idea which
@set it belongs to. But the data types are the same for all IOASIDs
used by the IOMMU.
If @set is NULL, the search does not check set match. It is separate
from return pointer. Sorry i am not seeing the problems here.

> >   
> >> + */
> >> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
> >> +bool (*getter)(void *))
> >> +{
> >> +  void *priv = NULL;  
> > 
> > Set in all paths, so does need to be set here.  
> 
> Right
> 
> Thanks,
> Jean

[Jacob Pan]
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 1/2] swiotlb: Group identical cleanup in swiotlb_cleanup()

2019-06-11 Thread Florian Fainelli

Avoid repeating the zeroing of global swiotlb variables in two locations
and introduce swiotlb_cleanup() to do that.

Signed-off-by: Florian Fainelli 
---
 kernel/dma/swiotlb.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 13f0cb080a4d..b2b5c5df273c 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -317,6 +317,14 @@ swiotlb_late_init_with_default_size(size_t default_size)
return rc;
 }
 
+static void swiotlb_cleanup(void)
+{
+   io_tlb_end = 0;
+   io_tlb_start = 0;
+   io_tlb_nslabs = 0;
+   max_segment = 0;
+}
+
 int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
@@ -367,10 +375,7 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 sizeof(int)));
io_tlb_list = NULL;
 cleanup3:
-   io_tlb_end = 0;
-   io_tlb_start = 0;
-   io_tlb_nslabs = 0;
-   max_segment = 0;
+   swiotlb_cleanup();
return -ENOMEM;
 }
 
@@ -394,10 +399,7 @@ void __init swiotlb_exit(void)
memblock_free_late(io_tlb_start,
   PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT));
}
-   io_tlb_start = 0;
-   io_tlb_end = 0;
-   io_tlb_nslabs = 0;
-   max_segment = 0;
+   swiotlb_cleanup();
 }
 
 /*
-- 
2.17.1

[PATCH 2/2] swiotlb: Return consistent SWIOTLB segments/nr_tbl

2019-06-11 Thread Florian Fainelli

With a specifically contrived memory layout where there is no physical
memory available to the kernel below the 4GB boundary, we will fail to
perform the initial swiotlb_init() call and set no_iotlb_memory to true.

There are drivers out there that call into swiotlb_nr_tbl() to determine
whether they can use the SWIOTLB. With the right DMA_BIT_MASK() value
for these drivers (say 64-bit), they won't ever need to hit
swiotlb_tbl_map_single() so this can go unoticed and we would be
possibly lying about those drivers.

Signed-off-by: Florian Fainelli 
---
 kernel/dma/swiotlb.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index b2b5c5df273c..e906ef2e6315 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -129,15 +129,17 @@ setup_io_tlb_npages(char *str)
 }
 early_param("swiotlb", setup_io_tlb_npages);
 
+static bool no_iotlb_memory;
+
 unsigned long swiotlb_nr_tbl(void)
 {
-   return io_tlb_nslabs;
+   return unlikely(no_iotlb_memory) ? 0 : io_tlb_nslabs;
 }
 EXPORT_SYMBOL_GPL(swiotlb_nr_tbl);
 
 unsigned int swiotlb_max_segment(void)
 {
-   return max_segment;
+   return unlikely(no_iotlb_memory) ? 0 : max_segment;
 }
 EXPORT_SYMBOL_GPL(swiotlb_max_segment);
 
@@ -160,8 +162,6 @@ unsigned long swiotlb_size_or_default(void)
return size ? size : (IO_TLB_DEFAULT_SIZE);
 }
 
-static bool no_iotlb_memory;
-
 void swiotlb_print_info(void)
 {
unsigned long bytes = io_tlb_nslabs << IO_TLB_SHIFT;
-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[PATCH 0/2] swiotlb: Cleanup and consistency fix

2019-06-11 Thread Florian Fainelli

Hi Christoph,

Still with my contrived memory layout where there is no physical memory
the kernel can use below 4GB, it was possible to fail swiotlb_init(),
but still not hit swiotlb_map_single() since all peripherals have a
DMA_BIT_MASK() that is within the remaining addressable physical memory.

The second path could be backported to stable, but for the same reasons
as the one we had just discussed before, this requires a very contrived
test case that is not necessarily realistic or would warrant a stable
backport IMHO.

Thanks!

Florian Fainelli (2):
  swiotlb: Group identical cleanup in swiotlb_cleanup()
  swiotlb: Return consistent SWIOTLB segments/nr_tbl

 kernel/dma/swiotlb.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

-- 
2.17.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: Device specific pass through in host systems - discuss user interface

2019-06-11 Thread Prakhya, Sai Praneeth

> > > > Sure! Makes sense.. per-group default domain type sounds good.
> >
> > I am planning to implement an RFC (supporting only runtime case for
> > now) which works as below
> >
> > 1. User unbinds the driver by writing to sysfs 2. User puts a group in
> > pass through mode by writing "1" to
> > /sys/kernel/iommu_groups//pt
> 
> might be better to read current value of default domain for that group..
> /sys/kernel/iommu_groups//default_domain

Presently, we already have a file that gives out "type" of default_domain and 
the file is
/sys/kernel/iommu_groups//type

> reading the above value shows current setting.
> provide a differnet file next_def_domain, and you can echo "pt" or
> "dma_domain"
> to switch to pass-through, or normal dma isolation mode.

We have couple of options here:

1. Since we already have "type" file, which is "read-only", we could make it 
R/W.

The present value shows the existing type of default domain.
If user wants to change it (Eg: from DMA to IDENTITY or vice versa), he 
attempts to write the new value.
Kernel performs checks to make sure that the driver in unbinded and it's safe 
to change the default domain type.
After successfully changing the default_domain type internally, kernel reflects 
the new value in the file.
Ay errors in the process will be reported in dmesg.

2. As you have suggested, we could have a *new* file named 
"next_def_domain_type", which takes string as an input.

Please let me know if there is any preference among these approaches and why :)

Regards,
Sai
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-11 Thread Jacob Pan

On Tue, 11 Jun 2019 15:37:42 +0100
Jean-Philippe Brucker  wrote:

> On 11/06/2019 13:26, Jacob Pan wrote:
> >> +/**
> >> + * ioasid_set_data - Set private data for an allocated ioasid
> >> + * @ioasid: the ID to set data
> >> + * @data:   the private data
> >> + *
> >> + * For IOASID that is already allocated, private data can be set
> >> + * via this API. Future lookup can be done via ioasid_find.
> >> + */
> >> +int ioasid_set_data(ioasid_t ioasid, void *data)
> >> +{
> >> +  struct ioasid_data *ioasid_data;
> >> +  int ret = 0;
> >> +
> >> +  xa_lock(_xa);  
> > Just wondering if this is necessary, since xa_load is under
> > rcu_read_lock and we are not changing anything internal to xa. For
> > custom allocator I still need to have the mutex against allocator
> > removal.  
> 
> I think we do need this because of a possible race with ioasid_free():
> 
>  CPU1  CPU2
>   ioasid_free(ioasid) ioasid_set_data(ioasid, foo)
> data = xa_load(...)
> xa_erase(...)
> kfree_rcu(data)   (no RCU lock held)
> ...free(data)
> data->private = foo;
> 
make sense, thanks for explaining.

> The issue is theoretical at the moment because no users do this, but
> I'd be more comfortable taking the xa_lock, which prevents a
> concurrent xa_erase()+free(). (I commented on your v3 but you might
> have missed it)
> 
Did you reply to my v3? I did not see it. I only saw your comments about
v3 in your commit message.

> >> +  ioasid_data = xa_load(_xa, ioasid);
> >> +  if (ioasid_data)
> >> +  rcu_assign_pointer(ioasid_data->private, data);  
> > it is good to publish and have barrier here. But I just wonder even
> > for weakly ordered machine, this pointer update is quite far away
> > from its data update.  
> 
> I don't know, it could be right before calling ioasid_set_data():
> 
>   mydata = kzalloc(sizeof(*mydata));
>   mydata->ops = _ops;  (1)
>   ioasid_set_data(ioasid, mydata);
>   ... /* no write barrier here */
>   data->private = mydata; (2)
> 
> And then another thread calls ioasid_find():
> 
>   mydata = ioasid_find(ioasid);
>   if (mydata)
>   mydata->ops->do_something();
> 
> On a weakly ordered machine, this thread could observe the pointer
> assignment (2) before the ops assignment (1), and dereference NULL.
> Using rcu_assign_pointer() should fix that
> 
I agree it is better to have the barrier. Just thought there is already
a rcu_read_lock() in xa_load() in between. rcu_read_lock() may have
barrier in some case but better not count on it. No issues here. I will
integrate this in the next version.

> Thanks,
> Jean

[Jacob Pan]

Re: [PATCH 0/6] iommu/vt-d: Fixes and cleanups for linux-next

2019-06-11 Thread Qian Cai

On Sun, 2019-06-09 at 10:37 +0800, Lu Baolu wrote:
> Hi Joerg,
> 
> This series includes several fixes and cleanups after delegating
> DMA domain to generic iommu. Please review and consider them for
> linux-next.
> 
> Best regards,
> Baolu
> 
> Lu Baolu (5):
>   iommu/vt-d: Don't return error when device gets right domain
>   iommu/vt-d: Set domain type for a private domain
>   iommu/vt-d: Don't enable iommu's which have been ignored
>   iommu/vt-d: Fix suspicious RCU usage in probe_acpi_namespace_devices()
>   iommu/vt-d: Consolidate domain_init() to avoid duplication
> 
> Sai Praneeth Prakhya (1):
>   iommu/vt-d: Cleanup after delegating DMA domain to generic iommu
> 
>  drivers/iommu/intel-iommu.c | 210 +---
>  1 file changed, 53 insertions(+), 157 deletions(-)
> 

BTW, the linux-next commit "iommu/vt-d: Expose ISA direct mapping region via
iommu_get_resv_regions" [1] also introduced a memory leak below, as it forgets
to ask intel_iommu_put_resv_regions() to call kfree() when
CONFIG_INTEL_IOMMU_FLOPPY_WA=y.

[1] https://lore.kernel.org/patchwork/patch/1078963/

unreferenced object 0x88912ef789c8 (size 64):
  comm "swapper/0", pid 1, jiffies 4294946232 (age 5399.530s)
  hex dump (first 32 bytes):
48 83 f7 2e 91 88 ff ff 30 fa e3 00 82 88 ff ff  H...0...
00 00 00 00 00 00 00 00 00 00 00 01 00 00 00 00  
  backtrace:
[] kmem_cache_alloc_trace+0x266/0x380
[] iommu_alloc_resv_region+0x40/0xb0
[] intel_iommu_get_resv_regions+0x25e/0x2d0
[<21fbc6c3>] iommu_group_create_direct_mappings+0x159/0x3d0
[<22259268>] iommu_group_add_device+0x17b/0x4f0
[<28b91093>] iommu_group_get_for_dev+0x153/0x460
[<577c33b4>] intel_iommu_add_device+0xc4/0x210
[<587b7492>] iommu_probe_device+0x63/0x80
[<4aa997d1>] add_iommu_group+0xe/0x20
[] bus_for_each_dev+0xf0/0x150
[] bus_set_iommu+0xc6/0x100
[] intel_iommu_init+0x682/0xb0a
[<226f7444>] pci_iommu_init+0x26/0x62
[<2d8694f5>] do_one_initcall+0xe5/0x3ea
[<4bc60101>] kernel_init_freeable+0x5ad/0x640
[<91b0bad6>] kernel_init+0x11/0x138

Re: [PATCH] dma-remap: Avoid de-referencing NULL atomic_pool

2019-06-11 Thread Florian Fainelli

On 6/10/19 10:26 PM, Christoph Hellwig wrote:
> Looks good to me.  When did this start to show up?  Do we need
> to push it to Linus this cycle and cc stable?

You need a really contrived memory map layout to reach that situation,
so I don't think it warrants a stable backport.

For ARM64, this seems to date back to 3.18 with
d4932f9e81ae7a7bf3c3967e48373909b9c98ee5 ("arm64: add atomic pool for
non-coherent and CMA allocations") and for ARM 32-bit, I don't think you
could hit that condition since it would mean no lowmem, which would not
lead to a working kernel.
-- 
Florian
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v1 0/3] PCIe and AR8151 on APQ8098/MSM8998

2019-06-11 Thread Marc Gonzalez

[ Trimming recipients list ]

On 11/06/2019 17:55, Lorenzo Pieralisi wrote:

> On Thu, Mar 28, 2019 at 05:59:48PM +0100, Marc Gonzalez wrote:
> 
>> After a lot of poking, I am finally able to use the AR8151 ethernet on the 
>> APQ8098 board.
>> The magic bits are the iommu-map prop and the PCIE20_PARF_BDF_TRANSLATE_N 
>> setup.
>>
>> The WIP thread is archived here:
>> https://marc.info/?t=15505953924=1=2
>>
>>
>> Marc Gonzalez (3):
>>   PCI: qcom: Setup PCIE20_PARF_BDF_TRANSLATE_N
>>   arm64: dts: qcom: msm8998: Add PCIe SMMU node
>>   arm64: dts: qcom: msm8998: Add PCIe PHY and RC nodes
>>
>>  arch/arm64/boot/dts/qcom/msm8998.dtsi  | 93 ++
>>  drivers/pci/controller/dwc/pcie-qcom.c |  4 ++
>>  2 files changed, 97 insertions(+)
> 
> Marc,
> 
> what's the plan with this series ? Please let me know so that
> I can handle it correctly in the PCI patch queue, I am not
> sure by reading comments it has evolved much since posting.

Hello Lorenzo,

You can ignore/drop this series, it has been superseded; FWIW, the latest
patches no longer touch drivers/pci

The hold-up was finding an acceptable work-around for the

Re: [PATCH v1 0/3] PCIe and AR8151 on APQ8098/MSM8998

2019-06-11 Thread Lorenzo Pieralisi

On Thu, Mar 28, 2019 at 05:59:48PM +0100, Marc Gonzalez wrote:
> Hello everyone,
> 
> After a lot of poking, I am finally able to use the AR8151 ethernet on the 
> APQ8098 board.
> The magic bits are the iommu-map prop and the PCIE20_PARF_BDF_TRANSLATE_N 
> setup.
> 
> The WIP thread is archived here:
> https://marc.info/?t=15505953924=1=2
> 
> 
> Marc Gonzalez (3):
>   PCI: qcom: Setup PCIE20_PARF_BDF_TRANSLATE_N
>   arm64: dts: qcom: msm8998: Add PCIe SMMU node
>   arm64: dts: qcom: msm8998: Add PCIe PHY and RC nodes
> 
>  arch/arm64/boot/dts/qcom/msm8998.dtsi  | 93 ++
>  drivers/pci/controller/dwc/pcie-qcom.c |  4 ++
>  2 files changed, 97 insertions(+)

Marc,

what's the plan with this series ? Please let me know so that
I can handle it correctly in the PCI patch queue, I am not
sure by reading comments it has evolved much since posting.

Thanks,
Lorenzo
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [bug report] vfio: Can't find phys by iova in vfio_unmap_unpin()

2019-06-11 Thread Alex Williamson

[cc +iommu]

On Tue, 11 Jun 2019 20:40:41 +0800
Jiangyiwen  wrote:

> Hi Alex,
> 
> I found this problem is not very easy to solve, for
> now, in arm64 platform, the "0" physical address
> is a valid system memory address, so in function
> arm_smmu_iova_to_phys() I think it should not use
> "0" as abnormal return value.
> 
> Do you have any idea?

I think you're going to need to redefine iommu_iova_to_phys() and fix
all the IOMMU implementations of it to comply.  Currently AMD and Intel
IOMMU driver return zero if a mapping is not found.  You could make the
function return 0/errno and return the physical address via a pointer
arg.  You could also keep the existing definition, but introduce a test
for a valid result that might use an architecture specific value (akin
to IS_ERR()).  You could also just reserve the zero page from userspace
allocation.  I really don't want #ifdef in the vfio iommu driver trying
to discern the correct invalid value though.  Thanks,

Alex

> On 2019/6/11 11:21, jiangyiwen wrote:
> > On 2019/5/21 3:28, Alex Williamson wrote:  
> >> On Mon, 20 May 2019 15:50:11 +0800
> >> jiangyiwen  wrote:
> >>  
> >>> Hello alex,
> >>>
> >>> We test a call trace as follows use ARM64 architecture,
> >>> it prints a WARN_ON() when find not physical address by
> >>> iova in vfio_unmap_unpin(), I can't find the cause of
> >>> problem now, do you have any ideas?  
> >> Is it reproducible?  Can you explain how to reproduce it?  The stack
> >> trace indicates a KVM VM is being shutdown and we're trying to clean
> >> out the IOMMU mappings from the domain and find a page that we think
> >> should be mapped that the IOMMU doesn't have mapped.  What device(s) was
> >> assigned to the VM?  This could be an IOMMU driver bug or a
> >> vfio_iommu_type1 bug.  Have you been able to reproduce this on other
> >> platforms?
> >>  
> > Hello Alex,
> >
> > Sorry to reply you so late because of some things,
> > this problem's reason is in some platform (like ARM64),
> > the "0" physical address is valid and can be used for
> > system memory, so in this case it should not print a
> > WARN_ON() and continue, we should unmap and unpin this
> > "0" physical address in these platform.
> >
> > So I want to return L instead of "0" as invalid
> > physical address in function iommu_iova_to_phys(). Do you think
> > it's appropriate?
> >
> > Thanks,
> > Yiwen.
> >  
> >>> In addition, I want to know why there is a WARN_ON() instead
> >>> of BUG_ON()? Does it affect the follow-up process?  
> >> We're removing an IOMMU page mapping entry and find that it's not
> >> present, so ultimately the effect at the IOMMU is the same, there's no
> >> mapping at that address, but I can't say without further analysis
> >> whether that means a page remains pinned or if that inconsistency was
> >> resolved previously elsewhere.  We WARN_ON because this is not what we
> >> expect, but potentially leaking a page of memory doesn't seem worthy of
> >> crashing the host, nor would a crash dump at that point necessarily aid
> >> in resolving the missing page as it potentially occurred well in the
> >> past.  Thanks,
> >>
> >> Alex
> >>
> >> .
> >>  
> >
> >
> > .
> >  
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Alan Stern

On Tue, 11 Jun 2019, Christoph Hellwig wrote:

> Hi Alan,
> 
> thanks for the explanation.  It seems like what usb wants is to:
> 
>  - set sg_tablesize to 1 for devices that can't handle scatterlist at all

Hmmm.  usb-storage (and possible other drivers too) currently handles
such controllers by setting up an SG transfer as a series of separate
URBs, one for each scatterlist entry.  But this is not the same thing,
for two reasons:

It has less I/O overhead than setting sg_tablesize to 1 because 
it sets up the whole transfer as a single SCSI command, which 
requires much less time and traffic on the USB bus than sending 
multiple commands.

It has that requirement about each scatterlist element except
the last being a multiple of the maximum packet size in length.
(This is because the USB protocol says that a transfer ends
whenever a less-than-maximum-size packet is encountered.)

We would like to avoid the extra I/O overhead for host controllers that
can't handle SG.  In fact, switching to sg_tablesize = 1 would probably
be considered a regression.

>  - set the virt boundary as-is for devices supporting "basic" scatterlist,
>although that still assumes they can rejiggle them because for example
>you could still get a smaller than expected first segment ala (assuming
>a 1024 byte packet size and thus 1023 virt_boundary_mask):
> 
> | 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |
> 
>as the virt_bondary does not guarantee that the first segment is
>the same size as all the mid segments.

But that is exactly the problem we need to solve.

The issue which prompted the commit this thread is about arose in a
situation where the block layer set up a scatterlist containing buffer
sizes something like:

4096 4096 1536 1024

and the maximum packet size was 1024.  The situation was a little 
unusual, because it involved vhci-hcd (a virtual HCD).  This doesn't 
matter much in normal practice because:

Block devices normally have a block size of 512 bytes or more.
Smaller values are very uncommon.  So scatterlist element sizes
are always divisible by 512.

xHCI is the only USB host controller type with a maximum packet 
size larger than 512, and xHCI hardware can do full 
scatter-gather so it doesn't care what the buffer sizes are.

So another approach would be to fix vhci-hcd and then trust that the
problem won't arise again, for the reasons above.  We would be okay so
long as nobody tried to use a USB-SCSI device with a block size of 256
bytes or less.

>  - do not set any limit on xhci
> 
> But that just goes back to the original problem, and that is that with
> swiotlb we are limited in the total dma mapping size, and recent block
> layer changes in the way we handle the virt_boundary mean we now build
> much larger requests by default.  For SCSI ULDs to take that into
> account I need to call dma_max_mapping_size() and use that as the
> upper bound for the request size.  My plan is to do that in scsi_lib.c,
> but for that we need to expose the actual struct device that the dma
> mapping is perfomed on to the scsi layer.  If that device is different
> from the sysfs hierchary struct device, which it is for usb the ULDD
> needs to scsi_add_host_with_dma and pass the dma device as well.  How
> do I get at the dma device (aka the HCDs pci_dev or similar) from
> usb-storage/uas?

>From usb_stor_probe2(): us->pusb_dev->bus->sysdev.
>From uas_probe(): udev->bus->sysdev.

The ->sysdev field points to the device used for DMA mapping.  It is
often the same as ->controller, but sometimes it is
->controller->parent because of the peculiarities of some platforms.

Alan Stern

Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-11 Thread Jean-Philippe Brucker

On 11/06/2019 13:26, Jacob Pan wrote:
>> +/**
>> + * ioasid_set_data - Set private data for an allocated ioasid
>> + * @ioasid: the ID to set data
>> + * @data:   the private data
>> + *
>> + * For IOASID that is already allocated, private data can be set
>> + * via this API. Future lookup can be done via ioasid_find.
>> + */
>> +int ioasid_set_data(ioasid_t ioasid, void *data)
>> +{
>> +struct ioasid_data *ioasid_data;
>> +int ret = 0;
>> +
>> +xa_lock(_xa);
> Just wondering if this is necessary, since xa_load is under
> rcu_read_lock and we are not changing anything internal to xa. For
> custom allocator I still need to have the mutex against allocator
> removal.

I think we do need this because of a possible race with ioasid_free():

 CPU1  CPU2
  ioasid_free(ioasid) ioasid_set_data(ioasid, foo)
data = xa_load(...)
xa_erase(...)
kfree_rcu(data)   (no RCU lock held)
...free(data)
data->private = foo;

The issue is theoretical at the moment because no users do this, but I'd
be more comfortable taking the xa_lock, which prevents a concurrent
xa_erase()+free(). (I commented on your v3 but you might have missed it)

>> +ioasid_data = xa_load(_xa, ioasid);
>> +if (ioasid_data)
>> +rcu_assign_pointer(ioasid_data->private, data);
> it is good to publish and have barrier here. But I just wonder even for
> weakly ordered machine, this pointer update is quite far away from its
> data update.

I don't know, it could be right before calling ioasid_set_data():

mydata = kzalloc(sizeof(*mydata));
mydata->ops = _ops;  (1)
ioasid_set_data(ioasid, mydata);
... /* no write barrier here */
data->private = mydata; (2)

And then another thread calls ioasid_find():

mydata = ioasid_find(ioasid);
if (mydata)
mydata->ops->do_something();

On a weakly ordered machine, this thread could observe the pointer
assignment (2) before the ops assignment (1), and dereference NULL.
Using rcu_assign_pointer() should fix that

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID

2019-06-11 Thread Jean-Philippe Brucker

On 11/06/2019 11:45, Jonathan Cameron wrote:
>> +pci_disable_pasid(pdev);
>> +master->ssid_bits = 0;
> 
> If we are being really fussy about ordering, why have this set of
> ssid_bits after pci_disable_pasid rather than before (to reverse order
> of .._enable_pasid)?

Sure, I'll change that

Thanks,
Jean

Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs

2019-06-11 Thread Jean-Philippe Brucker

On 11/06/2019 11:19, Jonathan Cameron wrote:
>> +static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
>> +struct arm_smmu_master *master)
>> +{
>> +struct arm_smmu_device *smmu = smmu_domain->smmu;
>> +struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
>>  
>> -cfg->cdptr[0] = cpu_to_le64(val);
>> +cfg->s1fmt = STRTAB_STE_0_S1FMT_LINEAR;
>> +cfg->s1cdmax = master->ssid_bits;
>> +return arm_smmu_alloc_cd_leaf_table(smmu, >table, 1 << 
>> cfg->s1cdmax);
>> +}
>>  
>> -val = cfg->cd.ttbr & CTXDESC_CD_1_TTB0_MASK;
>> -cfg->cdptr[1] = cpu_to_le64(val);
> 
> Hmm. Diff was having a field day in trying to make the patch as unreadable as 
> possible..

Ugh, yes. This part is a bit more readable with --patience, but I'll
also try to split the patch as you suggest

Thanks,
Jean

Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-11 Thread Jean-Philippe Brucker

On 11/06/2019 10:36, Jonathan Cameron wrote:
>> +/**
>> + * ioasid_alloc - Allocate an IOASID
>> + * @set: the IOASID set
>> + * @min: the minimum ID (inclusive)
>> + * @max: the maximum ID (inclusive)
>> + * @private: data private to the caller
>> + *
>> + * Allocate an ID between @min and @max. The @private pointer is stored
>> + * internally and can be retrieved with ioasid_find().
>> + *
>> + * Return: the allocated ID on success, or %INVALID_IOASID on failure.
>> + */
>> +ioasid_t ioasid_alloc(struct ioasid_set *set, ioasid_t min, ioasid_t max,
>> +  void *private)
>> +{
>> +u32 id = INVALID_IOASID;
>> +struct ioasid_data *data;
>> +
>> +data = kzalloc(sizeof(*data), GFP_KERNEL);
>> +if (!data)
>> +return INVALID_IOASID;
>> +
>> +data->set = set;
>> +data->private = private;
>> +
>> +if (xa_alloc(_xa, , data, XA_LIMIT(min, max), GFP_KERNEL)) {
>> +pr_err("Failed to alloc ioasid from %d to %d\n", min, max);
>> +goto exit_free;
>> +}
>> +data->id = id;
>> +
>> +exit_free:
> 
> This error flow is perhaps a little more confusing than it needs to be?
> 
> My assumption (perhaps wrong) is that we only have an id == INVALID_IOASID
> if the xa_alloc fails, and that we will always have such an id value if
> it does (I'm not totally sure this second element is true in __xa_alloc).
> 
> If I'm missing something perhaps a comment on how else we'd get here.

Yes we can simplify this:

return id;
exit_free:
kfree(data)
return INVALID_IOASID;
}

The XA API doesn't say that @id passed to xa_alloc() won't be modified
in case of error. It's true in the current implementation, but won't
necessarily stay that way. On the other hand I think it's safe to expect
@id to always be set when xa_alloc() succeeds.

>> +/**
>> + * ioasid_find - Find IOASID data
>> + * @set: the IOASID set
>> + * @ioasid: the IOASID to find
>> + * @getter: function to call on the found object
>> + *
>> + * The optional getter function allows to take a reference to the found 
>> object
>> + * under the rcu lock. The function can also check if the object is still 
>> valid:
>> + * if @getter returns false, then the object is invalid and NULL is 
>> returned.
>> + *
>> + * If the IOASID has been allocated for this set, return the private pointer
>> + * passed to ioasid_alloc. Private data can be NULL if not set. Return an 
>> error
>> + * if the IOASID is not found or does not belong to the set.
> 
> Perhaps should make it clear that @set can be null.

Indeed. But I'm not sure allowing @set to be NULL is such a good idea,
because the data type associated to an ioasid depends on its set. For
example SVA will put an mm_struct in there, and auxiliary domains use
some structure private to the IOMMU domain.

Jacob, could me make @set mandatory, or do you see a use for a global
search? If @set is NULL, then callers can check if the return pointer is
NULL, but will run into trouble if they try to dereference it.

> 
>> + */
>> +void *ioasid_find(struct ioasid_set *set, ioasid_t ioasid,
>> +  bool (*getter)(void *))
>> +{
>> +void *priv = NULL;
> 
> Set in all paths, so does need to be set here.

Right

Thanks,
Jean

Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID

2019-06-11 Thread Jean-Philippe Brucker

On 11/06/2019 10:42, Jonathan Cameron wrote:
>> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
>> index 519e40fb23ce..b91df613385f 100644
>> --- a/include/linux/iommu.h
>> +++ b/include/linux/iommu.h
>> @@ -536,6 +536,7 @@ struct iommu_fwspec {
>>  struct fwnode_handle*iommu_fwnode;
>>  void*iommu_priv;
>>  u32 flags;
>> +u32 num_pasid_bits;
> 
> This structure has kernel doc so you need to add something for this.

Good catch

Thanks,
Jean

Re: "iommu/vt-d: Delegate DMA domain to generic iommu" series breaks megaraid_sas

2019-06-11 Thread Qian Cai




> On Jun 10, 2019, at 9:41 PM, Lu Baolu  wrote:
> 
> Ah, good catch!
> 
> The device failed to be attached by a DMA domain. Can you please try the
> attached fix patch?

It works fine.

> 
> [  101.885468] pci :06:00.0: DMAR: Device is ineligible for IOMMU
> domain attach due to platform RMRR requirement.  Contact your platform
> vendor.
> [  101.900801] pci :06:00.0: Failed to add to iommu group 23: -1
> 
> Best regards,
> Baolu
> 
> On 6/10/19 10:54 PM, Qian Cai wrote:
>> On Mon, 2019-06-10 at 09:44 -0400, Qian Cai wrote:
>>> On Sun, 2019-06-09 at 10:43 +0800, Lu Baolu wrote:
 Hi Qian,
 
 I just posted some fix patches. I cc'ed them in your email inbox as
 well. Can you please check whether they happen to fix your issue?
 If not, do you mind posting more debug messages?
>>> 
>>> Unfortunately, it does not work. Here is the dmesg.
>>> 
>>> https://raw.githubusercontent.com/cailca/tmp/master/dmesg?token=AMC35QKPIZBYUM
>>> FUQKLW4ZC47ZPIK
>> This one should be good to view.
>> https://cailca.github.io/files/dmesg.txt
> <0001-iommu-vt-d-Allow-DMA-domain-attaching-to-rmrr-locked.patch>

[RFC CFT 6/6] iommu/arm-smmu-v3: Reduce contention during command-queue insertion

2019-06-11 Thread Will Deacon

The SMMU command queue is a bottleneck in large systems, thanks to the
spin_lock which serialises accesses from all CPUs to the single queue
supported by the hardware.

Attempt to improve this situation by moving to a new algorithm for
inserting commands into the queue, which is lock-free on the fast-path.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 579 +---
 1 file changed, 441 insertions(+), 138 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 85535400a365..1e9fa83b502a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -183,7 +183,7 @@
 
 #define Q_IDX(llq, p)  ((p) & ((1 << (llq)->max_n_shift) - 1))
 #define Q_WRP(llq, p)  ((p) & (1 << (llq)->max_n_shift))
-#define Q_OVERFLOW_FLAG(1 << 31)
+#define Q_OVERFLOW_FLAG(1U << 31)
 #define Q_OVF(p)   ((p) & Q_OVERFLOW_FLAG)
 #define Q_ENT(q, p)((q)->base +\
 Q_IDX(&((q)->llq), p) *\
@@ -301,6 +301,8 @@
 #define CMDQ_ERR_CERROR_ABT_IDX2
 #define CMDQ_ERR_CERROR_ATC_INV_IDX3
 
+#define CMDQ_PROD_OWNED_FLAG   Q_OVERFLOW_FLAG
+
 #define CMDQ_0_OP  GENMASK_ULL(7, 0)
 #define CMDQ_0_SSV (1UL << 11)
 
@@ -363,9 +365,8 @@
 #define PRIQ_1_ADDR_MASK   GENMASK_ULL(63, 12)
 
 /* High-level queue structures */
-#define ARM_SMMU_POLL_TIMEOUT_US   100
-#define ARM_SMMU_CMDQ_SYNC_TIMEOUT_US  100 /* 1s! */
-#define ARM_SMMU_CMDQ_SYNC_SPIN_COUNT  10
+#define ARM_SMMU_POLL_TIMEOUT_US   100 /* 1s! */
+#define ARM_SMMU_POLL_SPIN_COUNT   10
 
 #define MSI_IOVA_BASE  0x800
 #define MSI_IOVA_LENGTH0x10
@@ -467,15 +468,24 @@ struct arm_smmu_cmdq_ent {
 
#define CMDQ_OP_CMD_SYNC0x46
struct {
-   u32 msidata;
-   u64 msiaddr;
+   boolmsi;
} sync;
};
 };
 
 struct arm_smmu_ll_queue {
-   u32 prod;
-   u32 cons;
+   union {
+   u64 val;
+   struct {
+   u32 prod;
+   u32 cons;
+   };
+   struct {
+   atomic_tprod;
+   atomic_tcons;
+   } atomic;
+   u8  __pad[SMP_CACHE_BYTES];
+   } cacheline_aligned_in_smp;
u32 max_n_shift;
 };
 
@@ -493,9 +503,18 @@ struct arm_smmu_queue {
u32 __iomem *cons_reg;
 };
 
+struct arm_smmu_queue_poll {
+   ktime_t timeout;
+   unsigned intdelay;
+   unsigned intspin_cnt;
+   boolwfe;
+};
+
 struct arm_smmu_cmdq {
struct arm_smmu_queue   q;
-   spinlock_t  lock;
+   unsigned long   *valid_map;
+   atomic_towner_prod;
+   atomic_tlock;
 };
 
 struct arm_smmu_evtq {
@@ -595,12 +614,6 @@ struct arm_smmu_device {
 
struct arm_smmu_strtab_cfg  strtab_cfg;
 
-   /* Hi16xx adds an extra 32 bits of goodness to its MSI payload */
-   union {
-   u32 sync_count;
-   u64 padding;
-   };
-
/* IOMMU core code handle */
struct iommu_device iommu;
 };
@@ -696,9 +709,12 @@ static bool queue_empty(struct arm_smmu_ll_queue *q)
   Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
 }
 
-static void queue_sync_cons_in(struct arm_smmu_queue *q)
+static bool queue_consumed(struct arm_smmu_ll_queue *q, u32 prod)
 {
-   q->llq.cons = readl_relaxed(q->cons_reg);
+   return ((Q_WRP(q, q->cons) == Q_WRP(q, prod)) &&
+   (Q_IDX(q, q->cons) > Q_IDX(q, prod))) ||
+  ((Q_WRP(q, q->cons) != Q_WRP(q, prod)) &&
+   (Q_IDX(q, q->cons) <= Q_IDX(q, prod)));
 }
 
 static void queue_sync_cons_out(struct arm_smmu_queue *q)
@@ -729,46 +745,39 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q)
return ret;
 }
 
-static void queue_sync_prod_out(struct arm_smmu_queue *q)
+static u32 queue_inc_prod_n(struct arm_smmu_ll_queue *q, int n)
 {
-   writel(q->llq.prod, q->prod_reg);
+   u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + n;
+   return Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
 }
 
 static void queue_inc_prod(struct arm_smmu_ll_queue *q)
 {
-   u32 prod = (Q_WRP(q, q->prod)

[RFC CFT 5/6] iommu/arm-smmu-v3: Operate directly on low-level queue where possible

2019-06-11 Thread Will Deacon

In preparation for rewriting the command queue insertion code to use a
new algorithm, rework many of our queue macro accessors and manipulation
functions so that they operate on the arm_smmu_ll_queue structure where
possible. This will allow us to call these helpers on local variables
without having to construct a full-blown arm_smmu_queue on the stack.

No functional change.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 58 -
 1 file changed, 31 insertions(+), 27 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index d72da799bd0a..85535400a365 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -181,12 +181,13 @@
 #define ARM_SMMU_MEMATTR_DEVICE_nGnRE  0x1
 #define ARM_SMMU_MEMATTR_OIWB  0xf
 
-#define Q_IDX(q, p)((p) & ((1 << (q)->llq.max_n_shift) - 
1))
-#define Q_WRP(q, p)((p) & (1 << (q)->llq.max_n_shift))
+#define Q_IDX(llq, p)  ((p) & ((1 << (llq)->max_n_shift) - 1))
+#define Q_WRP(llq, p)  ((p) & (1 << (llq)->max_n_shift))
 #define Q_OVERFLOW_FLAG(1 << 31)
 #define Q_OVF(p)   ((p) & Q_OVERFLOW_FLAG)
 #define Q_ENT(q, p)((q)->base +\
-Q_IDX(q, p) * (q)->ent_dwords)
+Q_IDX(&((q)->llq), p) *\
+(q)->ent_dwords)
 
 #define Q_BASE_RWA (1UL << 62)
 #define Q_BASE_ADDR_MASK   GENMASK_ULL(51, 5)
@@ -683,16 +684,16 @@ static void parse_driver_options(struct arm_smmu_device 
*smmu)
 }
 
 /* Low-level queue manipulation functions */
-static bool queue_full(struct arm_smmu_queue *q)
+static bool queue_full(struct arm_smmu_ll_queue *q)
 {
-   return Q_IDX(q, q->llq.prod) == Q_IDX(q, q->llq.cons) &&
-  Q_WRP(q, q->llq.prod) != Q_WRP(q, q->llq.cons);
+   return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
+  Q_WRP(q, q->prod) != Q_WRP(q, q->cons);
 }
 
-static bool queue_empty(struct arm_smmu_queue *q)
+static bool queue_empty(struct arm_smmu_ll_queue *q)
 {
-   return Q_IDX(q, q->llq.prod) == Q_IDX(q, q->llq.cons) &&
-  Q_WRP(q, q->llq.prod) == Q_WRP(q, q->llq.cons);
+   return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
+  Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
 }
 
 static void queue_sync_cons_in(struct arm_smmu_queue *q)
@@ -710,10 +711,10 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q)
writel_relaxed(q->llq.cons, q->cons_reg);
 }
 
-static void queue_inc_cons(struct arm_smmu_queue *q)
+static void queue_inc_cons(struct arm_smmu_ll_queue *q)
 {
-   u32 cons = (Q_WRP(q, q->llq.cons) | Q_IDX(q, q->llq.cons)) + 1;
-   q->llq.cons = Q_OVF(q->llq.cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
+   u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
+   q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
 }
 
 static int queue_sync_prod_in(struct arm_smmu_queue *q)
@@ -733,10 +734,10 @@ static void queue_sync_prod_out(struct arm_smmu_queue *q)
writel(q->llq.prod, q->prod_reg);
 }
 
-static void queue_inc_prod(struct arm_smmu_queue *q)
+static void queue_inc_prod(struct arm_smmu_ll_queue *q)
 {
-   u32 prod = (Q_WRP(q, q->llq.prod) | Q_IDX(q, q->llq.prod)) + 1;
-   q->llq.prod = Q_OVF(q->llq.prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
+   u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
+   q->prod = Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
 }
 
 /*
@@ -753,7 +754,8 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool 
sync, bool wfe)
ARM_SMMU_CMDQ_SYNC_TIMEOUT_US :
ARM_SMMU_POLL_TIMEOUT_US);
 
-   while (queue_sync_cons_in(q), (sync ? !queue_empty(q) : queue_full(q))) 
{
+   while (queue_sync_cons_in(q),
+ (sync ? !queue_empty(>llq) : queue_full(>llq))) {
if (ktime_compare(ktime_get(), timeout) > 0)
return -ETIMEDOUT;
 
@@ -782,11 +784,11 @@ static void queue_write(__le64 *dst, u64 *src, size_t 
n_dwords)
 
 static int queue_insert_raw(struct arm_smmu_queue *q, u64 *ent)
 {
-   if (queue_full(q))
+   if (queue_full(>llq))
return -ENOSPC;
 
queue_write(Q_ENT(q, q->llq.prod), ent, q->ent_dwords);
-   queue_inc_prod(q);
+   queue_inc_prod(>llq);
queue_sync_prod_out(q);
return 0;
 }
@@ -801,11 +803,11 @@ static void queue_read(__le64 *dst, u64 *src, size_t 
n_dwords)
 
 static int queue_remove_raw(struct arm_smmu_queue *q, u64 *ent)
 {
-   if (queue_empty(q))
+   if (queue_empty(>llq))
return -EAGAIN;
 
queue_read(ent, Q_ENT(q, q->llq.cons), q->ent_dwords);
-   queue_inc_cons(q);
+

[RFC CFT 1/6] iommu/arm-smmu-v3: Increase maximum size of queues

2019-06-11 Thread Will Deacon

We've been artificially limiting the size of our queues to 4k so that we
don't end up allocating huge amounts of physically-contiguous memory at
probe time. However, 4k is only enough for 256 commands in the command
queue, so instead let's try to allocate the largest queue that the SMMU
supports, retrying with a smaller size if the allocation fails.

The caveat here is that we have to limit our upper bound based on
CONFIG_CMA_ALIGNMENT to ensure that our queue allocations remain
natually aligned, which is required by the SMMU architecture.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 54 +++--
 1 file changed, 38 insertions(+), 16 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 4d5a694f02c2..65de2458999f 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -191,6 +191,7 @@
 #define Q_BASE_RWA (1UL << 62)
 #define Q_BASE_ADDR_MASK   GENMASK_ULL(51, 5)
 #define Q_BASE_LOG2SIZEGENMASK(4, 0)
+#define Q_MAX_SZ_SHIFT (PAGE_SHIFT + CONFIG_CMA_ALIGNMENT)
 
 /*
  * Stream table.
@@ -289,8 +290,9 @@
FIELD_GET(ARM64_TCR_##fld, tcr))
 
 /* Command queue */
-#define CMDQ_ENT_DWORDS2
-#define CMDQ_MAX_SZ_SHIFT  8
+#define CMDQ_ENT_SZ_SHIFT  4
+#define CMDQ_ENT_DWORDS((1 << CMDQ_ENT_SZ_SHIFT) >> 3)
+#define CMDQ_MAX_SZ_SHIFT  (Q_MAX_SZ_SHIFT - CMDQ_ENT_SZ_SHIFT)
 
 #define CMDQ_CONS_ERR  GENMASK(30, 24)
 #define CMDQ_ERR_CERROR_NONE_IDX   0
@@ -336,14 +338,16 @@
 #define CMDQ_SYNC_1_MSIADDR_MASK   GENMASK_ULL(51, 2)
 
 /* Event queue */
-#define EVTQ_ENT_DWORDS4
-#define EVTQ_MAX_SZ_SHIFT  7
+#define EVTQ_ENT_SZ_SHIFT  5
+#define EVTQ_ENT_DWORDS((1 << EVTQ_ENT_SZ_SHIFT) >> 3)
+#define EVTQ_MAX_SZ_SHIFT  (Q_MAX_SZ_SHIFT - EVTQ_ENT_SZ_SHIFT)
 
 #define EVTQ_0_ID  GENMASK_ULL(7, 0)
 
 /* PRI queue */
-#define PRIQ_ENT_DWORDS2
-#define PRIQ_MAX_SZ_SHIFT  8
+#define PRIQ_ENT_SZ_SHIFT  4
+#define PRIQ_ENT_DWORDS((1 << PRIQ_ENT_SZ_SHIFT) >> 3)
+#define PRIQ_MAX_SZ_SHIFT  (Q_MAX_SZ_SHIFT - PRIQ_ENT_SZ_SHIFT)
 
 #define PRIQ_0_SID GENMASK_ULL(31, 0)
 #define PRIQ_0_SSIDGENMASK_ULL(51, 32)
@@ -798,7 +802,7 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 
*ent)
 /* High-level queue accessors */
 static int arm_smmu_cmdq_build_cmd(u64 *cmd, struct arm_smmu_cmdq_ent *ent)
 {
-   memset(cmd, 0, CMDQ_ENT_DWORDS << 3);
+   memset(cmd, 0, 1 << CMDQ_ENT_SZ_SHIFT);
cmd[0] |= FIELD_PREP(CMDQ_0_OP, ent->opcode);
 
switch (ent->opcode) {
@@ -2270,17 +2274,32 @@ static int arm_smmu_init_one_queue(struct 
arm_smmu_device *smmu,
   struct arm_smmu_queue *q,
   unsigned long prod_off,
   unsigned long cons_off,
-  size_t dwords)
+  size_t dwords, const char *name)
 {
-   size_t qsz = ((1 << q->max_n_shift) * dwords) << 3;
+   size_t qsz;
+
+   do {
+   qsz = ((1 << q->max_n_shift) * dwords) << 3;
+   q->base = dmam_alloc_coherent(smmu->dev, qsz, >base_dma,
+ GFP_KERNEL);
+   if (q->base || qsz < PAGE_SIZE)
+   break;
+
+   q->max_n_shift--;
+   } while (1);
 
-   q->base = dmam_alloc_coherent(smmu->dev, qsz, >base_dma, GFP_KERNEL);
if (!q->base) {
-   dev_err(smmu->dev, "failed to allocate queue (0x%zx bytes)\n",
-   qsz);
+   dev_err(smmu->dev,
+   "failed to allocate queue (0x%zx bytes) for %s\n",
+   qsz, name);
return -ENOMEM;
}
 
+   if (!WARN_ON(q->base_dma & (qsz - 1))) {
+   dev_info(smmu->dev, "allocated %u entries for %s\n",
+1 << q->max_n_shift, name);
+   }
+
q->prod_reg = arm_smmu_page1_fixup(prod_off, smmu);
q->cons_reg = arm_smmu_page1_fixup(cons_off, smmu);
q->ent_dwords   = dwords;
@@ -2300,13 +2319,15 @@ static int arm_smmu_init_queues(struct arm_smmu_device 
*smmu)
/* cmdq */
spin_lock_init(>cmdq.lock);
ret = arm_smmu_init_one_queue(smmu, >cmdq.q, ARM_SMMU_CMDQ_PROD,
- ARM_SMMU_CMDQ_CONS, CMDQ_ENT_DWORDS);
+ ARM_SMMU_CMDQ_CONS, CMDQ_ENT_DWORDS,
+ "cmdq");
if (ret)
return ret;
 
/* evtq */

[RFC CFT 4/6] iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue

2019-06-11 Thread Will Deacon

In preparation for rewriting the command queue insertion code to use a
new algorithm, introduce a new arm_smmu_ll_queue structure which contains
only the information necessary to perform queue arithmetic for a queue
and will later be extended so that we can perform complex atomic
manipulation on some of the fields.

No functional change.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 88 -
 1 file changed, 47 insertions(+), 41 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 1b7e8fe26c41..d72da799bd0a 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -181,8 +181,8 @@
 #define ARM_SMMU_MEMATTR_DEVICE_nGnRE  0x1
 #define ARM_SMMU_MEMATTR_OIWB  0xf
 
-#define Q_IDX(q, p)((p) & ((1 << (q)->max_n_shift) - 1))
-#define Q_WRP(q, p)((p) & (1 << (q)->max_n_shift))
+#define Q_IDX(q, p)((p) & ((1 << (q)->llq.max_n_shift) - 
1))
+#define Q_WRP(q, p)((p) & (1 << (q)->llq.max_n_shift))
 #define Q_OVERFLOW_FLAG(1 << 31)
 #define Q_OVF(p)   ((p) & Q_OVERFLOW_FLAG)
 #define Q_ENT(q, p)((q)->base +\
@@ -472,7 +472,14 @@ struct arm_smmu_cmdq_ent {
};
 };
 
+struct arm_smmu_ll_queue {
+   u32 prod;
+   u32 cons;
+   u32 max_n_shift;
+};
+
 struct arm_smmu_queue {
+   struct arm_smmu_ll_queuellq;
int irq; /* Wired interrupt */
 
__le64  *base;
@@ -480,9 +487,6 @@ struct arm_smmu_queue {
u64 q_base;
 
size_t  ent_dwords;
-   u32 max_n_shift;
-   u32 prod;
-   u32 cons;
 
u32 __iomem *prod_reg;
u32 __iomem *cons_reg;
@@ -681,19 +685,19 @@ static void parse_driver_options(struct arm_smmu_device 
*smmu)
 /* Low-level queue manipulation functions */
 static bool queue_full(struct arm_smmu_queue *q)
 {
-   return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
-  Q_WRP(q, q->prod) != Q_WRP(q, q->cons);
+   return Q_IDX(q, q->llq.prod) == Q_IDX(q, q->llq.cons) &&
+  Q_WRP(q, q->llq.prod) != Q_WRP(q, q->llq.cons);
 }
 
 static bool queue_empty(struct arm_smmu_queue *q)
 {
-   return Q_IDX(q, q->prod) == Q_IDX(q, q->cons) &&
-  Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
+   return Q_IDX(q, q->llq.prod) == Q_IDX(q, q->llq.cons) &&
+  Q_WRP(q, q->llq.prod) == Q_WRP(q, q->llq.cons);
 }
 
 static void queue_sync_cons_in(struct arm_smmu_queue *q)
 {
-   q->cons = readl_relaxed(q->cons_reg);
+   q->llq.cons = readl_relaxed(q->cons_reg);
 }
 
 static void queue_sync_cons_out(struct arm_smmu_queue *q)
@@ -703,13 +707,13 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q)
 * are complete before we update the cons pointer.
 */
mb();
-   writel_relaxed(q->cons, q->cons_reg);
+   writel_relaxed(q->llq.cons, q->cons_reg);
 }
 
 static void queue_inc_cons(struct arm_smmu_queue *q)
 {
-   u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
-   q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
+   u32 cons = (Q_WRP(q, q->llq.cons) | Q_IDX(q, q->llq.cons)) + 1;
+   q->llq.cons = Q_OVF(q->llq.cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
 }
 
 static int queue_sync_prod_in(struct arm_smmu_queue *q)
@@ -717,22 +721,22 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q)
int ret = 0;
u32 prod = readl_relaxed(q->prod_reg);
 
-   if (Q_OVF(prod) != Q_OVF(q->prod))
+   if (Q_OVF(prod) != Q_OVF(q->llq.prod))
ret = -EOVERFLOW;
 
-   q->prod = prod;
+   q->llq.prod = prod;
return ret;
 }
 
 static void queue_sync_prod_out(struct arm_smmu_queue *q)
 {
-   writel(q->prod, q->prod_reg);
+   writel(q->llq.prod, q->prod_reg);
 }
 
 static void queue_inc_prod(struct arm_smmu_queue *q)
 {
-   u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
-   q->prod = Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
+   u32 prod = (Q_WRP(q, q->llq.prod) | Q_IDX(q, q->llq.prod)) + 1;
+   q->llq.prod = Q_OVF(q->llq.prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
 }
 
 /*
@@ -781,7 +785,7 @@ static int queue_insert_raw(struct arm_smmu_queue *q, u64 
*ent)
if (queue_full(q))
return -ENOSPC;
 
-   queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
+   queue_write(Q_ENT(q, q->llq.prod), ent, q->ent_dwords);
queue_inc_prod(q);
queue_sync_prod_out(q);
return 0;
@@ -800,7 +804,7 @@ static int queue_remove_raw(struct

[RFC CFT 3/6] iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro

2019-06-11 Thread Will Deacon

The Q_OVF macro doesn't need to access the arm_smmu_queue structure, so
drop the unused macro argument.

No functional change.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 2d756e63865b..1b7e8fe26c41 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -184,7 +184,7 @@
 #define Q_IDX(q, p)((p) & ((1 << (q)->max_n_shift) - 1))
 #define Q_WRP(q, p)((p) & (1 << (q)->max_n_shift))
 #define Q_OVERFLOW_FLAG(1 << 31)
-#define Q_OVF(q, p)((p) & Q_OVERFLOW_FLAG)
+#define Q_OVF(p)   ((p) & Q_OVERFLOW_FLAG)
 #define Q_ENT(q, p)((q)->base +\
 Q_IDX(q, p) * (q)->ent_dwords)
 
@@ -709,7 +709,7 @@ static void queue_sync_cons_out(struct arm_smmu_queue *q)
 static void queue_inc_cons(struct arm_smmu_queue *q)
 {
u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
-   q->cons = Q_OVF(q, q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
+   q->cons = Q_OVF(q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
 }
 
 static int queue_sync_prod_in(struct arm_smmu_queue *q)
@@ -717,7 +717,7 @@ static int queue_sync_prod_in(struct arm_smmu_queue *q)
int ret = 0;
u32 prod = readl_relaxed(q->prod_reg);
 
-   if (Q_OVF(q, prod) != Q_OVF(q, q->prod))
+   if (Q_OVF(prod) != Q_OVF(q->prod))
ret = -EOVERFLOW;
 
q->prod = prod;
@@ -732,7 +732,7 @@ static void queue_sync_prod_out(struct arm_smmu_queue *q)
 static void queue_inc_prod(struct arm_smmu_queue *q)
 {
u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
-   q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
+   q->prod = Q_OVF(q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
 }
 
 /*
@@ -1328,7 +1328,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void 
*dev)
} while (!queue_empty(q));
 
/* Sync our overflow flag, as we believe we're up to speed */
-   q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+   q->cons = Q_OVF(q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
return IRQ_HANDLED;
 }
 
@@ -1385,7 +1385,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void 
*dev)
} while (!queue_empty(q));
 
/* Sync our overflow flag, as we believe we're up to speed */
-   q->cons = Q_OVF(q, q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
+   q->cons = Q_OVF(q->prod) | Q_WRP(q, q->cons) | Q_IDX(q, q->cons);
writel(q->cons, q->cons_reg);
return IRQ_HANDLED;
 }
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC CFT 2/6] iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes

2019-06-11 Thread Will Deacon

In preparation for rewriting the command queue insertion code to use a
new algorithm, separate the software and hardware views of the prod and
cons indexes so that manipulating the software state doesn't
automatically update the hardware state at the same time.

No functional change.

Signed-off-by: Will Deacon 
---
 drivers/iommu/arm-smmu-v3.c | 36 ++--
 1 file changed, 22 insertions(+), 14 deletions(-)

diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
index 65de2458999f..2d756e63865b 100644
--- a/drivers/iommu/arm-smmu-v3.c
+++ b/drivers/iommu/arm-smmu-v3.c
@@ -691,17 +691,13 @@ static bool queue_empty(struct arm_smmu_queue *q)
   Q_WRP(q, q->prod) == Q_WRP(q, q->cons);
 }
 
-static void queue_sync_cons(struct arm_smmu_queue *q)
+static void queue_sync_cons_in(struct arm_smmu_queue *q)
 {
q->cons = readl_relaxed(q->cons_reg);
 }
 
-static void queue_inc_cons(struct arm_smmu_queue *q)
+static void queue_sync_cons_out(struct arm_smmu_queue *q)
 {
-   u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
-
-   q->cons = Q_OVF(q, q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
-
/*
 * Ensure that all CPU accesses (reads and writes) to the queue
 * are complete before we update the cons pointer.
@@ -710,7 +706,13 @@ static void queue_inc_cons(struct arm_smmu_queue *q)
writel_relaxed(q->cons, q->cons_reg);
 }
 
-static int queue_sync_prod(struct arm_smmu_queue *q)
+static void queue_inc_cons(struct arm_smmu_queue *q)
+{
+   u32 cons = (Q_WRP(q, q->cons) | Q_IDX(q, q->cons)) + 1;
+   q->cons = Q_OVF(q, q->cons) | Q_WRP(q, cons) | Q_IDX(q, cons);
+}
+
+static int queue_sync_prod_in(struct arm_smmu_queue *q)
 {
int ret = 0;
u32 prod = readl_relaxed(q->prod_reg);
@@ -722,12 +724,15 @@ static int queue_sync_prod(struct arm_smmu_queue *q)
return ret;
 }
 
+static void queue_sync_prod_out(struct arm_smmu_queue *q)
+{
+   writel(q->prod, q->prod_reg);
+}
+
 static void queue_inc_prod(struct arm_smmu_queue *q)
 {
u32 prod = (Q_WRP(q, q->prod) | Q_IDX(q, q->prod)) + 1;
-
q->prod = Q_OVF(q, q->prod) | Q_WRP(q, prod) | Q_IDX(q, prod);
-   writel(q->prod, q->prod_reg);
 }
 
 /*
@@ -744,7 +749,7 @@ static int queue_poll_cons(struct arm_smmu_queue *q, bool 
sync, bool wfe)
ARM_SMMU_CMDQ_SYNC_TIMEOUT_US :
ARM_SMMU_POLL_TIMEOUT_US);
 
-   while (queue_sync_cons(q), (sync ? !queue_empty(q) : queue_full(q))) {
+   while (queue_sync_cons_in(q), (sync ? !queue_empty(q) : queue_full(q))) 
{
if (ktime_compare(ktime_get(), timeout) > 0)
return -ETIMEDOUT;
 
@@ -778,6 +783,7 @@ static int queue_insert_raw(struct arm_smmu_queue *q, u64 
*ent)
 
queue_write(Q_ENT(q, q->prod), ent, q->ent_dwords);
queue_inc_prod(q);
+   queue_sync_prod_out(q);
return 0;
 }
 
@@ -796,6 +802,7 @@ static int queue_remove_raw(struct arm_smmu_queue *q, u64 
*ent)
 
queue_read(ent, Q_ENT(q, q->cons), q->ent_dwords);
queue_inc_cons(q);
+   queue_sync_cons_out(q);
return 0;
 }
 
@@ -1316,7 +1323,7 @@ static irqreturn_t arm_smmu_evtq_thread(int irq, void 
*dev)
 * Not much we can do on overflow, so scream and pretend we're
 * trying harder.
 */
-   if (queue_sync_prod(q) == -EOVERFLOW)
+   if (queue_sync_prod_in(q) == -EOVERFLOW)
dev_err(smmu->dev, "EVTQ overflow detected -- events 
lost\n");
} while (!queue_empty(q));
 
@@ -1373,7 +1380,7 @@ static irqreturn_t arm_smmu_priq_thread(int irq, void 
*dev)
while (!queue_remove_raw(q, evt))
arm_smmu_handle_ppr(smmu, evt);
 
-   if (queue_sync_prod(q) == -EOVERFLOW)
+   if (queue_sync_prod_in(q) == -EOVERFLOW)
dev_err(smmu->dev, "PRIQ overflow detected -- requests 
lost\n");
} while (!queue_empty(q));
 
@@ -1564,8 +1571,9 @@ static void arm_smmu_tlb_inv_context(void *cookie)
/*
 * NOTE: when io-pgtable is in non-strict mode, we may get here with
 * PTEs previously cleared by unmaps on the current CPU not yet visible
-* to the SMMU. We are relying on the DSB implicit in queue_inc_prod()
-* to guarantee those are observed before the TLBI. Do be careful, 007.
+* to the SMMU. We are relying on the DSB implicit in
+* queue_sync_prod_out() to guarantee those are observed before the
+* TLBI. Do be careful, 007.
 */
arm_smmu_cmdq_issue_cmd(smmu, );
arm_smmu_cmdq_issue_sync(smmu);
-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[RFC CFT 0/6] Try to reduce lock contention on the SMMUv3 command queue

2019-06-11 Thread Will Deacon

Hi all,

This patch series is an attempt to reduce lock contention when inserting
commands into the Arm SMMUv3 command queue. Unfortunately, our initial
benchmarking has shown mixed results across the board and the changes in
the last patch don't appear to justify their complexity. Based on that,
I only plan to queue the first patch for the time being.

Anyway, before I park this series, I thought it was probably worth
sharing it in case it's useful to somebody. If you have a system where
you believe I/O performance to be limited by the SMMUv3 command queue
then please try these patches and let me know what happens, even if it's
just more bad news.

Patches based on 5.2-rc3. I've also pushed them out to my iommu/devel
branch for the moment:

  
https://git.kernel.org/pub/scm/linux/kernel/git/will/linux.git/log/?h=iommu/devel

Thanks,

Will

--->8

Cc: Jean-Philippe Brucker 
Cc: Robin Murphy 
Cc: Jayachandran Chandrasekharan Nair 
Cc: Jan Glauber 
Cc: Jon Masters 
Cc: Eric Auger 
Cc: Zhen Lei 
Cc: Jonathan Cameron 
Cc: Vijay Kilary 
Cc: Joerg Roedel 

Will Deacon (6):
  iommu/arm-smmu-v3: Increase maximum size of queues
  iommu/arm-smmu-v3: Separate s/w and h/w views of prod and cons indexes
  iommu/arm-smmu-v3: Drop unused 'q' argument from Q_OVF macro
  iommu/arm-smmu-v3: Move low-level queue fields out of arm_smmu_queue
  iommu/arm-smmu-v3: Operate directly on low-level queue where possible
  iommu/arm-smmu-v3: Reduce contention during command-queue insertion

 drivers/iommu/arm-smmu-v3.c | 725 
 1 file changed, 534 insertions(+), 191 deletions(-)

-- 
2.11.0

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v8 26/29] vfio-pci: Register an iommu fault handler

2019-06-11 Thread Jean-Philippe Brucker

On 10/06/2019 22:31, Jacob Pan wrote:
> On Mon, 10 Jun 2019 13:45:02 +0100
> Jean-Philippe Brucker  wrote:
> 
>> On 07/06/2019 18:43, Jacob Pan wrote:
> So it seems we agree on the following:
> - iommu_unregister_device_fault_handler() will never fail
> - iommu driver cleans up all pending faults when handler is
> unregistered
> - assume device driver or guest not sending more page response
> _after_ handler is unregistered.
> - system will tolerate rare spurious response
>
> Sounds right?

 Yes, I'll add that to the fault series  
>>> Hold on a second please, I think we need more clarifications. Ashok
>>> pointed out to me that the spurious response can be harmful to other
>>> devices when it comes to mdev, where PRQ group id is not per PASID,
>>> device may reuse the group number and receiving spurious page
>>> response can confuse the entire PF.   
>>
>> I don't understand how mdev differs from the non-mdev situation (but I
>> also still don't fully get how mdev+PASID will be implemented). Is the
>> following the case you're worried about?
>>
>>   M#: mdev #
>>
>> # Dev Hostmdev drv   VFIO/QEMUGuest
>> 
>> 1 <- reg(handler)
>> 2 PR1 G1 P1-> M1 PR1 G1inject -> M1 PR1 G1
>> 3 <- unreg(handler)
>> 4   <- PS1 G1 P1 (F)  |
>> 5unreg(handler)
>> 6 <- reg(handler)
>> 7 PR2 G1 P1-> M2 PR2 G1inject -> M2 PR2 G1
>> 8 <- M1 PS1 G1
>> 9 accept ??<- PS1 G1 P1
>> 10<- M2 PS2 G1
>> 11accept   <- PS2 G1 P1
>>
> Not really. I am not worried about PASID reuse or unbind. Just within
> the same PASID bind lifetime of a single mdev, back to back
> register/unregister fault handler.
> After Step 4, device will think G1 is done. Device could reuse G1 for
> the next PR, if we accept PS1 in step 9, device will terminate G1 before
> the real G1 PS arrives in Step 11. The real G1 PS might have a
> different response code. Then we just drop the PS in Step 11?

Yes, I think we do. Two possibilities:

* G1 is reused at step 7 for the same PASID context, which means that it
is for the same mdev. The problem is then identical to the non-mdev
case, new page faults and old page response may cross:

# Dev Hostmdev drv   VFIO/QEMUGuest

7 PR2 G1 P1  --.
8   \ .- M1 PS1 G1
9'->  PR2 G1 P1  ->  /   inject  --> M1 PR2 G1
10   accept <---  PS1 G1 P1  <--'
11   reject <---  PS2 G1 P1  <-- M1 PS2 G1

And the incorrect page response is returned to the guest. However it
affects a single mdev/guest context, it doesn't affect other mdevs.

* Or G1 is reused at step 7 for a different PASID. At step 10 the fault
handler rejects the page response because the PASID is different, and
step 11 is accepted.


>>> Having spurious page response is also not
>>> abiding the PCIe spec. exactly.  
>>
>> We are following the PCI spec though, in that we don't send page
>> responses for PRGIs that aren't in flight.
>>
> You are right, the worst case of the spurious PS is to terminate the
> group prematurely. Need to know the scope of the HW damage in case of mdev
> where group IDs can be shared among mdevs belong to the same PF.

But from the IOMMU fault API point of view, the full page request is
identified by both PRGI and PASID. Given that each mdev has its own set
of PASIDs, it should be easy to isolate page responses per mdev.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH 1/8] iommu: Add I/O ASID allocator

2019-06-11 Thread Jacob Pan

On Mon, 10 Jun 2019 19:47:07 +0100
Jean-Philippe Brucker  wrote:

> Some devices might support multiple DMA address spaces, in particular
> those that have the PCI PASID feature. PASID (Process Address Space
> ID) allows to share process address spaces with devices (SVA),
> partition a device into VM-assignable entities (VFIO mdev) or simply
> provide multiple DMA address space to kernel drivers. Add a global
> PASID allocator usable by different drivers at the same time. Name it
> I/O ASID to avoid confusion with ASIDs allocated by arch code, which
> are usually a separate ID space.
> 
> The IOASID space is global. Each device can have its own PASID space,
> but by convention the IOMMU ended up having a global PASID space, so
> that with SVA, each mm_struct is associated to a single PASID.
> 
> The allocator is primarily used by IOMMU subsystem but in rare
> occasions drivers would like to allocate PASIDs for devices that
> aren't managed by an IOMMU, using the same ID space as IOMMU.
> 
> Signed-off-by: Jean-Philippe Brucker 
> Signed-off-by: Jacob Pan 
> ---
> The most recent discussion on this patch was at:
> https://lkml.kernel.org/lkml/1556922737-76313-4-git-send-email-jacob.jun@linux.intel.com/
> I fixed it up a bit following comments in that series, and removed the
> definitions for the custom allocator for now.
> 
> There also is a new version that includes the custom allocator into
> this patch, but is currently missing the RCU fixes, at:
> https://lore.kernel.org/lkml/1560087862-57608-13-git-send-email-jacob.jun@linux.intel.com/
> ---
>  drivers/iommu/Kconfig  |   4 ++
>  drivers/iommu/Makefile |   1 +
>  drivers/iommu/ioasid.c | 150
> + include/linux/ioasid.h |
> 49 ++ 4 files changed, 204 insertions(+)
>  create mode 100644 drivers/iommu/ioasid.c
>  create mode 100644 include/linux/ioasid.h
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db5221d..9b45f70549a7 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -3,6 +3,10 @@
>  config IOMMU_IOVA
>   tristate
>  
> +# The IOASID library may also be used by non-IOMMU_API users
> +config IOASID
> + tristate
> +
>  # IOMMU_API always gets selected by whoever wants it.
>  config IOMMU_API
>   bool
> diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile
> index 8c71a15e986b..0efac6f1ec73 100644
> --- a/drivers/iommu/Makefile
> +++ b/drivers/iommu/Makefile
> @@ -7,6 +7,7 @@ obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
>  obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o
> +obj-$(CONFIG_IOASID) += ioasid.o
>  obj-$(CONFIG_IOMMU_IOVA) += iova.o
>  obj-$(CONFIG_OF_IOMMU)   += of_iommu.o
>  obj-$(CONFIG_MSM_IOMMU) += msm_iommu.o
> diff --git a/drivers/iommu/ioasid.c b/drivers/iommu/ioasid.c
> new file mode 100644
> index ..bbb771214fa9
> --- /dev/null
> +++ b/drivers/iommu/ioasid.c
> @@ -0,0 +1,150 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * I/O Address Space ID allocator. There is one global IOASID space,
> split into
> + * subsets. Users create a subset with DECLARE_IOASID_SET, then
> allocate and
> + * free IOASIDs with ioasid_alloc and ioasid_free.
> + */
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +struct ioasid_data {
> + ioasid_t id;
> + struct ioasid_set *set;
> + void *private;
> + struct rcu_head rcu;
> +};
> +
> +static DEFINE_XARRAY_ALLOC(ioasid_xa);
> +
> +/**
> + * ioasid_set_data - Set private data for an allocated ioasid
> + * @ioasid: the ID to set data
> + * @data:   the private data
> + *
> + * For IOASID that is already allocated, private data can be set
> + * via this API. Future lookup can be done via ioasid_find.
> + */
> +int ioasid_set_data(ioasid_t ioasid, void *data)
> +{
> + struct ioasid_data *ioasid_data;
> + int ret = 0;
> +
> + xa_lock(_xa);
Just wondering if this is necessary, since xa_load is under
rcu_read_lock and we are not changing anything internal to xa. For
custom allocator I still need to have the mutex against allocator
removal.
> + ioasid_data = xa_load(_xa, ioasid);
> + if (ioasid_data)
> + rcu_assign_pointer(ioasid_data->private, data);
it is good to publish and have barrier here. But I just wonder even for
weakly ordered machine, this pointer update is quite far away from its
data update.
> + else
> + ret = -ENOENT;
> + xa_unlock(_xa);
> +
> + /*
> +  * Wait for readers to stop accessing the old private data,
> so the
> +  * caller can free it.
> +  */
> + if (!ret)
> + synchronize_rcu();
> +
I will add that to my next version to check ret value.

Thanks,

Jacob
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(ioasid_set_data);
> +
> +/**
> + * ioasid_alloc - Allocate an IOASID
> + * @set: the IOASID set
> + * @min: the

Re: [PATCH v4 4/9] iommu: Add bounce page APIs

2019-06-11 Thread Pavel Begunkov



On 03/06/2019 04:16, Lu Baolu wrote:
> IOMMU hardware always use paging for DMA remapping.  The
> minimum mapped window is a page size. The device drivers
> may map buffers not filling whole IOMMU window. It allows
> device to access to possibly unrelated memory and various
> malicious devices can exploit this to perform DMA attack.
> 
> This introduces the bouce buffer mechanism for DMA buffers
> which doesn't fill a minimal IOMMU page. It could be used
> by various vendor specific IOMMU drivers as long as the
> DMA domain is managed by the generic IOMMU layer. Below
> APIs are added:
> 
> * iommu_bounce_map(dev, addr, paddr, size, dir, attrs)
>   - Map a buffer start at DMA address @addr in bounce page
> manner. For buffer parts that doesn't cross a whole
> minimal IOMMU page, the bounce page policy is applied.
> A bounce page mapped by swiotlb will be used as the DMA
> target in the IOMMU page table. Otherwise, the physical
> address @paddr is mapped instead.
> 
> * iommu_bounce_unmap(dev, addr, size, dir, attrs)
>   - Unmap the buffer mapped with iommu_bounce_map(). The bounce
> page will be torn down after the bounced data get synced.
> 
> * iommu_bounce_sync(dev, addr, size, dir, target)
>   - Synce the bounced data in case the bounce mapped buffer is
> reused.
> 
> The whole APIs are included within a kernel option IOMMU_BOUNCE_PAGE.
> It's useful for cases where bounce page doesn't needed, for example,
> embedded cases.
> 
> Cc: Ashok Raj 
> Cc: Jacob Pan 
> Cc: Kevin Tian 
> Cc: Alan Cox 
> Cc: Mika Westerberg 
> Signed-off-by: Lu Baolu 
> ---
>  drivers/iommu/Kconfig |  14 +
>  drivers/iommu/iommu.c | 119 ++
>  include/linux/iommu.h |  35 +
>  3 files changed, 168 insertions(+)
> 
> diff --git a/drivers/iommu/Kconfig b/drivers/iommu/Kconfig
> index 83664db5221d..d837ec3f359b 100644
> --- a/drivers/iommu/Kconfig
> +++ b/drivers/iommu/Kconfig
> @@ -86,6 +86,20 @@ config IOMMU_DEFAULT_PASSTHROUGH
>  
> If unsure, say N here.
>  
> +config IOMMU_BOUNCE_PAGE
> + bool "Use bounce page for untrusted devices"
> + depends on IOMMU_API
> + select SWIOTLB
> + help
> +   IOMMU hardware always use paging for DMA remapping. The minimum
> +   mapped window is a page size. The device drivers may map buffers
> +   not filling whole IOMMU window. This allows device to access to
> +   possibly unrelated memory and malicious device can exploit this
> +   to perform a DMA attack. Select this to use a bounce page for the
> +   buffer which doesn't fill a whole IOMU page.
> +
> +   If unsure, say N here.
> +
>  config OF_IOMMU
> def_bool y
> depends on OF && IOMMU_API
> diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
> index 2a906386bb8e..fa44f681a82b 100644
> --- a/drivers/iommu/iommu.c
> +++ b/drivers/iommu/iommu.c
> @@ -2246,3 +2246,122 @@ int iommu_sva_get_pasid(struct iommu_sva *handle)
>   return ops->sva_get_pasid(handle);
>  }
>  EXPORT_SYMBOL_GPL(iommu_sva_get_pasid);
> +
> +#ifdef CONFIG_IOMMU_BOUNCE_PAGE
> +
> +/*
> + * Bounce buffer support for external devices:
> + *
> + * IOMMU hardware always use paging for DMA remapping. The minimum mapped
> + * window is a page size. The device drivers may map buffers not filling
> + * whole IOMMU window. This allows device to access to possibly unrelated
> + * memory and malicious device can exploit this to perform a DMA attack.
> + * Use bounce pages for the buffer which doesn't fill whole IOMMU pages.
> + */
> +
> +static inline size_t
> +get_aligned_size(struct iommu_domain *domain, dma_addr_t addr, size_t size)
> +{
> + unsigned long page_size = 1 << __ffs(domain->pgsize_bitmap);
> + unsigned long offset = page_size - 1;
> +
> + return ALIGN((addr & offset) + size, page_size);
> +}
> +
> +dma_addr_t iommu_bounce_map(struct device *dev, dma_addr_t iova,
> + phys_addr_t paddr, size_t size,
> + enum dma_data_direction dir,
> + unsigned long attrs)
> +{
> + struct iommu_domain *domain;
> + unsigned int min_pagesz;
> + phys_addr_t tlb_addr;
> + size_t aligned_size;
> + int prot = 0;
> + int ret;
> +
> + domain = iommu_get_dma_domain(dev);
> + if (!domain)
> + return DMA_MAPPING_ERROR;
> +
> + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL)
> + prot |= IOMMU_READ;
> + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL)
> + prot |= IOMMU_WRITE;
> +
> + aligned_size = get_aligned_size(domain, paddr, size);
> + min_pagesz = 1 << __ffs(domain->pgsize_bitmap);
> +
> + /*
> +  * If both the physical buffer start address and size are
> +  * page aligned, we don't need to use a bounce page.
> +  */
> + if (!IS_ALIGNED(paddr | size, min_pagesz)) {
> + tlb_addr = swiotlb_tbl_map_single(dev,
> +

Re: [PATCH 8/8] iommu/arm-smmu-v3: Add support for PCI PASID

2019-06-11 Thread Jonathan Cameron

On Mon, 10 Jun 2019 19:47:14 +0100
Jean-Philippe Brucker  wrote:

> Enable PASID for PCI devices that support it. Since the SSID tables are
> allocated by arm_smmu_attach_dev(), PASID has to be enabled early enough.
> arm_smmu_dev_feature_enable() would be too late, since by that time the
> main DMA domain has already been attached. Do it in add_device() instead.
> 
> Signed-off-by: Jean-Philippe Brucker 
Nitpick in line.

Thanks,

Jonathan
> ---
>  drivers/iommu/arm-smmu-v3.c | 51 -
>  1 file changed, 50 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 972bfb80f964..a8a516d9ff10 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -2197,6 +2197,49 @@ static void arm_smmu_disable_ats(struct 
> arm_smmu_master *master)
>   master->ats_enabled = false;
>  }
>  
> +static int arm_smmu_enable_pasid(struct arm_smmu_master *master)
> +{
> + int ret;
> + int features;
> + int num_pasids;
> + struct pci_dev *pdev;
> +
> + if (!dev_is_pci(master->dev))
> + return -ENOSYS;
> +
> + pdev = to_pci_dev(master->dev);
> +
> + features = pci_pasid_features(pdev);
> + if (features < 0)
> + return -ENOSYS;
> +
> + num_pasids = pci_max_pasids(pdev);
> + if (num_pasids <= 0)
> + return -ENOSYS;
> +
> + ret = pci_enable_pasid(pdev, features);
> + if (!ret)
> + master->ssid_bits = min_t(u8, ilog2(num_pasids),
> +   master->smmu->ssid_bits);
> + return ret;
> +}
> +
> +static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
> +{
> + struct pci_dev *pdev;
> +
> + if (!dev_is_pci(master->dev))
> + return;
> +
> + pdev = to_pci_dev(master->dev);
> +
> + if (!pdev->pasid_enabled)
> + return;
> +
> + pci_disable_pasid(pdev);
> + master->ssid_bits = 0;

If we are being really fussy about ordering, why have this set of
ssid_bits after pci_disable_pasid rather than before (to reverse order
of .._enable_pasid)?

> +}
> +
>  static void arm_smmu_detach_dev(struct arm_smmu_master *master)
>  {
>   unsigned long flags;
> @@ -2413,6 +2456,9 @@ static int arm_smmu_add_device(struct device *dev)
>  
>   master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
>  
> + /* Note that PASID must be enabled before, and disabled after ATS */
> + arm_smmu_enable_pasid(master);
> +
>   /*
>* If the SMMU doesn't support 2-stage CD, limit the linear
>* tables to a reasonable number of contexts, let's say
> @@ -2423,7 +2469,7 @@ static int arm_smmu_add_device(struct device *dev)
>  
>   ret = iommu_device_link(>iommu, dev);
>   if (ret)
> - goto err_free_master;
> + goto err_disable_pasid;
>  
>   group = iommu_group_get_for_dev(dev);
>   if (IS_ERR(group)) {
> @@ -2436,6 +2482,8 @@ static int arm_smmu_add_device(struct device *dev)
>  
>  err_unlink:
>   iommu_device_unlink(>iommu, dev);
> +err_disable_pasid:
> + arm_smmu_disable_pasid(master);
>  err_free_master:
>   kfree(master);
>   fwspec->iommu_priv = NULL;
> @@ -2456,6 +2504,7 @@ static void arm_smmu_remove_device(struct device *dev)
>   arm_smmu_detach_dev(master);
>   iommu_group_remove_device(dev);
>   iommu_device_unlink(>iommu, dev);
> + arm_smmu_disable_pasid(master);
>   kfree(master);
>   iommu_fwspec_free(dev);
>  }

Re: [PATCH 5/8] iommu/arm-smmu-v3: Add second level of context descriptor table

2019-06-11 Thread Jonathan Cameron

On Mon, 10 Jun 2019 19:47:11 +0100
Jean-Philippe Brucker  wrote:

> The SMMU can support up to 20 bits of SSID. Add a second level of page
> tables to accommodate this. Devices that support more than 1024 SSIDs now
> have a table of 1024 L1 entries (8kB), pointing to tables of 1024 context
> descriptors (64kB), allocated on demand.
> 
> Signed-off-by: Jean-Philippe Brucker 
One trivial typo.

Thanks,

Jonathan
> ---
>  drivers/iommu/arm-smmu-v3.c | 136 +---
>  1 file changed, 128 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index d90eb604b65d..326b71793336 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -216,6 +216,8 @@
>  
>  #define STRTAB_STE_0_S1FMT   GENMASK_ULL(5, 4)
>  #define STRTAB_STE_0_S1FMT_LINEAR0
> +#define STRTAB_STE_0_S1FMT_4K_L2 1
> +#define STRTAB_STE_0_S1FMT_64K_L22
>  #define STRTAB_STE_0_S1CTXPTR_MASK   GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX GENMASK_ULL(63, 59)
>  
> @@ -255,6 +257,18 @@
>  
>  #define STRTAB_STE_3_S2TTB_MASK  GENMASK_ULL(51, 4)
>  
> +/*
> + * Linear: when less than 1024 SSIDs are supported
> + * 2lvl: at most 1024 L1 entrie,

entries?

> + *  1024 lazy entries per table.
> + */
> +#define CTXDESC_SPLIT10
> +#define CTXDESC_NUM_L2_ENTRIES   (1 << CTXDESC_SPLIT)
> +
> +#define CTXDESC_L1_DESC_DWORD1
> +#define CTXDESC_L1_DESC_VALID1
> +#define CTXDESC_L1_DESC_L2PTR_MASK   GENMASK_ULL(51, 12)
> +
>  /* Context descriptor (stage-1 only) */
>  #define CTXDESC_CD_DWORDS8
>  #define CTXDESC_CD_0_TCR_T0SZGENMASK_ULL(5, 0)
> @@ -530,7 +544,10 @@ struct arm_smmu_ctx_desc {
>  struct arm_smmu_s1_cfg {
>   u8  s1fmt;
>   u8  s1cdmax;
> - struct arm_smmu_cd_tabletable;
> + struct arm_smmu_cd_table*tables;
> + size_t  num_tables;
> + __le64  *l1ptr;
> + dma_addr_t  l1ptr_dma;
>  
>   /* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
>   struct arm_smmu_ctx_desccd;
> @@ -1118,12 +1135,51 @@ static void arm_smmu_free_cd_leaf_table(struct 
> arm_smmu_device *smmu,
>  {
>   size_t size = num_entries * (CTXDESC_CD_DWORDS << 3);
>  
> + if (!table->ptr)
> + return;
>   dmam_free_coherent(smmu->dev, size, table->ptr, table->ptr_dma);
>  }
>  
> -static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_s1_cfg *cfg, u32 ssid)
> +static void arm_smmu_write_cd_l1_desc(__le64 *dst,
> +   struct arm_smmu_cd_table *table)
>  {
> - return cfg->table.ptr + ssid * CTXDESC_CD_DWORDS;
> + u64 val = (table->ptr_dma & CTXDESC_L1_DESC_L2PTR_MASK) |
> +   CTXDESC_L1_DESC_VALID;
> +
> + *dst = cpu_to_le64(val);
> +}
> +
> +static __le64 *arm_smmu_get_cd_ptr(struct arm_smmu_domain *smmu_domain,
> +u32 ssid)
> +{
> + unsigned int idx;
> + struct arm_smmu_cd_table *table;
> + struct arm_smmu_device *smmu = smmu_domain->smmu;
> + struct arm_smmu_s1_cfg *cfg = _domain->s1_cfg;
> +
> + if (cfg->s1fmt == STRTAB_STE_0_S1FMT_LINEAR) {
> + table = >tables[0];
> + idx = ssid;
> + } else {
> + idx = ssid >> CTXDESC_SPLIT;
> + if (idx >= cfg->num_tables)
> + return NULL;
> +
> + table = >tables[idx];
> + if (!table->ptr) {
> + __le64 *l1ptr = cfg->l1ptr + idx * 
> CTXDESC_L1_DESC_DWORD;
> +
> + if (arm_smmu_alloc_cd_leaf_table(smmu, table,
> +  
> CTXDESC_NUM_L2_ENTRIES))
> + return NULL;
> +
> + arm_smmu_write_cd_l1_desc(l1ptr, table);
> + /* An invalid L1 entry is allowed to be cached */
> + arm_smmu_sync_cd(smmu_domain, ssid, false);
> + }
> + idx = ssid & (CTXDESC_NUM_L2_ENTRIES - 1);
> + }
> + return table->ptr + idx * CTXDESC_CD_DWORDS;
>  }
>  
>  static u64 arm_smmu_cpu_tcr_to_cd(u64 tcr)
> @@ -1149,7 +1205,7 @@ static int arm_smmu_write_ctx_desc(struct 
> arm_smmu_domain *smmu_domain,
>   u64 val;
>   bool cd_live;
>   struct arm_smmu_device *smmu = smmu_domain->smmu;
> - __le64 *cdptr = arm_smmu_get_cd_ptr(_domain->s1_cfg, ssid);
> + __le64 *cdptr = arm_smmu_get_cd_ptr(smmu_domain, ssid);
>  
>   /*
>* This function handles the following cases:
> @@ -1213,20 +1269,81 @@ static int arm_smmu_write_ctx_desc(struct 
> arm_smmu_domain *smmu_domain,
>  static int arm_smmu_alloc_cd_tables(struct arm_smmu_domain *smmu_domain,
>   struct

Re: [PATCH 4/8] iommu/arm-smmu-v3: Add support for Substream IDs

2019-06-11 Thread Jonathan Cameron

On Mon, 10 Jun 2019 19:47:10 +0100
Jean-Philippe Brucker  wrote:

> At the moment, the SMMUv3 driver implements only one stage-1 or stage-2
> page directory per device. However SMMUv3 allows more than one address
> space for some devices, by providing multiple stage-1 page directories. In
> addition to the Stream ID (SID), that identifies a device, we can now have
> Substream IDs (SSID) identifying an address space. In PCIe, SID is called
> Requester ID (RID) and SSID is called Process Address-Space ID (PASID).
> 
> Prepare the driver for SSID support, by adding context descriptor tables
> in STEs (previously a single static context descriptor). A complete
> stage-1 walk is now performed like this by the SMMU:
> 
>   Stream tables  Ctx. tables  Page tables
> ++   ,--->+---+   ,--->+---+
> ::   |:   :   |:   :
> ++   |+---+   |+---+
>SID->|  STE   |---'  SSID->|  CD   |---'  IOVA->|  PTE  |--> IPA
> +++---++---+
> :::   ::   :
> +++---++---+
> 
> Implement a single level of context descriptor table for now, but as with
> stream and page tables, an SSID can be split to index multiple levels of
> tables.
> 
> In all stream table entries, we set S1DSS=SSID0 mode, making translations
> without an SSID use context descriptor 0. Although it would be possible by
> setting S1DSS=BYPASS, we don't currently support SSID when user selects
> iommu.passthrough.
> 
> Signed-off-by: Jean-Philippe Brucker 

Hi Jean-Phillipe,

A few trivial comments inline, mostly around wondering if a few bits
of refactoring can get pulled out before this and hopefully stop diff
making such a mess of this patch from a readability point of view!

Thanks,

Jonathan

> ---
>  drivers/iommu/arm-smmu-v3.c | 238 +---
>  1 file changed, 192 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 3254f473e681..d90eb604b65d 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -219,6 +219,11 @@
>  #define STRTAB_STE_0_S1CTXPTR_MASK   GENMASK_ULL(51, 6)
>  #define STRTAB_STE_0_S1CDMAX GENMASK_ULL(63, 59)
>  
> +#define STRTAB_STE_1_S1DSS   GENMASK_ULL(1, 0)
> +#define STRTAB_STE_1_S1DSS_TERMINATE 0x0
> +#define STRTAB_STE_1_S1DSS_BYPASS0x1
> +#define STRTAB_STE_1_S1DSS_SSID0 0x2
> +
>  #define STRTAB_STE_1_S1C_CACHE_NC0UL
>  #define STRTAB_STE_1_S1C_CACHE_WBRA  1UL
>  #define STRTAB_STE_1_S1C_CACHE_WT2UL
> @@ -305,6 +310,7 @@
>  #define CMDQ_PREFETCH_1_SIZE GENMASK_ULL(4, 0)
>  #define CMDQ_PREFETCH_1_ADDR_MASKGENMASK_ULL(63, 12)
>  
> +#define CMDQ_CFGI_0_SSID GENMASK_ULL(31, 12)
>  #define CMDQ_CFGI_0_SID  GENMASK_ULL(63, 32)
>  #define CMDQ_CFGI_1_LEAF (1UL << 0)
>  #define CMDQ_CFGI_1_RANGEGENMASK_ULL(4, 0)
> @@ -421,8 +427,11 @@ struct arm_smmu_cmdq_ent {
>  
>   #define CMDQ_OP_CFGI_STE0x3
>   #define CMDQ_OP_CFGI_ALL0x4
> + #define CMDQ_OP_CFGI_CD 0x5
> + #define CMDQ_OP_CFGI_CD_ALL 0x6
>   struct {
>   u32 sid;
> + u32 ssid;
>   union {
>   boolleaf;
>   u8  span;
> @@ -506,16 +515,25 @@ struct arm_smmu_strtab_l1_desc {
>   dma_addr_t  l2ptr_dma;
>  };
>  
> +struct arm_smmu_cd_table {
> + __le64  *ptr;
> + dma_addr_t  ptr_dma;
> +};
> +
> +struct arm_smmu_ctx_desc {
> + u16 asid;
> + u64 ttbr;
> + u64 tcr;
> + u64 mair;
> +};
> +
>  struct arm_smmu_s1_cfg {
> - __le64  *cdptr;
> - dma_addr_t  cdptr_dma;
> -
> - struct arm_smmu_ctx_desc {
> - u16 asid;
> - u64 ttbr;
> - u64 tcr;
> - u64 mair;
> - }   cd;
> + u8  s1fmt;
> + u8  s1cdmax;
> + struct arm_smmu_cd_tabletable;

This new structure is a sensible addition and makes the code more readable,
but it's not directly tied to the main flow of this patch, perhaps pull it out?

> +
> + /* Context descriptor 0, when substreams are disabled or s1dss = 0b10 */
> + struct arm_smmu_ctx_desccd;
I've not checked in detail but feels like you could pull this refactor out as 
well
as a trivial precursor and

Re: [RFC v2 4/4] media: platform: mtk-mdp3: Add Mediatek MDP3 driver

2019-06-11 Thread Tomasz Figa

Hi Daoyuan,

On Tue, Jun 11, 2019 at 6:20 PM Daoyuan Huang
 wrote:
>
> hi Tomasz:
>
> Thanks for your review comments, the corresponding modification
> & explanation is under preparation, will update soon.
>
> Thanks.

Thanks.

Note that Alexandre may already be reviewing the rest of this patch,
so I'd consult with him if sending a next revision or waiting for his
review is preferred.

Best regards,
Tomasz

Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-11 Thread Tomasz Figa

On Tue, Jun 11, 2019 at 7:07 PM Frederic Chen
 wrote:
>
> Hi Tomasz,
>
>
> On Tue, 2019-06-11 at 17:59 +0900, Tomasz Figa wrote:
> > On Tue, Jun 11, 2019 at 5:48 PM Frederic Chen
> >  wrote:
> > >
> > > Dear Tomasz,
> > >
> > > I'd like to elaborate more about the tuning_data.va.
> > > Would you like to give us some advice about our improvement proposal 
> > > inline?
> > >
> > > Thank you very much.
> > >
> > >
> > > On Wed, 2019-05-22 at 03:14 +0800, Frederic Chen wrote:
> > > > Dear Tomasz,
> > > >
> > > > I appreciate your comment. It is very helpful for us.
> > > >
> > > >
> > > > > > diff --git 
> > > > > > a/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c 
> > > > > > b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > > new file mode 100644
> > > > > > index ..54d2b5f5b802
> > > > > > --- /dev/null
> > > > > > +++ b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > > @@ -0,0 +1,1384 @@
> > >
> > > [snip]
> > >
> > > > > > +static void dip_submit_worker(struct work_struct *work)
> > > > > > +{
> > > > > > +   struct mtk_dip_hw_submit_work *dip_submit_work =
> > > > > > +   container_of(work, struct mtk_dip_hw_submit_work, 
> > > > > > frame_work);
> > > > > > +   struct mtk_dip_hw *dip_hw = dip_submit_work->dip_hw;
> > > > > > +   struct mtk_dip_dev *dip_dev = mtk_dip_hw_to_dev(dip_hw);
> > > > > > +   struct mtk_dip_hw_work *dip_work;
> > > > > > +   struct mtk_dip_hw_subframe *buf;
> > > > > > +   u32 len, num;
> > > > > > +   int ret;
> > > > > > +
> > > > > > +   num  = atomic_read(_hw->num_composing);
> > > > > > +
> > > > > > +   mutex_lock(_hw->dip_worklist.queuelock);
> > > > > > +   dip_work = list_first_entry(_hw->dip_worklist.queue,
> > >
> > > [snip]
> > >
> > > > > > +
> > > > > > +   if (dip_work->frameparams.tuning_data.pa == 0) {
> > > > > > +   dev_dbg(_dev->pdev->dev,
> > > > > > +   "%s: frame_no(%d) has no tuning_data\n",
> > > > > > +   __func__, dip_work->frameparams.frame_no);
> > > > > > +
> > > > > > +   memcpy(_work->frameparams.tuning_data,
> > > > > > +  >tuning_buf, sizeof(buf->tuning_buf));
> > > > >
> > > > > Ditto.
> > > > >
> > > >
> > > > I got it.
> > > >
> > > > > > +   memset((char *)buf->tuning_buf.va, 0, 
> > > > > > DIP_TUNING_SZ);
> > > > >
> > > > > Ditto.
> > > >
> > > > I got it.
> > > >
> > > > >
> > > > > > +   /*
> > > > > > +* When user enqueued without tuning buffer,
> > > > > > +* it would use driver internal buffer.
> > > > > > +* So, tuning_data.va should be 0
> > > > > > +*/
> > > > > > +   dip_work->frameparams.tuning_data.va = 0;
> > > > >
> > > > > I don't understand this. We just zeroed the buffer via this kernel VA 
> > > > > few
> > > > > lines above, so why would it have to be set to 0?
> > > > >
> > > >
> > > > I will remove this unnecessary line.
> > > >
> > > > > > +   }
> > >
> > > After confirming the firmware part, I found that we use this field
> > > (tuning_data.va) to notify firmware if there is no tuning data from
> > > user.
> > >
> > > - frameparams.tuning_data.va is 0: use the default tuning data in
> > >SCP, but we still need to pass
> > >frameparams.tuning_data.pa because
> > >the buffer contains some working
> > >buffer required.
> > > - frameparams.tuning_data.va is not 0: the tuning data was passed from
> > >the user
> > >
> > > Since we should not pass cpu addres to SCP, could I rename tuning_data.va
> > > as tuning_data.cookie, and write a constant value to indicate if SCP
> > > should use its internal default setting or not here?
> > >
> > > For example,
> > > /* SCP uses tuning data passed from userspace*/
> > > dip_work->frameparams.tuning_data.cookie = MTK_DIP_USER_TUNING_DATA;
> > >
> > > /* SCP uses internal tuning data */
> > > dip_work->frameparams.tuning_data.cookie = MTK_DIP_DEFAULT_TUNING_DATA;
> >
> > Perhaps we could just call it "present" and set to true or false?
> >
>
> Yes. I would like to use "present" here.
>
> Original:
>   struct img_addr {
>   u64 va; /* Used by Linux OS access */
>   u32 pa; /* Used by CM4 access */
>   u32 iova; /* Used by IOMMU HW access */
>   } __attribute__ ((__packed__));
>
>   struct img_ipi_frameparam {
>   u32 index;
>   u32 frame_no;
>   u64 timestamp;
>   u8 type;  /* enum mdp_stream_type */
>   u8 state;
>   u8 num_inputs;
>   u8 num_outputs;
>   u64 drv_data;
>   struct img_input inputs[IMG_MAX_HW_INPUTS];
>   struct img_output outputs[IMG_MAX_HW_OUTPUTS];
>   struct img_addr tuning_data;
>   struct img_addr subfrm_data;
>   struct

Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-11 Thread Frederic Chen

Hi Tomasz,


On Tue, 2019-06-11 at 17:59 +0900, Tomasz Figa wrote:
> On Tue, Jun 11, 2019 at 5:48 PM Frederic Chen
>  wrote:
> >
> > Dear Tomasz,
> >
> > I'd like to elaborate more about the tuning_data.va.
> > Would you like to give us some advice about our improvement proposal inline?
> >
> > Thank you very much.
> >
> >
> > On Wed, 2019-05-22 at 03:14 +0800, Frederic Chen wrote:
> > > Dear Tomasz,
> > >
> > > I appreciate your comment. It is very helpful for us.
> > >
> > >
> > > > > diff --git a/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c 
> > > > > b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > new file mode 100644
> > > > > index ..54d2b5f5b802
> > > > > --- /dev/null
> > > > > +++ b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > > @@ -0,0 +1,1384 @@
> >
> > [snip]
> >
> > > > > +static void dip_submit_worker(struct work_struct *work)
> > > > > +{
> > > > > +   struct mtk_dip_hw_submit_work *dip_submit_work =
> > > > > +   container_of(work, struct mtk_dip_hw_submit_work, 
> > > > > frame_work);
> > > > > +   struct mtk_dip_hw *dip_hw = dip_submit_work->dip_hw;
> > > > > +   struct mtk_dip_dev *dip_dev = mtk_dip_hw_to_dev(dip_hw);
> > > > > +   struct mtk_dip_hw_work *dip_work;
> > > > > +   struct mtk_dip_hw_subframe *buf;
> > > > > +   u32 len, num;
> > > > > +   int ret;
> > > > > +
> > > > > +   num  = atomic_read(_hw->num_composing);
> > > > > +
> > > > > +   mutex_lock(_hw->dip_worklist.queuelock);
> > > > > +   dip_work = list_first_entry(_hw->dip_worklist.queue,
> >
> > [snip]
> >
> > > > > +
> > > > > +   if (dip_work->frameparams.tuning_data.pa == 0) {
> > > > > +   dev_dbg(_dev->pdev->dev,
> > > > > +   "%s: frame_no(%d) has no tuning_data\n",
> > > > > +   __func__, dip_work->frameparams.frame_no);
> > > > > +
> > > > > +   memcpy(_work->frameparams.tuning_data,
> > > > > +  >tuning_buf, sizeof(buf->tuning_buf));
> > > >
> > > > Ditto.
> > > >
> > >
> > > I got it.
> > >
> > > > > +   memset((char *)buf->tuning_buf.va, 0, DIP_TUNING_SZ);
> > > >
> > > > Ditto.
> > >
> > > I got it.
> > >
> > > >
> > > > > +   /*
> > > > > +* When user enqueued without tuning buffer,
> > > > > +* it would use driver internal buffer.
> > > > > +* So, tuning_data.va should be 0
> > > > > +*/
> > > > > +   dip_work->frameparams.tuning_data.va = 0;
> > > >
> > > > I don't understand this. We just zeroed the buffer via this kernel VA 
> > > > few
> > > > lines above, so why would it have to be set to 0?
> > > >
> > >
> > > I will remove this unnecessary line.
> > >
> > > > > +   }
> >
> > After confirming the firmware part, I found that we use this field
> > (tuning_data.va) to notify firmware if there is no tuning data from
> > user.
> >
> > - frameparams.tuning_data.va is 0: use the default tuning data in
> >SCP, but we still need to pass
> >frameparams.tuning_data.pa because
> >the buffer contains some working
> >buffer required.
> > - frameparams.tuning_data.va is not 0: the tuning data was passed from
> >the user
> >
> > Since we should not pass cpu addres to SCP, could I rename tuning_data.va
> > as tuning_data.cookie, and write a constant value to indicate if SCP
> > should use its internal default setting or not here?
> >
> > For example,
> > /* SCP uses tuning data passed from userspace*/
> > dip_work->frameparams.tuning_data.cookie = MTK_DIP_USER_TUNING_DATA;
> >
> > /* SCP uses internal tuning data */
> > dip_work->frameparams.tuning_data.cookie = MTK_DIP_DEFAULT_TUNING_DATA;
> 
> Perhaps we could just call it "present" and set to true or false?
> 

Yes. I would like to use "present" here.

Original:
  struct img_addr {
  u64 va; /* Used by Linux OS access */
  u32 pa; /* Used by CM4 access */
  u32 iova; /* Used by IOMMU HW access */
  } __attribute__ ((__packed__));

  struct img_ipi_frameparam {
  u32 index;
  u32 frame_no;
  u64 timestamp;
  u8 type;  /* enum mdp_stream_type */
  u8 state;
  u8 num_inputs;
  u8 num_outputs;
  u64 drv_data;
  struct img_input inputs[IMG_MAX_HW_INPUTS];
  struct img_output outputs[IMG_MAX_HW_OUTPUTS];
  struct img_addr tuning_data;
  struct img_addr subfrm_data;
  struct img_sw_addr config_data;
  struct img_sw_addr self_data;
  } __attribute__ ((__packed__));


Modified:
  struct tuning_buf {
  u8 present;
  u32 pa;   /* Used by CM4 access */
  u32 iova; /* Used by IOMMU HW access */
  } __attribute__ ((__packed__));

  struct img_ipi_frameparam {
  /* ... */
  struct tuning_buf

Re: [PATCH 3/8] iommu/arm-smmu-v3: Support platform SSID

2019-06-11 Thread Jonathan Cameron

On Mon, 10 Jun 2019 19:47:09 +0100
Jean-Philippe Brucker  wrote:

> For platform devices that support SubstreamID (SSID), firmware provides
> the number of supported SSID bits. Restrict it to what the SMMU supports
> and cache it into master->ssid_bits.
> 
> Signed-off-by: Jean-Philippe Brucker 

Missing kernel-doc.

Thanks,

Jonathan

> ---
>  drivers/iommu/arm-smmu-v3.c | 11 +++
>  drivers/iommu/of_iommu.c|  6 +-
>  include/linux/iommu.h   |  1 +
>  3 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c
> index 4d5a694f02c2..3254f473e681 100644
> --- a/drivers/iommu/arm-smmu-v3.c
> +++ b/drivers/iommu/arm-smmu-v3.c
> @@ -604,6 +604,7 @@ struct arm_smmu_master {
>   struct list_headdomain_head;
>   u32 *sids;
>   unsigned intnum_sids;
> + unsigned intssid_bits;
>   boolats_enabled :1;
>  };
>  
> @@ -2097,6 +2098,16 @@ static int arm_smmu_add_device(struct device *dev)
>   }
>   }
>  
> + master->ssid_bits = min(smmu->ssid_bits, fwspec->num_pasid_bits);
> +
> + /*
> +  * If the SMMU doesn't support 2-stage CD, limit the linear
> +  * tables to a reasonable number of contexts, let's say
> +  * 64kB / sizeof(ctx_desc) = 1024 = 2^10
> +  */
> + if (!(smmu->features & ARM_SMMU_FEAT_2_LVL_CDTAB))
> + master->ssid_bits = min(master->ssid_bits, 10U);
> +
>   group = iommu_group_get_for_dev(dev);
>   if (!IS_ERR(group)) {
>   iommu_group_put(group);
> diff --git a/drivers/iommu/of_iommu.c b/drivers/iommu/of_iommu.c
> index f04a6df65eb8..04f4f6b95d82 100644
> --- a/drivers/iommu/of_iommu.c
> +++ b/drivers/iommu/of_iommu.c
> @@ -206,8 +206,12 @@ const struct iommu_ops *of_iommu_configure(struct device 
> *dev,
>   if (err)
>   break;
>   }
> - }
>  
> + fwspec = dev_iommu_fwspec_get(dev);
> + if (!err && fwspec)
> + of_property_read_u32(master_np, "pasid-num-bits",
> +  >num_pasid_bits);
> + }
>  
>   /*
>* Two success conditions can be represented by non-negative err here:
> diff --git a/include/linux/iommu.h b/include/linux/iommu.h
> index 519e40fb23ce..b91df613385f 100644
> --- a/include/linux/iommu.h
> +++ b/include/linux/iommu.h
> @@ -536,6 +536,7 @@ struct iommu_fwspec {
>   struct fwnode_handle*iommu_fwnode;
>   void*iommu_priv;
>   u32 flags;
> + u32 num_pasid_bits;

This structure has kernel doc so you need to add something for this.

>   unsigned intnum_ids;
>   u32 ids[1];
>  };


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC PATCH V1 6/6] platform: mtk-isp: Add Mediatek DIP driver

2019-06-11 Thread Tomasz Figa

On Tue, Jun 11, 2019 at 5:48 PM Frederic Chen
 wrote:
>
> Dear Tomasz,
>
> I'd like to elaborate more about the tuning_data.va.
> Would you like to give us some advice about our improvement proposal inline?
>
> Thank you very much.
>
>
> On Wed, 2019-05-22 at 03:14 +0800, Frederic Chen wrote:
> > Dear Tomasz,
> >
> > I appreciate your comment. It is very helpful for us.
> >
> >
> > > > diff --git a/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c 
> > > > b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > new file mode 100644
> > > > index ..54d2b5f5b802
> > > > --- /dev/null
> > > > +++ b/drivers/media/platform/mtk-isp/isp_50/dip/mtk_dip-sys.c
> > > > @@ -0,0 +1,1384 @@
>
> [snip]
>
> > > > +static void dip_submit_worker(struct work_struct *work)
> > > > +{
> > > > +   struct mtk_dip_hw_submit_work *dip_submit_work =
> > > > +   container_of(work, struct mtk_dip_hw_submit_work, 
> > > > frame_work);
> > > > +   struct mtk_dip_hw *dip_hw = dip_submit_work->dip_hw;
> > > > +   struct mtk_dip_dev *dip_dev = mtk_dip_hw_to_dev(dip_hw);
> > > > +   struct mtk_dip_hw_work *dip_work;
> > > > +   struct mtk_dip_hw_subframe *buf;
> > > > +   u32 len, num;
> > > > +   int ret;
> > > > +
> > > > +   num  = atomic_read(_hw->num_composing);
> > > > +
> > > > +   mutex_lock(_hw->dip_worklist.queuelock);
> > > > +   dip_work = list_first_entry(_hw->dip_worklist.queue,
>
> [snip]
>
> > > > +
> > > > +   if (dip_work->frameparams.tuning_data.pa == 0) {
> > > > +   dev_dbg(_dev->pdev->dev,
> > > > +   "%s: frame_no(%d) has no tuning_data\n",
> > > > +   __func__, dip_work->frameparams.frame_no);
> > > > +
> > > > +   memcpy(_work->frameparams.tuning_data,
> > > > +  >tuning_buf, sizeof(buf->tuning_buf));
> > >
> > > Ditto.
> > >
> >
> > I got it.
> >
> > > > +   memset((char *)buf->tuning_buf.va, 0, DIP_TUNING_SZ);
> > >
> > > Ditto.
> >
> > I got it.
> >
> > >
> > > > +   /*
> > > > +* When user enqueued without tuning buffer,
> > > > +* it would use driver internal buffer.
> > > > +* So, tuning_data.va should be 0
> > > > +*/
> > > > +   dip_work->frameparams.tuning_data.va = 0;
> > >
> > > I don't understand this. We just zeroed the buffer via this kernel VA few
> > > lines above, so why would it have to be set to 0?
> > >
> >
> > I will remove this unnecessary line.
> >
> > > > +   }
>
> After confirming the firmware part, I found that we use this field
> (tuning_data.va) to notify firmware if there is no tuning data from
> user.
>
> - frameparams.tuning_data.va is 0: use the default tuning data in
>SCP, but we still need to pass
>frameparams.tuning_data.pa because
>the buffer contains some working
>buffer required.
> - frameparams.tuning_data.va is not 0: the tuning data was passed from
>the user
>
> Since we should not pass cpu addres to SCP, could I rename tuning_data.va
> as tuning_data.cookie, and write a constant value to indicate if SCP
> should use its internal default setting or not here?
>
> For example,
> /* SCP uses tuning data passed from userspace*/
> dip_work->frameparams.tuning_data.cookie = MTK_DIP_USER_TUNING_DATA;
>
> /* SCP uses internal tuning data */
> dip_work->frameparams.tuning_data.cookie = MTK_DIP_DEFAULT_TUNING_DATA;

Perhaps we could just call it "present" and set to true or false?

Best regards,
Tomasz

RE: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Yoshihiro Shimoda

Hi Christoph, Alan,

> From: Alan Stern, Sent: Tuesday, June 11, 2019 3:46 AM
> 
> On Mon, 10 Jun 2019, Christoph Hellwig wrote:
> 
> > Hi Yoshihiro,
> >
> > sorry for not taking care of this earlier, today is a public holiday
> > here and thus I'm not working much over the long weekend.

To Christoph:

No worries.

> > On Mon, Jun 10, 2019 at 11:13:07AM +, Yoshihiro Shimoda wrote:
> > > I have another way to avoid the issue. But it doesn't seem that a good 
> > > way though...
> > > According to the commit that adding blk_queue_virt_boundary() [3],
> > > this is needed for vhci_hcd as a workaround so that if we avoid to call it
> > > on xhci-hcd driver, the issue disappeared. What do you think?
> > > JFYI, I pasted a tentative patch in the end of email [4].
> >
> > Oh, I hadn't even look at why USB uses blk_queue_virt_boundary, and it
> > seems like the usage is wrong, as it doesn't follow the same rules as
> > all the others.  I think your patch goes in the right direction,
> > but instead of comparing a hcd name it needs to be keyed of a flag
> > set by the driver (I suspect there is one indicating native SG support,
> > but I can't quickly find it), and we need an alternative solution
> > for drivers that don't see like vhci.  I suspect just limiting the
> > entire transfer size to something that works for a single packet
> > for them would be fine.
> 
> Christoph:
> 
> In most of the different kinds of USB host controllers, the hardware is
> not capable of assembling a packet out of multiple buffers at arbitrary
> addresses.  As a matter of fact, xHCI is the only kind that _can_ do
> this.
> 
> In some cases, the hardware can assemble packets provided each buffer
> other than the last ends at a page boundary and each buffer other than
> the first starts at a page boundary (Intel would say the buffers are
> "virtually contiguous"), but this is a rather complex rule and we don't
> want to rely on it.  Plus, in other cases the hardware _can't_ do this.
> 
> Instead, we want the SG buffers to be set up so that each one (except
> the last) is an exact multiple of the maximum packet size.  That way,
> each packet can be assembled from the contents of a single buffer and
> there's no problem.

There is out of this topic though, if we prepare such an exact multiple
of the maximum packet size (1024, 512 or 64), is it possible to cause
trouble on IOMMU environment? IIUC, dma_map_sg() maps SG buffers as
a single segment and then the segment buffer is not contiguous.

> The maximum packet size depends on the type of USB connection.
> Typical values are 1024, 512, or 64.  It's always a power of two and
> it's smaller than 4096.  Therefore we simplify the problem even further
> by requiring that each SG buffer in a scatterlist (except the last one)
> be a multiple of the page size.  (It doesn't need to be aligned on a
> page boundary, as far as I remember.)
> 
> That's why the blk_queue_virt_boundary usage was added to the USB code.
> Perhaps it's not the right way of doing this; I'm not an expert on the
> inner workings of the block layer.  If you can suggest a better way to
> express our requirement, that would be great.

Since I'm also not familiar with the block layer, I could not find a better
way...

Best regards,
Yoshihiro Shimoda

> Alan Stern
> 
> PS: There _is_ a flag saying whether an HCD supports SG.  But what it
> means is that the driver can handle an SG list that meets the
> requirement above; it doesn't mean that the driver can reassemble the
> data from an SG list into a series of bounce buffers in order to meet
> the requirement.  We very much want not to do that, especially since
> the block layer should already be capable of doing it for us.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: How to resolve an issue in swiotlb environment?

2019-06-11 Thread Christoph Hellwig

Hi Alan,

thanks for the explanation.  It seems like what usb wants is to:

 - set sg_tablesize to 1 for devices that can't handle scatterlist at all
 - set the virt boundary as-is for devices supporting "basic" scatterlist,
   although that still assumes they can rejiggle them because for example
   you could still get a smaller than expected first segment ala (assuming
   a 1024 byte packet size and thus 1023 virt_boundary_mask):

| 0 .. 511 | 512 .. 1023 | 1024 .. 1535 |

   as the virt_bondary does not guarantee that the first segment is
   the same size as all the mid segments.
 - do not set any limit on xhci

But that just goes back to the original problem, and that is that with
swiotlb we are limited in the total dma mapping size, and recent block
layer changes in the way we handle the virt_boundary mean we now build
much larger requests by default.  For SCSI ULDs to take that into
account I need to call dma_max_mapping_size() and use that as the
upper bound for the request size.  My plan is to do that in scsi_lib.c,
but for that we need to expose the actual struct device that the dma
mapping is perfomed on to the scsi layer.  If that device is different
from the sysfs hierchary struct device, which it is for usb the ULDD
needs to scsi_add_host_with_dma and pass the dma device as well.  How
do I get at the dma device (aka the HCDs pci_dev or similar) from
usb-storage/uas?

62 matches

Mail list logo