Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky
Good to know. Sorry because of the email.

Sent from my iPhone

> On 6. Dec 2018, at 20:36, Christoph Hellwig  wrote:
> 
>> On Thu, Dec 06, 2018 at 06:10:54PM +0100, Christian Zigotzky wrote:
>> Please don’t merge this code. We are still testing and trying to figure out 
>> where the problems are in the code.
> 
> The ones I sent pings for were either tested successfully by you
> (the zone change) or are trivial cleanups that don't affect your setup.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v5 1/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Vlastimil Babka
On 12/7/18 7:16 AM, Nicolas Boichat wrote:
> IOMMUs using ARMv7 short-descriptor format require page tables
> to be allocated within the first 4GB of RAM, even on 64-bit systems.
> On arm64, this is done by passing GFP_DMA32 flag to memory allocation
> functions.
> 
> For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
> a full page using get_free_pages, so we considered 3 approaches:
>  1. This patch, adding support for GFP_DMA32 slab caches.
>  2. genalloc, which requires pre-allocating the maximum number of L2
> page tables (4096, so 4MB of memory).
>  3. page_frag, which is not very memory-efficient as it is unable
> to reuse freed fragments until the whole page is freed.
> 
> This change makes it possible to create a custom cache in DMA32 zone
> using kmem_cache_create, then allocate memory using kmem_cache_alloc.
> 
> We do not create a DMA32 kmalloc cache array, as there are currently
> no users of kmalloc(..., GFP_DMA32). These calls will continue to
> trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.
> 
> This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
> kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
> unnecessary).
> 
> Signed-off-by: Nicolas Boichat 

Acked-by: Vlastimil Babka 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-06 Thread Joe Jin
On 12/6/18 9:49 PM, Dongli Zhang wrote:
> 
> 
> On 12/07/2018 12:12 AM, Joe Jin wrote:
>> Hi Dongli,
>>
>> Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():
> 
> I assume the call of swiotlb_tbl_map_single() might be frequent in some
> situations, e.g., when 'swiotlb=force'.
> 
> That's why I declare the d_swiotlb_usage out of any functions and use "if
> (unlikely(!d_swiotlb_usage))".

This is reasonable.

Thanks,
Joe

> 
> I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than
> calling swiotlb_create_debugfs() every time to confirm if debugfs is created. 
> I
> would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if 
> the
> performance overhead is acceptable (it is trivial indeed).
> 
> 
> That is the reason I tag the patch with RFC because I am not sure if the
> on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb
> pages are never allocated, we would not be able to see the debugfs entry.
> 
> I would prefer to limit the modification within swiotlb and to not taint any
> other files.
> 
> The drawback is there is no place to create or delete the debugfs entry 
> because
> swiotlb buffer could be initialized and uninitialized at very early stage.
> 
>>
>> void swiotlb_create_debugfs(void)
>> {
>> #ifdef CONFIG_DEBUG_FS
>>  static struct dentry *d_swiotlb_usage = NULL;
>>
>>  if (d_swiotlb_usage)
>>  return;
>>
>>  d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
>>
>>  if (!d_swiotlb_usage)
>>  return;
>>
>>  debugfs_create_file("usage", 0600, d_swiotlb_usage,
>>  NULL, &swiotlb_usage_fops);
>> #endif
>> }
>>
>> And for io_tlb_used, possible add a check at the begin of 
>> swiotlb_tbl_map_single(),
>> if there were not any free slots or not enough slots, return fail directly?
> 
> This would optimize the slots allocation path. I will follow this in next
> version after I got more suggestions and confirmations from maintainers.
> 
> 
> Thank you very much!
> 
> Dongli Zhang
> 
>>
>> Thanks,
>> Joe
>> On 12/5/18 7:59 PM, Dongli Zhang wrote:
>>> The device driver will not be able to do dma operations once swiotlb buffer
>>> is full, either because the driver is using so many IO TLB blocks inflight,
>>> or because there is memory leak issue in device driver. To export the
>>> swiotlb buffer usage via debugfs would help the user estimate the size of
>>> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
>>>
>>> As the swiotlb can be initialized at very early stage when debugfs cannot
>>> register successfully, this patch creates the debugfs entry on demand.
>>>
>>> Signed-off-by: Dongli Zhang 
>>> ---
>>>  kernel/dma/swiotlb.c | 57 
>>> 
>>>  1 file changed, 57 insertions(+)
>>>
>>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>>> index 045930e..d3c8aa4 100644
>>> --- a/kernel/dma/swiotlb.c
>>> +++ b/kernel/dma/swiotlb.c
>>> @@ -35,6 +35,9 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#ifdef CONFIG_DEBUG_FS
>>> +#include 
>>> +#endif
>>>  
>>>  #include 
>>>  #include 
>>> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>>>   */
>>>  static unsigned long io_tlb_nslabs;
>>>  
>>> +#ifdef CONFIG_DEBUG_FS
>>> +/*
>>> + * The number of used IO TLB block
>>> + */
>>> +static unsigned long io_tlb_used;
>>> +#endif
>>> +
>>>  /*
>>>   * This is a free list describing the number of free entries available from
>>>   * each index
>>> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
>>>  
>>>  static int late_alloc;
>>>  
>>> +#ifdef CONFIG_DEBUG_FS
>>> +
>>> +static struct dentry *d_swiotlb_usage;
>>> +
>>> +static int swiotlb_usage_show(struct seq_file *m, void *v)
>>> +{
>>> +   seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
>>> +   return 0;
>>> +}
>>> +
>>> +static int swiotlb_usage_open(struct inode *inode, struct file *filp)
>>> +{
>>> +   return single_open(filp, swiotlb_usage_show, NULL);
>>> +}
>>> +
>>> +static const struct file_operations swiotlb_usage_fops = {
>>> +   .open   = swiotlb_usage_open,
>>> +   .read   = seq_read,
>>> +   .llseek = seq_lseek,
>>> +   .release= single_release,
>>> +};
>>> +
>>> +void swiotlb_create_debugfs(void)
>>> +{
>>> +   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
>>> +
>>> +   if (!d_swiotlb_usage)
>>> +   return;
>>> +
>>> +   debugfs_create_file("usage", 0600, d_swiotlb_usage,
>>> +   NULL, &swiotlb_usage_fops);
>>> +}
>>> +
>>> +#endif
>>> +
>>>  static int __init
>>>  setup_io_tlb_npages(char *str)
>>>  {
>>> @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device 
>>> *hwdev,
>>> pr_warn_once("%s is active and system is using DMA bounce 
>>> buffers\n",
>>>  sme_active() ? "SME" : "SEV");
>>>  
>>> +#ifdef CONFIG_DEBUG_FS
>>> +   if (unlikely(!d_swiotlb_usage))
>>> +

[PATCH v5 0/3] iommu/io-pgtable-arm-v7s: Use DMA32 zone for page tables

2018-12-06 Thread Nicolas Boichat
This is a follow-up to the discussion in [1], [2].

IOMMUs using ARMv7 short-descriptor format require page tables
(level 1 and 2) to be allocated within the first 4GB of RAM, even
on 64-bit systems.

For L1 tables that are bigger than a page, we can just use __get_free_pages
with GFP_DMA32 (on arm64 systems only, arm would still use GFP_DMA).

For L2 tables that only take 1KB, it would be a waste to allocate a full
page, so we considered 3 approaches:
 1. This series, adding support for GFP_DMA32 slab caches.
 2. genalloc, which requires pre-allocating the maximum number of L2 page
tables (4096, so 4MB of memory).
 3. page_frag, which is not very memory-efficient as it is unable to reuse
freed fragments until the whole page is freed. [3]

This series is the most memory-efficient approach.

[1] https://lists.linuxfoundation.org/pipermail/iommu/2018-November/030876.html
[2] https://lists.linuxfoundation.org/pipermail/iommu/2018-December/031696.html
[3] https://patchwork.codeaurora.org/patch/671639/

Changes since v1:
 - Add support for SLAB_CACHE_DMA32 in slab and slub (patches 1/2)
 - iommu/io-pgtable-arm-v7s (patch 3):
   - Changed approach to use SLAB_CACHE_DMA32 added by the previous
 commit.
   - Use DMA or DMA32 depending on the architecture (DMA for arm,
 DMA32 for arm64).

Changes since v2:
 - Reworded and expanded commit messages
 - Added cache_dma32 documentation in PATCH 2/3.

v3 used the page_frag approach, see [3].

Changes since v4:
 - Dropped change that removed GFP_DMA32 from GFP_SLAB_BUG_MASK:
   instead we can just call kmem_cache_*alloc without GFP_DMA32
   parameter. This also means that we can drop PATCH v4 1/3, as we
   do not make any changes in GFP flag verification.
 - Dropped hunks that added cache_dma32 sysfs file, and moved
   the hunks to PATCH v5 3/3, so that maintainer can decide whether
   to pick the change independently.

Nicolas Boichat (3):
  mm: Add support for kmem caches in DMA32 zone
  iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging
  mm: Add /sys/kernel/slab/cache/cache_dma32

 Documentation/ABI/testing/sysfs-kernel-slab |  9 +
 drivers/iommu/io-pgtable-arm-v7s.c  | 19 +++
 include/linux/slab.h|  2 ++
 mm/slab.c   |  2 ++
 mm/slab.h   |  3 ++-
 mm/slab_common.c|  2 +-
 mm/slub.c   | 16 
 tools/vm/slabinfo.c |  7 ++-
 8 files changed, 53 insertions(+), 7 deletions(-)

-- 
2.20.0.rc2.403.gdbc3b29805-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 2/3] iommu/io-pgtable-arm-v7s: Request DMA32 memory, and improve debugging

2018-12-06 Thread Nicolas Boichat
IOMMUs using ARMv7 short-descriptor format require page tables
(level 1 and 2) to be allocated within the first 4GB of RAM, even
on 64-bit systems.

For level 1/2 pages, ensure GFP_DMA32 is used if CONFIG_ZONE_DMA32
is defined (e.g. on arm64 platforms).

For level 2 pages, allocate a slab cache in SLAB_CACHE_DMA32. Note
that we do not explicitly pass GFP_DMA[32] to kmem_cache_zalloc,
as this is not strictly necessary, and would cause a warning
in mm/sl*b.c, as we did not update GFP_SLAB_BUG_MASK.

Also, print an error when the physical address does not fit in
32-bit, to make debugging easier in the future.

Fixes: ad67f5a6545f ("arm64: replace ZONE_DMA with ZONE_DMA32")
Signed-off-by: Nicolas Boichat 
---

Changes since v2:
 - Commit message

(v3 used the page_frag approach)

Changes since v4:
 - Do not pass ARM_V7S_TABLE_GFP_DMA to kmem_cache_zalloc, as this
   is unnecessary, and would trigger a warning.

 drivers/iommu/io-pgtable-arm-v7s.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/iommu/io-pgtable-arm-v7s.c 
b/drivers/iommu/io-pgtable-arm-v7s.c
index 445c3bde04800c..f311fe1dfa47b7 100644
--- a/drivers/iommu/io-pgtable-arm-v7s.c
+++ b/drivers/iommu/io-pgtable-arm-v7s.c
@@ -161,6 +161,14 @@
 
 #define ARM_V7S_TCR_PD1BIT(5)
 
+#ifdef CONFIG_ZONE_DMA32
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA32
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA32
+#else
+#define ARM_V7S_TABLE_GFP_DMA GFP_DMA
+#define ARM_V7S_TABLE_SLAB_CACHE SLAB_CACHE_DMA
+#endif
+
 typedef u32 arm_v7s_iopte;
 
 static bool selftest_running;
@@ -198,13 +206,16 @@ static void *__arm_v7s_alloc_table(int lvl, gfp_t gfp,
void *table = NULL;
 
if (lvl == 1)
-   table = (void *)__get_dma_pages(__GFP_ZERO, get_order(size));
+   table = (void *)__get_free_pages(
+   __GFP_ZERO | ARM_V7S_TABLE_GFP_DMA, get_order(size));
else if (lvl == 2)
-   table = kmem_cache_zalloc(data->l2_tables, gfp | GFP_DMA);
+   table = kmem_cache_zalloc(data->l2_tables, gfp);
phys = virt_to_phys(table);
-   if (phys != (arm_v7s_iopte)phys)
+   if (phys != (arm_v7s_iopte)phys) {
/* Doesn't fit in PTE */
+   dev_err(dev, "Page table does not fit in PTE: %pa", &phys);
goto out_free;
+   }
if (table && !(cfg->quirks & IO_PGTABLE_QUIRK_NO_DMA)) {
dma = dma_map_single(dev, table, size, DMA_TO_DEVICE);
if (dma_mapping_error(dev, dma))
@@ -737,7 +748,7 @@ static struct io_pgtable *arm_v7s_alloc_pgtable(struct 
io_pgtable_cfg *cfg,
data->l2_tables = kmem_cache_create("io-pgtable_armv7s_l2",
ARM_V7S_TABLE_SIZE(2),
ARM_V7S_TABLE_SIZE(2),
-   SLAB_CACHE_DMA, NULL);
+   ARM_V7S_TABLE_SLAB_CACHE, NULL);
if (!data->l2_tables)
goto out_free_data;
 
-- 
2.20.0.rc2.403.gdbc3b29805-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v5 1/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Nicolas Boichat
IOMMUs using ARMv7 short-descriptor format require page tables
to be allocated within the first 4GB of RAM, even on 64-bit systems.
On arm64, this is done by passing GFP_DMA32 flag to memory allocation
functions.

For IOMMU L2 tables that only take 1KB, it would be a waste to allocate
a full page using get_free_pages, so we considered 3 approaches:
 1. This patch, adding support for GFP_DMA32 slab caches.
 2. genalloc, which requires pre-allocating the maximum number of L2
page tables (4096, so 4MB of memory).
 3. page_frag, which is not very memory-efficient as it is unable
to reuse freed fragments until the whole page is freed.

This change makes it possible to create a custom cache in DMA32 zone
using kmem_cache_create, then allocate memory using kmem_cache_alloc.

We do not create a DMA32 kmalloc cache array, as there are currently
no users of kmalloc(..., GFP_DMA32). These calls will continue to
trigger a warning, as we keep GFP_DMA32 in GFP_SLAB_BUG_MASK.

This implies that calls to kmem_cache_*alloc on a SLAB_CACHE_DMA32
kmem_cache must _not_ use GFP_DMA32 (it is anyway redundant and
unnecessary).

Signed-off-by: Nicolas Boichat 
---

Changes since v2:
 - Clarified commit message
 - Add entry in sysfs-kernel-slab to document the new sysfs file

(v3 used the page_frag approach)

Changes since v4:
 - Added details to commit message
 - Dropped change that removed GFP_DMA32 from GFP_SLAB_BUG_MASK:
   instead we can just call kmem_cache_*alloc without GFP_DMA32
   parameter. This also means that we can drop PATCH 1/3, as we
   do not make any changes in GFP flag verification.
 - Dropped hunks that added cache_dma32 sysfs file, and moved
   the hunks to PATCH 3/3, so that maintainer can decide whether
   to pick the change independently.

 include/linux/slab.h | 2 ++
 mm/slab.c| 2 ++
 mm/slab.h| 3 ++-
 mm/slab_common.c | 2 +-
 mm/slub.c| 5 +
 5 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 11b45f7ae4057c..9449b19c5f107a 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -32,6 +32,8 @@
 #define SLAB_HWCACHE_ALIGN ((slab_flags_t __force)0x2000U)
 /* Use GFP_DMA memory */
 #define SLAB_CACHE_DMA ((slab_flags_t __force)0x4000U)
+/* Use GFP_DMA32 memory */
+#define SLAB_CACHE_DMA32   ((slab_flags_t __force)0x8000U)
 /* DEBUG: Store the last owner for bug hunting */
 #define SLAB_STORE_USER((slab_flags_t __force)0x0001U)
 /* Panic if kmem_cache_create() fails */
diff --git a/mm/slab.c b/mm/slab.c
index 73fe23e649c91a..124f8c556d27fb 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -2109,6 +2109,8 @@ int __kmem_cache_create(struct kmem_cache *cachep, 
slab_flags_t flags)
cachep->allocflags = __GFP_COMP;
if (flags & SLAB_CACHE_DMA)
cachep->allocflags |= GFP_DMA;
+   if (flags & SLAB_CACHE_DMA32)
+   cachep->allocflags |= GFP_DMA32;
if (flags & SLAB_RECLAIM_ACCOUNT)
cachep->allocflags |= __GFP_RECLAIMABLE;
cachep->size = size;
diff --git a/mm/slab.h b/mm/slab.h
index 4190c24ef0e9df..fcf717e12f0a86 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -127,7 +127,8 @@ static inline slab_flags_t kmem_cache_flags(unsigned int 
object_size,
 
 
 /* Legal flag mask for kmem_cache_create(), for various configurations */
-#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | SLAB_PANIC | \
+#define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \
+SLAB_CACHE_DMA32 | SLAB_PANIC | \
 SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS )
 
 #if defined(CONFIG_DEBUG_SLAB)
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 70b0cc85db67f8..18b7b809c8d064 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -53,7 +53,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work,
SLAB_FAILSLAB | SLAB_KASAN)
 
 #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \
-SLAB_ACCOUNT)
+SLAB_CACHE_DMA32 | SLAB_ACCOUNT)
 
 /*
  * Merge control. If this is set then no merging of slab caches will occur.
diff --git a/mm/slub.c b/mm/slub.c
index c229a9b7dd5448..4caadb926838ef 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3583,6 +3583,9 @@ static int calculate_sizes(struct kmem_cache *s, int 
forced_order)
if (s->flags & SLAB_CACHE_DMA)
s->allocflags |= GFP_DMA;
 
+   if (s->flags & SLAB_CACHE_DMA32)
+   s->allocflags |= GFP_DMA32;
+
if (s->flags & SLAB_RECLAIM_ACCOUNT)
s->allocflags |= __GFP_RECLAIMABLE;
 
@@ -5671,6 +5674,8 @@ static char *create_unique_id(struct kmem_cache *s)
 */
if (s->flags & SLAB_CACHE_DMA)
*p++ = 'd';
+   if (s->flags & SLAB_CACHE_DMA32)
+   *p++ = 'D';
if (s->flags & SLAB_RECLAIM_ACCOUNT)
*p++ = 'a';
if (s->flag

[PATCH v5 3/3] mm: Add /sys/kernel/slab/cache/cache_dma32

2018-12-06 Thread Nicolas Boichat
A previous patch in this series adds support for SLAB_CACHE_DMA32
kmem caches. This adds the corresponding
/sys/kernel/slab/cache/cache_dma32 entries, and fixes slabinfo
tool.

Signed-off-by: Nicolas Boichat 
---

There were different opinions on whether this sysfs entry should
be added, so I'll leave it up to the mm/slub maintainers to decide
whether they want to pick this up, or drop it.

 Documentation/ABI/testing/sysfs-kernel-slab |  9 +
 mm/slub.c   | 11 +++
 tools/vm/slabinfo.c |  7 ++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/Documentation/ABI/testing/sysfs-kernel-slab 
b/Documentation/ABI/testing/sysfs-kernel-slab
index 29601d93a1c2ea..d742c6cfdffbe9 100644
--- a/Documentation/ABI/testing/sysfs-kernel-slab
+++ b/Documentation/ABI/testing/sysfs-kernel-slab
@@ -106,6 +106,15 @@ Description:
are from ZONE_DMA.
Available when CONFIG_ZONE_DMA is enabled.
 
+What:  /sys/kernel/slab/cache/cache_dma32
+Date:  December 2018
+KernelVersion: 4.21
+Contact:   Nicolas Boichat 
+Description:
+   The cache_dma32 file is read-only and specifies whether objects
+   are from ZONE_DMA32.
+   Available when CONFIG_ZONE_DMA32 is enabled.
+
 What:  /sys/kernel/slab/cache/cpu_slabs
 Date:  May 2007
 KernelVersion: 2.6.22
diff --git a/mm/slub.c b/mm/slub.c
index 4caadb926838ef..840f3719d9d543 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -5104,6 +5104,14 @@ static ssize_t cache_dma_show(struct kmem_cache *s, char 
*buf)
 SLAB_ATTR_RO(cache_dma);
 #endif
 
+#ifdef CONFIG_ZONE_DMA32
+static ssize_t cache_dma32_show(struct kmem_cache *s, char *buf)
+{
+   return sprintf(buf, "%d\n", !!(s->flags & SLAB_CACHE_DMA32));
+}
+SLAB_ATTR_RO(cache_dma32);
+#endif
+
 static ssize_t usersize_show(struct kmem_cache *s, char *buf)
 {
return sprintf(buf, "%u\n", s->usersize);
@@ -5444,6 +5452,9 @@ static struct attribute *slab_attrs[] = {
 #ifdef CONFIG_ZONE_DMA
&cache_dma_attr.attr,
 #endif
+#ifdef CONFIG_ZONE_DMA32
+   &cache_dma32_attr.attr,
+#endif
 #ifdef CONFIG_NUMA
&remote_node_defrag_ratio_attr.attr,
 #endif
diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index 334b16db0ebbe9..4ee1bf6e498dfa 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -29,7 +29,7 @@ struct slabinfo {
char *name;
int alias;
int refs;
-   int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu;
+   int aliases, align, cache_dma, cache_dma32, cpu_slabs, destroy_by_rcu;
unsigned int hwcache_align, object_size, objs_per_slab;
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
@@ -531,6 +531,8 @@ static void report(struct slabinfo *s)
printf("** Hardware cacheline aligned\n");
if (s->cache_dma)
printf("** Memory is allocated in a special DMA zone\n");
+   if (s->cache_dma32)
+   printf("** Memory is allocated in a special DMA32 zone\n");
if (s->destroy_by_rcu)
printf("** Slabs are destroyed via RCU\n");
if (s->reclaim_account)
@@ -599,6 +601,8 @@ static void slabcache(struct slabinfo *s)
*p++ = '*';
if (s->cache_dma)
*p++ = 'd';
+   if (s->cache_dma32)
+   *p++ = 'D';
if (s->hwcache_align)
*p++ = 'A';
if (s->poison)
@@ -1205,6 +1209,7 @@ static void read_slab_dir(void)
slab->aliases = get_obj("aliases");
slab->align = get_obj("align");
slab->cache_dma = get_obj("cache_dma");
+   slab->cache_dma32 = get_obj("cache_dma32");
slab->cpu_slabs = get_obj("cpu_slabs");
slab->destroy_by_rcu = get_obj("destroy_by_rcu");
slab->hwcache_align = get_obj("hwcache_align");
-- 
2.20.0.rc2.403.gdbc3b29805-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-06 Thread Dongli Zhang



On 12/07/2018 12:12 AM, Joe Jin wrote:
> Hi Dongli,
> 
> Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():

I assume the call of swiotlb_tbl_map_single() might be frequent in some
situations, e.g., when 'swiotlb=force'.

That's why I declare the d_swiotlb_usage out of any functions and use "if
(unlikely(!d_swiotlb_usage))".

I think "if (unlikely(!d_swiotlb_usage))" incur less performance overhead than
calling swiotlb_create_debugfs() every time to confirm if debugfs is created. I
would declare d_swiotlb_usage statically inside swiotlb_create_debugfs() if the
performance overhead is acceptable (it is trivial indeed).


That is the reason I tag the patch with RFC because I am not sure if the
on-demand creation of debugfs is fine with maintainers/reviewers. If swiotlb
pages are never allocated, we would not be able to see the debugfs entry.

I would prefer to limit the modification within swiotlb and to not taint any
other files.

The drawback is there is no place to create or delete the debugfs entry because
swiotlb buffer could be initialized and uninitialized at very early stage.

> 
> void swiotlb_create_debugfs(void)
> {
> #ifdef CONFIG_DEBUG_FS
>   static struct dentry *d_swiotlb_usage = NULL;
> 
>   if (d_swiotlb_usage)
>   return;
> 
>   d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> 
>   if (!d_swiotlb_usage)
>   return;
> 
>   debugfs_create_file("usage", 0600, d_swiotlb_usage,
>   NULL, &swiotlb_usage_fops);
> #endif
> }
> 
> And for io_tlb_used, possible add a check at the begin of 
> swiotlb_tbl_map_single(),
> if there were not any free slots or not enough slots, return fail directly?

This would optimize the slots allocation path. I will follow this in next
version after I got more suggestions and confirmations from maintainers.


Thank you very much!

Dongli Zhang

> 
> Thanks,
> Joe
> On 12/5/18 7:59 PM, Dongli Zhang wrote:
>> The device driver will not be able to do dma operations once swiotlb buffer
>> is full, either because the driver is using so many IO TLB blocks inflight,
>> or because there is memory leak issue in device driver. To export the
>> swiotlb buffer usage via debugfs would help the user estimate the size of
>> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
>>
>> As the swiotlb can be initialized at very early stage when debugfs cannot
>> register successfully, this patch creates the debugfs entry on demand.
>>
>> Signed-off-by: Dongli Zhang 
>> ---
>>  kernel/dma/swiotlb.c | 57 
>> 
>>  1 file changed, 57 insertions(+)
>>
>> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
>> index 045930e..d3c8aa4 100644
>> --- a/kernel/dma/swiotlb.c
>> +++ b/kernel/dma/swiotlb.c
>> @@ -35,6 +35,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_DEBUG_FS
>> +#include 
>> +#endif
>>  
>>  #include 
>>  #include 
>> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>>   */
>>  static unsigned long io_tlb_nslabs;
>>  
>> +#ifdef CONFIG_DEBUG_FS
>> +/*
>> + * The number of used IO TLB block
>> + */
>> +static unsigned long io_tlb_used;
>> +#endif
>> +
>>  /*
>>   * This is a free list describing the number of free entries available from
>>   * each index
>> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
>>  
>>  static int late_alloc;
>>  
>> +#ifdef CONFIG_DEBUG_FS
>> +
>> +static struct dentry *d_swiotlb_usage;
>> +
>> +static int swiotlb_usage_show(struct seq_file *m, void *v)
>> +{
>> +seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
>> +return 0;
>> +}
>> +
>> +static int swiotlb_usage_open(struct inode *inode, struct file *filp)
>> +{
>> +return single_open(filp, swiotlb_usage_show, NULL);
>> +}
>> +
>> +static const struct file_operations swiotlb_usage_fops = {
>> +.open   = swiotlb_usage_open,
>> +.read   = seq_read,
>> +.llseek = seq_lseek,
>> +.release= single_release,
>> +};
>> +
>> +void swiotlb_create_debugfs(void)
>> +{
>> +d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
>> +
>> +if (!d_swiotlb_usage)
>> +return;
>> +
>> +debugfs_create_file("usage", 0600, d_swiotlb_usage,
>> +NULL, &swiotlb_usage_fops);
>> +}
>> +
>> +#endif
>> +
>>  static int __init
>>  setup_io_tlb_npages(char *str)
>>  {
>> @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>  pr_warn_once("%s is active and system is using DMA bounce 
>> buffers\n",
>>   sme_active() ? "SME" : "SEV");
>>  
>> +#ifdef CONFIG_DEBUG_FS
>> +if (unlikely(!d_swiotlb_usage))
>> +swiotlb_create_debugfs();
>> +#endif
>> +
>>  mask = dma_get_seg_boundary(hwdev);
>>  
>>  tbl_dma_addr &= mask;
>> @@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>>  dev_warn(hwdev, "swi

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 08:24:38PM +, Robin Murphy wrote:
> On 06/12/2018 20:00, Christoph Hellwig wrote:
>> On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:
>>> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
>>> the point we detected the ACPI properties are wrong - that shouldn't be too
>>> much of a headache to go back to.
>>
>> Ok.  I've cooked up a patch to use NULL as the go direct marker.
>> This cleans up a few things nicely, but also means we now need to
>> do the bypass scheme for all ops, not just the fast path.  But we
>> probably should just move the slow path ops out of line anyway,
>> so I'm not worried about it.  This has survived some very basic
>> testing on x86, and really needs to be cleaned up and split into
>> multiple patches..
>
> I've also just finished hacking something up to keep the arm64 status quo - 
> I'll need to actually test it tomorrow, but the overall diff looks like the 
> below.

Nice.  I created a branch that picked up your bits and also the ideas
from Linus, and the result looks reall nice.  I'll still need a signoff
for your bits, though.

Jesper, can you give this a spin if it changes the number even further?

  git://git.infradead.org/users/hch/misc.git dma-direct-calls.2

  
http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls.2
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v2] iommu: fix amd_iommu=force_isolation

2018-12-06 Thread Yu Zhao via iommu
The parameter is still there but it's ignored. We need to check its
value before deciding to go into passthrough mode for AMD IOMMU v2
capable device.

We occasionally use this parameter to force v2 capable device into
translation mode to debug memory corruption that we suspect is
caused by DMA writes.

To address the following comment from Joerg Roedel on the first
version, v2 capability of device is completely ignored.
> This breaks the iommu_v2 use-case, as it needs a direct mapping for the
> devices that support it.

And from Documentation/admin-guide/kernel-parameters.txt:
  This option does not override iommu=pt

Fixes: aafd8ba0ca74 ("iommu/amd: Implement add_device and remove_device")

Signed-off-by: Yu Zhao 
---
 drivers/iommu/amd_iommu.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/amd_iommu.c b/drivers/iommu/amd_iommu.c
index 1167ff0416cf..325f3bad118b 100644
--- a/drivers/iommu/amd_iommu.c
+++ b/drivers/iommu/amd_iommu.c
@@ -438,7 +438,14 @@ static int iommu_init_device(struct device *dev)
 
dev_data->alias = get_alias(dev);
 
-   if (dev_is_pci(dev) && pci_iommuv2_capable(to_pci_dev(dev))) {
+   /*
+* By default we use passthrough mode for IOMMUv2 capable device.
+* But if amd_iommu=force_isolation is set (e.g. to debug DMA to
+* invalid address), we ignore the capability for the device so
+* it'll be forced to go into translation mode.
+*/
+   if ((iommu_pass_through || !amd_iommu_force_isolation) &&
+   dev_is_pci(dev) && pci_iommuv2_capable(to_pci_dev(dev))) {
struct amd_iommu *iommu;
 
iommu = amd_iommu_rlookup_table[dev_data->devid];
-- 
2.20.0.rc2.403.gdbc3b29805-goog

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Robin Murphy

On 06/12/2018 20:00, Christoph Hellwig wrote:

On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:

I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at
the point we detected the ACPI properties are wrong - that shouldn't be too
much of a headache to go back to.


Ok.  I've cooked up a patch to use NULL as the go direct marker.
This cleans up a few things nicely, but also means we now need to
do the bypass scheme for all ops, not just the fast path.  But we
probably should just move the slow path ops out of line anyway,
so I'm not worried about it.  This has survived some very basic
testing on x86, and really needs to be cleaned up and split into
multiple patches..


I've also just finished hacking something up to keep the arm64 status 
quo - I'll need to actually test it tomorrow, but the overall diff looks 
like the below.


Robin.

->8-
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h

index 0626c3a93730..fe6287f68adb 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -19,15 +19,13 @@
 #include 
 #include 

-extern const struct dma_map_ops dummy_dma_ops;
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct 
bus_type *bus)

 {
/*
 * We expect no ISA devices, and all other DMA masters are expected to
 * have someone call arch_setup_dma_ops at device creation time.
 */
-   return &dummy_dma_ops;
+   return &dma_dummy_ops;
 }

 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c
index 17f8fc784de2..574aee2d04e7 100644
--- a/arch/arm64/mm/dma-mapping.c
+++ b/arch/arm64/mm/dma-mapping.c
@@ -255,98 +255,6 @@ static int __init atomic_pool_init(void)
return -ENOMEM;
 }

-/
- * The following APIs are for dummy DMA ops *
- /
-
-static void *__dummy_alloc(struct device *dev, size_t size,
-  dma_addr_t *dma_handle, gfp_t flags,
-  unsigned long attrs)
-{
-   return NULL;
-}
-
-static void __dummy_free(struct device *dev, size_t size,
-void *vaddr, dma_addr_t dma_handle,
-unsigned long attrs)
-{
-}
-
-static int __dummy_mmap(struct device *dev,
-   struct vm_area_struct *vma,
-   void *cpu_addr, dma_addr_t dma_addr, size_t size,
-   unsigned long attrs)
-{
-   return -ENXIO;
-}
-
-static dma_addr_t __dummy_map_page(struct device *dev, struct page *page,
-  unsigned long offset, size_t size,
-  enum dma_data_direction dir,
-  unsigned long attrs)
-{
-   return 0;
-}
-
-static void __dummy_unmap_page(struct device *dev, dma_addr_t dev_addr,
-  size_t size, enum dma_data_direction dir,
-  unsigned long attrs)
-{
-}
-
-static int __dummy_map_sg(struct device *dev, struct scatterlist *sgl,
- int nelems, enum dma_data_direction dir,
- unsigned long attrs)
-{
-   return 0;
-}
-
-static void __dummy_unmap_sg(struct device *dev,
-struct scatterlist *sgl, int nelems,
-enum dma_data_direction dir,
-unsigned long attrs)
-{
-}
-
-static void __dummy_sync_single(struct device *dev,
-   dma_addr_t dev_addr, size_t size,
-   enum dma_data_direction dir)
-{
-}
-
-static void __dummy_sync_sg(struct device *dev,
-   struct scatterlist *sgl, int nelems,
-   enum dma_data_direction dir)
-{
-}
-
-static int __dummy_mapping_error(struct device *hwdev, dma_addr_t dma_addr)
-{
-   return 1;
-}
-
-static int __dummy_dma_supported(struct device *hwdev, u64 mask)
-{
-   return 0;
-}
-
-const struct dma_map_ops dummy_dma_ops = {
-   .alloc  = __dummy_alloc,
-   .free   = __dummy_free,
-   .mmap   = __dummy_mmap,
-   .map_page   = __dummy_map_page,
-   .unmap_page = __dummy_unmap_page,
-   .map_sg = __dummy_map_sg,
-   .unmap_sg   = __dummy_unmap_sg,
-   .sync_single_for_cpu= __dummy_sync_single,
-   .sync_single_for_device = __dummy_sync_single,
-   .sync_sg_for_cpu= __dummy_sync_sg,
-   .sync_sg_for_device = __dummy_sync_sg,
-   .mapping_error  = __dummy_mapping_error,
-   .dma_supported  = __dummy_dma_supported,
-};
-EXPORT_SYMBOL(dummy_dma_ops);
-
 static int __init arm64_dma_init(void)
 {
WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(),
dif

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 06:54:17PM +, Robin Murphy wrote:
> I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at 
> the point we detected the ACPI properties are wrong - that shouldn't be too 
> much of a headache to go back to.

Ok.  I've cooked up a patch to use NULL as the go direct marker.
This cleans up a few things nicely, but also means we now need to
do the bypass scheme for all ops, not just the fast path.  But we
probably should just move the slow path ops out of line anyway,
so I'm not worried about it.  This has survived some very basic
testing on x86, and really needs to be cleaned up and split into
multiple patches..

The other nice thing this would allow is removing dma_direct_ops
entirely, which means we can simplify a few things even further.

This patch is relative to what I sent out before and the git tree,
and survives some very basic x86 testing.

---
>From 9318145675791833f752cba22484a0b4b4387f1b Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Thu, 6 Dec 2018 10:52:35 -0800
Subject: dma-direct: use NULL as the bypass indicator

Signed-off-by: Christoph Hellwig 
---
 arch/alpha/include/asm/dma-mapping.h |   2 +-
 arch/arm/include/asm/dma-mapping.h   |   2 +-
 arch/arm/mm/dma-mapping-nommu.c  |  12 +--
 arch/arm64/include/asm/dma-mapping.h |   8 +-
 arch/arm64/mm/dma-mapping.c  |  89 ---
 arch/ia64/hp/common/hwsw_iommu.c |   2 +-
 arch/ia64/hp/common/sba_iommu.c  |   4 +-
 arch/ia64/kernel/dma-mapping.c   |   1 -
 arch/ia64/kernel/pci-dma.c   |   2 +-
 arch/mips/include/asm/dma-mapping.h  |   2 +-
 arch/parisc/kernel/setup.c   |   4 -
 arch/sparc/include/asm/dma-mapping.h |   4 +-
 arch/x86/kernel/pci-dma.c|   2 +-
 drivers/base/platform.c  |   2 +
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c  |   2 +-
 drivers/iommu/amd_iommu.c|  13 +--
 drivers/pci/controller/vmd.c |   4 +-
 include/asm-generic/dma-mapping.h|   2 +-
 include/linux/dma-direct.h   |   6 --
 include/linux/dma-mapping.h  | 125 ++-
 include/linux/dma-noncoherent.h  |   7 --
 kernel/dma/Kconfig   |   2 +-
 kernel/dma/direct.c  |   3 +
 23 files changed, 92 insertions(+), 208 deletions(-)

diff --git a/arch/alpha/include/asm/dma-mapping.h 
b/arch/alpha/include/asm/dma-mapping.h
index 8beeafd4f68e..c9203468d4fe 100644
--- a/arch/alpha/include/asm/dma-mapping.h
+++ b/arch/alpha/include/asm/dma-mapping.h
@@ -7,7 +7,7 @@ extern const struct dma_map_ops alpha_pci_ops;
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
 #ifdef CONFIG_ALPHA_JENSEN
-   return &dma_direct_ops;
+   return NULL; /* use direct mapping */
 #else
return &alpha_pci_ops;
 #endif
diff --git a/arch/arm/include/asm/dma-mapping.h 
b/arch/arm/include/asm/dma-mapping.h
index 965b7c846ecb..31d3b96f0f4b 100644
--- a/arch/arm/include/asm/dma-mapping.h
+++ b/arch/arm/include/asm/dma-mapping.h
@@ -18,7 +18,7 @@ extern const struct dma_map_ops arm_coherent_dma_ops;
 
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : &dma_direct_ops;
+   return IS_ENABLED(CONFIG_MMU) ? &arm_dma_ops : NULL;
 }
 
 #ifdef __arch_page_to_dma
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 712416ecd8e6..378763003079 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -209,16 +209,9 @@ const struct dma_map_ops arm_nommu_dma_ops = {
 };
 EXPORT_SYMBOL(arm_nommu_dma_ops);
 
-static const struct dma_map_ops *arm_nommu_get_dma_map_ops(bool coherent)
-{
-   return coherent ? &dma_direct_ops : &arm_nommu_dma_ops;
-}
-
 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
const struct iommu_ops *iommu, bool coherent)
 {
-   const struct dma_map_ops *dma_ops;
-
if (IS_ENABLED(CONFIG_CPU_V7M)) {
/*
 * Cache support for v7m is optional, so can be treated as
@@ -234,7 +227,6 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, 
u64 size,
dev->archdata.dma_coherent = (get_cr() & CR_M) ? coherent : 
true;
}
 
-   dma_ops = arm_nommu_get_dma_map_ops(dev->archdata.dma_coherent);
-
-   set_dma_ops(dev, dma_ops);
+   if (!dev->archdata.dma_coherent)
+   set_dma_ops(dev, &arm_nommu_dma_ops);
 }
diff --git a/arch/arm64/include/asm/dma-mapping.h 
b/arch/arm64/include/asm/dma-mapping.h
index c41f3fb1446c..95dbf3ef735a 100644
--- a/arch/arm64/include/asm/dma-mapping.h
+++ b/arch/arm64/include/asm/dma-mapping.h
@@ -24,15 +24,9 @@
 #include 
 #include 
 
-extern const struct dma_map_ops dummy_dma_ops;
-
 static inline const struct dma_map_ops *get_arch_dma_ops(struct bus_type *bus)
 {
-   /*
-* We expect no ISA devices, and all other DMA masters are expecte

Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 06:10:54PM +0100, Christian Zigotzky wrote:
> Please don’t merge this code. We are still testing and trying to figure out 
> where the problems are in the code.

The ones I sent pings for were either tested successfully by you
(the zone change) or are trivial cleanups that don't affect your setup.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Robin Murphy

On 06/12/2018 18:43, Christoph Hellwig wrote:

On Thu, Dec 06, 2018 at 10:28:22AM -0800, Linus Torvalds wrote:

which is counterproductive because it checks for the fast case *after*
you've done a load (and a check) that is entirely pointless for the
fast case.

Put another way, you made the fast case unnecessarily slow.

If this fast case matters (and the whole point of the series is that
it *does* matter), then you should make sure that

  (a) the dma_is_direct() function/macro uses "likely()" for the test


I'm a little worried about that.  Yes, for the common case it is
likely.  But if you run a setup where you say always have an iommu
it is not, in fact it is never called in that case, but we only know
that at runtime.


  (b) the code is organized like this instead:

+   if (dma_is_direct(ops))
+   dma_direct_unmap_page(dev, addr, size, dir, attrs);
+   else if (ops->unmap_page)
+   ops->unmap_page(dev, addr, size, dir, attrs);

because now you avoid that unnecessary conditional for the common
case, and you hopefully never even access the pointer (because you
know what's behind it: the direct ops cases that you're
short-circuiting).


Agreed on that, I've fixed it up now (current patch attached below).


In fact, as a further micro-optimization it might be a good idea to
just specify that the dma_is_direct() ops is a special pointer
(perhaps even just say that "NULL means it's direct"), because that
then makes the fast-case test much simpler (avoids a whole nasty
constant load, and testing for NULL in particular is often much
better).


So I wanted to do the NULL case, and it would be much nicer.  But the
arm folks went to great length to make sure they don't have a default
set of dma ops and require it to be explicitly set on every device
to catch cases where people don't set things up properly and I didn't
want to piss them off..  But maybe I should just go for it and see who
screams, as the benefit is pretty obvious.


I'm pretty sure we used to assign dummy_dma_ops explicitly to devices at 
the point we detected the ACPI properties are wrong - that shouldn't be 
too much of a headache to go back to.


Robin.



---
 From 89cc8ab03798177339ad916efabd320e792ee034 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 3 Dec 2018 16:49:29 +0100
Subject: dma-mapping: bypass indirect calls for dma-direct

Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig 
---
  include/linux/dma-direct.h  |  17 --
  include/linux/dma-mapping.h | 110 
  kernel/dma/direct.c |  13 +++--
  3 files changed, 106 insertions(+), 34 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3b0a3ea3876d..b7338702592a 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
  struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
  void __dma_direct_free_pages(struct device *dev, size_t size, struct page 
*page);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-   enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_cpu(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
  int dma_direct_supported(struct device *dev, u64 mask);
  #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8916499d2805..98e4dd67c906 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -221,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct 
device *dev)
  }
  #endif
  
+static inline bool dma_is_direct(const struct dma_map_ops *ops)

+{
+   return IS_ENABLED(CONFIG_DMA_DIRECT_OPS) && ops == &dma_dir

Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 10:43 AM Christoph Hellwig  wrote:
>
> >
> >  (a) the dma_is_direct() function/macro uses "likely()" for the test
>
> I'm a little worried about that.  Yes, for the common case it is
> likely.  But if you run a setup where you say always have an iommu
> it is not, in fact it is never called in that case, but we only know
> that at runtime.

Note that "likely()" doesn't have any really huge overhead - it just
makes the compiler move the unlikely case out-of-line.

Compared to the overhead of the indirect branch, it's simply not a
huge deal, it's more a mispredict and cache layout issue.

So marking something "likely()" when it isn't doesn't really penalize
things too much. It's not like an exception or anything like that,
it's really just a marker for better code layout.

  Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 06:10:47PM +, Robin Murphy wrote:
> AFAICS the tmp list wasn't about locking as much as meaning that if 
> kzalloc() failed at any point, we can free the partial allocation and back 
> out without disturbing free_entries at all - that still makes sense to me 
> up until patch #8 where we embrace the "never free anything" paradigm and 
> rip out the final traces.
>
> That said, maybe I should just drop the refactoring of 
> dma_debug_resize_entries() now that I'm deleting it as part of the same 
> series anyway - then I guess I squash what's left of this patch into #4 and 
> bring forward some of the simplification from #8 to start with. Would that 
> be more agreeable?

Yes, I just noticed all this goes away toward the end anyway.  We can
either keep it as is, or just drop the intermediate step if that is
easy enough for you.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 10:29:35AM -0800, Nadav Amit wrote:
> Did you happen to see my RFC for "automatic" indirect call promotion? 
> 
>   https://lkml.org/lkml/2018/10/18/175
> 
> I hope to get v1 based on Josh responses next week.

I'll take a look when you repost it.  Although I'm always a little vary
of under the hood magic like this..
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 10:28:22AM -0800, Linus Torvalds wrote:
> which is counterproductive because it checks for the fast case *after*
> you've done a load (and a check) that is entirely pointless for the
> fast case.
> 
> Put another way, you made the fast case unnecessarily slow.
> 
> If this fast case matters (and the whole point of the series is that
> it *does* matter), then you should make sure that
> 
>  (a) the dma_is_direct() function/macro uses "likely()" for the test

I'm a little worried about that.  Yes, for the common case it is
likely.  But if you run a setup where you say always have an iommu
it is not, in fact it is never called in that case, but we only know
that at runtime.

>  (b) the code is organized like this instead:
> 
> +   if (dma_is_direct(ops))
> +   dma_direct_unmap_page(dev, addr, size, dir, attrs);
> +   else if (ops->unmap_page)
> +   ops->unmap_page(dev, addr, size, dir, attrs);
> 
> because now you avoid that unnecessary conditional for the common
> case, and you hopefully never even access the pointer (because you
> know what's behind it: the direct ops cases that you're
> short-circuiting).

Agreed on that, I've fixed it up now (current patch attached below).

> In fact, as a further micro-optimization it might be a good idea to
> just specify that the dma_is_direct() ops is a special pointer
> (perhaps even just say that "NULL means it's direct"), because that
> then makes the fast-case test much simpler (avoids a whole nasty
> constant load, and testing for NULL in particular is often much
> better).

So I wanted to do the NULL case, and it would be much nicer.  But the
arm folks went to great length to make sure they don't have a default
set of dma ops and require it to be explicitly set on every device
to catch cases where people don't set things up properly and I didn't
want to piss them off..  But maybe I should just go for it and see who
screams, as the benefit is pretty obvious.

---
>From 89cc8ab03798177339ad916efabd320e792ee034 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig 
Date: Mon, 3 Dec 2018 16:49:29 +0100
Subject: dma-mapping: bypass indirect calls for dma-direct

Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h  |  17 --
 include/linux/dma-mapping.h | 110 
 kernel/dma/direct.c |  13 +++--
 3 files changed, 106 insertions(+), 34 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3b0a3ea3876d..b7338702592a 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
 void __dma_direct_free_pages(struct device *dev, size_t size, struct page 
*page);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-   enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_cpu(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
 int dma_direct_supported(struct device *dev, u64 mask);
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8916499d2805..98e4dd67c906 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -221,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct 
device *dev)
 }
 #endif
 
+static inline bool dma_is_direct(const struct dma_map_ops *ops)
+{
+   return IS_ENABLED(CONFIG_DMA_DIRECT_OPS) && ops == &dma_direct_ops;
+}
+
+/*
+ * All the dma_direct_* declarations are here just for the indirect call 
bypass,
+ * and must not be used directly drivers!
+ */
+dma_addr_t dma_direct_map_page(struct device *dev, struct page

[PATCH v3 6/9] iommu/dma-iommu.c: Convert to use vm_insert_range

2018-12-06 Thread Souptick Joarder
Convert to use vm_insert_range() to map range of kernel
memory to user vma.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
---
 drivers/iommu/dma-iommu.c | 13 +++--
 1 file changed, 3 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index d1b0475..a2c65e2 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -622,17 +622,10 @@ struct page **iommu_dma_alloc(struct device *dev, size_t 
size, gfp_t gfp,
 
 int iommu_dma_mmap(struct page **pages, size_t size, struct vm_area_struct 
*vma)
 {
-   unsigned long uaddr = vma->vm_start;
-   unsigned int i, count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-   int ret = -ENXIO;
+   unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT;
 
-   for (i = vma->vm_pgoff; i < count && uaddr < vma->vm_end; i++) {
-   ret = vm_insert_page(vma, uaddr, pages[i]);
-   if (ret)
-   break;
-   uaddr += PAGE_SIZE;
-   }
-   return ret;
+   return vm_insert_range(vma, vma->vm_start,
+   pages + vma->vm_pgoff, count);
 }
 
 static dma_addr_t __iommu_dma_map(struct device *dev, phys_addr_t phys,
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 1/9] mm: Introduce new vm_insert_range API

2018-12-06 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating a new function and use it across
the drivers.

vm_insert_range is the new API which will be used to map a
range of kernel memory/pages to user vma.

This API is tested by Heiko for Rockchip drm driver, on rk3188,
rk3288, rk3328 and rk3399 with graphics.

Signed-off-by: Souptick Joarder 
Reviewed-by: Matthew Wilcox 
Reviewed-by: Mike Rapoport 
Tested-by: Heiko Stuebner 
---
 include/linux/mm.h |  2 ++
 mm/memory.c| 38 ++
 mm/nommu.c |  7 +++
 3 files changed, 47 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index fcf9cc9..2bc399f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2506,6 +2506,8 @@ unsigned long change_prot_numa(struct vm_area_struct *vma,
 int remap_pfn_range(struct vm_area_struct *, unsigned long addr,
unsigned long pfn, unsigned long size, pgprot_t);
 int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *);
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count);
 vm_fault_t vmf_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
 vm_fault_t vmf_insert_pfn_prot(struct vm_area_struct *vma, unsigned long addr,
diff --git a/mm/memory.c b/mm/memory.c
index 15c417e..84ea46c 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1478,6 +1478,44 @@ static int insert_page(struct vm_area_struct *vma, 
unsigned long addr,
 }
 
 /**
+ * vm_insert_range - insert range of kernel pages into user vma
+ * @vma: user vma to map to
+ * @addr: target user address of this page
+ * @pages: pointer to array of source kernel pages
+ * @page_count: number of pages need to insert into user vma
+ *
+ * This allows drivers to insert range of kernel pages they've allocated
+ * into a user vma. This is a generic function which drivers can use
+ * rather than using their own way of mapping range of kernel pages into
+ * user vma.
+ *
+ * If we fail to insert any page into the vma, the function will return
+ * immediately leaving any previously-inserted pages present.  Callers
+ * from the mmap handler may immediately return the error as their caller
+ * will destroy the vma, removing any successfully-inserted pages. Other
+ * callers should make their own arrangements for calling unmap_region().
+ *
+ * Context: Process context. Called by mmap handlers.
+ * Return: 0 on success and error code otherwise
+ */
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count)
+{
+   unsigned long uaddr = addr;
+   int ret = 0, i;
+
+   for (i = 0; i < page_count; i++) {
+   ret = vm_insert_page(vma, uaddr, pages[i]);
+   if (ret < 0)
+   return ret;
+   uaddr += PAGE_SIZE;
+   }
+
+   return ret;
+}
+EXPORT_SYMBOL(vm_insert_range);
+
+/**
  * vm_insert_page - insert single page into user vma
  * @vma: user vma to map to
  * @addr: target user address of this page
diff --git a/mm/nommu.c b/mm/nommu.c
index 749276b..d6ef5c7 100644
--- a/mm/nommu.c
+++ b/mm/nommu.c
@@ -473,6 +473,13 @@ int vm_insert_page(struct vm_area_struct *vma, unsigned 
long addr,
 }
 EXPORT_SYMBOL(vm_insert_page);
 
+int vm_insert_range(struct vm_area_struct *vma, unsigned long addr,
+   struct page **pages, unsigned long page_count)
+{
+   return -EINVAL;
+}
+EXPORT_SYMBOL(vm_insert_range);
+
 /*
  *  sys_brk() for the most part doesn't need the global kernel
  *  lock, except when an application is doing something nasty
-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: bypass indirect calls for dma-direct

2018-12-06 Thread Christoph Hellwig
On Thu, Dec 06, 2018 at 06:40:27PM +0100, Jesper Dangaard Brouer wrote:
> Is this patch to be applied on top of the other (41) patches I just
> tested with my XDP forward workload?

This is just the last patch of the git tree you tested.  All the
others were sent out separately already.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 0/9] Use vm_insert_range

2018-12-06 Thread Souptick Joarder
Previouly drivers have their own way of mapping range of
kernel pages/memory into user vma and this was done by
invoking vm_insert_page() within a loop.

As this pattern is common across different drivers, it can
be generalized by creating a new function and use it across
the drivers.

vm_insert_range is the new API which will be used to map a
range of kernel memory/pages to user vma.

All the applicable places are converted to use new vm_insert_range
in this patch series.

v1 -> v2:
Address review comment on mm/memory.c. Add EXPORT_SYMBOL
for vm_insert_range and corrected the documentation part
for this API.

In drivers/gpu/drm/xen/xen_drm_front_gem.c, replace err
with ret as suggested.

In drivers/iommu/dma-iommu.c, handle the scenario of partial
mmap() of large buffer by passing *pages + vma->vm_pgoff* to
vm_insert_range().

v2 -> v3:
Declaration of vm_insert_range() moved to include/linux/mm.h

Souptick Joarder (9):
  mm: Introduce new vm_insert_range API
  arch/arm/mm/dma-mapping.c: Convert to use vm_insert_range
  drivers/firewire/core-iso.c: Convert to use vm_insert_range
  drm/rockchip/rockchip_drm_gem.c: Convert to use vm_insert_range
  drm/xen/xen_drm_front_gem.c: Convert to use vm_insert_range
  iommu/dma-iommu.c: Convert to use vm_insert_range
  videobuf2/videobuf2-dma-sg.c: Convert to use vm_insert_range
  xen/gntdev.c: Convert to use vm_insert_range
  xen/privcmd-buf.c: Convert to use vm_insert_range

 arch/arm/mm/dma-mapping.c | 21 +
 drivers/firewire/core-iso.c   | 15 ++---
 drivers/gpu/drm/rockchip/rockchip_drm_gem.c   | 20 ++--
 drivers/gpu/drm/xen/xen_drm_front_gem.c   | 20 
 drivers/iommu/dma-iommu.c | 13 ++--
 drivers/media/common/videobuf2/videobuf2-dma-sg.c | 23 +-
 drivers/xen/gntdev.c  | 11 +++
 drivers/xen/privcmd-buf.c |  8 ++---
 include/linux/mm.h|  2 ++
 mm/memory.c   | 38 +++
 mm/nommu.c|  7 +
 11 files changed, 80 insertions(+), 98 deletions(-)

-- 
1.9.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/8] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-06 Thread Robin Murphy

On 06/12/2018 14:24, Christoph Hellwig wrote:

@@ -47,6 +47,8 @@
  #ifndef PREALLOC_DMA_DEBUG_ENTRIES
  #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
  #endif


FYI, I think we should drop the potential arch hook with the ifndef
here once we support the dynamic adjustment.


Sure - I have a vague feeling that thought did cross my mind at the 
time, but it didn't stick.





+   if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, 
GFP_ATOMIC)) {


Overly long line.


Bah, I deliberately didn't bother wrapping this since it gets shorter 
again later, but now I realise it also slipped through the s/create/add/ 
rebase so it's just busted, and due to another last-minute fix I 
actually ended up with an even longer line elsewhere.


I'll tidy all of that up for a v3.

Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 10:28 AM Linus Torvalds
 wrote:
>
> Put another way, you made the fast case unnecessarily slow.

Side note: the code seems to be a bit confused about it, because
*some* cases test the fast case first, and some do it after they've
already accessed the pointer for the slow case.

So even aside from the performance and code generation issue (and
possible future "use a special bit pattern for the fast case"), it
would be good for _consistency_ to just always do the fast-case test
first.

Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Nadav Amit
> On Dec 6, 2018, at 9:43 AM, Jesper Dangaard Brouer  wrote:
> 
> On Thu,  6 Dec 2018 07:37:19 -0800
> Christoph Hellwig  wrote:
> 
>> Hi all,
>> 
>> a while ago Jesper reported major performance regressions due to the
>> spectre v2 mitigations in his XDP forwarding workloads.  A large part
>> of that is due to the DMA mapping API indirect calls.
>> 
>> It turns out that the most common implementation of the DMA API is the
>> direct mapping case, and now that we have merged almost all duplicate
>> implementations of that into a single generic one is easily feasily to
>> direct calls for this fast path.
>> 
>> This patch adds a check if we are using dma_direct_ops in each fast path
>> DMA operation, and just uses a direct call instead.  For the XDP workload
>> this increases the number of packets per second from 7,438,283 to
>> 9,610,088, so it provides a very significant speedup.
> 
> Full test report avail here:
> https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org
> 
> 
>> Note that the patch depends on a lot of work either queued up in the
>> DMA mapping tree, or still out on the list from review, so to actually
>> try the patch you probably want this git tree:
>> 
>> 
>>git://git.infradead.org/users/hch/misc.git dma-direct-calls
>> 
>> Gitweb:
>> 
>>
>> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls

Did you happen to see my RFC for "automatic" indirect call promotion? 

https://lkml.org/lkml/2018/10/18/175

I hope to get v1 based on Josh responses next week.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 7:37 AM Christoph Hellwig  wrote:
>
> This patch adds a check if we are using dma_direct_ops in each fast path
> DMA operation, and just uses a direct call instead.  For the XDP workload
> this increases the number of packets per second from 7,438,283 to
> 9,610,088, so it provides a very significant speedup.

May I suggest re-doing the logic a bit?

You do things like this:

+   if (ops->unmap_page) {
+   if (dma_is_direct(ops))
+   dma_direct_unmap_page(dev, addr, size, dir, attrs);
+   else
+   ops->unmap_page(dev, addr, size, dir, attrs);
+   }

which is counterproductive because it checks for the fast case *after*
you've done a load (and a check) that is entirely pointless for the
fast case.

Put another way, you made the fast case unnecessarily slow.

If this fast case matters (and the whole point of the series is that
it *does* matter), then you should make sure that

 (a) the dma_is_direct() function/macro uses "likely()" for the test

and

 (b) the code is organized like this instead:

+   if (dma_is_direct(ops))
+   dma_direct_unmap_page(dev, addr, size, dir, attrs);
+   else if (ops->unmap_page)
+   ops->unmap_page(dev, addr, size, dir, attrs);

because now you avoid that unnecessary conditional for the common
case, and you hopefully never even access the pointer (because you
know what's behind it: the direct ops cases that you're
short-circuiting).

In fact, as a further micro-optimization it might be a good idea to
just specify that the dma_is_direct() ops is a special pointer
(perhaps even just say that "NULL means it's direct"), because that
then makes the fast-case test much simpler (avoids a whole nasty
constant load, and testing for NULL in particular is often much
better).

But that further micro-optimization absolutely *requires* that the ops
pointer test comes first. So making that ordering change is not only
"better code generation for the fast case to avoid extra cache
accesses", it also allows future optimizations.

 Linus
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation

2018-12-06 Thread Robin Murphy

On 06/12/2018 14:23, Christoph Hellwig wrote:

+static int dma_debug_add_entries(u32 num_entries, gfp_t gfp)
+{
+   struct dma_debug_entry *entry, *next_entry;
+   LIST_HEAD(tmp);
+   int i;
+
+   for (i = 0; i < num_entries; ++i) {
+   entry = kzalloc(sizeof(*entry), gfp);
+   if (!entry)
+   goto out_err;
+
+   list_add_tail(&entry->list, &tmp);
+   }
+
+   list_splice(&tmp, &free_entries);
+   num_free_entries += num_entries;
+   nr_total_entries += num_entries;


The adding to a local list and splicing seems a bit pointless if we
do it all under lock anyway.  Either we change the locking in
dma_debug_resize_entries and your upcoming automatic allocation that
we only do it over the splice and counter adjustment, which would
have the advantage of allowing freeing of entries in parallel to these
allocations, or we could just drop the local tmp list.


AFAICS the tmp list wasn't about locking as much as meaning that if 
kzalloc() failed at any point, we can free the partial allocation and 
back out without disturbing free_entries at all - that still makes sense 
to me up until patch #8 where we embrace the "never free anything" 
paradigm and rip out the final traces.


That said, maybe I should just drop the refactoring of 
dma_debug_resize_entries() now that I'm deleting it as part of the same 
series anyway - then I guess I squash what's left of this patch into #4 
and bring forward some of the simplification from #8 to start with. 
Would that be more agreeable?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Jesper Dangaard Brouer
On Thu,  6 Dec 2018 07:37:19 -0800
Christoph Hellwig  wrote:

> Hi all,
> 
> a while ago Jesper reported major performance regressions due to the
> spectre v2 mitigations in his XDP forwarding workloads.  A large part
> of that is due to the DMA mapping API indirect calls.
> 
> It turns out that the most common implementation of the DMA API is the
> direct mapping case, and now that we have merged almost all duplicate
> implementations of that into a single generic one is easily feasily to
> direct calls for this fast path.
> 
> This patch adds a check if we are using dma_direct_ops in each fast path
> DMA operation, and just uses a direct call instead.  For the XDP workload
> this increases the number of packets per second from 7,438,283 to
> 9,610,088, so it provides a very significant speedup.

Full test report avail here:
 
https://github.com/xdp-project/xdp-project/blob/master/areas/dma/dma01_test_hellwig_direct_dma.org


> Note that the patch depends on a lot of work either queued up in the
> DMA mapping tree, or still out on the list from review, so to actually
> try the patch you probably want this git tree:
> 
> 
> git://git.infradead.org/users/hch/misc.git dma-direct-calls
> 
> Gitweb:
> 
> 
> http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: bypass indirect calls for dma-direct

2018-12-06 Thread Jesper Dangaard Brouer
On Thu,  6 Dec 2018 07:37:20 -0800
Christoph Hellwig  wrote:

> Avoid expensive indirect calls in the fast path DMA mapping
> operations by directly calling the dma_direct_* ops if we are using
> the directly mapped DMA operations.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  include/linux/dma-direct.h  |  17 -
>  include/linux/dma-mapping.h | 138 +++-
>  kernel/dma/direct.c |  13 ++--
>  3 files changed, 127 insertions(+), 41 deletions(-)

Hi Christoph,

Is this patch to be applied on top of the other (41) patches I just
tested with my XDP forward workload?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()

2018-12-06 Thread Robin Murphy

On 06/12/2018 15:35, Joerg Roedel wrote:

Hi Robin,

On Wed, Dec 05, 2018 at 05:17:54PM +, Robin Murphy wrote:

FWIW, this check (and its ACPI equivalent in patch #3) is specifically
asking "has .add_device() already been called?", rather than the more
general "is this device managed by an IOMMU?" (to which the exact answer at
this point is "yes, provided we return successfully from here").

I have no objection to the change as-is - especially if that usage is within
the intended scope of this API - I just wanted to call it out in case you're
also planning to introduce something else which would be even more
appropriate for that.


Yes, the purpose of the device_iommu_mapped() functions is to check
whether the device has been initialized by the IOMMU driver that handles
it, if any.

So it answers the question: Can I use the device in an IOMMU-API call?


OK, another way to consider the usage here would be "is the device ready 
to use in IOMMU API calls (or do I need to call add_device to finish 
setting it up)?", so it does in fact seem like a perfect fit, great!



And it is more readable than the dev->iommu_group checks everywhere :)


For sure - although I am now wondering whether "mapped" is perhaps a 
little ambiguous in the naming, since the answer to "can I use the API" 
is yes even when the device may currently be attached to an 
identity/passthrough domain or blocked completely, neither of which 
involve any "mapping". Maybe simply "device_has_iommu()" would convey 
the intent better?


Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky
Please don’t merge this code. We are still testing and trying to figure out 
where the problems are in the code.

— Christian

Sent from my iPhone

> On 6. Dec 2018, at 11:55, Christian Zigotzky  wrote:
> 
>> On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:
>> 
>> Thanks.  Can you try a few stepping points in the tree?
>> 
>> First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
>> (the first one) applied?
>> 
>> Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b
>> 
>> And if that still works with commits up to
>> c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a
>> 
> Hi Christoph,
> 
> I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
> following command:
> 
> git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
> 
> Result: PASEMI onboard ethernet works again and the P5020 board boots.
> 
> I will test the other commits in the next days.
> 
> @All
> It is really important, that you also test Christoph's work on your PASEMI 
> and NXP boards. Could you please help us with solving the issues?
> 
> 'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'
> 
> Thanks,
> Christian
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH RFC 1/1] swiotlb: add debugfs to track swiotlb buffer usage

2018-12-06 Thread Joe Jin
Hi Dongli,

Maybe move d_swiotlb_usage declare into swiotlb_create_debugfs():

void swiotlb_create_debugfs(void)
{
#ifdef CONFIG_DEBUG_FS
static struct dentry *d_swiotlb_usage = NULL;

if (d_swiotlb_usage)
return;

d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);

if (!d_swiotlb_usage)
return;

debugfs_create_file("usage", 0600, d_swiotlb_usage,
NULL, &swiotlb_usage_fops);
#endif
}

And for io_tlb_used, possible add a check at the begin of 
swiotlb_tbl_map_single(),
if there were not any free slots or not enough slots, return fail directly?

Thanks,
Joe
On 12/5/18 7:59 PM, Dongli Zhang wrote:
> The device driver will not be able to do dma operations once swiotlb buffer
> is full, either because the driver is using so many IO TLB blocks inflight,
> or because there is memory leak issue in device driver. To export the
> swiotlb buffer usage via debugfs would help the user estimate the size of
> swiotlb buffer to pre-allocate or analyze device driver memory leak issue.
> 
> As the swiotlb can be initialized at very early stage when debugfs cannot
> register successfully, this patch creates the debugfs entry on demand.
> 
> Signed-off-by: Dongli Zhang 
> ---
>  kernel/dma/swiotlb.c | 57 
> 
>  1 file changed, 57 insertions(+)
> 
> diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
> index 045930e..d3c8aa4 100644
> --- a/kernel/dma/swiotlb.c
> +++ b/kernel/dma/swiotlb.c
> @@ -35,6 +35,9 @@
>  #include 
>  #include 
>  #include 
> +#ifdef CONFIG_DEBUG_FS
> +#include 
> +#endif
>  
>  #include 
>  #include 
> @@ -73,6 +76,13 @@ static phys_addr_t io_tlb_start, io_tlb_end;
>   */
>  static unsigned long io_tlb_nslabs;
>  
> +#ifdef CONFIG_DEBUG_FS
> +/*
> + * The number of used IO TLB block
> + */
> +static unsigned long io_tlb_used;
> +#endif
> +
>  /*
>   * This is a free list describing the number of free entries available from
>   * each index
> @@ -100,6 +110,41 @@ static DEFINE_SPINLOCK(io_tlb_lock);
>  
>  static int late_alloc;
>  
> +#ifdef CONFIG_DEBUG_FS
> +
> +static struct dentry *d_swiotlb_usage;
> +
> +static int swiotlb_usage_show(struct seq_file *m, void *v)
> +{
> + seq_printf(m, "%lu\n%lu\n", io_tlb_used, io_tlb_nslabs);
> + return 0;
> +}
> +
> +static int swiotlb_usage_open(struct inode *inode, struct file *filp)
> +{
> + return single_open(filp, swiotlb_usage_show, NULL);
> +}
> +
> +static const struct file_operations swiotlb_usage_fops = {
> + .open   = swiotlb_usage_open,
> + .read   = seq_read,
> + .llseek = seq_lseek,
> + .release= single_release,
> +};
> +
> +void swiotlb_create_debugfs(void)
> +{
> + d_swiotlb_usage = debugfs_create_dir("swiotlb", NULL);
> +
> + if (!d_swiotlb_usage)
> + return;
> +
> + debugfs_create_file("usage", 0600, d_swiotlb_usage,
> + NULL, &swiotlb_usage_fops);
> +}
> +
> +#endif
> +
>  static int __init
>  setup_io_tlb_npages(char *str)
>  {
> @@ -449,6 +494,11 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   pr_warn_once("%s is active and system is using DMA bounce 
> buffers\n",
>sme_active() ? "SME" : "SEV");
>  
> +#ifdef CONFIG_DEBUG_FS
> + if (unlikely(!d_swiotlb_usage))
> + swiotlb_create_debugfs();
> +#endif
> +
>   mask = dma_get_seg_boundary(hwdev);
>  
>   tbl_dma_addr &= mask;
> @@ -528,6 +578,9 @@ phys_addr_t swiotlb_tbl_map_single(struct device *hwdev,
>   dev_warn(hwdev, "swiotlb buffer is full (sz: %zd bytes)\n", 
> size);
>   return SWIOTLB_MAP_ERROR;
>  found:
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used += nslots;
> +#endif
>   spin_unlock_irqrestore(&io_tlb_lock, flags);
>  
>   /*
> @@ -588,6 +641,10 @@ void swiotlb_tbl_unmap_single(struct device *hwdev, 
> phys_addr_t tlb_addr,
>*/
>   for (i = index - 1; (OFFSET(i, IO_TLB_SEGSIZE) != 
> IO_TLB_SEGSIZE -1) && io_tlb_list[i]; i--)
>   io_tlb_list[i] = ++count;
> +
> +#ifdef CONFIG_DEBUG_FS
> + io_tlb_used -= nslots;
> +#endif
>   }
>   spin_unlock_irqrestore(&io_tlb_lock, flags);
>  }
> 


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/4] iommu/of: Don't call iommu_ops->add_device directly

2018-12-06 Thread Joerg Roedel
On Wed, Dec 05, 2018 at 06:49:26PM +, Robin Murphy wrote:
>   if (ops && dev->bus && !dev->iommu_group)
> 
> What I can't quite remember just now is whether it's actually valid to get
> here with err == 0 but dev->iommu_fwspec->ops == NULL, so it *might* be OK
> to use "!err" instead of "ops" to make things slightly more obvious - I'll
> work through the flow tomorrow to double-check.

Yeah, adding an err == 0 check seems to be the best option here, so that
iommu-drivers don't get confused when we try to add a device that is not
managed by an iommu.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] iommu: Consolitate ->add/remove_device() calls

2018-12-06 Thread Joerg Roedel
On Wed, Dec 05, 2018 at 06:20:50PM +, Robin Murphy wrote:
> Is there any good reason to let .add_device/.remove_device be optional
> still? Everyone's implemented them for a while now, and on a practical level
> I don't really see how else we could expect devices to be taken in and out
> of their appropriate groups correctly.

There is probably no reason anymore to keep add_device/remove_device
option. I'll check the users and remove the pointer checks if possible.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] driver core: Introduce device_iommu_mapped() function

2018-12-06 Thread Joerg Roedel
On Thu, Dec 06, 2018 at 01:55:20PM +0100, Greg Kroah-Hartman wrote:
> On Tue, Dec 04, 2018 at 06:25:00PM +0100, Joerg Roedel wrote:
> > Cc: Greg Kroah-Hartman 
> > Signed-off-by: Joerg Roedel 
> > ---
> >  include/linux/device.h | 10 ++
> >  1 file changed, 10 insertions(+)
> 
> Acked-by: Greg Kroah-Hartman 

Thanks a lot, Greg. Added your Acked-by to the patch.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH] dma-mapping: bypass indirect calls for dma-direct

2018-12-06 Thread Christoph Hellwig
Avoid expensive indirect calls in the fast path DMA mapping
operations by directly calling the dma_direct_* ops if we are using
the directly mapped DMA operations.

Signed-off-by: Christoph Hellwig 
---
 include/linux/dma-direct.h  |  17 -
 include/linux/dma-mapping.h | 138 +++-
 kernel/dma/direct.c |  13 ++--
 3 files changed, 127 insertions(+), 41 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 3b0a3ea3876d..b7338702592a 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -60,22 +60,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs);
 void __dma_direct_free_pages(struct device *dev, size_t size, struct page 
*page);
-dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
-   unsigned long offset, size_t size, enum dma_data_direction dir,
-   unsigned long attrs);
-void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
-   size_t size, enum dma_data_direction dir, unsigned long attrs);
-int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
-   enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
-   int nents, enum dma_data_direction dir, unsigned long attrs);
-void dma_direct_sync_single_for_device(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_device(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
-void dma_direct_sync_single_for_cpu(struct device *dev,
-   dma_addr_t addr, size_t size, enum dma_data_direction dir);
-void dma_direct_sync_sg_for_cpu(struct device *dev,
-   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
 int dma_direct_supported(struct device *dev, u64 mask);
 #endif /* _LINUX_DMA_DIRECT_H */
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 8916499d2805..648ca7da3bb3 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -221,6 +221,69 @@ static inline const struct dma_map_ops *get_dma_ops(struct 
device *dev)
 }
 #endif
 
+static inline bool dma_is_direct(const struct dma_map_ops *ops)
+{
+   return IS_ENABLED(CONFIG_DMA_DIRECT_OPS) && ops == &dma_direct_ops;
+}
+
+/*
+ * All the dma_direct_* declarations are here just for the indirect call 
bypass,
+ * and must not be used directly drivers!
+ */
+dma_addr_t dma_direct_map_page(struct device *dev, struct page *page,
+   unsigned long offset, size_t size, enum dma_data_direction dir,
+   unsigned long attrs);
+int dma_direct_map_sg(struct device *dev, struct scatterlist *sgl, int nents,
+   enum dma_data_direction dir, unsigned long attrs);
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_DEVICE) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_sync_single_for_device(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma_direct_sync_sg_for_device(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir)
+{
+}
+#endif
+
+#if defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU) || \
+defined(CONFIG_ARCH_HAS_SYNC_DMA_FOR_CPU_ALL) || \
+defined(CONFIG_SWIOTLB)
+void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_unmap_sg(struct device *dev, struct scatterlist *sgl,
+   int nents, enum dma_data_direction dir, unsigned long attrs);
+void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir);
+void dma_direct_sync_sg_for_cpu(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction 
dir);
+#else
+static inline void dma_direct_unmap_page(struct device *dev, dma_addr_t addr,
+   size_t size, enum dma_data_direction dir, unsigned long attrs)
+{
+}
+static inline void dma_direct_unmap_sg(struct device *dev,
+   struct scatterlist *sgl, int nents, enum dma_data_direction dir,
+   unsigned long attrs)
+{
+}
+static inline void dma_direct_sync_single_for_cpu(struct device *dev,
+   dma_addr_t addr, size_t size, enum dma_data_direction dir)
+{
+}
+static inline void dma

[RFC] avoid indirect calls for DMA direct mappings

2018-12-06 Thread Christoph Hellwig
Hi all,

a while ago Jesper reported major performance regressions due to the
spectre v2 mitigations in his XDP forwarding workloads.  A large part
of that is due to the DMA mapping API indirect calls.

It turns out that the most common implementation of the DMA API is the
direct mapping case, and now that we have merged almost all duplicate
implementations of that into a single generic one is easily feasily to
direct calls for this fast path.

This patch adds a check if we are using dma_direct_ops in each fast path
DMA operation, and just uses a direct call instead.  For the XDP workload
this increases the number of packets per second from 7,438,283 to
9,610,088, so it provides a very significant speedup.

Note that the patch depends on a lot of work either queued up in the
DMA mapping tree, or still out on the list from review, so to actually
try the patch you probably want this git tree:


git://git.infradead.org/users/hch/misc.git dma-direct-calls

Gitweb:


http://git.infradead.org/users/hch/misc.git/shortlog/refs/heads/dma-direct-calls

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 0/5] Introduce device_iommu_maped() function

2018-12-06 Thread Joerg Roedel
On Wed, Dec 05, 2018 at 05:17:29PM +, Robin Murphy wrote:
> On 04/12/2018 17:24, Joerg Roedel wrote:

> Nice, we can also clean up a whole load of vague iommu_present() usage and
> even one or two odd iommu_get_domain_for_dev() calls once we have this.

Right, I didn't think of that yet, but it's certainly true.

> There looks to be one more here:
> 
> drivers/dma/sh/rcar-dmac.c: rcar_dmac_probe()

True, I added a patch for that too.

> Other than that and a minor comment on the OF/IORT part, though, for the
> whole series:
> 
> Acked-by: Robin Murphy 

Thanks, I added you Acked-by to the 5 patches I posted here.


Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/5] iommu/of: Use device_iommu_mapped()

2018-12-06 Thread Joerg Roedel
Hi Robin,

On Wed, Dec 05, 2018 at 05:17:54PM +, Robin Murphy wrote:
> FWIW, this check (and its ACPI equivalent in patch #3) is specifically
> asking "has .add_device() already been called?", rather than the more
> general "is this device managed by an IOMMU?" (to which the exact answer at
> this point is "yes, provided we return successfully from here").
> 
> I have no objection to the change as-is - especially if that usage is within
> the intended scope of this API - I just wanted to call it out in case you're
> also planning to introduce something else which would be even more
> appropriate for that.

Yes, the purpose of the device_iommu_mapped() functions is to check
whether the device has been initialized by the IOMMU driver that handles
it, if any.

So it answers the question: Can I use the device in an IOMMU-API call?
And it is more readable than the dev->iommu_group checks everywhere :)

Regards,

Joerg

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/2] arch: switch the default on ARCH_HAS_SG_CHAIN

2018-12-06 Thread Christoph Hellwig
I've picked this up for the dma-mapping for-next tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/2] csky, h8300, riscv: remove leftovers

2018-12-06 Thread Christoph Hellwig
Looks like generic-y is mostly going away anyway, so I'm going to skip
this to avoid conflicts with the kbuild tree.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: remove the ->mapping_error method from dma_map_ops V3

2018-12-06 Thread Christoph Hellwig
I've pulled this into the dma-mapping for-next tree, with the suggestion
from Robin that improves bisectability, and two unused variables found
by the build bot.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 8/8] dma-debug: Batch dma_debug_entry allocation

2018-12-06 Thread Christoph Hellwig
This looks very nice, thanks!

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 7/8] dma/debug: Remove dma_debug_resize_entries()

2018-12-06 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 07:56:56PM +, Robin Murphy wrote:
> With no callers left, we can clean up this part of dma-debug's exposed
> internals, making way to tweak the allocation behaviour.
> 
> Signed-off-by: Robin Murphy 

Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 5/8] dma-debug: Make leak-like behaviour apparent

2018-12-06 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 6/8] x86/dma/amd-gart: Stop resizing dma_debug_entry pool

2018-12-06 Thread Christoph Hellwig
Looks good,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 4/8] dma-debug: Dynamically expand the dma_debug_entry pool

2018-12-06 Thread Christoph Hellwig
> @@ -47,6 +47,8 @@
>  #ifndef PREALLOC_DMA_DEBUG_ENTRIES
>  #define PREALLOC_DMA_DEBUG_ENTRIES (1 << 16)
>  #endif

FYI, I think we should drop the potential arch hook with the ifndef
here once we support the dynamic adjustment.

> + if (dma_debug_create_entries(DMA_DEBUG_DYNAMIC_ENTRIES, 
> GFP_ATOMIC)) {

Overly long line.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 3/8] dma-debug: Refactor dma_debug_entry allocation

2018-12-06 Thread Christoph Hellwig
> +static int dma_debug_add_entries(u32 num_entries, gfp_t gfp)
> +{
> + struct dma_debug_entry *entry, *next_entry;
> + LIST_HEAD(tmp);
> + int i;
> +
> + for (i = 0; i < num_entries; ++i) {
> + entry = kzalloc(sizeof(*entry), gfp);
> + if (!entry)
> + goto out_err;
> +
> + list_add_tail(&entry->list, &tmp);
> + }
> +
> + list_splice(&tmp, &free_entries);
> + num_free_entries += num_entries;
> + nr_total_entries += num_entries;

The adding to a local list and splicing seems a bit pointless if we
do it all under lock anyway.  Either we change the locking in
dma_debug_resize_entries and your upcoming automatic allocation that
we only do it over the splice and counter adjustment, which would
have the advantage of allowing freeing of entries in parallel to these
allocations, or we could just drop the local tmp list.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 7/9] iommu/of: Use helper functions to access dev->iommu_fwspec

2018-12-06 Thread Joerg Roedel
On Wed, Dec 05, 2018 at 05:41:51PM +, Robin Murphy wrote:
> Nit: I think it makes sense to put this inside the "if (!err)" condition
> below rather than out here where it may or may not be relevant. The comment
> for that case is already supposed to imply that it's dealing with a fresh
> fwspec.

Right, updated that too.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2 2/8] dma-debug: Expose nr_total_entries in debugfs

2018-12-06 Thread Christoph Hellwig
On Wed, Dec 05, 2018 at 07:56:51PM +, Robin Murphy wrote:
> Expose nr_total_entries in debugfs, so that {num,min}_free_entries
> become even more meaningful to users interested in current/maximum
> utilisation. This becomes even more relevant once nr_total_entries
> may change at runtime beyond just the existing AMD GART debug code.
> 
> Suggested-by: John Garry 
> Signed-off-by: Robin Murphy 

Looks fine,

Reviewed-by: Christoph Hellwig 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 08/34] powerpc/dma: untangle vio_dma_mapping_ops from dma_iommu_ops

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:48AM +0100, Christoph Hellwig wrote:
> vio_dma_mapping_ops currently does a lot of indirect calls through
> dma_iommu_ops, which not only make the code harder to follow but are
> also expensive in the post-spectre world.  Unwind the indirect calls
> by calling the ppc_iommu_* or iommu_* APIs directly applicable, or
> just use the dma_iommu_* methods directly where we can.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/include/asm/iommu.h |  1 +
>  arch/powerpc/kernel/dma-iommu.c  |  2 +-
>  arch/powerpc/platforms/pseries/vio.c | 87 
>  3 files changed, 38 insertions(+), 52 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/iommu.h 
> b/arch/powerpc/include/asm/iommu.h
> index 35db0cbc9222..75daa10f31a4 100644
> --- a/arch/powerpc/include/asm/iommu.h
> +++ b/arch/powerpc/include/asm/iommu.h
> @@ -242,6 +242,7 @@ static inline int __init tce_iommu_bus_notifier_init(void)
>  }
>  #endif /* !CONFIG_IOMMU_API */
>  
> +u64 dma_iommu_get_required_mask(struct device *dev);
>  int dma_iommu_mapping_error(struct device *dev, dma_addr_t dma_addr);
>  
>  #else
> diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
> index 2ca6cfaebf65..0613278abf9f 100644
> --- a/arch/powerpc/kernel/dma-iommu.c
> +++ b/arch/powerpc/kernel/dma-iommu.c
> @@ -92,7 +92,7 @@ int dma_iommu_dma_supported(struct device *dev, u64 mask)
>   return 1;
>  }
>  
> -static u64 dma_iommu_get_required_mask(struct device *dev)
> +u64 dma_iommu_get_required_mask(struct device *dev)
>  {
>   struct iommu_table *tbl = get_iommu_table_base(dev);
>   u64 mask;
> diff --git a/arch/powerpc/platforms/pseries/vio.c 
> b/arch/powerpc/platforms/pseries/vio.c
> index 88f1ad1d6309..ea3a9745c812 100644
> --- a/arch/powerpc/platforms/pseries/vio.c
> +++ b/arch/powerpc/platforms/pseries/vio.c
> @@ -492,7 +492,9 @@ static void *vio_dma_iommu_alloc_coherent(struct device 
> *dev, size_t size,
>   return NULL;
>   }
>  
> - ret = dma_iommu_ops.alloc(dev, size, dma_handle, flag, attrs);
> + ret = iommu_alloc_coherent(dev, get_iommu_table_base(dev), size,
> + dma_handle, dev->coherent_dma_mask, flag,
> + dev_to_node(dev));
>   if (unlikely(ret == NULL)) {
>   vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
>   atomic_inc(&viodev->cmo.allocs_failed);
> @@ -507,8 +509,7 @@ static void vio_dma_iommu_free_coherent(struct device 
> *dev, size_t size,
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
>  
> - dma_iommu_ops.free(dev, size, vaddr, dma_handle, attrs);
> -
> + iommu_free_coherent(get_iommu_table_base(dev), size, vaddr, dma_handle);
>   vio_cmo_dealloc(viodev, roundup(size, PAGE_SIZE));
>  }
>  
> @@ -518,22 +519,22 @@ static dma_addr_t vio_dma_iommu_map_page(struct device 
> *dev, struct page *page,
>   unsigned long attrs)
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
> - struct iommu_table *tbl;
> + struct iommu_table *tbl = get_iommu_table_base(dev);
>   dma_addr_t ret = IOMMU_MAPPING_ERROR;
>  
> - tbl = get_iommu_table_base(dev);
> - if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl {
> - atomic_inc(&viodev->cmo.allocs_failed);
> - return ret;
> - }
> -
> - ret = dma_iommu_ops.map_page(dev, page, offset, size, direction, attrs);
> - if (unlikely(dma_mapping_error(dev, ret))) {
> - vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
> - atomic_inc(&viodev->cmo.allocs_failed);
> - }
> -
> + if (vio_cmo_alloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl
> + goto out_fail;
> + ret = iommu_map_page(dev, tbl, page, offset, size, device_to_mask(dev),
> + direction, attrs);
> + if (unlikely(ret == IOMMU_MAPPING_ERROR))
> + goto out_deallocate;
>   return ret;
> +
> +out_deallocate:
> + vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
> +out_fail:
> + atomic_inc(&viodev->cmo.allocs_failed);
> + return IOMMU_MAPPING_ERROR;
>  }
>  
>  static void vio_dma_iommu_unmap_page(struct device *dev, dma_addr_t 
> dma_handle,
> @@ -542,11 +543,9 @@ static void vio_dma_iommu_unmap_page(struct device *dev, 
> dma_addr_t dma_handle,
>unsigned long attrs)
>  {
>   struct vio_dev *viodev = to_vio_dev(dev);
> - struct iommu_table *tbl;
> -
> - tbl = get_iommu_table_base(dev);
> - dma_iommu_ops.unmap_page(dev, dma_handle, size, direction, attrs);
> + struct iommu_table *tbl = get_iommu_table_base(dev);
>  
> + iommu_unmap_page(tbl, dma_handle, size, direction, attrs);
>   vio_cmo_dealloc(viodev, roundup(size, IOMMU_PAGE_SIZE(tbl)));
>  }
>  
> @@ -555,34 +554,32 @@ static int vio_dma_iommu_map_sg(struct device *dev,

Re: [PATCH 14/34] powerpc/dart: remove dead cleanup code in iommu_init_early_dart

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:54AM +0100, Christoph Hellwig wrote:
> If dart_init failed we didn't have a chance to setup dma or controller
> ops yet, so there is no point in resetting them.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/sysdev/dart_iommu.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/dart_iommu.c 
> b/arch/powerpc/sysdev/dart_iommu.c
> index a5b40d1460f1..283ce04c5844 100644
> --- a/arch/powerpc/sysdev/dart_iommu.c
> +++ b/arch/powerpc/sysdev/dart_iommu.c
> @@ -428,7 +428,7 @@ void __init iommu_init_early_dart(struct 
> pci_controller_ops *controller_ops)
>  
>   /* Initialize the DART HW */
>   if (dart_init(dn) != 0)
> - goto bail;
> + return;
>  
>   /* Setup bypass if supported */
>   if (dart_is_u4)
> @@ -439,15 +439,6 @@ void __init iommu_init_early_dart(struct 
> pci_controller_ops *controller_ops)
>  
>   /* Setup pci_dma ops */
>   set_pci_dma_ops(&dma_iommu_ops);
> - return;
> -
> - bail:
> - /* If init failed, use direct iommu and null setup functions */
> - controller_ops->dma_dev_setup = NULL;
> - controller_ops->dma_bus_setup = NULL;
> -
> - /* Setup pci_dma ops */
> - set_pci_dma_ops(&dma_nommu_ops);
>  }
>  
>  #ifdef CONFIG_PM
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 19/34] cxl: drop the dma_set_mask callback from vphb

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:59AM +0100, Christoph Hellwig wrote:
> The CXL code never even looks at the dma mask, so there is no good
> reason for this sanity check.  Remove it because it gets in the way
> of the dma ops refactoring.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/misc/cxl/vphb.c | 12 
>  1 file changed, 12 deletions(-)
> 
> diff --git a/drivers/misc/cxl/vphb.c b/drivers/misc/cxl/vphb.c
> index 7908633d9204..49da2f744bbf 100644
> --- a/drivers/misc/cxl/vphb.c
> +++ b/drivers/misc/cxl/vphb.c
> @@ -11,17 +11,6 @@
>  #include 
>  #include "cxl.h"
>  
> -static int cxl_dma_set_mask(struct pci_dev *pdev, u64 dma_mask)
> -{
> - if (dma_mask < DMA_BIT_MASK(64)) {
> - pr_info("%s only 64bit DMA supported on CXL", __func__);
> - return -EIO;
> - }
> -
> - *(pdev->dev.dma_mask) = dma_mask;
> - return 0;
> -}
> -
>  static int cxl_pci_probe_mode(struct pci_bus *bus)
>  {
>   return PCI_PROBE_NORMAL;
> @@ -220,7 +209,6 @@ static struct pci_controller_ops cxl_pci_controller_ops =
>   .reset_secondary_bus = cxl_pci_reset_secondary_bus,
>   .setup_msi_irqs = cxl_setup_msi_irqs,
>   .teardown_msi_irqs = cxl_teardown_msi_irqs,
> - .dma_set_mask = cxl_dma_set_mask,
>  };
>  
>  int cxl_pci_vphb_add(struct cxl_afu *afu)
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/34] powerpc/dma: split the two __dma_alloc_coherent implementations

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:46AM +0100, Christoph Hellwig wrote:
> The implemementation for the CONFIG_NOT_COHERENT_CACHE case doesn't share
> any code with the one for systems with coherent caches.  Split it off
> and merge it with the helpers in dma-noncoherent.c that have no other
> callers.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/dma-mapping.h |  5 -
>  arch/powerpc/kernel/dma.c  | 14 ++
>  arch/powerpc/mm/dma-noncoherent.c  | 15 +++
>  arch/powerpc/platforms/44x/warp.c  |  2 +-
>  4 files changed, 10 insertions(+), 26 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index f2a4a7142b1e..dacd0f93f2b2 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -39,9 +39,6 @@ extern int dma_nommu_mmap_coherent(struct device *dev,
>   * to ensure it is consistent.
>   */
>  struct device;
> -extern void *__dma_alloc_coherent(struct device *dev, size_t size,
> -   dma_addr_t *handle, gfp_t gfp);
> -extern void __dma_free_coherent(size_t size, void *vaddr);
>  extern void __dma_sync(void *vaddr, size_t size, int direction);
>  extern void __dma_sync_page(struct page *page, unsigned long offset,
>size_t size, int direction);
> @@ -52,8 +49,6 @@ extern unsigned long __dma_get_coherent_pfn(unsigned long 
> cpu_addr);
>   * Cache coherent cores.
>   */
>  
> -#define __dma_alloc_coherent(dev, gfp, size, handle) NULL
> -#define __dma_free_coherent(size, addr)  ((void)0)
>  #define __dma_sync(addr, size, rw)   ((void)0)
>  #define __dma_sync_page(pg, off, sz, rw) ((void)0)
>  
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 6551685a4ed0..d6deb458bb91 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -62,18 +62,12 @@ static int dma_nommu_dma_supported(struct device *dev, 
> u64 mask)
>  #endif
>  }
>  
> +#ifndef CONFIG_NOT_COHERENT_CACHE
>  void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
> dma_addr_t *dma_handle, gfp_t flag,
> unsigned long attrs)
>  {
>   void *ret;
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> - ret = __dma_alloc_coherent(dev, size, dma_handle, flag);
> - if (ret == NULL)
> - return NULL;
> - *dma_handle += get_dma_offset(dev);
> - return ret;
> -#else
>   struct page *page;
>   int node = dev_to_node(dev);
>  #ifdef CONFIG_FSL_SOC
> @@ -110,19 +104,15 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
> size_t size,
>   *dma_handle = __pa(ret) + get_dma_offset(dev);
>  
>   return ret;
> -#endif
>  }
>  
>  void __dma_nommu_free_coherent(struct device *dev, size_t size,
>   void *vaddr, dma_addr_t dma_handle,
>   unsigned long attrs)
>  {
> -#ifdef CONFIG_NOT_COHERENT_CACHE
> - __dma_free_coherent(size, vaddr);
> -#else
>   free_pages((unsigned long)vaddr, get_order(size));
> -#endif
>  }
> +#endif /* !CONFIG_NOT_COHERENT_CACHE */
>  
>  static void *dma_nommu_alloc_coherent(struct device *dev, size_t size,
>  dma_addr_t *dma_handle, gfp_t flag,
> diff --git a/arch/powerpc/mm/dma-noncoherent.c 
> b/arch/powerpc/mm/dma-noncoherent.c
> index b6e7b5952ab5..e955539686a4 100644
> --- a/arch/powerpc/mm/dma-noncoherent.c
> +++ b/arch/powerpc/mm/dma-noncoherent.c
> @@ -29,7 +29,7 @@
>  #include 
>  #include 
>  #include 
> -#include 
> +#include 
>  #include 
>  
>  #include 
> @@ -151,8 +151,8 @@ static struct ppc_vm_region *ppc_vm_region_find(struct 
> ppc_vm_region *head, unsi
>   * Allocate DMA-coherent memory space and return both the kernel remapped
>   * virtual and bus address for that space.
>   */
> -void *
> -__dma_alloc_coherent(struct device *dev, size_t size, dma_addr_t *handle, 
> gfp_t gfp)
> +void *__dma_nommu_alloc_coherent(struct device *dev, size_t size,
> + dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs)
>  {
>   struct page *page;
>   struct ppc_vm_region *c;
> @@ -223,7 +223,7 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
> dma_addr_t *handle, gfp_t
>   /*
>* Set the "dma handle"
>*/
> - *handle = page_to_phys(page);
> + *dma_handle = phys_to_dma(dev, page_to_phys(page));
>  
>   do {
>   SetPageReserved(page);
> @@ -249,12 +249,12 @@ __dma_alloc_coherent(struct device *dev, size_t size, 
> dma_addr_t *handle, gfp_t
>   no_page:
>   return NULL;
>  }
> -EXPORT_SYMBOL(__dma_alloc_coherent);
>  
>  /*
>   * free a page as defined by the above mapping.
>   */
> -void __dma_free_coherent(size_t size, void *vaddr)
> +void __dma_nommu_free_cohere

Re: [PATCH 03/34] powerpc/dma: remove the unused ARCH_HAS_DMA_MMAP_COHERENT define

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:43AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/include/asm/dma-mapping.h | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/dma-mapping.h 
> b/arch/powerpc/include/asm/dma-mapping.h
> index 8fa394520af6..f2a4a7142b1e 100644
> --- a/arch/powerpc/include/asm/dma-mapping.h
> +++ b/arch/powerpc/include/asm/dma-mapping.h
> @@ -112,7 +112,5 @@ extern int dma_set_mask(struct device *dev, u64 dma_mask);
>  
>  extern u64 __dma_get_required_mask(struct device *dev);
>  
> -#define ARCH_HAS_DMA_MMAP_COHERENT
> -
>  #endif /* __KERNEL__ */
>  #endif   /* _ASM_DMA_MAPPING_H */
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 05/34] powerpc/dma: remove the unused dma_iommu_ops export

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:45AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/dma-iommu.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma-iommu.c b/arch/powerpc/kernel/dma-iommu.c
> index f9fe2080ceb9..2ca6cfaebf65 100644
> --- a/arch/powerpc/kernel/dma-iommu.c
> +++ b/arch/powerpc/kernel/dma-iommu.c
> @@ -6,7 +6,6 @@
>   * busses using the iommu infrastructure
>   */
>  
> -#include 
>  #include 
>  
>  /*
> @@ -123,4 +122,3 @@ struct dma_map_ops dma_iommu_ops = {
>   .get_required_mask  = dma_iommu_get_required_mask,
>   .mapping_error  = dma_iommu_mapping_error,
>  };
> -EXPORT_SYMBOL(dma_iommu_ops);
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/34] powerpc/dma: remove the no-op dma_nommu_unmap_{page, sg} routines

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:47AM +0100, Christoph Hellwig wrote:
> These methods are optional, no need to implement no-op versions.
> 
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/kernel/dma.c | 16 
>  1 file changed, 16 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index d6deb458bb91..7078d72baec2 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -197,12 +197,6 @@ static int dma_nommu_map_sg(struct device *dev, struct 
> scatterlist *sgl,
>   return nents;
>  }
>  
> -static void dma_nommu_unmap_sg(struct device *dev, struct scatterlist *sg,
> - int nents, enum dma_data_direction direction,
> - unsigned long attrs)
> -{
> -}
> -
>  static u64 dma_nommu_get_required_mask(struct device *dev)
>  {
>   u64 end, mask;
> @@ -228,14 +222,6 @@ static inline dma_addr_t dma_nommu_map_page(struct 
> device *dev,
>   return page_to_phys(page) + offset + get_dma_offset(dev);
>  }
>  
> -static inline void dma_nommu_unmap_page(struct device *dev,
> -  dma_addr_t dma_address,
> -  size_t size,
> -  enum dma_data_direction direction,
> -  unsigned long attrs)
> -{
> -}
> -
>  #ifdef CONFIG_NOT_COHERENT_CACHE
>  static inline void dma_nommu_sync_sg(struct device *dev,
>   struct scatterlist *sgl, int nents,
> @@ -261,10 +247,8 @@ const struct dma_map_ops dma_nommu_ops = {
>   .free   = dma_nommu_free_coherent,
>   .mmap   = dma_nommu_mmap_coherent,
>   .map_sg = dma_nommu_map_sg,
> - .unmap_sg   = dma_nommu_unmap_sg,
>   .dma_supported  = dma_nommu_dma_supported,
>   .map_page   = dma_nommu_map_page,
> - .unmap_page = dma_nommu_unmap_page,
>   .get_required_mask  = dma_nommu_get_required_mask,
>  #ifdef CONFIG_NOT_COHERENT_CACHE
>   .sync_single_for_cpu= dma_nommu_sync_single,
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 04/34] powerpc/dma: remove the unused ISA_DMA_THRESHOLD export

2018-12-06 Thread Christoph Hellwig
ping?

On Wed, Nov 14, 2018 at 09:22:44AM +0100, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 
> Acked-by: Benjamin Herrenschmidt 
> ---
>  arch/powerpc/kernel/setup_32.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/arch/powerpc/kernel/setup_32.c b/arch/powerpc/kernel/setup_32.c
> index 81909600013a..07f7e6aaf104 100644
> --- a/arch/powerpc/kernel/setup_32.c
> +++ b/arch/powerpc/kernel/setup_32.c
> @@ -59,7 +59,6 @@ unsigned long ISA_DMA_THRESHOLD;
>  unsigned int DMA_MODE_READ;
>  unsigned int DMA_MODE_WRITE;
>  
> -EXPORT_SYMBOL(ISA_DMA_THRESHOLD);
>  EXPORT_SYMBOL(DMA_MODE_READ);
>  EXPORT_SYMBOL(DMA_MODE_WRITE);
>  
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 01/34] powerpc: use mm zones more sensibly

2018-12-06 Thread Christoph Hellwig
Ben / Michael,

can we get this one queued up for 4.21 to prepare for the DMA work later
on?

On Wed, Nov 14, 2018 at 09:22:41AM +0100, Christoph Hellwig wrote:
> Powerpc has somewhat odd usage where ZONE_DMA is used for all memory on
> common 64-bit configfs, and ZONE_DMA32 is used for 31-bit schemes.
> 
> Move to a scheme closer to what other architectures use (and I dare to
> say the intent of the system):
> 
>  - ZONE_DMA: optionally for memory < 31-bit (64-bit embedded only)
>  - ZONE_NORMAL: everything addressable by the kernel
>  - ZONE_HIGHMEM: memory > 32-bit for 32-bit kernels
> 
> Also provide information on how ZONE_DMA is used by defining
> ARCH_ZONE_DMA_BITS.
> 
> Contains various fixes from Benjamin Herrenschmidt.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/Kconfig  |  8 +---
>  arch/powerpc/include/asm/page.h   |  2 +
>  arch/powerpc/include/asm/pgtable.h|  1 -
>  arch/powerpc/kernel/dma-swiotlb.c |  6 +--
>  arch/powerpc/kernel/dma.c |  7 +--
>  arch/powerpc/mm/mem.c | 47 +++
>  arch/powerpc/platforms/85xx/corenet_generic.c | 10 
>  arch/powerpc/platforms/85xx/qemu_e500.c   |  9 
>  include/linux/mmzone.h|  2 +-
>  9 files changed, 25 insertions(+), 67 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 8be31261aec8..c3613bc1 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -374,9 +374,9 @@ config PPC_ADV_DEBUG_DAC_RANGE
>   depends on PPC_ADV_DEBUG_REGS && 44x
>   default y
>  
> -config ZONE_DMA32
> +config ZONE_DMA
>   bool
> - default y if PPC64
> + default y if PPC_BOOK3E_64
>  
>  config PGTABLE_LEVELS
>   int
> @@ -869,10 +869,6 @@ config ISA
> have an IBM RS/6000 or pSeries machine, say Y.  If you have an
> embedded board, consult your board documentation.
>  
> -config ZONE_DMA
> - bool
> - default y
> -
>  config GENERIC_ISA_DMA
>   bool
>   depends on ISA_DMA_API
> diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h
> index f6a1265face2..fc8c9ac0c6be 100644
> --- a/arch/powerpc/include/asm/page.h
> +++ b/arch/powerpc/include/asm/page.h
> @@ -354,4 +354,6 @@ typedef struct page *pgtable_t;
>  #endif /* __ASSEMBLY__ */
>  #include 
>  
> +#define ARCH_ZONE_DMA_BITS 31
> +
>  #endif /* _ASM_POWERPC_PAGE_H */
> diff --git a/arch/powerpc/include/asm/pgtable.h 
> b/arch/powerpc/include/asm/pgtable.h
> index 9679b7519a35..8af32ce93c7f 100644
> --- a/arch/powerpc/include/asm/pgtable.h
> +++ b/arch/powerpc/include/asm/pgtable.h
> @@ -66,7 +66,6 @@ extern unsigned long empty_zero_page[];
>  
>  extern pgd_t swapper_pg_dir[];
>  
> -void limit_zone_pfn(enum zone_type zone, unsigned long max_pfn);
>  int dma_pfn_limit_to_zone(u64 pfn_limit);
>  extern void paging_init(void);
>  
> diff --git a/arch/powerpc/kernel/dma-swiotlb.c 
> b/arch/powerpc/kernel/dma-swiotlb.c
> index 5fc335f4d9cd..678811abccfc 100644
> --- a/arch/powerpc/kernel/dma-swiotlb.c
> +++ b/arch/powerpc/kernel/dma-swiotlb.c
> @@ -108,12 +108,8 @@ int __init swiotlb_setup_bus_notifier(void)
>  
>  void __init swiotlb_detect_4g(void)
>  {
> - if ((memblock_end_of_DRAM() - 1) > 0x) {
> + if ((memblock_end_of_DRAM() - 1) > 0x)
>   ppc_swiotlb_enable = 1;
> -#ifdef CONFIG_ZONE_DMA32
> - limit_zone_pfn(ZONE_DMA32, (1ULL << 32) >> PAGE_SHIFT);
> -#endif
> - }
>  }
>  
>  static int __init check_swiotlb_enabled(void)
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index dbfc7056d7df..6551685a4ed0 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -50,7 +50,7 @@ static int dma_nommu_dma_supported(struct device *dev, u64 
> mask)
>   return 1;
>  
>  #ifdef CONFIG_FSL_SOC
> - /* Freescale gets another chance via ZONE_DMA/ZONE_DMA32, however
> + /* Freescale gets another chance via ZONE_DMA, however
>* that will have to be refined if/when they support iommus
>*/
>   return 1;
> @@ -94,13 +94,10 @@ void *__dma_nommu_alloc_coherent(struct device *dev, 
> size_t size,
>   }
>  
>   switch (zone) {
> +#ifdef CONFIG_ZONE_DMA
>   case ZONE_DMA:
>   flag |= GFP_DMA;
>   break;
> -#ifdef CONFIG_ZONE_DMA32
> - case ZONE_DMA32:
> - flag |= GFP_DMA32;
> - break;
>  #endif
>   };
>  #endif /* CONFIG_FSL_SOC */
> diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
> index 0a64fffabee1..c0b676c3a5ba 100644
> --- a/arch/powerpc/mm/mem.c
> +++ b/arch/powerpc/mm/mem.c
> @@ -246,35 +246,19 @@ static int __init mark_nonram_nosave(void)
>  }
>  #endif
>  
> -static bool zone_limits_final;
> -
>  /*
> - * The memory zones past TOP_ZONE are managed by generic mm code.
> - * These should be set to zero since that's what every other
> - * 

Re: [PATCH 02/34] powerpc: allow NOT_COHERENT_CACHE for amigaone

2018-12-06 Thread Christoph Hellwig
powerpc maintainers, can you pick this up as this is a bug fix for the
currently existing powerpc Kconfig code?

On Wed, Nov 14, 2018 at 09:22:42AM +0100, Christoph Hellwig wrote:
> AMIGAONE select NOT_COHERENT_CACHE, so we better allow it.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/platforms/Kconfig.cputype | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/platforms/Kconfig.cputype 
> b/arch/powerpc/platforms/Kconfig.cputype
> index f4e2c5729374..6fedbf349fce 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -412,7 +412,8 @@ config NR_CPUS
>  
>  config NOT_COHERENT_CACHE
>   bool
> - depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
> + depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || \
> + GAMECUBE_COMMON || AMIGAONE
>   default n if PPC_47x
>   default y
>  
> -- 
> 2.19.1
> 
> ___
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
---end quoted text---
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/9] ACPI/IORT: Use helper functions to access dev->iommu_fwspec

2018-12-06 Thread Joerg Roedel
Hi Robin,

On Wed, Dec 05, 2018 at 05:50:23PM +, Robin Murphy wrote:
> This needs to reload the new fwspec initialised by iort_iommu_xlate(), in
> the same manner as the OF code. I think the best thing to do is encapsulate
> the dev_iommu_fwspec_get() call in iort_fwspec_iommu_ops(), and have that
> take dev as its argument directly.

Yeah, that makes sense. I changed the patch this way.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 3/9] iommu/arm-smmu: Use helper functions to access dev->iommu_fwspec

2018-12-06 Thread Will Deacon
On Tue, Dec 04, 2018 at 05:29:56PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Use the new helpers dev_iommu_fwspec_get()/set() to access
> the dev->iommu_fwspec pointer. This makes it easier to move
> that pointer later into another struct.
> 
> Cc: Will Deacon 
> Cc: Robin Murphy 
> Signed-off-by: Joerg Roedel 
> ---
>  drivers/iommu/arm-smmu-v3.c | 16 +---
>  drivers/iommu/arm-smmu.c| 12 ++--
>  2 files changed, 15 insertions(+), 13 deletions(-)

Looks straighforward enough to me:

Acked-by: Will Deacon 

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/5] driver core: Introduce device_iommu_mapped() function

2018-12-06 Thread Greg Kroah-Hartman
On Tue, Dec 04, 2018 at 06:25:00PM +0100, Joerg Roedel wrote:
> From: Joerg Roedel 
> 
> Some places in the kernel check the iommu_group pointer in
> 'struct device' in order to find ot whether a device is
> mapped by an IOMMU.
> 
> This is not good way to make this check, as the pointer will
> be moved to 'struct dev_iommu_data'. This way to make the
> check is also not very readable.
> 
> Introduce an explicit function to perform this check.
> 
> Cc: Greg Kroah-Hartman 
> Signed-off-by: Joerg Roedel 
> ---
>  include/linux/device.h | 10 ++
>  1 file changed, 10 insertions(+)

Acked-by: Greg Kroah-Hartman 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: use generic DMA mapping code in powerpc V4

2018-12-06 Thread Christian Zigotzky

On 05 December 2018 at 3:05PM, Christoph Hellwig wrote:


Thanks.  Can you try a few stepping points in the tree?

First just with commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6
(the first one) applied?

Second with all commits up to 5da11e49df21f21dac25a2491aa788307bdacb6b

And if that still works with commits up to
c1bfcad4b0cf38ce5b00f7ad880d3a13484c123a


Hi Christoph,

I undid the commit 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6 with the 
following command:


git checkout 7fd3bb05b73beea1f9840b505aa09beb9c75a8c6

Result: PASEMI onboard ethernet works again and the P5020 board boots.

I will test the other commits in the next days.

@All
It is really important, that you also test Christoph's work on your 
PASEMI and NXP boards. Could you please help us with solving the issues?


'git clone git://git.infradead.org/users/hch/misc.git -b powerpc-dma.5 a'

Thanks,
Christian

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Nicolas Boichat
On Thu, Dec 6, 2018 at 5:37 PM Vlastimil Babka  wrote:
>
> On 12/6/18 4:49 AM, Nicolas Boichat wrote:
> >> So it would be fine even unchanged. The check would anyway need some
> >> more love to catch the same with __GFP_DMA to be consistent and cover
> >> all corner cases.
> > Yes, the test is not complete. If we really wanted this to be
> > accurate, we'd need to check that GFP_* exactly matches SLAB_CACHE_*.
> >
> > The only problem with dropping this is test that we should restore
> > GFP_DMA32 warning/errors somewhere else (as Christopher pointed out
> > here: https://lkml.org/lkml/2018/11/22/430), especially for kmalloc
> > case.
>
> I meant just dropping that patch hunk, not the whole test. Then the test
> stays as it is and will keep warning anyone calling kmalloc(GFP_DMA32).
> It would also warn anyone calling kmem_cache_alloc(GFP_DMA32) on
> SLAB_CACHE_DMA32 cache, but since the gfp can be just dropped, and you
> as the only user of this so far will do that, it's fine?

I missed your point, this would work fine indeed.

Thanks.

> > Maybe this can be done in kmalloc_slab.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v4 2/3] mm: Add support for kmem caches in DMA32 zone

2018-12-06 Thread Vlastimil Babka
On 12/6/18 4:49 AM, Nicolas Boichat wrote:
>> So it would be fine even unchanged. The check would anyway need some
>> more love to catch the same with __GFP_DMA to be consistent and cover
>> all corner cases.
> Yes, the test is not complete. If we really wanted this to be
> accurate, we'd need to check that GFP_* exactly matches SLAB_CACHE_*.
> 
> The only problem with dropping this is test that we should restore
> GFP_DMA32 warning/errors somewhere else (as Christopher pointed out
> here: https://lkml.org/lkml/2018/11/22/430), especially for kmalloc
> case.

I meant just dropping that patch hunk, not the whole test. Then the test
stays as it is and will keep warning anyone calling kmalloc(GFP_DMA32).
It would also warn anyone calling kmem_cache_alloc(GFP_DMA32) on
SLAB_CACHE_DMA32 cache, but since the gfp can be just dropped, and you
as the only user of this so far will do that, it's fine?

> Maybe this can be done in kmalloc_slab.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu