Re: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-27 Thread Jaewon Kim
Hello

2017-11-24 19:35 GMT+09:00 David Laight :
> From: Jaewon Kim
>> Sent: 24 November 2017 05:59
>>
>> dma-coherent uses bitmap APIs which internally consider align based on the
>> requested size. If most of allocations are small size like KBs, using
>> alignment scheme seems to be good for anti-fragmentation. But if large
>> allocation are commonly used, then an allocation could be failed because
>> of the alignment. To avoid the allocation failure, we had to increase total
>> size.
>>
>> This is a example, total size is 30MB, only few memory at front is being
>> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
>> first try on offset 0MB will be failed because others already are using
>> them. The second try on offset 16MB will be failed because of ouf of bound.
>>
>> So if the alignment is not necessary on a specific dma-coherent memory
>> region, we can set no-align property. Then dma-coherent will ignore the
>> alignment only for the memory region.
>
> ISTM that the alignment needs to be a property of the request, not of the
> device. Certainly the device driver code is most likely to know the specific
> alignment requirements of any specific allocation.
>
Sorry but I'm not fully understand on 'a property of the request'. Actually
dma-coherent APIs does not get alignment through argument but it internally
uses get_order to determine alignment according to a requested size.
I think if you meant that dma-coherent APIs should work in that way
because drivers
calling to dma-coherent APIs have been assuming the alignment for a long time.

I still think few memory region could be managed without alignment if author
knows well and adds no-align into its device tree. But it's OK if open
source community
worried about the no-alignment.

Thank you
> We've some hardware that would need large allocations to be 16k aligned.
> We actually use multiple 16k allocations because any large buffers are
> accessed directly from userspace (mmap and vm_iomap_memory) and the
> card has its own page tables (with 16k pages).
>
> David
>


Re: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-27 Thread Jaewon Kim
Hello

2017-11-24 19:35 GMT+09:00 David Laight :
> From: Jaewon Kim
>> Sent: 24 November 2017 05:59
>>
>> dma-coherent uses bitmap APIs which internally consider align based on the
>> requested size. If most of allocations are small size like KBs, using
>> alignment scheme seems to be good for anti-fragmentation. But if large
>> allocation are commonly used, then an allocation could be failed because
>> of the alignment. To avoid the allocation failure, we had to increase total
>> size.
>>
>> This is a example, total size is 30MB, only few memory at front is being
>> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
>> first try on offset 0MB will be failed because others already are using
>> them. The second try on offset 16MB will be failed because of ouf of bound.
>>
>> So if the alignment is not necessary on a specific dma-coherent memory
>> region, we can set no-align property. Then dma-coherent will ignore the
>> alignment only for the memory region.
>
> ISTM that the alignment needs to be a property of the request, not of the
> device. Certainly the device driver code is most likely to know the specific
> alignment requirements of any specific allocation.
>
Sorry but I'm not fully understand on 'a property of the request'. Actually
dma-coherent APIs does not get alignment through argument but it internally
uses get_order to determine alignment according to a requested size.
I think if you meant that dma-coherent APIs should work in that way
because drivers
calling to dma-coherent APIs have been assuming the alignment for a long time.

I still think few memory region could be managed without alignment if author
knows well and adds no-align into its device tree. But it's OK if open
source community
worried about the no-alignment.

Thank you
> We've some hardware that would need large allocations to be 16k aligned.
> We actually use multiple 16k allocations because any large buffers are
> accessed directly from userspace (mmap and vm_iomap_memory) and the
> card has its own page tables (with 16k pages).
>
> David
>


RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-24 Thread David Laight
From: Jaewon Kim
> Sent: 24 November 2017 05:59
> 
> dma-coherent uses bitmap APIs which internally consider align based on the
> requested size. If most of allocations are small size like KBs, using
> alignment scheme seems to be good for anti-fragmentation. But if large
> allocation are commonly used, then an allocation could be failed because
> of the alignment. To avoid the allocation failure, we had to increase total
> size.
> 
> This is a example, total size is 30MB, only few memory at front is being
> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
> first try on offset 0MB will be failed because others already are using
> them. The second try on offset 16MB will be failed because of ouf of bound.
> 
> So if the alignment is not necessary on a specific dma-coherent memory
> region, we can set no-align property. Then dma-coherent will ignore the
> alignment only for the memory region.

ISTM that the alignment needs to be a property of the request, not of the
device. Certainly the device driver code is most likely to know the specific
alignment requirements of any specific allocation.

We've some hardware that would need large allocations to be 16k aligned.
We actually use multiple 16k allocations because any large buffers are
accessed directly from userspace (mmap and vm_iomap_memory) and the
card has its own page tables (with 16k pages).

David



RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-24 Thread David Laight
From: Jaewon Kim
> Sent: 24 November 2017 05:59
> 
> dma-coherent uses bitmap APIs which internally consider align based on the
> requested size. If most of allocations are small size like KBs, using
> alignment scheme seems to be good for anti-fragmentation. But if large
> allocation are commonly used, then an allocation could be failed because
> of the alignment. To avoid the allocation failure, we had to increase total
> size.
> 
> This is a example, total size is 30MB, only few memory at front is being
> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
> first try on offset 0MB will be failed because others already are using
> them. The second try on offset 16MB will be failed because of ouf of bound.
> 
> So if the alignment is not necessary on a specific dma-coherent memory
> region, we can set no-align property. Then dma-coherent will ignore the
> alignment only for the memory region.

ISTM that the alignment needs to be a property of the request, not of the
device. Certainly the device driver code is most likely to know the specific
alignment requirements of any specific allocation.

We've some hardware that would need large allocations to be 16k aligned.
We actually use multiple 16k allocations because any large buffers are
accessed directly from userspace (mmap and vm_iomap_memory) and the
card has its own page tables (with 16k pages).

David



[RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-23 Thread Jaewon Kim
dma-coherent uses bitmap APIs which internally consider align based on the
requested size. If most of allocations are small size like KBs, using
alignment scheme seems to be good for anti-fragmentation. But if large
allocation are commonly used, then an allocation could be failed because
of the alignment. To avoid the allocation failure, we had to increase total
size.

This is a example, total size is 30MB, only few memory at front is being
used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
first try on offset 0MB will be failed because others already are using
them. The second try on offset 16MB will be failed because of ouf of bound.

So if the alignment is not necessary on a specific dma-coherent memory
region, we can set no-align property. Then dma-coherent will ignore the
alignment only for the memory region.

patch changelog:

v2: use no-align property rather than forcely using no-align

Signed-off-by: Jaewon Kim 
---
 .../bindings/reserved-memory/reserved-memory.txt   |  6 +++
 arch/arm/mm/dma-mapping-nommu.c|  3 +-
 drivers/base/dma-coherent.c| 49 --
 include/linux/dma-mapping.h| 12 +++---
 4 files changed, 50 insertions(+), 20 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index 16291f2a4688..b279e111a7ca 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -63,6 +63,12 @@ reusable (optional) - empty property
   able to reclaim it back. Typically that means that the operating
   system can use that region to store volatile or cached data that
   can be otherwise regenerated or migrated elsewhere.
+no-align (optional) - empty property
+- Depending on a device or its usage pattern, tring to do aligning is not
+  useful. Because of aligning, allocation can be failed and that leads to
+  increasing total memory size to avoid the allocation failure. This
+  property indicates allocator will not try to do aligning on size nor
+  offset.
 
 Linux implementation note:
 - If a "linux,cma-default" property is present, then Linux will use the
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 6db5fc26d154..6512dae5d19b 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -75,8 +75,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
size,
if (attrs & DMA_ATTR_NON_CONSISTENT) {
ops->free(dev, size, cpu_addr, dma_addr, attrs);
} else {
-   int ret = dma_release_from_global_coherent(get_order(size),
-  cpu_addr);
+   int ret = dma_release_from_global_coherent(size, cpu_addr);
 
WARN_ON_ONCE(ret == 0);
}
diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 1e6396bb807b..95d96bd764d9 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -17,6 +17,7 @@ struct dma_coherent_mem {
int flags;
unsigned long   *bitmap;
spinlock_t  spinlock;
+   boolno_align;
booluse_dev_dma_pfn_offset;
 };
 
@@ -163,19 +164,35 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem,
ssize_t size, dma_addr_t *dma_handle)
 {
-   int order = get_order(size);
unsigned long flags;
int pageno;
void *ret;
 
spin_lock_irqsave(>spinlock, flags);
 
-   if (unlikely(size > (mem->size << PAGE_SHIFT)))
+   if (unlikely(size > (mem->size << PAGE_SHIFT))) {
+   WARN_ONCE(1, "%s too big size, req-size: %zu total-size: %d\n",
+ __func__, size, (mem->size << PAGE_SHIFT));
goto err;
+   }
 
-   pageno = bitmap_find_free_region(mem->bitmap, mem->size, order);
-   if (unlikely(pageno < 0))
-   goto err;
+   if (mem->no_align) {
+   int nr_page = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   pageno = bitmap_find_next_zero_area(mem->bitmap, mem->size, 0,
+   nr_page, 0);
+   if (unlikely(pageno >= mem->size)) {
+   pr_err("%s: alloc failed, req-size: %u pages\n", 
__func__, nr_page);
+   goto err;
+   }
+   bitmap_set(mem->bitmap, pageno, nr_page);
+   } else {
+   int order = get_order(size);
+
+   pageno = bitmap_find_free_region(mem->bitmap, mem->size, order);
+   if (unlikely(pageno < 0))
+   goto err;
+   }
 
/*
 

[RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory

2017-11-23 Thread Jaewon Kim
dma-coherent uses bitmap APIs which internally consider align based on the
requested size. If most of allocations are small size like KBs, using
alignment scheme seems to be good for anti-fragmentation. But if large
allocation are commonly used, then an allocation could be failed because
of the alignment. To avoid the allocation failure, we had to increase total
size.

This is a example, total size is 30MB, only few memory at front is being
used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The
first try on offset 0MB will be failed because others already are using
them. The second try on offset 16MB will be failed because of ouf of bound.

So if the alignment is not necessary on a specific dma-coherent memory
region, we can set no-align property. Then dma-coherent will ignore the
alignment only for the memory region.

patch changelog:

v2: use no-align property rather than forcely using no-align

Signed-off-by: Jaewon Kim 
---
 .../bindings/reserved-memory/reserved-memory.txt   |  6 +++
 arch/arm/mm/dma-mapping-nommu.c|  3 +-
 drivers/base/dma-coherent.c| 49 --
 include/linux/dma-mapping.h| 12 +++---
 4 files changed, 50 insertions(+), 20 deletions(-)

diff --git 
a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt 
b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
index 16291f2a4688..b279e111a7ca 100644
--- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
+++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt
@@ -63,6 +63,12 @@ reusable (optional) - empty property
   able to reclaim it back. Typically that means that the operating
   system can use that region to store volatile or cached data that
   can be otherwise regenerated or migrated elsewhere.
+no-align (optional) - empty property
+- Depending on a device or its usage pattern, tring to do aligning is not
+  useful. Because of aligning, allocation can be failed and that leads to
+  increasing total memory size to avoid the allocation failure. This
+  property indicates allocator will not try to do aligning on size nor
+  offset.
 
 Linux implementation note:
 - If a "linux,cma-default" property is present, then Linux will use the
diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c
index 6db5fc26d154..6512dae5d19b 100644
--- a/arch/arm/mm/dma-mapping-nommu.c
+++ b/arch/arm/mm/dma-mapping-nommu.c
@@ -75,8 +75,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t 
size,
if (attrs & DMA_ATTR_NON_CONSISTENT) {
ops->free(dev, size, cpu_addr, dma_addr, attrs);
} else {
-   int ret = dma_release_from_global_coherent(get_order(size),
-  cpu_addr);
+   int ret = dma_release_from_global_coherent(size, cpu_addr);
 
WARN_ON_ONCE(ret == 0);
}
diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c
index 1e6396bb807b..95d96bd764d9 100644
--- a/drivers/base/dma-coherent.c
+++ b/drivers/base/dma-coherent.c
@@ -17,6 +17,7 @@ struct dma_coherent_mem {
int flags;
unsigned long   *bitmap;
spinlock_t  spinlock;
+   boolno_align;
booluse_dev_dma_pfn_offset;
 };
 
@@ -163,19 +164,35 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied);
 static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem,
ssize_t size, dma_addr_t *dma_handle)
 {
-   int order = get_order(size);
unsigned long flags;
int pageno;
void *ret;
 
spin_lock_irqsave(>spinlock, flags);
 
-   if (unlikely(size > (mem->size << PAGE_SHIFT)))
+   if (unlikely(size > (mem->size << PAGE_SHIFT))) {
+   WARN_ONCE(1, "%s too big size, req-size: %zu total-size: %d\n",
+ __func__, size, (mem->size << PAGE_SHIFT));
goto err;
+   }
 
-   pageno = bitmap_find_free_region(mem->bitmap, mem->size, order);
-   if (unlikely(pageno < 0))
-   goto err;
+   if (mem->no_align) {
+   int nr_page = PAGE_ALIGN(size) >> PAGE_SHIFT;
+
+   pageno = bitmap_find_next_zero_area(mem->bitmap, mem->size, 0,
+   nr_page, 0);
+   if (unlikely(pageno >= mem->size)) {
+   pr_err("%s: alloc failed, req-size: %u pages\n", 
__func__, nr_page);
+   goto err;
+   }
+   bitmap_set(mem->bitmap, pageno, nr_page);
+   } else {
+   int order = get_order(size);
+
+   pageno = bitmap_find_free_region(mem->bitmap, mem->size, order);
+   if (unlikely(pageno < 0))
+   goto err;
+   }
 
/*
 * Memory was found in the