Re: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
Hello 2017-11-24 19:35 GMT+09:00 David Laight: > From: Jaewon Kim >> Sent: 24 November 2017 05:59 >> >> dma-coherent uses bitmap APIs which internally consider align based on the >> requested size. If most of allocations are small size like KBs, using >> alignment scheme seems to be good for anti-fragmentation. But if large >> allocation are commonly used, then an allocation could be failed because >> of the alignment. To avoid the allocation failure, we had to increase total >> size. >> >> This is a example, total size is 30MB, only few memory at front is being >> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The >> first try on offset 0MB will be failed because others already are using >> them. The second try on offset 16MB will be failed because of ouf of bound. >> >> So if the alignment is not necessary on a specific dma-coherent memory >> region, we can set no-align property. Then dma-coherent will ignore the >> alignment only for the memory region. > > ISTM that the alignment needs to be a property of the request, not of the > device. Certainly the device driver code is most likely to know the specific > alignment requirements of any specific allocation. > Sorry but I'm not fully understand on 'a property of the request'. Actually dma-coherent APIs does not get alignment through argument but it internally uses get_order to determine alignment according to a requested size. I think if you meant that dma-coherent APIs should work in that way because drivers calling to dma-coherent APIs have been assuming the alignment for a long time. I still think few memory region could be managed without alignment if author knows well and adds no-align into its device tree. But it's OK if open source community worried about the no-alignment. Thank you > We've some hardware that would need large allocations to be 16k aligned. > We actually use multiple 16k allocations because any large buffers are > accessed directly from userspace (mmap and vm_iomap_memory) and the > card has its own page tables (with 16k pages). > > David >
Re: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
Hello 2017-11-24 19:35 GMT+09:00 David Laight : > From: Jaewon Kim >> Sent: 24 November 2017 05:59 >> >> dma-coherent uses bitmap APIs which internally consider align based on the >> requested size. If most of allocations are small size like KBs, using >> alignment scheme seems to be good for anti-fragmentation. But if large >> allocation are commonly used, then an allocation could be failed because >> of the alignment. To avoid the allocation failure, we had to increase total >> size. >> >> This is a example, total size is 30MB, only few memory at front is being >> used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The >> first try on offset 0MB will be failed because others already are using >> them. The second try on offset 16MB will be failed because of ouf of bound. >> >> So if the alignment is not necessary on a specific dma-coherent memory >> region, we can set no-align property. Then dma-coherent will ignore the >> alignment only for the memory region. > > ISTM that the alignment needs to be a property of the request, not of the > device. Certainly the device driver code is most likely to know the specific > alignment requirements of any specific allocation. > Sorry but I'm not fully understand on 'a property of the request'. Actually dma-coherent APIs does not get alignment through argument but it internally uses get_order to determine alignment according to a requested size. I think if you meant that dma-coherent APIs should work in that way because drivers calling to dma-coherent APIs have been assuming the alignment for a long time. I still think few memory region could be managed without alignment if author knows well and adds no-align into its device tree. But it's OK if open source community worried about the no-alignment. Thank you > We've some hardware that would need large allocations to be 16k aligned. > We actually use multiple 16k allocations because any large buffers are > accessed directly from userspace (mmap and vm_iomap_memory) and the > card has its own page tables (with 16k pages). > > David >
RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
From: Jaewon Kim > Sent: 24 November 2017 05:59 > > dma-coherent uses bitmap APIs which internally consider align based on the > requested size. If most of allocations are small size like KBs, using > alignment scheme seems to be good for anti-fragmentation. But if large > allocation are commonly used, then an allocation could be failed because > of the alignment. To avoid the allocation failure, we had to increase total > size. > > This is a example, total size is 30MB, only few memory at front is being > used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The > first try on offset 0MB will be failed because others already are using > them. The second try on offset 16MB will be failed because of ouf of bound. > > So if the alignment is not necessary on a specific dma-coherent memory > region, we can set no-align property. Then dma-coherent will ignore the > alignment only for the memory region. ISTM that the alignment needs to be a property of the request, not of the device. Certainly the device driver code is most likely to know the specific alignment requirements of any specific allocation. We've some hardware that would need large allocations to be 16k aligned. We actually use multiple 16k allocations because any large buffers are accessed directly from userspace (mmap and vm_iomap_memory) and the card has its own page tables (with 16k pages). David
RE: [RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
From: Jaewon Kim > Sent: 24 November 2017 05:59 > > dma-coherent uses bitmap APIs which internally consider align based on the > requested size. If most of allocations are small size like KBs, using > alignment scheme seems to be good for anti-fragmentation. But if large > allocation are commonly used, then an allocation could be failed because > of the alignment. To avoid the allocation failure, we had to increase total > size. > > This is a example, total size is 30MB, only few memory at front is being > used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The > first try on offset 0MB will be failed because others already are using > them. The second try on offset 16MB will be failed because of ouf of bound. > > So if the alignment is not necessary on a specific dma-coherent memory > region, we can set no-align property. Then dma-coherent will ignore the > alignment only for the memory region. ISTM that the alignment needs to be a property of the request, not of the device. Certainly the device driver code is most likely to know the specific alignment requirements of any specific allocation. We've some hardware that would need large allocations to be 16k aligned. We actually use multiple 16k allocations because any large buffers are accessed directly from userspace (mmap and vm_iomap_memory) and the card has its own page tables (with 16k pages). David
[RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
dma-coherent uses bitmap APIs which internally consider align based on the requested size. If most of allocations are small size like KBs, using alignment scheme seems to be good for anti-fragmentation. But if large allocation are commonly used, then an allocation could be failed because of the alignment. To avoid the allocation failure, we had to increase total size. This is a example, total size is 30MB, only few memory at front is being used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The first try on offset 0MB will be failed because others already are using them. The second try on offset 16MB will be failed because of ouf of bound. So if the alignment is not necessary on a specific dma-coherent memory region, we can set no-align property. Then dma-coherent will ignore the alignment only for the memory region. patch changelog: v2: use no-align property rather than forcely using no-align Signed-off-by: Jaewon Kim--- .../bindings/reserved-memory/reserved-memory.txt | 6 +++ arch/arm/mm/dma-mapping-nommu.c| 3 +- drivers/base/dma-coherent.c| 49 -- include/linux/dma-mapping.h| 12 +++--- 4 files changed, 50 insertions(+), 20 deletions(-) diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index 16291f2a4688..b279e111a7ca 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -63,6 +63,12 @@ reusable (optional) - empty property able to reclaim it back. Typically that means that the operating system can use that region to store volatile or cached data that can be otherwise regenerated or migrated elsewhere. +no-align (optional) - empty property +- Depending on a device or its usage pattern, tring to do aligning is not + useful. Because of aligning, allocation can be failed and that leads to + increasing total memory size to avoid the allocation failure. This + property indicates allocator will not try to do aligning on size nor + offset. Linux implementation note: - If a "linux,cma-default" property is present, then Linux will use the diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 6db5fc26d154..6512dae5d19b 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -75,8 +75,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t size, if (attrs & DMA_ATTR_NON_CONSISTENT) { ops->free(dev, size, cpu_addr, dma_addr, attrs); } else { - int ret = dma_release_from_global_coherent(get_order(size), - cpu_addr); + int ret = dma_release_from_global_coherent(size, cpu_addr); WARN_ON_ONCE(ret == 0); } diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index 1e6396bb807b..95d96bd764d9 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -17,6 +17,7 @@ struct dma_coherent_mem { int flags; unsigned long *bitmap; spinlock_t spinlock; + boolno_align; booluse_dev_dma_pfn_offset; }; @@ -163,19 +164,35 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied); static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, ssize_t size, dma_addr_t *dma_handle) { - int order = get_order(size); unsigned long flags; int pageno; void *ret; spin_lock_irqsave(>spinlock, flags); - if (unlikely(size > (mem->size << PAGE_SHIFT))) + if (unlikely(size > (mem->size << PAGE_SHIFT))) { + WARN_ONCE(1, "%s too big size, req-size: %zu total-size: %d\n", + __func__, size, (mem->size << PAGE_SHIFT)); goto err; + } - pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); - if (unlikely(pageno < 0)) - goto err; + if (mem->no_align) { + int nr_page = PAGE_ALIGN(size) >> PAGE_SHIFT; + + pageno = bitmap_find_next_zero_area(mem->bitmap, mem->size, 0, + nr_page, 0); + if (unlikely(pageno >= mem->size)) { + pr_err("%s: alloc failed, req-size: %u pages\n", __func__, nr_page); + goto err; + } + bitmap_set(mem->bitmap, pageno, nr_page); + } else { + int order = get_order(size); + + pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); + if (unlikely(pageno < 0)) + goto err; + } /*
[RFC v2] dma-coherent: introduce no-align to avoid allocation failure and save memory
dma-coherent uses bitmap APIs which internally consider align based on the requested size. If most of allocations are small size like KBs, using alignment scheme seems to be good for anti-fragmentation. But if large allocation are commonly used, then an allocation could be failed because of the alignment. To avoid the allocation failure, we had to increase total size. This is a example, total size is 30MB, only few memory at front is being used, and 9MB is being requsted. Then 9MB will be aligned to 16MB. The first try on offset 0MB will be failed because others already are using them. The second try on offset 16MB will be failed because of ouf of bound. So if the alignment is not necessary on a specific dma-coherent memory region, we can set no-align property. Then dma-coherent will ignore the alignment only for the memory region. patch changelog: v2: use no-align property rather than forcely using no-align Signed-off-by: Jaewon Kim --- .../bindings/reserved-memory/reserved-memory.txt | 6 +++ arch/arm/mm/dma-mapping-nommu.c| 3 +- drivers/base/dma-coherent.c| 49 -- include/linux/dma-mapping.h| 12 +++--- 4 files changed, 50 insertions(+), 20 deletions(-) diff --git a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt index 16291f2a4688..b279e111a7ca 100644 --- a/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt +++ b/Documentation/devicetree/bindings/reserved-memory/reserved-memory.txt @@ -63,6 +63,12 @@ reusable (optional) - empty property able to reclaim it back. Typically that means that the operating system can use that region to store volatile or cached data that can be otherwise regenerated or migrated elsewhere. +no-align (optional) - empty property +- Depending on a device or its usage pattern, tring to do aligning is not + useful. Because of aligning, allocation can be failed and that leads to + increasing total memory size to avoid the allocation failure. This + property indicates allocator will not try to do aligning on size nor + offset. Linux implementation note: - If a "linux,cma-default" property is present, then Linux will use the diff --git a/arch/arm/mm/dma-mapping-nommu.c b/arch/arm/mm/dma-mapping-nommu.c index 6db5fc26d154..6512dae5d19b 100644 --- a/arch/arm/mm/dma-mapping-nommu.c +++ b/arch/arm/mm/dma-mapping-nommu.c @@ -75,8 +75,7 @@ static void arm_nommu_dma_free(struct device *dev, size_t size, if (attrs & DMA_ATTR_NON_CONSISTENT) { ops->free(dev, size, cpu_addr, dma_addr, attrs); } else { - int ret = dma_release_from_global_coherent(get_order(size), - cpu_addr); + int ret = dma_release_from_global_coherent(size, cpu_addr); WARN_ON_ONCE(ret == 0); } diff --git a/drivers/base/dma-coherent.c b/drivers/base/dma-coherent.c index 1e6396bb807b..95d96bd764d9 100644 --- a/drivers/base/dma-coherent.c +++ b/drivers/base/dma-coherent.c @@ -17,6 +17,7 @@ struct dma_coherent_mem { int flags; unsigned long *bitmap; spinlock_t spinlock; + boolno_align; booluse_dev_dma_pfn_offset; }; @@ -163,19 +164,35 @@ EXPORT_SYMBOL(dma_mark_declared_memory_occupied); static void *__dma_alloc_from_coherent(struct dma_coherent_mem *mem, ssize_t size, dma_addr_t *dma_handle) { - int order = get_order(size); unsigned long flags; int pageno; void *ret; spin_lock_irqsave(>spinlock, flags); - if (unlikely(size > (mem->size << PAGE_SHIFT))) + if (unlikely(size > (mem->size << PAGE_SHIFT))) { + WARN_ONCE(1, "%s too big size, req-size: %zu total-size: %d\n", + __func__, size, (mem->size << PAGE_SHIFT)); goto err; + } - pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); - if (unlikely(pageno < 0)) - goto err; + if (mem->no_align) { + int nr_page = PAGE_ALIGN(size) >> PAGE_SHIFT; + + pageno = bitmap_find_next_zero_area(mem->bitmap, mem->size, 0, + nr_page, 0); + if (unlikely(pageno >= mem->size)) { + pr_err("%s: alloc failed, req-size: %u pages\n", __func__, nr_page); + goto err; + } + bitmap_set(mem->bitmap, pageno, nr_page); + } else { + int order = get_order(size); + + pageno = bitmap_find_free_region(mem->bitmap, mem->size, order); + if (unlikely(pageno < 0)) + goto err; + } /* * Memory was found in the