On 16.03.2026 13:08, Maxime Ripard wrote: > On Wed, Mar 11, 2026 at 08:18:28AM -0500, Andrew Davis wrote: >> On 3/11/26 5:19 AM, Albert Esteve wrote: >>> On Tue, Mar 10, 2026 at 4:34 PM Andrew Davis <[email protected]> wrote: >>>> On 3/6/26 4:36 AM, Albert Esteve wrote: >>>>> Expose DT coherent reserved-memory pools ("shared-dma-pool" >>>>> without "reusable") as dma-buf heaps, creating one heap per >>>>> region so userspace can allocate from the exact device-local >>>>> pool intended for coherent DMA. >>>>> >>>>> This is a missing backend in the long-term effort to steer >>>>> userspace buffer allocations (DRM, v4l2, dma-buf heaps) >>>>> through heaps for clearer cgroup accounting. CMA and system >>>>> heaps already exist; non-reusable coherent reserved memory >>>>> did not. >>>>> >>>>> The heap binds the heap device to each memory region so >>>>> coherent allocations use the correct dev->dma_mem, and >>>>> it defers registration until module_init when normal >>>>> allocators are available. >>>>> >>>>> Signed-off-by: Albert Esteve <[email protected]> >>>>> --- >>>>> drivers/dma-buf/heaps/Kconfig | 9 + >>>>> drivers/dma-buf/heaps/Makefile | 1 + >>>>> drivers/dma-buf/heaps/coherent_heap.c | 414 >>>>> ++++++++++++++++++++++++++++++++++ >>>>> 3 files changed, 424 insertions(+) >>>>> >>>>> (...) >>>> You are doing this DMA allocation using a non-DMA pseudo-device (heap_dev). >>>> This is why you need to do that dma_coerce_mask_and_coherent(64) nonsense, >>>> you >>>> are doing a DMA alloc for the CPU itself. This might still work, but only >>>> if >>>> dma_map_sgtable() can handle swiotlb/iommu for all attaching devices at map >>>> time. >>> The concern is valid. We're allocating via a synthetic device, which >>> ties the allocation to that device's DMA domain. I looked deeper into >>> this trying to address the concern. >>> >>> The approach works because dma_map_sgtable() handles both >>> dma_map_direct and use_dma_iommu cases in __dma_map_sg_attrs(). For >>> each physical address in the sg_table (extracted via sg_phys()), it >>> creates device-specific DMA mappings: >>> - For direct mapping: it checks if the address is directly accessible >>> (dma_capable()), and if not, it falls back to swiotlb. >>> - For IOMMU: it creates mappings that allow the device to access >>> physical addresses. >>> >>> This means every attached device gets its own device-specific DMA >>> mapping, properly handling cases where the physical addresses are >>> inaccessible or have DMA constraints. >>> >> While this means it might still "work" it won't always be ideal. Take >> the case where the consuming device(s) have a 32bit address restriction, >> if the allocation was done using the real devices then the backing buffer >> itself would be allocated in <32bit mem. Whereas here the allocation >> could end up in >32bit mem, as the CPU/synthetic device supports that. >> Then each mapping device would instead get a bounce buffer. >> >> (this example might not be great as we usually know the address of >> carveout/reserved memory regions, but substitute in whatever restriction >> makes more sense) >> >> These non-reusable carveouts tend to be made for some specific device, and >> they are made specifically because that device has some memory restriction. >> So we might run into the situation above more than one would expect. >> >> Not a blocker here, but just something worth thinking on. > As I detailed in the previous version [1] the main idea behind that work > is to allow to get rid of dma_alloc_attrs for framework and drivers to > allocate from the heaps instead. > > Robin was saying he wasn't comfortable with exposing this heap to > userspace, and we're saying here that maybe this might not always work > anyway (or at least that we couldn't test it fully). > > Maybe the best thing is to defer this series until we are at a point > where we can start enabling the "heap allocations" in frameworks then? > Hopefully we will have hardware to test it with by then, and we might > not even need to expose it to userspace at all but only to the kernel. > > What do you think?
IMHO a good idea. Maybe in-kernel heap for the coherent allocations will be just enough. Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland
