Re: [PATCH v2] drm/xe/tests: Fix the shrinker test compiler warnings.
On 04/10/2024 15:11, Thomas Hellström wrote: The xe_bo_shrink_kunit test has an uninitialized value and illegal integer size conversions on 32-bit. Fix. v2: - Use div64_u64 to ensure the u64 division compiles everywhere. (Matt Auld) Reported-by: Nathan Chancellor Closes: https://lore.kernel.org/20240913195649.GA61514@thelio-3990X/ Fixes: 5a90b60db5e6 ("drm/xe: Add a xe_bo subtest for shrinking / swapping") Cc: dri-devel@lists.freedesktop.org Cc: Matthew Auld Reviewed-by: Matthew Auld #v1 Signed-off-by: Thomas Hellström Could probably use div_u64(). Anyway, Reviewed-by: Matthew Auld --- drivers/gpu/drm/xe/tests/xe_bo.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c index 7d3fd720478b..cd811aa2b227 100644 --- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -7,6 +7,7 @@ #include #include +#include #include #include @@ -440,7 +441,7 @@ static int shrink_test_run_device(struct xe_device *xe) LIST_HEAD(bos); struct xe_bo_link *link, *next; struct sysinfo si; - size_t ram, ram_and_swap, purgeable, alloced, to_alloc, limit; + u64 ram, ram_and_swap, purgeable = 0, alloced, to_alloc, limit; unsigned int interrupted = 0, successful = 0, count = 0; struct rnd_state prng; u64 rand_seed; @@ -469,7 +470,7 @@ static int shrink_test_run_device(struct xe_device *xe) ram_and_swap = ram + get_nr_swap_pages() * PAGE_SIZE; if (to_alloc > ram_and_swap) purgeable = to_alloc - ram_and_swap; - purgeable += purgeable / 5; + purgeable += div64_u64(purgeable, 5); kunit_info(test, "Free ram is %lu bytes. Will allocate twice of that.\n", (unsigned long)ram);
Re: [PATCH v2 2/2] drm/xe: Invalidate media_gt TLBs in PT code
On 26/08/2024 18:01, Matthew Brost wrote: Testing on LNL has shown media GT's TLBs need to be invalidated via the GuC, update PT code appropriately. v2: - Do dma_fence_get before first call of invalidation_fence_init (Himal) - No need to check for valid chain fence (Himal) v3: - Use dma-fence-array Fixes: 3330361543fc ("drm/xe/lnl: Add LNL platform definition") Signed-off-by: Matthew Brost Reviewed-by: Matthew Auld
Re: [PATCH v6 2/2] drm/xe/lnl: Offload system clear page activity to GPU
On 16/08/2024 14:51, Nirmoy Das wrote: On LNL because of flat CCS, driver creates migrates job to clear CCS meta data. Extend that to also clear system pages using GPU. Inform TTM to allocate pages without __GFP_ZERO to avoid double page clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clear on free as XE now takes care of clearing pages. If a bo is in system placement such as BO created with DRM_XE_GEM_CREATE_FLAG_DEFER_BACKING and there is a cpu map then for such BO gpu clear will be avoided as there is no dma mapping for such BO at that moment to create migration jobs. Tested this patch api_overhead_benchmark_l0 from https://github.com/intel/compute-benchmarks Without the patch: api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation: UsmMemoryAllocation(api=l0 type=Host size=4KB) 84.206 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 105775.56 us erf tool top 5 entries: 71.44% api_overhead_be [kernel.kallsyms] [k] clear_page_erms 6.34% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page 2.24% api_overhead_be [kernel.kallsyms] [k] cpa_flush 2.15% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 1.94% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res With the patch: api_overhead_benchmark_l0 --testFilter=UsmMemoryAllocation: UsmMemoryAllocation(api=l0 type=Host size=4KB) 79.439 us UsmMemoryAllocation(api=l0 type=Host size=1GB) 98677.75 us Perf tool top 5 entries: 11.16% api_overhead_be [kernel.kallsyms] [k] __pageblock_pfn_to_page 7.85% api_overhead_be [kernel.kallsyms] [k] cpa_flush 7.59% api_overhead_be [kernel.kallsyms] [k] find_next_iomem_res 7.24% api_overhead_be [kernel.kallsyms] [k] pages_are_mergeable 5.53% api_overhead_be [kernel.kallsyms] [k] lookup_address_in_pgd_attr Without this patch clear_page_erms() dominates execution time which is also not pipelined with migration jobs. With this patch page clearing will get pipelined with migration job and will free CPU for more work. v2: Handle regression on dgfx(Himal) Update commit message as no ttm API changes needed. v3: Fix Kunit test. v4: handle data leak on cpu mmap(Thomas) v5: s/gpu_page_clear/gpu_page_clear_sys and move setting it to xe_ttm_sys_mgr_init() and other nits (Matt Auld) v6: Disable it when init_on_alloc and/or init_on_free is active(Matt) Use compute-benchmarks as reporter used it to report this allocation latency issue also a proper test application than mime. In v5, the test showed significant reduction in alloc latency but that is not the case any more, I think this was mostly because previous test was done on IFWI which had low mem BW from CPU. Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: Matthew Brost Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das Reviewed-by: Matthew Auld
Re: [PATCH] drm/buddy: fix issue that force_merge cannot free all roots
On 13/08/2024 10:44, Lin.Cao wrote: If buddy manager have more than one roots and each root have sub-block need to be free. When drm_buddy_fini called, the first loop of force_merge will merge and free all of the sub block of first root, which offset is 0x0 and size is biggest(more than have of the mm size). In subsequent force_merge rounds, if we use 0 as start and use remaining mm size as end, the block of other roots will be skipped in __force_merge function. It will cause the other roots can not be freed. Solution: use roots' offset as the start could fix this issue. Were you able to take a look at the test side for this? See previous reply: https://lore.kernel.org/dri-devel/3fdd9175-832a-4113-8aaa-6039925c5...@intel.com/ Patch itself is good, but would be good to understand why the test for this is not failing, and try to improve that also. Signed-off-by: Lin.Cao --- drivers/gpu/drm/drm_buddy.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..b3f0dd652088 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -324,7 +324,7 @@ EXPORT_SYMBOL(drm_buddy_init); */ void drm_buddy_fini(struct drm_buddy *mm) { - u64 root_size, size; + u64 root_size, size, start; unsigned int order; int i; @@ -332,7 +332,8 @@ void drm_buddy_fini(struct drm_buddy *mm) for (i = 0; i < mm->n_roots; ++i) { order = ilog2(size) - ilog2(mm->chunk_size); - __force_merge(mm, 0, size, order); + start = drm_buddy_block_offset(mm->roots[i]); + __force_merge(mm, start, start + size, order); WARN_ON(!drm_buddy_block_is_free(mm->roots[i])); drm_block_free(mm, mm->roots[i]);
Re: [PATCH v6 11/12] drm/ttm, drm/xe: Add a shrinker for xe bos
Hi, On 03/07/2024 16:38, Thomas Hellström wrote: Rather than relying on the TTM watermark accounting add a shrinker for xe_bos in TT or system memory. Leverage the newly added TTM per-page shrinking and shmem backup support. Although xe doesn't fully support WONTNEED (purgeable) bos yet, introduce and add shrinker support for purgeable ttm_tts. v2: - Cleanups bugfixes and a KUNIT shrinker test. - Add writeback support, and activate if kswapd. v3: - Move the try_shrink() helper to core TTM. - Minor cleanups. v4: - Add runtime pm for the shrinker. Shrinking may require an active device for CCS metadata copying. v5: - Separately purge ghost- and zombie objects in the shrinker. - Fix a format specifier - type inconsistency. (Kernel test robot). Cc: Christian König Cc: Somalapuram Amaranath Cc: Matthew Brost Cc: Signed-off-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_bo_util.c | 67 ++ drivers/gpu/drm/xe/Makefile | 1 + drivers/gpu/drm/xe/tests/xe_bo.c | 118 +++ drivers/gpu/drm/xe/tests/xe_bo_test.c | 1 + drivers/gpu/drm/xe/tests/xe_bo_test.h | 1 + drivers/gpu/drm/xe/xe_bo.c| 155 -- drivers/gpu/drm/xe/xe_bo.h| 26 +++ drivers/gpu/drm/xe/xe_device.c| 8 + drivers/gpu/drm/xe/xe_device_types.h | 2 + drivers/gpu/drm/xe/xe_shrinker.c | 287 ++ drivers/gpu/drm/xe/xe_shrinker.h | 18 ++ include/drm/ttm/ttm_bo.h | 3 + 12 files changed, 671 insertions(+), 16 deletions(-) create mode 100644 drivers/gpu/drm/xe/xe_shrinker.c create mode 100644 drivers/gpu/drm/xe/xe_shrinker.h diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index c4f678f30fc2..563e96a4cf06 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -924,3 +924,70 @@ long ttm_lru_walk_for_evict(struct ttm_lru_walk *walk, struct ttm_device *bdev, return progress; } +EXPORT_SYMBOL(ttm_lru_walk_for_evict); + +/** + * ttm_bo_try_shrink - LRU walk helper to shrink a ttm buffer object. + * @walk: The struct xe_ttm_lru_walk that describes the walk. + * @bo: The buffer object. + * @purge: Whether to attempt to purge the bo content since it's no + * longer needed. + * @writeback: If !@purge, attempt to write out to persistent storage. + * + * The function uses the ttm_tt_back_up functionality to back up or + * purge a struct ttm_tt. If the bo is not in system, it's first + * moved there. + * + * Return: The number of pages shrunken or purged, or + * negative error code on failure. + */ +long ttm_bo_try_shrink(struct ttm_lru_walk *walk, struct ttm_buffer_object *bo, + bool purge, bool writeback) +{ + static const struct ttm_place sys_placement_flags = { + .fpfn = 0, + .lpfn = 0, + .mem_type = TTM_PL_SYSTEM, + .flags = 0, + }; + static struct ttm_placement sys_placement = { + .num_placement = 1, + .placement = &sys_placement_flags, + }; + struct ttm_operation_ctx *ctx = walk->ctx; + struct ttm_tt *tt = bo->ttm; + long lret; + + dma_resv_assert_held(bo->base.resv); + + if (!tt || !ttm_tt_is_populated(tt)) + return 0; + + if (bo->resource->mem_type != TTM_PL_SYSTEM) { + int ret = ttm_bo_validate(bo, &sys_placement, ctx); + + if (ret) { + if (ret == -EINTR || ret == -EDEADLK || + ret == -ERESTARTSYS) + return ret; + return 0; + } + } + + lret = ttm_bo_wait_ctx(bo, ctx); + if (lret < 0) { + if (lret == -ERESTARTSYS) + return lret; + return 0; + } + + if (bo->deleted) + lret = ttm_tt_backup(bo->bdev, tt, true, writeback); + else + lret = ttm_tt_backup(bo->bdev, tt, purge, writeback); + if (lret < 0 && lret != -EINTR) + return 0; + + return lret; +} +EXPORT_SYMBOL(ttm_bo_try_shrink); diff --git a/drivers/gpu/drm/xe/Makefile b/drivers/gpu/drm/xe/Makefile index b1e03bfe4a68..1eba51bdd172 100644 --- a/drivers/gpu/drm/xe/Makefile +++ b/drivers/gpu/drm/xe/Makefile @@ -112,6 +112,7 @@ xe-y += xe_bb.o \ xe_ring_ops.o \ xe_sa.o \ xe_sched_job.o \ + xe_shrinker.o \ xe_step.o \ xe_sync.o \ xe_tile.o \ diff --git a/drivers/gpu/drm/xe/tests/xe_bo.c b/drivers/gpu/drm/xe/tests/xe_bo.c index 9f3c02826464..49617f16dc76 100644 --- a/drivers/gpu/drm/xe/tests/xe_bo.c +++ b/drivers/gpu/drm/xe/tests/xe_bo.c @@ -6,6 +6,8 @@ #include #include +#include + #include "tests/xe_bo_test.h" #include "tests/xe_pci_test.h" #include "tests/xe_test.h" @@ -350,3 +352,119 @@ void xe_bo_evict_kunit(struct kunit *test) xe_call_fo
Re: [PATCH v6 2/3] drm/xe/migrate: Parameterize ccs and bo data clear in xe_migrate_clear()
On 19/07/2024 10:55, Nirmoy Das wrote: Parameterize clearing ccs and bo data in xe_migrate_clear() which higher layers can utilize. This patch will be used later on when doing bo data clear for igfx as well. v2: Replace multiple params with flags in xe_migrate_clear (Matt B) v3: s/CLEAR_BO_DATA_FLAG_*/XE_MIGRATE_CLEAR_FLAG_* and move to xe_migrate.h. other nits(Matt B) Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: Matthew Brost Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das Signed-off-by: Akshata Jahagirdar Seems reasonable to me, Reviewed-by: Matthew Auld
Re: [PATCH] drm/buddy: fix issue that force_merge cannot free all roots
Hi, On 08/08/2024 07:38, Lin.Cao wrote: If buddy manager have more than one roots and each root have sub-block need to be free. When drm_buddy_fini called, the first loop of force_merge will merge and free all of the sub block of first root, which offset is 0x0 and size is biggest(more than have of the mm size). In subsequent force_merge rounds, if we use 0 as start and use remaining mm size as end, the block of other roots will be skipped in __force_merge function. It will cause the other roots can not be freed. Solution: use roots' offset as the start could fix this issue. Signed-off-by: Lin.Cao Nice catch. --- drivers/gpu/drm/drm_buddy.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..5379687552bc 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -327,12 +327,14 @@ void drm_buddy_fini(struct drm_buddy *mm) u64 root_size, size; unsigned int order; int i; + u64 start = 0; Nit: We could maybe move this into root_size, size or even into the loop body below? Also no need to init. size = mm->size; for (i = 0; i < mm->n_roots; ++i) { order = ilog2(size) - ilog2(mm->chunk_size); - __force_merge(mm, 0, size, order); + start = drm_buddy_block_offset(mm->roots[i]); + __force_merge(mm, start, start + size, order); WARN_ON(!drm_buddy_block_is_free(mm->roots[i])); We do seem to have a testcase for this at the bottom of drm_test_buddy_alloc_clear(), so either it is not triggering the WARN_ON() here in which case we should maybe improve that. Or it is, but kunit doesn't treat that as a test failure? Maybe we can call something like kunit_fail_current_test() here if that WARN_ON is triggered? For reference our CI is just running all drm selftests with: /kernel/tools/testing/kunit/kunit.py run --kunitconfig /kernel/drivers/gpu/drm/tests/.kunitconfig drm_block_free(mm, mm->roots[i]);
Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function
On 23/07/2024 14:25, Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König Given the comment from Marek that this is about non power-of-two alignment, this makes sense (although might be good to mention that somewhere in the commit message to make that clearer), Reviewed-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 6a8e45e9d0ec..103c185bb1c8 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, +u64 *start, u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block); + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; + + if (!IS_ALIGNED(new_start, mm->chunk_size)) + return -EINVAL; + + if (range_overflows(new_start, new_size, block_end)) + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +924,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { @@ -1066,7 +1085,8 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } while (1); /* Trim the allocated block to the required size */ - if (original_size != size) { + if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); u64 trim_size; @@ -1083,6 +1103,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } drm_buddy_block_trim(mm, +NULL, trim_size, trim_list); diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index fe3779fdba2c..423b261ea743 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -150,7 +150,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, } while (remaining_size); if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { - if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks)) + if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks)) size = vres->base.size; } diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 2a74fa9d0ce5..9689a7c5dd36 100644 --- a/in
Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function
On 24/07/2024 02:35, Marek Olšák wrote: The reason is that our DCC requires 768K alignment in some cases. I haven't read this patch series, but one way to do that is to align to 256K, overallocate by 512K, and then not use either 0, 256K, or 512K at the beginning to get to 768K alignment. Ah, so we need a non power-of-two alignment. That makes sense, thanks. Marek On Tue, Jul 23, 2024, 11:04 Matthew Auld <mailto:matthew.a...@intel.com>> wrote: On 23/07/2024 14:43, Paneer Selvam, Arunpravin wrote: > Hi Matthew, > > Can we push this version for now as we need to mainline the DCC changes > ASAP, > while we continue our discussion and proceed to implement the permanent > solution > for address alignment? Yeah, we can always merge now and circle back around later, if this for sure helps your usecase and is needed asap. I just didn't fully get the idea for needing this interface, but likely I am missing something. > > Thanks, > Arun. > > On 7/23/2024 6:55 PM, Arunpravin Paneer Selvam wrote: >> - Add a new start parameter in trim function to specify exact >> address from where to start the trimming. This would help us >> in situations like if drivers would like to do address alignment >> for specific requirements. >> >> - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this >> flag to disable the allocator trimming part. This patch enables >> the drivers control trimming and they can do it themselves >> based on the application requirements. >> >> v1:(Matthew) >> - check new_start alignment with min chunk_size >> - use range_overflows() >> >> Signed-off-by: Arunpravin Paneer Selvam mailto:arunpravin.paneersel...@amd.com>> >> Acked-by: Alex Deucher mailto:alexander.deuc...@amd.com>> >> Acked-by: Christian König mailto:christian.koe...@amd.com>> >> --- >> drivers/gpu/drm/drm_buddy.c | 25 +++-- >> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- >> include/drm/drm_buddy.h | 2 ++ >> 3 files changed, 26 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >> index 6a8e45e9d0ec..103c185bb1c8 100644 >> --- a/drivers/gpu/drm/drm_buddy.c >> +++ b/drivers/gpu/drm/drm_buddy.c >> @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct >> drm_buddy *mm, >> * drm_buddy_block_trim - free unused pages >> * >> * @mm: DRM buddy manager >> + * @start: start address to begin the trimming. >> * @new_size: original size requested >> * @blocks: Input and output list of allocated blocks. >> * MUST contain single block as input to be trimmed. >> @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct >> drm_buddy *mm, >> * 0 on success, error code on failure. >> */ >> int drm_buddy_block_trim(struct drm_buddy *mm, >> + u64 *start, >> u64 new_size, >> struct list_head *blocks) >> { >> struct drm_buddy_block *parent; >> struct drm_buddy_block *block; >> + u64 block_start, block_end; >> LIST_HEAD(dfs); >> u64 new_start; >> int err; >> @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, >> struct drm_buddy_block, >> link); >> + block_start = drm_buddy_block_offset(block); >> + block_end = block_start + drm_buddy_block_size(mm, block); >> + >> if (WARN_ON(!drm_buddy_block_is_allocated(block))) >> return -EINVAL; >> @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, >> if (new_size == drm_buddy_block_size(mm, block)) >> return 0; >> + new_start = block_start; >> + if (start) { >> + new_start = *start; >> + >> + if (new_start < block_start) >> + return -EINVAL; >> + >> + if (!IS_ALIGNED(new_start, mm->chunk_size)) >> + return -EINVAL; >> + >> + if (range_overflows(new_start, new_size, block_end)) >> + return -EINVAL; >> +
Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function
On 23/07/2024 14:43, Paneer Selvam, Arunpravin wrote: Hi Matthew, Can we push this version for now as we need to mainline the DCC changes ASAP, while we continue our discussion and proceed to implement the permanent solution for address alignment? Yeah, we can always merge now and circle back around later, if this for sure helps your usecase and is needed asap. I just didn't fully get the idea for needing this interface, but likely I am missing something. Thanks, Arun. On 7/23/2024 6:55 PM, Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 6a8e45e9d0ec..103c185bb1c8 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, + u64 *start, u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block); + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; + + if (!IS_ALIGNED(new_start, mm->chunk_size)) + return -EINVAL; + + if (range_overflows(new_start, new_size, block_end)) + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +924,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { @@ -1066,7 +1085,8 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } while (1); /* Trim the allocated block to the required size */ - if (original_size != size) { + if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); u64 trim_size; @@ -1083,6 +1103,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } drm_buddy_block_trim(mm, + NULL, trim_size, trim_list); diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index fe3779fdba2c..423b261ea743 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -150,7 +150,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, } while (remaining_size); if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { - if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks)) + if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks)) size = vres->base.size; } diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 2a74fa9d0ce5..9689a7c5dd36 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -27,6 +27,7 @@ #define DRM_BUDDY_CONTIGUOUS_ALLOCATION
Re: [PATCH] drm/buddy: Add start address support to trim function
Hi, On 22/07/2024 12:41, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/19/2024 4:01 PM, Matthew Auld wrote: On 17/07/2024 16:02, Paneer Selvam, Arunpravin wrote: On 7/16/2024 3:34 PM, Matthew Auld wrote: On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6:20 PM, Matthew Auld wrote: On 10/07/2024 07:03, Paneer Selvam, Arunpravin wrote: Thanks Alex. Hi Matthew, Any comments? Do we not pass the required address alignment when allocating the pages in the first place? If address alignment is really useful, we can add that in the drm_buddy_alloc_blocks() function. I mean don't we already pass the min page size, which should give us matching physical address alignment? I think we don't need to align the address to the passed min_block_size value for all the contiguous buffers, so I thought that decision we can leave it to the drivers and they can achieve that through trim function in this kind of a specific request. I would have assumed it would be simpler to use min_block_size and then trim the size, if it's too big? That would then also take care of the try_harder case? For example, if the required contiguous size is 1MiB and min_block_size is 256KiB, to perform the address alignment of 256KiB, we might need to over-allocate at least to the min_block_size (say 256KiB). Now the size becomes 1280KiB and If we have 1M contig request then it should already be aligned to 256K and every other power-of-two < 1M. VRAM should start at offset zero, so 1M block will have 1M address alignment, and so should also be aligned to 256K, right? Or does "address alignment of 256KiB" mean something else here? To me it just means IS_ALIGNED(block_start, 256K). since the contiguous flag is enabled, we will round up the size to the next power of two and the size value becomes 2MiB. Next, in trimming we should round up the block start address to the min_block_size. May be we can keep the above mentioned operations under the flag combination DRM_BUDDY_CONTIGUOUS_ALLOCATION && DRM_BUDDY_ADDRESS_ALIGNMENT?. At the moment, we cannot support address alignment for try_harder allocations since in case of try_harder allocations we first traverse RHS to allocate the maximum possible and traverse LHS (here we align the LHS size to min_block_size) to allocate the remaining size. May be in case of DRM_BUDDY_ADDRESS_ALIGNMENT, we should first allocate LHS satisfying the address alignment requirement and then traverse RHS to allocate the remaining size if required? Also how are we dealing with the multi-block try_harder case? AFAICT we only allow trimming single block atm, or is it not possible to trigger that path here? Or are we handling that somehow? not possible to trigger that path here. only when we either over-allocate the LHS size and pass the multiple blocks to the trim function or implement the above mentioned method. https://patchwork.freedesktop.org/series/136150/ We are getting this sparse error from the Intel CI. Do you think these errors are introduced with this patches? I think it's safe to ignore, there seem to be other series with the same thing. Thanks. Thanks, Arun. Thanks, Arun. Thanks, Arun. On 7/9/2024 1:42 AM, Alex Deucher wrote: On Thu, Jul 4, 2024 at 4:40 AM Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Series is: Acked-by: Alex Deucher I'd like to take this series through the amdgpu tree if there are no objections as it's required for display buffers on some chips and I'd like to make sure it lands in 6.11. Thanks, Alex --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..8cebe1fa4e9d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block
Re: [PATCH] drm/buddy: Add start address support to trim function
On 17/07/2024 16:02, Paneer Selvam, Arunpravin wrote: On 7/16/2024 3:34 PM, Matthew Auld wrote: On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6:20 PM, Matthew Auld wrote: On 10/07/2024 07:03, Paneer Selvam, Arunpravin wrote: Thanks Alex. Hi Matthew, Any comments? Do we not pass the required address alignment when allocating the pages in the first place? If address alignment is really useful, we can add that in the drm_buddy_alloc_blocks() function. I mean don't we already pass the min page size, which should give us matching physical address alignment? I think we don't need to align the address to the passed min_block_size value for all the contiguous buffers, so I thought that decision we can leave it to the drivers and they can achieve that through trim function in this kind of a specific request. I would have assumed it would be simpler to use min_block_size and then trim the size, if it's too big? That would then also take care of the try_harder case? Also how are we dealing with the multi-block try_harder case? AFAICT we only allow trimming single block atm, or is it not possible to trigger that path here? Or are we handling that somehow? https://patchwork.freedesktop.org/series/136150/ We are getting this sparse error from the Intel CI. Do you think these errors are introduced with this patches? I think it's safe to ignore, there seem to be other series with the same thing. Thanks, Arun. Thanks, Arun. Thanks, Arun. On 7/9/2024 1:42 AM, Alex Deucher wrote: On Thu, Jul 4, 2024 at 4:40 AM Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Series is: Acked-by: Alex Deucher I'd like to take this series through the amdgpu tree if there are no objections as it's required for display buffers on some chips and I'd like to make sure it lands in 6.11. Thanks, Alex --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..8cebe1fa4e9d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, + u64 *start, u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block); + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; + + if (!IS_ALIGNED(new_start, mm->chunk_size)) + return -EINVAL; + + if (range_overflows(new_start, new_size, block_end)) + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +924,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_budd
Re: [PATCH v5 3/4] drm/xe/migrate: Clear CCS when clearing bo on xe2
On 04/07/2024 09:18, Nirmoy Das wrote: Clearing bo with uncompress PTE will trigger a CCS clearing as well for XE2, so skip emit_copy_ccs() when on xe2 when clearing bo. v2: When clearing BO, CCS clear happens with all command as long as PTEs are uncompress. Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/xe_migrate.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c index e0a3f6921572..cc8beed2bf8e 100644 --- a/drivers/gpu/drm/xe/xe_migrate.c +++ b/drivers/gpu/drm/xe/xe_migrate.c @@ -1061,7 +1061,8 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, if (clear_vram && xe_migrate_allow_identity(clear_L0, &src_it)) xe_res_next(&src_it, clear_L0); else - emit_pte(m, bb, clear_L0_pt, clear_vram, clear_ccs, + /* Use uncompressed pte so clear happens in the real memory. */ + emit_pte(m, bb, clear_L0_pt, clear_vram, false, &src_it, clear_L0, dst); I assume this uses coherency 1way+ mode for that pat index? We could potentially use coh_none instead, for the case where bo.cpu_caching != wb. In theory that should be faster, but could be ignored for now. bb->cs[bb->len++] = MI_BATCH_BUFFER_END; @@ -1070,7 +1071,9 @@ struct dma_fence *xe_migrate_clear(struct xe_migrate *m, if (clear_bo_data) emit_clear(gt, bb, clear_L0_ofs, clear_L0, XE_PAGE_SIZE, clear_vram); - if (xe_device_has_flat_ccs(xe)) { + /* Clearing BO with uncompress PTE will clear CCS metadata as well on XE2 */ + if (xe_device_has_flat_ccs(xe) && clear_ccs && + !(clear_bo_data && GRAPHICS_VERx100(gt_to_xe(gt)) >= 2000)) { emit_copy_ccs(gt, bb, clear_L0_ofs, true, m->cleared_mem_ofs, false, clear_L0); flush_flags = MI_FLUSH_DW_CCS;
Re: [PATCH v5 4/4] drm/xe/lnl: Offload system clear page activity to GPU
On 04/07/2024 09:18, Nirmoy Das wrote: On LNL because of flat CCS, driver creates a migrate job to clear CCS meta data. Extend that to also clear system pages using GPU. Inform TTM to allocate pages without __GFP_ZERO to avoid double page clearing by clearing out TTM_TT_FLAG_ZERO_ALLOC flag and set TTM_TT_FLAG_CLEARED_ON_FREE while freeing to skip ttm pool's clearn-on-free as XE now takes care of clearing pages. If a bo is in system placement and there is a cpu map then for such BO gpu clear will be avoided as there is no dma mapping for such BO. To test the patch, created a small test that tries to submit a job after binding various sizes of buffer which shows good gains for larger buffer. For lower buffer sizes, the result is not very reliable as the results vary a lot. With the patch sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719237905 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 9493 us Time taken for size SZ_2M: 5503 us Time taken for size SZ_64M: 13016 us Time taken for size SZ_128M: 29464 us Time taken for size SZ_256M: 38408 us Time taken for size SZ_1G: 148758 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 3889 us Time taken for size SZ_2M: 6091 us Time taken for size SZ_64M: 20920 us Time taken for size SZ_128M: 32394 us Time taken for size SZ_256M: 61710 us Time taken for size SZ_1G: 215437 us Subtest basic-store-benchmark: SUCCESS (0.589s) With the patch: sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719238062 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 11803 us Time taken for size SZ_2M: 4237 us Time taken for size SZ_64M: 8649 us Time taken for size SZ_128M: 14682 us Time taken for size SZ_256M: 22156 us Time taken for size SZ_1G: 74457 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 5129 us Time taken for size SZ_2M: 12563 us Time taken for size SZ_64M: 14860 us Time taken for size SZ_128M: 26064 us Time taken for size SZ_256M: 47167 us Time taken for size SZ_1G: 170304 us Subtest basic-store-benchmark: SUCCESS (0.417s) With the patch and init_on_alloc=0 sudo ~/igt-gpu-tools/build/tests/xe_exec_store --run basic-store-benchmark IGT-Version: 1.28-g2ed908c0b (x86_64) (Linux: 6.10.0-rc2-xe+ x86_64) Using IGT_SRANDOM=1719238219 for randomisation Opened device: /dev/dri/card0 Starting subtest: basic-store-benchmark Starting dynamic subtest: WC Dynamic subtest WC: SUCCESS (0.000s) Time taken for size SZ_4K: 4803 us Time taken for size SZ_2M: 9212 us Time taken for size SZ_64M: 9643 us Time taken for size SZ_128M: 13479 us Time taken for size SZ_256M: 22429 us Time taken for size SZ_1G: 83110 us Starting dynamic subtest: WB Dynamic subtest WB: SUCCESS (0.000s) Time taken for size SZ_4K: 4003 us Time taken for size SZ_2M: 4443 us Time taken for size SZ_64M: 12960 us Time taken for size SZ_128M: 13741 us Time taken for size SZ_256M: 26841 us Time taken for size SZ_1G: 84746 us Subtest basic-store-benchmark: SUCCESS (0.290s) v2: Handle regression on dgfx(Himal) Update commit message as no ttm API changes needed. v3: Fix Kunit test. v4: handle data leak on cpu mmap(Thomas) Cc: Himal Prasad Ghimiray Cc: Matthew Auld Cc: "Thomas Hellström" Signed-off-by: Nirmoy Das --- drivers/gpu/drm/xe/xe_bo.c | 25 - drivers/gpu/drm/xe/xe_device.c | 7 +++ drivers/gpu/drm/xe/xe_device_types.h | 2 ++ 3 files changed, 33 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 4d6315d2ae9a..b76a44fcf3b1 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -387,6 +387,13 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, caching = ttm_uncached; } + /* If the device can support gpu clear pages then set proper ttm Nit: for multi-line block comment style is usually: /* * If the device +* flag. Zeroed pages are only required for ttm_bo_type_device so +* unwanted data is leaked to userspace. not leaked? +*/ + if (ttm_bo->type == ttm_bo_type_device && xe->mem.gpu_page_clear) + page_flags |= TTM_TT_FLAG_CLEARED_ON_FREE; + err = ttm_tt_init(&tt->ttm, &bo->ttm, page_flags, caching, extra_pages); if (err) { kfree(tt); @@ -408,6 +415,10 @@ static int xe_ttm_tt_populate(struct ttm_device *ttm_dev, st
Re: [PATCH] drm/buddy: Add start address support to trim function
On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6:20 PM, Matthew Auld wrote: On 10/07/2024 07:03, Paneer Selvam, Arunpravin wrote: Thanks Alex. Hi Matthew, Any comments? Do we not pass the required address alignment when allocating the pages in the first place? If address alignment is really useful, we can add that in the drm_buddy_alloc_blocks() function. I mean don't we already pass the min page size, which should give us matching physical address alignment? Thanks, Arun. Thanks, Arun. On 7/9/2024 1:42 AM, Alex Deucher wrote: On Thu, Jul 4, 2024 at 4:40 AM Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Series is: Acked-by: Alex Deucher I'd like to take this series through the amdgpu tree if there are no objections as it's required for display buffers on some chips and I'd like to make sure it lands in 6.11. Thanks, Alex --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..8cebe1fa4e9d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, + u64 *start, u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block); + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; + + if (!IS_ALIGNED(new_start, mm->chunk_size)) + return -EINVAL; + + if (range_overflows(new_start, new_size, block_end)) + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +924,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { @@ -1066,7 +1085,8 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } while (1); /* Trim the allocated block to the required size */ - if (original_size != size) { + if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); u64 trim_size; @@ -1083,6 +1103,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } drm_buddy_block_trim(mm, + NULL, trim_size, trim_list); diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index fe3779fdba2c..423b261ea743 100644 --- a/drivers/gp
Re: [PATCH] drm/buddy: Add start address support to trim function
On 10/07/2024 07:03, Paneer Selvam, Arunpravin wrote: Thanks Alex. Hi Matthew, Any comments? Do we not pass the required address alignment when allocating the pages in the first place? Thanks, Arun. On 7/9/2024 1:42 AM, Alex Deucher wrote: On Thu, Jul 4, 2024 at 4:40 AM Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. v1:(Matthew) - check new_start alignment with min chunk_size - use range_overflows() Signed-off-by: Arunpravin Paneer Selvam Series is: Acked-by: Alex Deucher I'd like to take this series through the amdgpu tree if there are no objections as it's required for display buffers on some chips and I'd like to make sure it lands in 6.11. Thanks, Alex --- drivers/gpu/drm/drm_buddy.c | 25 +++-- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 26 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 94f8c34fc293..8cebe1fa4e9d 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, + u64 *start, u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block); + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; + + if (!IS_ALIGNED(new_start, mm->chunk_size)) + return -EINVAL; + + if (range_overflows(new_start, new_size, block_end)) + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +924,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { @@ -1066,7 +1085,8 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } while (1); /* Trim the allocated block to the required size */ - if (original_size != size) { + if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); u64 trim_size; @@ -1083,6 +1103,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } drm_buddy_block_trim(mm, + NULL, trim_size, trim_list); diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index fe3779fdba2c..423b261ea743 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -150,7 +150,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, } while (remaining_size); if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { - if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks)) + if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blo
Re: [PATCH] drm/buddy: Add start address support to trim function
Hi, On 21/06/2024 06:29, Arunpravin Paneer Selvam wrote: - Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disable the allocator trimming part. This patch enables the drivers control trimming and they can do it themselves based on the application requirements. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 22 -- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- include/drm/drm_buddy.h | 2 ++ 3 files changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 6a8e45e9d0ec..287b6acb1637 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * drm_buddy_block_trim - free unused pages * * @mm: DRM buddy manager + * @start: start address to begin the trimming. * @new_size: original size requested * @blocks: Input and output list of allocated blocks. * MUST contain single block as input to be trimmed. @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct drm_buddy *mm, * 0 on success, error code on failure. */ int drm_buddy_block_trim(struct drm_buddy *mm, +u64 *start, I guess just wondering if this should be offset within or address. If it offset then zero be the valid default giving the existing behaviour. But hard to say without seeing the user for this. Are there some more patches to give some context for this usecase? u64 new_size, struct list_head *blocks) { struct drm_buddy_block *parent; struct drm_buddy_block *block; + u64 block_start, block_end; LIST_HEAD(dfs); u64 new_start; int err; @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, struct drm_buddy_block, link); + block_start = drm_buddy_block_offset(block); + block_end = block_start + drm_buddy_block_size(mm, block) - 1; + if (WARN_ON(!drm_buddy_block_is_allocated(block))) return -EINVAL; @@ -894,6 +900,17 @@ int drm_buddy_block_trim(struct drm_buddy *mm, if (new_size == drm_buddy_block_size(mm, block)) return 0; + new_start = block_start; + if (start) { + new_start = *start; + + if (new_start < block_start) + return -EINVAL; In addition should check that the alignment of new_start is at least compatible with the min chunk_size. Otherwise I think bad stuff can happen. + + if ((new_start + new_size) > block_end) range_overflows() ? + return -EINVAL; + } + list_del(&block->link); mark_free(mm, block); mm->avail += drm_buddy_block_size(mm, block); @@ -904,7 +921,6 @@ int drm_buddy_block_trim(struct drm_buddy *mm, parent = block->parent; block->parent = NULL; - new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { @@ -1066,7 +1082,8 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } while (1); /* Trim the allocated block to the required size */ - if (original_size != size) { + if (!(flags & DRM_BUDDY_TRIM_DISABLE) && + original_size != size) { struct list_head *trim_list; LIST_HEAD(temp); u64 trim_size; @@ -1083,6 +1100,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, } drm_buddy_block_trim(mm, +NULL, trim_size, trim_list); diff --git a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c index fe3779fdba2c..423b261ea743 100644 --- a/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_vram_mgr.c @@ -150,7 +150,7 @@ static int xe_ttm_vram_mgr_new(struct ttm_resource_manager *man, } while (remaining_size); if (place->flags & TTM_PL_FLAG_CONTIGUOUS) { - if (!drm_buddy_block_trim(mm, vres->base.size, &vres->blocks)) + if (!drm_buddy_block_trim(mm, NULL, vres->base.size, &vres->blocks)) size = vres->base.size; } diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index 2a74fa9d0ce5..9689a7c5dd36 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -27,6 +27,7 @@ #define DRM_BUDDY_CONTIGUOUS_ALLOCATION BIT(
Re: [PATCH v1 1/1] drm/xe/bo: Fix fixed placement ggtt pinning code
On 21/06/2024 08:15, Alan Previn wrote: When calling xe_bo_create_pin_map_at, use the correct starting offset provided by caller at xe_ggtt_insert_bo_at. Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Signed-off-by: Alan Previn --- drivers/gpu/drm/xe/xe_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 74294f1b05bc..cc6101b49c08 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -1436,7 +1436,7 @@ __xe_bo_create_locked(struct xe_device *xe, if (flags & XE_BO_FLAG_FIXED_PLACEMENT) { err = xe_ggtt_insert_bo_at(tile->mem.ggtt, bo, - start + bo->size, U64_MAX); + start, end); I think this is just about controlling the ggtt bias range. i.e please allocate bo->size somewhere within [start + bo->size, U64_MAX]. There was a reason for the + bo->size, I think something odd like initial fb using low ggtt address (pre-programmed by bios), and wanting to use that for guc or something instead so we effectively move it with the above. I think we also only need such an interface for funky stuff like initial fb, which is either in stolen or vram. The actual allocate backing pages part of the api should respect start, like for vram and stolen, which is the main point of FIXED_PLACEMENT. Trying to do that for system memory is not really possible AFAIK. If we just want something with specific ggtt alignment we could plumb that through to the mm interface. For example, put this anywhere within ggtt but ensure address is aligned to 256M. We then only pass FLAG_GGTT and not FIXED_PLACEMENT. } else { err = xe_ggtt_insert_bo(tile->mem.ggtt, bo); } base-commit: cffd77865f476994680892601e09bc2164179907
Re: [PATCH] drm/xe: Use write-back caching mode for system memory on DGFX
On 19/06/2024 17:39, Thomas Hellström wrote: The caching mode for buffer objects with VRAM as a possible placement was forced to write-combined, regardless of placement. However, write-combined system memory is expensive to allocate and even though it is pooled, the pool is expensive to shrink, since it involves global CPU TLB flushes. Moreover write-combined system memory from TTM is only reliably available on x86 and DGFX doesn't have an x86 restriction. So regardless of the cpu caching mode selected for a bo, internally use write-back caching mode for system memory on DGFX. Coherency is maintained, but user-space clients may perceive a difference in cpu access speeds. Signed-off-by: Thomas Hellström Fixes: 622f709ca629 ("drm/xe/uapi: Add support for CPU caching mode") Cc: Pallavi Mishra Cc: Matthew Auld Cc: dri-devel@lists.freedesktop.org Cc: Joonas Lahtinen Cc: Effie Yu Cc: Matthew Brost Cc: Maarten Lankhorst Cc: Jose Souza Cc: Michal Mrozek Cc: # v6.8+ Acked-by: Matthew Auld --- drivers/gpu/drm/xe/xe_bo.c | 47 +++- drivers/gpu/drm/xe/xe_bo_types.h | 3 +- include/uapi/drm/xe_drm.h| 8 +- 3 files changed, 37 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index 65c696966e96..31192d983d9e 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -343,7 +343,7 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, struct xe_device *xe = xe_bo_device(bo); struct xe_ttm_tt *tt; unsigned long extra_pages; - enum ttm_caching caching; + enum ttm_caching caching = ttm_cached; int err; tt = kzalloc(sizeof(*tt), GFP_KERNEL); @@ -357,26 +357,35 @@ static struct ttm_tt *xe_ttm_tt_create(struct ttm_buffer_object *ttm_bo, extra_pages = DIV_ROUND_UP(xe_device_ccs_bytes(xe, bo->size), PAGE_SIZE); - switch (bo->cpu_caching) { - case DRM_XE_GEM_CPU_CACHING_WC: - caching = ttm_write_combined; - break; - default: - caching = ttm_cached; - break; - } - - WARN_ON((bo->flags & XE_BO_FLAG_USER) && !bo->cpu_caching); - /* -* Display scanout is always non-coherent with the CPU cache. -* -* For Xe_LPG and beyond, PPGTT PTE lookups are also non-coherent and -* require a CPU:WC mapping. +* DGFX system memory is always WB / ttm_cached, since +* other caching modes are only supported on x86. DGFX +* GPU system memory accesses are always coherent with the +* CPU. */ - if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) || - (xe->info.graphics_verx100 >= 1270 && bo->flags & XE_BO_FLAG_PAGETABLE)) - caching = ttm_write_combined; + if (!IS_DGFX(xe)) { + switch (bo->cpu_caching) { + case DRM_XE_GEM_CPU_CACHING_WC: + caching = ttm_write_combined; + break; + default: + caching = ttm_cached; + break; + } + + WARN_ON((bo->flags & XE_BO_FLAG_USER) && !bo->cpu_caching); + + /* +* Display scanout is always non-coherent with the CPU cache. +* +* For Xe_LPG and beyond, PPGTT PTE lookups are also +* non-coherent and require a CPU:WC mapping. +*/ + if ((!bo->cpu_caching && bo->flags & XE_BO_FLAG_SCANOUT) || + (xe->info.graphics_verx100 >= 1270 && +bo->flags & XE_BO_FLAG_PAGETABLE)) + caching = ttm_write_combined; + } if (bo->flags & XE_BO_FLAG_NEEDS_UC) { /* diff --git a/drivers/gpu/drm/xe/xe_bo_types.h b/drivers/gpu/drm/xe/xe_bo_types.h index 86422e113d39..10450f1fbbde 100644 --- a/drivers/gpu/drm/xe/xe_bo_types.h +++ b/drivers/gpu/drm/xe/xe_bo_types.h @@ -66,7 +66,8 @@ struct xe_bo { /** * @cpu_caching: CPU caching mode. Currently only used for userspace -* objects. +* objects. Exceptions are system memory on DGFX, which is always +* WB. */ u16 cpu_caching; diff --git a/include/uapi/drm/xe_drm.h b/include/uapi/drm/xe_drm.h index 93e00be44b2d..1189b3044723 100644 --- a/include/uapi/drm/xe_drm.h +++ b/include/uapi/drm/xe_drm.h @@ -783,7 +783,13 @@ struct drm_xe_gem_create { #define DRM_XE_GEM_CPU_CACHING_WC 2 /** * @cpu_caching: The CPU caching mode to select for this object. If -* mmaping the object the mode selected here will also be used. +* mmaping the object
Re: [PATCH] dma-buf: add a warning when drv try to reserve 0 fence slots
On 29/05/2024 09:43, Christian König wrote: When dma_resv_reserve_fences() is called with num_fences=0 it usually means that a driver or other component messed up its calculation how many fences are needed. Warn in that situation. When no fence are needed the function shouldn't be called in the first place. Signed-off-by: Christian König Reviewed-by: Matthew Auld
Re: [PATCH v2] drm/buddy: Fix the warn on's during force merge
On 17/05/2024 14:50, Arunpravin Paneer Selvam wrote: Move the fallback and block incompatible checks above, so that we dont unnecessarily split the blocks and leaving the unmerged. This resolves the unnecessary warn on's thrown during force_merge call. v2:(Matthew) - Move the fallback and block incompatible checks above the contains check. Signed-off-by: Arunpravin Paneer Selvam Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature") Reviewed-by: Matthew Auld A follow up unit test to catch this edge case would be lovely. --- drivers/gpu/drm/drm_buddy.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 1daf778cf6fa..94f8c34fc293 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -524,11 +524,11 @@ __alloc_range_bias(struct drm_buddy *mm, continue; } + if (!fallback && block_incompatible(block, flags)) + continue; + if (contains(start, end, block_start, block_end) && order == drm_buddy_block_order(block)) { - if (!fallback && block_incompatible(block, flags)) - continue; - /* * Find the free block within the range. */
Re: [PATCH] drm/buddy: Merge back blocks in bias range function
On 17/05/2024 13:38, Arunpravin Paneer Selvam wrote: In bias range allocation, when we don't find the required blocks (i.e) on returning the -ENOSPC, we should merge back the split blocks. Otherwise, during force_merge we are flooded with warn on's due to block and its buddy are in same clear state (dirty or clear). Hence, renamed the force_merge with merge_blocks and passed a force_merge as a bool function parameter. Based on the requirement, say, in any normal situation we can call the merge_blocks passing the force_merge variable as false. And, in any memory cruch situation, we can call the merge_blocks passing the force_merge as true. This resolves the unnecessary warn on's thrown during force_merge call. Signed-off-by: Arunpravin Paneer Selvam Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature") --- drivers/gpu/drm/drm_buddy.c | 32 ++-- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 1daf778cf6fa..111f602f1359 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -161,10 +161,11 @@ static unsigned int __drm_buddy_free(struct drm_buddy *mm, return order; } -static int __force_merge(struct drm_buddy *mm, -u64 start, -u64 end, -unsigned int min_order) +static int __merge_blocks(struct drm_buddy *mm, + u64 start, + u64 end, + unsigned int min_order, + bool force_merge) { unsigned int order; int i; @@ -195,8 +196,9 @@ static int __force_merge(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) continue; - WARN_ON(drm_buddy_block_is_clear(block) == - drm_buddy_block_is_clear(buddy)); + if (force_merge) + WARN_ON(drm_buddy_block_is_clear(block) == + drm_buddy_block_is_clear(buddy)); /* * If the prev block is same as buddy, don't access the @@ -210,7 +212,7 @@ static int __force_merge(struct drm_buddy *mm, if (drm_buddy_block_is_clear(block)) mm->clear_avail -= drm_buddy_block_size(mm, block); - order = __drm_buddy_free(mm, block, true); + order = __drm_buddy_free(mm, block, force_merge); if (order >= min_order) return 0; } @@ -332,7 +334,7 @@ void drm_buddy_fini(struct drm_buddy *mm) for (i = 0; i < mm->n_roots; ++i) { order = ilog2(size) - ilog2(mm->chunk_size); - __force_merge(mm, 0, size, order); + __merge_blocks(mm, 0, size, order, true); WARN_ON(!drm_buddy_block_is_free(mm->roots[i])); drm_block_free(mm, mm->roots[i]); @@ -479,7 +481,7 @@ __alloc_range_bias(struct drm_buddy *mm, unsigned long flags, bool fallback) { - u64 req_size = mm->chunk_size << order; + u64 size, root_size, req_size = mm->chunk_size << order; struct drm_buddy_block *block; struct drm_buddy_block *buddy; LIST_HEAD(dfs); @@ -487,6 +489,7 @@ __alloc_range_bias(struct drm_buddy *mm, int i; end = end - 1; + size = mm->size; for (i = 0; i < mm->n_roots; ++i) list_add_tail(&mm->roots[i]->tmp_link, &dfs); @@ -548,6 +551,15 @@ __alloc_range_bias(struct drm_buddy *mm, list_add(&block->left->tmp_link, &dfs); } while (1); + /* Merge back the split blocks */ + for (i = 0; i < mm->n_roots; ++i) { + order = ilog2(size) - ilog2(mm->chunk_size); + __merge_blocks(mm, start, end, order, false); + + root_size = mm->chunk_size << order; + size -= root_size; + } Hmm, can't we just not split a given block if it is incompatible? Like say we are looking for cleared, there is not much point in splitting blocks that are dirty on this pass, right? What about moving the incompatible check earlier like: if (!fallback && block_incompatible(block) continue; Would that not fix the issue? + return ERR_PTR(-ENOSPC); err_undo: @@ -1026,7 +1038,7 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, if (order-- == min_order) { /* Try allocation through force merge method */ if (mm->clear_avail && - !__force_merge(mm, start, end, min_order)) { + !__merge_blocks(mm, start, end, min_order, true)) { block = __drm_buddy_alloc_blocks(mm, star
Re: [PATCH 1/2] drm/buddy: stop using PAGE_SIZE
On 17/05/2024 12:00, Christian König wrote: Am 17.05.24 um 10:53 schrieb Matthew Auld: On 17/05/2024 02:11, Dave Airlie wrote: On Thu, 29 Feb 2024 at 23:48, Arnd Bergmann wrote: On Thu, Feb 29, 2024, at 11:51, Matthew Auld wrote: The drm_buddy minimum page-size requirements should be distinct from the CPU PAGE_SIZE. Only restriction is that the minimum page-size is at least 4K. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Arnd Bergmann Acked-by: Arnd Bergmann Has this landed anywhere yet? Looks like it fell through the cracks. I think it still applies, so just needs someone with commit rights to push it. Pushed to drm-misc-fixes. Thanks. Regards, Christian. I'm been testing 6.9 on 64K pages and the buddy tests are exploding so I wanted to pull this in. Dave.
Re: [PATCH 1/2] drm/buddy: stop using PAGE_SIZE
On 17/05/2024 02:11, Dave Airlie wrote: On Thu, 29 Feb 2024 at 23:48, Arnd Bergmann wrote: On Thu, Feb 29, 2024, at 11:51, Matthew Auld wrote: The drm_buddy minimum page-size requirements should be distinct from the CPU PAGE_SIZE. Only restriction is that the minimum page-size is at least 4K. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Arnd Bergmann Acked-by: Arnd Bergmann Has this landed anywhere yet? Looks like it fell through the cracks. I think it still applies, so just needs someone with commit rights to push it. I'm been testing 6.9 on 64K pages and the buddy tests are exploding so I wanted to pull this in. Dave.
Re: [PATCH v2 2/2] drm/tests: Add a unit test for range bias allocation
On 13/05/2024 16:11, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 5/13/2024 1:49 PM, Matthew Auld wrote: On 12/05/2024 08:59, Arunpravin Paneer Selvam wrote: Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 35 ++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index e3b50e240d36..a194f271bc55 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -26,6 +26,8 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; DRM_RND_STATE(prng, random_seed); unsigned int i, count, *order; + struct drm_buddy_block *block; + unsigned long flags; struct drm_buddy mm; LIST_HEAD(allocated); @@ -222,6 +224,39 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_free_list(&mm, &allocated, 0); drm_buddy_fini(&mm); + + /* + * Allocate cleared blocks in the bias range when the DRM buddy's clear avail is + * zero. This will validate the bias range allocation in scenarios like system boot + * when no cleared blocks are available and exercise the fallback path too. The resulting + * blocks should always be dirty. + */ + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps), + "buddy_init failed\n"); + mm.clear_avail = 0; Should already be zero, right? Maybe make this an assert instead? No, since the mm declared as a local variable in the test case, mm.clear_avail is not zero. That sounds like a bug IMO. The init() should initialize it, like it does for mm.avail and everything else. + + bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps); + bias_end = round_up(bias_start + prandom_u32_state(&prng) % (mm_size - bias_start), ps); + bias_end = max(bias_end, bias_start + ps); + bias_rem = bias_end - bias_start; + + flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION; + u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps); u32 declaration should be moved to above? Sure. Thanks, Arun. Otherwise, Reviewed-by: Matthew Auld + + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, + bias_end, size, ps, + &allocated, + flags), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, size, ps); + + list_for_each_entry(block, &allocated, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(&mm, &allocated, 0); + drm_buddy_fini(&mm); } static void drm_test_buddy_alloc_clear(struct kunit *test)
Re: [PATCH v2 2/2] drm/tests: Add a unit test for range bias allocation
On 12/05/2024 08:59, Arunpravin Paneer Selvam wrote: Allocate cleared blocks in the bias range when the DRM buddy's clear avail is zero. This will validate the bias range allocation in scenarios like system boot when no cleared blocks are available and exercise the fallback path too. The resulting blocks should always be dirty. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 35 ++ 1 file changed, 35 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index e3b50e240d36..a194f271bc55 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -26,6 +26,8 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; DRM_RND_STATE(prng, random_seed); unsigned int i, count, *order; + struct drm_buddy_block *block; + unsigned long flags; struct drm_buddy mm; LIST_HEAD(allocated); @@ -222,6 +224,39 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_free_list(&mm, &allocated, 0); drm_buddy_fini(&mm); + + /* +* Allocate cleared blocks in the bias range when the DRM buddy's clear avail is +* zero. This will validate the bias range allocation in scenarios like system boot +* when no cleared blocks are available and exercise the fallback path too. The resulting +* blocks should always be dirty. +*/ + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps), + "buddy_init failed\n"); + mm.clear_avail = 0; Should already be zero, right? Maybe make this an assert instead? + + bias_start = round_up(prandom_u32_state(&prng) % (mm_size - ps), ps); + bias_end = round_up(bias_start + prandom_u32_state(&prng) % (mm_size - bias_start), ps); + bias_end = max(bias_end, bias_start + ps); + bias_rem = bias_end - bias_start; + + flags = DRM_BUDDY_CLEAR_ALLOCATION | DRM_BUDDY_RANGE_ALLOCATION; + u32 size = max(round_up(prandom_u32_state(&prng) % bias_rem, ps), ps); u32 declaration should be moved to above? Otherwise, Reviewed-by: Matthew Auld + + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, + bias_end, size, ps, + &allocated, + flags), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, size, ps); + + list_for_each_entry(block, &allocated, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(&mm, &allocated, 0); + drm_buddy_fini(&mm); } static void drm_test_buddy_alloc_clear(struct kunit *test)
[PATCH 01/20] drm/drm_managed: try to improve the drmm DOC
Hopefully make it clearer when to use devm vs drmm. Signed-off-by: Matthew Auld Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org --- drivers/gpu/drm/drm_managed.c | 42 +++ 1 file changed, 42 insertions(+) diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c index 7646f67bda4e..20d705bbc0a3 100644 --- a/drivers/gpu/drm/drm_managed.c +++ b/drivers/gpu/drm/drm_managed.c @@ -34,6 +34,48 @@ * during the lifetime of the driver, all the functions are fully concurrent * safe. But it is recommended to use managed resources only for resources that * change rarely, if ever, during the lifetime of the &drm_device instance. + * + * Note that the distinction between devm and drmm is important to get right. + * Consider some hotunplug scenarios, where it is valid for there to be multiple + * unplugged struct &drm_device instances each being kept alive by an open + * driver fd. The driver needs a clean separation between what needs to happen + * when the struct &device is removed and what needs to happen when a given + * struct &drm_device instance is released, as well as in some cases a more + * finer grained marking of critical sections that require hardware interaction. + * See below. + * + * devm + * + * In general use devm for cleaning up anything hardware related. So removing + * pci mmaps, releasing interrupt handlers, basically anything hw related. The + * devm release actions are called when the struct &device is removed, shortly + * after calling into the drivers struct &pci_driver.remove() callback, if this + * is a pci device. + * + * devm can be thought of as an alternative to putting all the hw related + * cleanup directly in the struct &pci_driver.remove() callback, where the + * correct ordering of the unwind steps needs to be manually done in the error + * path of the struct &pci_driver.probe() and again on the remove side. With + * devm this is all done automatically. + * + * drmm + * + * In general use this for cleaning up anything software related. So data + * structures and the like which are tied to the lifetime of a particular struct + * &drm_device instance. + * + * drmm can be thought of as an alternative to putting all the software related + * cleanup directly in the struct &drm_driver.release() callback, where again + * the correct ordering of the unwind steps needs to be done manually. As with + * devm this is instead done automatically. + * + * Sometimes there is no clean separation between software and hardware, which + * is where drm_dev_enter() comes in. For example, a driver might have some + * state tied to a struct &drm_device instance, for which the same cleanup path + * is called for both a plugged and unplugged device, and the cleanup itself + * might require talking to the device if it's still attached to this particular + * struct &drm_device. For that we instead mark the device sections. See + * drm_dev_enter(), drm_dev_exit() and drm_dev_unplug(). */ struct drmres_node { -- 2.45.0
Re: [PATCH] drm/buddy: Fix the range bias clear memory allocation issue
On 06/05/2024 14:38, Arunpravin Paneer Selvam wrote: Problem statement: During the system boot time, an application request for the bulk volume of cleared range bias memory when the clear_avail is zero, we dont fallback into normal allocation method as we had an unnecessary clear_avail check which prevents the fallback method leads to fb allocation failure following system goes into unresponsive state. Solution: Remove the unnecessary clear_avail check in the range bias allocation function. Signed-off-by: Arunpravin Paneer Selvam Fixes: 96950929eb23 ("drm/buddy: Implement tracking clear page feature") Reviewed-by: Matthew Auld
Re: [PATCH v10 4/9] drm/ttm/tests: Add tests with mock resource managers
On 22/03/2024 14:29, Karolina Stolarek wrote: Add mock resource manager to test ttm_bo_validate() with non-system placements. Update KConfig entry to enable DRM Buddy allocator, used by the mock manager. Update move function to do more than just assign a resource. Signed-off-by: Karolina Stolarek --- drivers/gpu/drm/Kconfig | 1 + drivers/gpu/drm/ttm/tests/.kunitconfig| 1 + drivers/gpu/drm/ttm/tests/Makefile| 1 + .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 276 ++ drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 39 ++- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h | 2 + drivers/gpu/drm/ttm/tests/ttm_mock_manager.c | 207 + drivers/gpu/drm/ttm/tests/ttm_mock_manager.h | 31 ++ 8 files changed, 556 insertions(+), 2 deletions(-) create mode 100644 drivers/gpu/drm/ttm/tests/ttm_mock_manager.c create mode 100644 drivers/gpu/drm/ttm/tests/ttm_mock_manager.h diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig index 91776996ada4..9fb6eb785bf9 100644 --- a/drivers/gpu/drm/Kconfig +++ b/drivers/gpu/drm/Kconfig @@ -200,6 +200,7 @@ config DRM_TTM_KUNIT_TEST default n depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST) select DRM_TTM +select DRM_BUDDY select DRM_EXPORT_FOR_TESTS if m select DRM_KUNIT_TEST_HELPERS default KUNIT_ALL_TESTS diff --git a/drivers/gpu/drm/ttm/tests/.kunitconfig b/drivers/gpu/drm/ttm/tests/.kunitconfig index 75fdce0cd98e..9228ce9b913c 100644 --- a/drivers/gpu/drm/ttm/tests/.kunitconfig +++ b/drivers/gpu/drm/ttm/tests/.kunitconfig @@ -2,3 +2,4 @@ CONFIG_KUNIT=y CONFIG_DRM=y CONFIG_DRM_KUNIT_TEST_HELPERS=y CONFIG_DRM_TTM_KUNIT_TEST=y +CONFIG_DRM_BUDDY=y diff --git a/drivers/gpu/drm/ttm/tests/Makefile b/drivers/gpu/drm/ttm/tests/Makefile index 2e5ed63fb414..f3149de77541 100644 --- a/drivers/gpu/drm/ttm/tests/Makefile +++ b/drivers/gpu/drm/ttm/tests/Makefile @@ -7,4 +7,5 @@ obj-$(CONFIG_DRM_TTM_KUNIT_TEST) += \ ttm_tt_test.o \ ttm_bo_test.o \ ttm_bo_validate_test.o \ +ttm_mock_manager.o \ ttm_kunit_helpers.o diff --git a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c index 8229bb31d747..7070b5d16c10 100644 --- a/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c +++ b/drivers/gpu/drm/ttm/tests/ttm_bo_validate_test.c @@ -8,12 +8,15 @@ #include #include "ttm_kunit_helpers.h" +#include "ttm_mock_manager.h" #define BO_SIZE SZ_4K +#define MANAGER_SIZE SZ_1M struct ttm_bo_validate_test_case { const char *description; enum ttm_bo_type bo_type; + uint32_t mem_type; bool with_ttm; }; @@ -102,6 +105,49 @@ static void ttm_bo_init_reserved_sys_man(struct kunit *test) ttm_bo_put(bo); } +static void ttm_bo_init_reserved_mock_man(struct kunit *test) +{ + const struct ttm_bo_validate_test_case *params = test->param_value; + enum ttm_bo_type bo_type = params->bo_type; + struct ttm_test_devices *priv = test->priv; + uint32_t size = ALIGN(BO_SIZE, PAGE_SIZE); + struct ttm_operation_ctx ctx = { }; + struct ttm_placement *placement; + uint32_t mem_type = TTM_PL_VRAM; + struct ttm_buffer_object *bo; + struct ttm_place *place; + int err; + + ttm_mock_manager_init(priv->ttm_dev, mem_type, MANAGER_SIZE); + + bo = kunit_kzalloc(test, sizeof(*bo), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, bo); + + place = ttm_place_kunit_init(test, mem_type, 0); + placement = ttm_placement_kunit_init(test, place, 1); + + drm_gem_private_object_init(priv->drm, &bo->base, size); + + err = ttm_bo_init_reserved(priv->ttm_dev, bo, bo_type, placement, + PAGE_SIZE, &ctx, NULL, NULL, + &dummy_ttm_bo_destroy); + dma_resv_unlock(bo->base.resv); + + KUNIT_EXPECT_EQ(test, err, 0); + KUNIT_EXPECT_EQ(test, kref_read(&bo->kref), 1); + KUNIT_EXPECT_PTR_EQ(test, bo->bdev, priv->ttm_dev); + KUNIT_EXPECT_EQ(test, bo->type, bo_type); + KUNIT_EXPECT_EQ(test, ctx.bytes_moved, size); + + if (bo_type != ttm_bo_type_kernel) + KUNIT_EXPECT_TRUE(test, + drm_mm_node_allocated(&bo->base.vma_node.vm_node)); + + ttm_resource_free(bo, &bo->resource); + ttm_bo_put(bo); + ttm_mock_manager_fini(priv->ttm_dev, mem_type); +} + static void ttm_bo_init_reserved_resv(struct kunit *test) { enum ttm_bo_type bo_type = ttm_bo_type_device; @@ -136,6 +182,51 @@ static void ttm_bo_init_reserved_resv(struct kunit *test) ttm_bo_put(bo); } +static void ttm_bo_validate_basic(struct kunit *test) +{ + const struct ttm_bo_validate_test_case *params = test->param_value; + uint32_t fst_mem = TTM_PL_SYSTEM, snd_mem = TTM_P
Re: [PATCH v10 3/9] drm/ttm/tests: Test simple BO creation and validation
o->resource); + ttm_bo_put(bo); +} + +static void ttm_bo_init_reserved_resv(struct kunit *test) +{ + enum ttm_bo_type bo_type = ttm_bo_type_device; + struct ttm_test_devices *priv = test->priv; + uint32_t size = ALIGN(BO_SIZE, PAGE_SIZE); + struct ttm_operation_ctx ctx = { }; + struct ttm_placement *placement; + struct ttm_buffer_object *bo; + struct ttm_place *place; + struct dma_resv resv; + int err; + + bo = kunit_kzalloc(test, sizeof(*bo), GFP_KERNEL); + KUNIT_ASSERT_NOT_NULL(test, bo); + + place = ttm_place_kunit_init(test, TTM_PL_SYSTEM, 0); + placement = ttm_placement_kunit_init(test, place, 1); + + drm_gem_private_object_init(priv->drm, &bo->base, size); + dma_resv_init(&resv); + dma_resv_lock(&resv, NULL); + + err = ttm_bo_init_reserved(priv->ttm_dev, bo, bo_type, placement, + PAGE_SIZE, &ctx, NULL, &resv, + &dummy_ttm_bo_destroy); + dma_resv_unlock(bo->base.resv); + + KUNIT_EXPECT_EQ(test, err, 0); + KUNIT_EXPECT_PTR_EQ(test, bo->base.resv, &resv); + + ttm_resource_free(bo, &bo->resource); + ttm_bo_put(bo); +} + +static void ttm_bo_validate_invalid_placement(struct kunit *test) +{ + enum ttm_bo_type bo_type = ttm_bo_type_device; + uint32_t unknown_mem_type = TTM_PL_PRIV + 1; + uint32_t size = ALIGN(BO_SIZE, PAGE_SIZE); + struct ttm_operation_ctx ctx = { }; + struct ttm_placement *placement; + struct ttm_buffer_object *bo; + struct ttm_place *place; + int err; + + place = ttm_place_kunit_init(test, unknown_mem_type, 0); + placement = ttm_placement_kunit_init(test, place, 1); + + bo = ttm_bo_kunit_init(test, test->priv, size); + bo->type = bo_type; + + ttm_bo_reserve(bo, false, false, NULL); + err = ttm_bo_validate(bo, placement, &ctx); + dma_resv_unlock(bo->base.resv); + + KUNIT_EXPECT_EQ(test, err, -ENOMEM); + + ttm_bo_put(bo); +} + +static void ttm_bo_validate_pinned(struct kunit *test) +{ + enum ttm_bo_type bo_type = ttm_bo_type_device; + uint32_t size = ALIGN(BO_SIZE, PAGE_SIZE); + struct ttm_operation_ctx ctx = { }; + uint32_t mem_type = TTM_PL_SYSTEM; + struct ttm_placement *placement; + struct ttm_buffer_object *bo; + struct ttm_place *place; + int err; + + place = ttm_place_kunit_init(test, mem_type, 0); + placement = ttm_placement_kunit_init(test, place, 1); + + bo = ttm_bo_kunit_init(test, test->priv, size); + bo->type = bo_type; + + ttm_bo_reserve(bo, false, false, NULL); + ttm_bo_pin(bo); + err = ttm_bo_validate(bo, placement, &ctx); + dma_resv_unlock(bo->base.resv); + + KUNIT_EXPECT_EQ(test, err, -EINVAL); ttm_bo_put(bo) ? Reviewed-by: Matthew Auld
Re: [PATCH v10 2/9] drm/ttm/tests: Use an init function from the helpers lib
On 22/03/2024 14:29, Karolina Stolarek wrote: Add a new helper function that also initializes the device. Use it in ttm_tt test suite and delete the local definition. Signed-off-by: Karolina Stolarek Reviewed-by: Matthew Auld
Re: [PATCH v10 1/3] drm/buddy: Implement tracking clear page feature
On 08/04/2024 16:16, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) v5: - remove force merge support to actual range allocation and not to bail out when contains && split(Matthew) - add range support to force merge function. Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 430 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 28 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 16 +- 6 files changed, 368 insertions(+), 122 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 5ebdd6f8f36e..83dbe252f727 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -38,8 +38,8 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); } -static void list_insert_sorted(struct drm_buddy *mm, - struct drm_buddy_block *block) +static void list_insert(struct drm_buddy *mm, + struct drm_buddy_block *block)
Re: [PATCH v10 3/3] drm/tests: Add a test case for drm buddy clear allocation
On 08/04/2024 16:16, Arunpravin Paneer Selvam wrote: Add a new test case for the drm buddy clear and dirty allocation. v2:(Matthew) - make size as u32 - rename PAGE_SIZE with SZ_4K - dont fragment the address space for all the order allocation iterations. we can do it once and just increment and allocate the size. - create new mm with non power-of-two size to ensure the multi-root force_merge during fini. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/tests/drm_buddy_test.c | 141 + 1 file changed, 141 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index 4621a860cb05..b07f132f2835 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -224,6 +224,146 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) drm_buddy_fini(&mm); } +static void drm_test_buddy_alloc_clear(struct kunit *test) +{ + unsigned long n_pages, total, i = 0; + const unsigned long ps = SZ_4K; + struct drm_buddy_block *block; + const int max_order = 12; + LIST_HEAD(allocated); + struct drm_buddy mm; + unsigned int order; + u32 mm_size, size; + LIST_HEAD(dirty); + LIST_HEAD(clean); + + mm_size = SZ_4K << max_order; + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + KUNIT_EXPECT_EQ(test, mm.max_order, max_order); + + /* +* Idea is to allocate and free some random portion of the address space, +* returning those pages as non-dirty and randomly alternate between +* requesting dirty and non-dirty pages (not going over the limit +* we freed as non-dirty), putting that into two separate lists. +* Loop over both lists at the end checking that the dirty list +* is indeed all dirty pages and vice versa. Free it all again, +* keeping the dirty/clear status. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 5 * ps, ps, &allocated, + DRM_BUDDY_TOPDOWN_ALLOCATION), + "buddy_alloc hit an error size=%lu\n", 5 * ps); + drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED); + + n_pages = 10; + do { + unsigned long flags; + struct list_head *list; + int slot = i % 2; + + if (slot == 0) { + list = &dirty; + flags = 0; + } else { + list = &clean; + flags = DRM_BUDDY_CLEAR_ALLOCATION; + } + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, + flags), + "buddy_alloc hit an error size=%lu\n", ps); + } while (++i < n_pages); + + list_for_each_entry(block, &clean, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), true); + + list_for_each_entry(block, &dirty, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED); + + /* +* Trying to go over the clear limit for some allocation. +* The allocation should never fail with reasonable page-size. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 10 * ps, ps, &clean, + DRM_BUDDY_CLEAR_ALLOCATION), + "buddy_alloc hit an error size=%lu\n", 10 * ps); + + drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED); + drm_buddy_free_list(&mm, &dirty, 0); + drm_buddy_fini(&mm); + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /* +* Create a new mm. Intentionally fragment the address space by creating +* two alternating lists. Free both lists, one as dirty the other as clean. +* Try to allocate double the previous size with matching min_page_size. The +* allocation should never fail as it calls the force_merge. Also check that +* the page is always dirty after force_merge. Free the page as dirty, then +* repeat the whole thing, increment the order until we hit the max_order. +*/ + + i = 0; + n_pages = mm_size / ps; + do { + struct list_head *list; +
Re: [PATCH v9 1/3] drm/buddy: Implement tracking clear page feature
On 01/04/2024 12:07, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 3/28/2024 10:18 PM, Matthew Auld wrote: On 28/03/2024 16:07, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 3/26/2024 11:39 PM, Matthew Auld wrote: On 18/03/2024 21:40, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 427 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c | 18 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 16 +- 6 files changed, 360 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c4222b886db7..625a30a6b855 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -38,8 +38,8 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); } -static void list_insert_sorted(struct drm_buddy *mm, - struct drm_buddy_block *block) +static void list_insert(struct drm_buddy *mm, + struct drm_buddy_block *block)
Re: [PATCH v9 1/3] drm/buddy: Implement tracking clear page feature
On 28/03/2024 16:07, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 3/26/2024 11:39 PM, Matthew Auld wrote: On 18/03/2024 21:40, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 427 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c | 18 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 16 +- 6 files changed, 360 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c4222b886db7..625a30a6b855 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -38,8 +38,8 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); } -static void list_insert_sorted(struct drm_buddy *mm, - struct drm_buddy_block *block) +static void list_insert(struct drm_buddy *mm, + struct drm_buddy_block *block) { struct drm_buddy_block *node; struct list_head *head; @@ -57,6 +57,16 @@ static void list_insert_sorte
Re: [PATCH v9 1/3] drm/buddy: Implement tracking clear page feature
On 18/03/2024 21:40, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) v4: - rename the function drm_buddy_defrag with __force_merge. - Include __force_merge directly in drm buddy file and remove the defrag use in amdgpu driver. - Remove list_empty() check(Matthew) - Remove unnecessary space, headers and placement of new variables(Matthew) - Add a unit test case(Matthew) Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 427 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 18 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 16 +- 6 files changed, 360 insertions(+), 117 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c4222b886db7..625a30a6b855 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -38,8 +38,8 @@ static void drm_block_free(struct drm_buddy *mm, kmem_cache_free(slab_blocks, block); } -static void list_insert_sorted(struct drm_buddy *mm, - struct drm_buddy_block *block) +static void list_insert(struct drm_buddy *mm, + struct drm_buddy_block *block) { struct drm_buddy_block *node; struct list_head *head; @@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm, __list_add(&bl
Re: [PATCH v9 3/3] drm/tests: Add a test case for drm buddy clear allocation
On 18/03/2024 21:40, Arunpravin Paneer Selvam wrote: Add a new test case for the drm buddy clear and dirty allocation. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/tests/drm_buddy_test.c | 127 + 1 file changed, 127 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index 454ad9952f56..d355a6e61893 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -19,6 +19,132 @@ static inline u64 get_size(int order, u64 chunk_size) return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_clear(struct kunit *test) +{ + unsigned long n_pages, total, i = 0; + const unsigned long ps = SZ_4K; + struct drm_buddy_block *block; + const int max_order = 12; + LIST_HEAD(allocated); + struct drm_buddy mm; + unsigned int order; + u64 mm_size, size; Maybe just make these two u32 or unsigned long. That should be big enough, plus avoids any kind of 32b compilation bugs below. + LIST_HEAD(dirty); + LIST_HEAD(clean); + + mm_size = PAGE_SIZE << max_order; s/PAGE_SIZE/SZ_4K/ below also. + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + KUNIT_EXPECT_EQ(test, mm.max_order, max_order); + + /** Drop the extra *, since is not actual kernel-doc. Below also. +* Idea is to allocate and free some random portion of the address space, +* returning those pages as non-dirty and randomly alternate between +* requesting dirty and non-dirty pages (not going over the limit +* we freed as non-dirty), putting that into two separate lists. +* Loop over both lists at the end checking that the dirty list +* is indeed all dirty pages and vice versa. Free it all again, +* keeping the dirty/clear status. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 5 * ps, ps, &allocated, + DRM_BUDDY_TOPDOWN_ALLOCATION), + "buddy_alloc hit an error size=%u\n", 5 * ps); + drm_buddy_free_list(&mm, &allocated, DRM_BUDDY_CLEARED); + + n_pages = 10; + do { + unsigned long flags; + struct list_head *list; + int slot = i % 2; + + if (slot == 0) { + list = &dirty; + flags = 0; + } else if (slot == 1) { Could just be else { + list = &clean; + flags = DRM_BUDDY_CLEAR_ALLOCATION; + } + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, + flags), + "buddy_alloc hit an error size=%u\n", ps); + } while (++i < n_pages); + + list_for_each_entry(block, &clean, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), true); + + list_for_each_entry(block, &dirty, link) + KUNIT_EXPECT_EQ(test, drm_buddy_block_is_clear(block), false); + + drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED); + + /** +* Trying to go over the clear limit for some allocation. +* The allocation should never fail with reasonable page-size. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 10 * ps, ps, &clean, + DRM_BUDDY_CLEAR_ALLOCATION), + "buddy_alloc hit an error size=%u\n", 10 * ps); + + drm_buddy_free_list(&mm, &clean, DRM_BUDDY_CLEARED); + drm_buddy_free_list(&mm, &dirty, 0); + drm_buddy_fini(&mm); + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /** +* Create a new mm. Intentionally fragment the address space by creating +* two alternating lists. Free both lists, one as dirty the other as clean. +* Try to allocate double the previous size with matching min_page_size. The +* allocation should never fail as it calls the force_merge. Also check that +* the page is always dirty after force_merge. Free the page as dirty, then +* repeat the whole thing, increment the order until we hit the max_order. +*/ + + order = 1; + do { + size = PAGE_SIZE << order; + i = 0; + n_pages = mm_size / ps; +
Re: [PATCH v8 1/3] drm/buddy: Implement tracking clear page feature
On 07/03/2024 12:25, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 3/6/2024 11:19 PM, Matthew Auld wrote: On 04/03/2024 16:32, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld Is there a unit test for this? What about maybe something roughly like: - Pick small random mm_size which is not always power-of-two. - Allocate and free some random portion of the address space, returning those pages as non-dirty. Then do another cycle and randomly alternate between requesting dirty and non-dirty pages (not going over the limit you freed as non-dirty), putting that into two separate lists. Loop over both lists at the end checking that the dirty list is indeed all dirty pages and vice versa. Free it all again, keeping the dirty/clear status. - Also try to go over the clear limit for some allocation. The allocation should never fail with reasonable page-size. - Test the defrag/force_merge interface. Clean the mm or create new one. Intentionally fragment the address space, by creating two alternating lists. Free both lists, one as dirty the other as clean. Try to allocate double the previous size with matching min_page_size. Should fail. Call force_merge. Should now succeed. Also check that the page is always dirty after force_merge. Free the page as dirty, then repeat the whole thing, doubling the page-size until you hit max_order. - Make sure we also call fini() with some part of the address space left as non-dirty. Should not trigger any warnings. I think would be good to consider, but not a blocker or anything. Some comments below, otherwise I think looks good. Yes. It is good to have a unit test for this feature. I will send the patch. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 294 +++--- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c | 18 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 22 +- 6 files changed, 290 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_
Re: [PATCH] drm/tests/buddy: fix print format
On 07/03/2024 08:30, Maxime Ripard wrote: On Thu, 29 Feb 2024 09:52:26 +, Matthew Auld wrote: This will report a build warning once we have: 806cb2270237 ("kunit: Annotate _MSG assertion variants with gnu printf specifiers"). Applied to drm/drm-misc (drm-misc-fixes). Thanks. Thanks! Maxime
Re: [PATCH v8 1/3] drm/buddy: Implement tracking clear page feature
On 04/03/2024 16:32, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. - Add a function to support defragmentation. v1: - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list(Christian) - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy)(Christian) - Defragment the memory beginning from min_order till the required memory space is available. v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. v3: - fix Gitlab user reported lockup issue. - Keep DRM_BUDDY_HEADER_CLEAR define sorted(Matthew) - modify to pass the root order instead max_order in fini() function(Matthew) - change bool 1 to true(Matthew) - add check if min_block_size is power of 2(Matthew) - modify the min_block_size datatype to u64(Matthew) Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Suggested-by: Matthew Auld Is there a unit test for this? What about maybe something roughly like: - Pick small random mm_size which is not always power-of-two. - Allocate and free some random portion of the address space, returning those pages as non-dirty. Then do another cycle and randomly alternate between requesting dirty and non-dirty pages (not going over the limit you freed as non-dirty), putting that into two separate lists. Loop over both lists at the end checking that the dirty list is indeed all dirty pages and vice versa. Free it all again, keeping the dirty/clear status. - Also try to go over the clear limit for some allocation. The allocation should never fail with reasonable page-size. - Test the defrag/force_merge interface. Clean the mm or create new one. Intentionally fragment the address space, by creating two alternating lists. Free both lists, one as dirty the other as clean. Try to allocate double the previous size with matching min_page_size. Should fail. Call force_merge. Should now succeed. Also check that the page is always dirty after force_merge. Free the page as dirty, then repeat the whole thing, doubling the page-size until you hit max_order. - Make sure we also call fini() with some part of the address space left as non-dirty. Should not trigger any warnings. I think would be good to consider, but not a blocker or anything. Some comments below, otherwise I think looks good. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 294 +++--- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 18 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 22 +- 6 files changed, 290 insertions(+), 60 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +
Re: [PATCH v7 3/3] drm/buddy: Add defragmentation support
On 04/03/2024 12:22, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 2/22/2024 12:12 AM, Matthew Auld wrote: On 21/02/2024 12:18, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: - Defragment the memory beginning from min_order till the required memory space is available. v2(Matthew): - add amdgpu user for defragmentation - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++- drivers/gpu/drm/drm_buddy.c | 93 +--- include/drm/drm_buddy.h | 3 + 3 files changed, 97 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..cff8a526c622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size, &vres->blocks, vres->flags); - if (unlikely(r)) - goto error_free_blocks; + if (unlikely(r)) { + if (r == -ENOSPC) { + drm_buddy_defrag(mm, min_block_size); + r = drm_buddy_alloc_blocks(mm, fpfn, + lpfn, + size, + min_block_size, + &vres->blocks, + vres->flags); + if (unlikely(r)) + goto error_free_blocks; + } else { + goto error_free_blocks; + } + } if (size > remaining_size) remaining_size = 0; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 18e004fa39d3..56bd1560fbcd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -203,6 +203,8 @@ void drm_buddy_fini(struct drm_buddy *mm) drm_block_free(mm, mm->roots[i]); } + drm_buddy_defrag(mm, mm->chunk_size << mm->max_order); I think this needs to be called higher up, otherwise we blow up with the WARN, plus we just freed the root(s). There is also the case with non-power-of-two VRAM size, in which case you get multiple roots and max_order is just the largest root and not entire address space. I guess do this in the loop above and use the root order instead? Also this should be done as part of the first patch and then in this patch it is just a case of exporting it. Every commit should ideally be functional by itself. You mean we move the above change in drm_buddy_fini function and drm_buddy_defrag function as part of first patch. And just we add export function and add amdgpu user in this patch. Is my understanding correct? Yeah, I think that makes sense. Thanks, Arun. + WARN_ON(mm->avail != mm->size); kfree(mm->roots); @@ -276,25 +278,39 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { + unsigned int order, block_order; struct drm_buddy_block *parent; + block_order = drm_buddy_block_order(block); + while ((parent = block->parent)) { - struct drm_buddy_block *buddy; + struct drm_buddy_block *buddy = NULL; buddy = __get_buddy(block); if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + /* + * Check the block and its buddy clear state and exit + * the loop if they both have the dissimilar state. + */ + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } + + WARN_ON(defrag && + (drm_buddy_block_is_clear(block) == + drm_buddy_block_is_clear(buddy))); list_del(&buddy->link); @@ -304,8 +320,57 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent;
Re: [PULL] drm-misc-fixes
On 29/02/2024 13:37, Maxime Ripard wrote: Hi, Here's this week drm-misc fixes PR. There's two commits for files unders drivers/soc/qcom that don't have a maintainer Acked-by. Bjorn's Acked-by was provided on IRC, and Konrad provided it by mail after the facts so we're covered. Maxime drm-misc-fixes-2024-02-29: A reset fix for host1x, a resource leak fix and a probe fix for aux-hpd, a use-after-free fix and a boot fix for a pmic_glink qcom driver in drivers/soc, a fix for the simpledrm/tegra transition, a kunit fix for the TTM tests, a font handling fix for fbcon, two allocation fixes and a kunit test to cover them for drm/buddy The following changes since commit 72fa02fdf83306c52bc1eede28359e3fa32a151a: nouveau: add an ioctl to report vram usage (2024-02-23 10:20:07 +1000) are available in the Git repository at: https://anongit.freedesktop.org/git/drm/drm-misc tags/drm-misc-fixes-2024-02-29 for you to fetch changes up to c70703320e557ff30847915e6a7631a9abdda16b: drm/tests/drm_buddy: add alloc_range_bias test (2024-02-28 08:03:29 +0100) A reset fix for host1x, a resource leak fix and a probe fix for aux-hpd, a use-after-free fix and a boot fix for a pmic_glink qcom driver in drivers/soc, a fix for the simpledrm/tegra transition, a kunit fix for the TTM tests, a font handling fix for fbcon, two allocation fixes and a kunit test to cover them for drm/buddy Christian König (1): drm/ttm/tests: depend on UML || COMPILE_TEST Jiri Slaby (SUSE) (1): fbcon: always restore the old font data in fbcon_do_set_font() Johan Hovold (3): drm/bridge: aux-hpd: fix OF node leaks drm/bridge: aux-hpd: separate allocation and registration soc: qcom: pmic_glink_altmode: fix drm bridge use-after-free Matthew Auld (3): drm/buddy: fix range bias drm/buddy: check range allocation matches alignment drm/tests/drm_buddy: add alloc_range_bias test Note that there is a build fix needed for this one: https://patchwork.freedesktop.org/patch/580568/?series=130552&rev=1 Maxime Ripard (1): Merge drm/drm-fixes into drm-misc-fixes Mikko Perttunen (1): gpu: host1x: Skip reset assert on Tegra186 Rob Clark (1): soc: qcom: pmic_glink: Fix boot when QRTR=m Thierry Reding (1): drm/tegra: Remove existing framebuffer only if we support display drivers/gpu/drm/Kconfig | 5 +- drivers/gpu/drm/bridge/aux-hpd-bridge.c | 70 +++--- drivers/gpu/drm/drm_buddy.c | 16 ++- drivers/gpu/drm/tegra/drm.c | 23 +++- drivers/gpu/drm/tests/drm_buddy_test.c | 218 drivers/gpu/host1x/dev.c| 15 ++- drivers/gpu/host1x/dev.h| 6 + drivers/soc/qcom/pmic_glink.c | 21 +-- drivers/soc/qcom/pmic_glink_altmode.c | 16 ++- drivers/video/fbdev/core/fbcon.c| 8 +- include/drm/bridge/aux-bridge.h | 15 +++ 11 files changed, 368 insertions(+), 45 deletions(-)
[PATCH 2/2] drm/tests/buddy: stop using PAGE_SIZE
Gives the wrong impression that min page-size has to be tied to the CPU PAGE_SIZE. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Arnd Bergmann --- drivers/gpu/drm/tests/drm_buddy_test.c | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index be2d9d7764be..8528f39a84e6 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -329,8 +329,8 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test) * Eventually we will have a fully 50% fragmented mm. */ - mm_size = PAGE_SIZE << max_order; - KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, PAGE_SIZE), + mm_size = SZ_4K << max_order; + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K), "buddy_init failed\n"); KUNIT_EXPECT_EQ(test, mm.max_order, max_order); @@ -344,7 +344,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test) } for (order = top; order--;) { - size = get_size(order, PAGE_SIZE); + size = get_size(order, mm.chunk_size); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), @@ -358,7 +358,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test) } /* There should be one final page for this sub-allocation */ - size = get_size(0, PAGE_SIZE); + size = get_size(0, mm.chunk_size); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), "buddy_alloc hit -ENOMEM for hole\n"); @@ -368,7 +368,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test) list_move_tail(&block->link, &holes); - size = get_size(top, PAGE_SIZE); + size = get_size(top, mm.chunk_size); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), "buddy_alloc unexpectedly succeeded at top-order %d/%d, it should be full!", @@ -379,7 +379,7 @@ static void drm_test_buddy_alloc_pathological(struct kunit *test) /* Nothing larger than blocks of chunk_size now available */ for (order = 1; order <= max_order; order++) { - size = get_size(order, PAGE_SIZE); + size = get_size(order, mm.chunk_size); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), "buddy_alloc unexpectedly succeeded at order %d, it should be full!", @@ -408,14 +408,14 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test) * page left. */ - mm_size = PAGE_SIZE << max_order; - KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, PAGE_SIZE), + mm_size = SZ_4K << max_order; + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, SZ_4K), "buddy_init failed\n"); KUNIT_EXPECT_EQ(test, mm.max_order, max_order); for (order = 0; order < max_order; order++) { - size = get_size(order, PAGE_SIZE); + size = get_size(order, mm.chunk_size); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), "buddy_alloc hit -ENOMEM with order=%d\n", @@ -428,7 +428,7 @@ static void drm_test_buddy_alloc_pessimistic(struct kunit *test) } /* And now the last remaining block available */ - size = get_size(0, PAGE_SIZE); + size = get_size(0, mm.chunk_size); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, start, mm_size, size, size, &tmp, flags), "buddy_alloc hit -ENOMEM on final alloc\n"); @@ -440,7 +440,7 @@ static void drm
[PATCH 1/2] drm/buddy: stop using PAGE_SIZE
The drm_buddy minimum page-size requirements should be distinct from the CPU PAGE_SIZE. Only restriction is that the minimum page-size is at least 4K. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Arnd Bergmann --- drivers/gpu/drm/drm_buddy.c | 2 +- include/drm/drm_buddy.h | 6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 5ebdd6f8f36e..f999568d69c1 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -102,7 +102,7 @@ int drm_buddy_init(struct drm_buddy *mm, u64 size, u64 chunk_size) if (size < chunk_size) return -EINVAL; - if (chunk_size < PAGE_SIZE) + if (chunk_size < SZ_4K) return -EINVAL; if (!is_power_of_2(chunk_size)) diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index a5b39fc01003..19ed661a32f3 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -53,8 +53,8 @@ struct drm_buddy_block { struct list_head tmp_link; }; -/* Order-zero must be at least PAGE_SIZE */ -#define DRM_BUDDY_MAX_ORDER (63 - PAGE_SHIFT) +/* Order-zero must be at least SZ_4K */ +#define DRM_BUDDY_MAX_ORDER (63 - 12) /* * Binary Buddy System. @@ -82,7 +82,7 @@ struct drm_buddy { unsigned int n_roots; unsigned int max_order; - /* Must be at least PAGE_SIZE */ + /* Must be at least SZ_4K */ u64 chunk_size; u64 size; u64 avail; -- 2.43.2
[PATCH] drm/tests/buddy: fix print format
This will report a build warning once we have: 806cb2270237 ("kunit: Annotate _MSG assertion variants with gnu printf specifiers"). Reported-by: Stephen Rothwell Fixes: c70703320e55 ("drm/tests/drm_buddy: add alloc_range_bias test") Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König --- drivers/gpu/drm/tests/drm_buddy_test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index be2d9d7764be..484360c7e1f6 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -189,7 +189,7 @@ static void drm_test_buddy_alloc_range_bias(struct kunit *test) &allocated, DRM_BUDDY_RANGE_ALLOCATION), "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", - bias_start, bias_end, size); + bias_start, bias_end, size, ps); bias_rem -= size; /* -- 2.43.2
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 28/02/2024 07:20, Christian König wrote: Am 26.02.24 um 10:58 schrieb Matthew Auld: On 19/02/2024 12:24, Matthew Auld wrote: On 19/02/2024 10:48, Matthew Auld wrote: On 19/02/2024 10:30, Christian König wrote: Am 19.02.24 um 11:28 schrieb Matthew Auld: On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. No, problem. I would have pushed this earlier, but my build server doesn't want to work any more. Looks like the SSD has passed its warranty :( Should I push the other three patches to drm-misc-fixes as well? I currently can't even build test them. Need to send a v2 for that. One minor change in the test just to be consistent with using u32. Thanks. Sent v2. If you could push that when you get a chance. Thanks. https://patchwork.freedesktop.org/series/130075/ Gentle ping on merging v2. Pushed all three to drm-misc-fixes. Thanks. Regards, Christian. Thanks, Christian. Thanks, Christian. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
On 26/02/2024 20:21, Thomas Hellström wrote: Hi, Christian On Fri, 2024-02-23 at 15:30 +0100, Christian König wrote: Am 06.02.24 um 13:56 schrieb Christian König: Am 06.02.24 um 13:53 schrieb Thomas Hellström: Hi, Christian, On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 Sending this through xe CI, will try to review asap. Take your time. At the moment people are bombarding me with work and I have only two hands and one head as well :( So I've digged myself out of that hole and would rather like to get this new feature into 6.9. Any time to review it? I can also plan some time to review your LRU changes next week. Thanks, Christian. Sorry for the late response. Was planning to review but saw that there was still an xe CI failure. https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-129579v1/bat-atsm-2/igt@xe_evict_...@evict-overcommit-parallel-nofree-samefd.html I haven't really had time to look into what might be causing this, though. Maybe in ttm_bo_alloc_resource(): @@ -772,7 +772,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, do { ret = ttm_resource_alloc(bo, place, res); - if (unlikely(ret != -ENOSPC)) + if (unlikely(ret && ret != -ENOSPC)) return ret; if (likely(!ret) || !force_space) break; Otherwise we allocate VRAM but never correctly synchronise against the move fence, since we missed adding it to the BO. When we trigger async evictions that would explain the above test failure where we detect VRAM corruption, since someone else is still using the VRAM we allocated. What do you think? /Thomas Christian. /Thomas --- drivers/gpu/drm/ttm/ttm_bo.c | 231 +--- --- -- drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ba3f09e2d7e6..b12f435542a9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx- no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, potentially evicting other idle buffer objects. - * This function may sleep while waiting for space to become available. + * Allocates a resource for the buffer object pointed to by @bo, using
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
Hi, On 26/02/2024 10:38, Geert Uytterhoeven wrote: Hi Matthew, On Mon, Feb 26, 2024 at 10:58 AM Matthew Auld wrote: On 19/02/2024 12:24, Matthew Auld wrote: On 19/02/2024 10:48, Matthew Auld wrote: On 19/02/2024 10:30, Christian König wrote: Am 19.02.24 um 11:28 schrieb Matthew Auld: On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. No, problem. I would have pushed this earlier, but my build server doesn't want to work any more. Looks like the SSD has passed its warranty :( Should I push the other three patches to drm-misc-fixes as well? I currently can't even build test them. Need to send a v2 for that. One minor change in the test just to be consistent with using u32. Thanks. Sent v2. If you could push that when you get a chance. Thanks. https://patchwork.freedesktop.org/series/130075/ Gentle ping on merging v2. Your v1 and a fix from Linus already made it upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/drivers/gpu/drm/tests?h=v6.8-rc6 Ah, right. I meant v2 for the remaining drm_buddy patches in this series: https://patchwork.freedesktop.org/series/130075/ Gr{oetje,eeting}s, Geert
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 19/02/2024 12:24, Matthew Auld wrote: On 19/02/2024 10:48, Matthew Auld wrote: On 19/02/2024 10:30, Christian König wrote: Am 19.02.24 um 11:28 schrieb Matthew Auld: On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. No, problem. I would have pushed this earlier, but my build server doesn't want to work any more. Looks like the SSD has passed its warranty :( Should I push the other three patches to drm-misc-fixes as well? I currently can't even build test them. Need to send a v2 for that. One minor change in the test just to be consistent with using u32. Thanks. Sent v2. If you could push that when you get a chance. Thanks. https://patchwork.freedesktop.org/series/130075/ Gentle ping on merging v2. Thanks, Christian. Thanks, Christian. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
Re: [PATCH 1/2] devm-helpers: Add resource managed version of mutex init
On 22/02/2024 14:58, Marek Behún wrote: A few drivers are doing resource-managed mutex initialization by implementing ad-hoc one-liner mutex dropping functions and using them with devm_add_action_or_reset(). Help drivers avoid these repeated one-liners by adding managed version of mutex initialization. Use the new function devm_mutex_init() in the following drivers: drivers/gpio/gpio-pisosr.c drivers/gpio/gpio-sim.c drivers/gpu/drm/xe/xe_hwmon.c drivers/hwmon/nzxt-smart2.c drivers/leds/leds-is31fl319x.c drivers/power/supply/mt6370-charger.c drivers/power/supply/rt9467-charger.c Signed-off-by: Marek Behún --- drivers/gpio/gpio-pisosr.c| 9 ++- drivers/gpio/gpio-sim.c | 12 ++ drivers/gpu/drm/xe/xe_hwmon.c | 11 ++--- drivers/hwmon/nzxt-smart2.c | 9 ++- drivers/leds/leds-is31fl319x.c| 9 ++- drivers/power/supply/mt6370-charger.c | 11 + drivers/power/supply/rt9467-charger.c | 34 --- include/linux/devm-helpers.h | 32 + 8 files changed, 47 insertions(+), 80 deletions(-) diff --git a/drivers/gpio/gpio-pisosr.c b/drivers/gpio/gpio-pisosr.c index e3013e778e15..dddbf37e855f 100644 --- a/drivers/gpio/gpio-pisosr.c +++ b/drivers/gpio/gpio-pisosr.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -116,11 +117,6 @@ static const struct gpio_chip template_chip = { .can_sleep = true, }; -static void pisosr_mutex_destroy(void *lock) -{ - mutex_destroy(lock); -} - static int pisosr_gpio_probe(struct spi_device *spi) { struct device *dev = &spi->dev; @@ -147,8 +143,7 @@ static int pisosr_gpio_probe(struct spi_device *spi) return dev_err_probe(dev, PTR_ERR(gpio->load_gpio), "Unable to allocate load GPIO\n"); - mutex_init(&gpio->lock); - ret = devm_add_action_or_reset(dev, pisosr_mutex_destroy, &gpio->lock); + ret = devm_mutex_init(dev, &gpio->lock); if (ret) return ret; diff --git a/drivers/gpio/gpio-sim.c b/drivers/gpio/gpio-sim.c index c4106e37e6db..fcfcaa4efe70 100644 --- a/drivers/gpio/gpio-sim.c +++ b/drivers/gpio/gpio-sim.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include #include @@ -307,13 +308,6 @@ static ssize_t gpio_sim_sysfs_pull_store(struct device *dev, return len; } -static void gpio_sim_mutex_destroy(void *data) -{ - struct mutex *lock = data; - - mutex_destroy(lock); -} - static void gpio_sim_put_device(void *data) { struct device *dev = data; @@ -457,9 +451,7 @@ static int gpio_sim_add_bank(struct fwnode_handle *swnode, struct device *dev) if (ret) return ret; - mutex_init(&chip->lock); - ret = devm_add_action_or_reset(dev, gpio_sim_mutex_destroy, - &chip->lock); + ret = devm_mutex_init(dev, &chip->lock); if (ret) return ret; diff --git a/drivers/gpu/drm/xe/xe_hwmon.c b/drivers/gpu/drm/xe/xe_hwmon.c index 174ed2185481..bb88ae1c196c 100644 --- a/drivers/gpu/drm/xe/xe_hwmon.c +++ b/drivers/gpu/drm/xe/xe_hwmon.c @@ -3,6 +3,7 @@ * Copyright © 2023 Intel Corporation */ +#include #include #include #include @@ -729,13 +730,6 @@ xe_hwmon_get_preregistration_info(struct xe_device *xe) xe_hwmon_energy_get(hwmon, &energy); } -static void xe_hwmon_mutex_destroy(void *arg) -{ - struct xe_hwmon *hwmon = arg; - - mutex_destroy(&hwmon->hwmon_lock); -} - void xe_hwmon_register(struct xe_device *xe) { struct device *dev = xe->drm.dev; @@ -751,8 +745,7 @@ void xe_hwmon_register(struct xe_device *xe) xe->hwmon = hwmon; - mutex_init(&hwmon->hwmon_lock); - if (devm_add_action_or_reset(dev, xe_hwmon_mutex_destroy, hwmon)) + if (devm_mutex_init(dev, &hwmon->hwmon_lock)) return; /* primary GT to access device level properties */ diff --git a/drivers/hwmon/nzxt-smart2.c b/drivers/hwmon/nzxt-smart2.c index 7aa586eb74be..00bc89607673 100644 --- a/drivers/hwmon/nzxt-smart2.c +++ b/drivers/hwmon/nzxt-smart2.c @@ -5,6 +5,7 @@ * Copyright (c) 2021 Aleksandr Mezin */ +#include #include #include #include @@ -721,11 +722,6 @@ static int __maybe_unused nzxt_smart2_hid_reset_resume(struct hid_device *hdev) return init_device(drvdata, drvdata->update_interval); } -static void mutex_fini(void *lock) -{ - mutex_destroy(lock); -} - static int nzxt_smart2_hid_probe(struct hid_device *hdev, const struct hid_device_id *id) { @@ -741,8 +737,7 @@ static int nzxt_smart2_hid_probe(struct hid_device *hdev, init_waitqueue_head(&drvdata->wq); - mutex_init(&drvdata->mutex); - ret = devm_add_action_or_
Re: [PATCH v7 3/3] drm/buddy: Add defragmentation support
On 21/02/2024 12:18, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: - Defragment the memory beginning from min_order till the required memory space is available. v2(Matthew): - add amdgpu user for defragmentation - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++- drivers/gpu/drm/drm_buddy.c | 93 +--- include/drm/drm_buddy.h | 3 + 3 files changed, 97 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..cff8a526c622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size, &vres->blocks, vres->flags); - if (unlikely(r)) - goto error_free_blocks; + if (unlikely(r)) { + if (r == -ENOSPC) { + drm_buddy_defrag(mm, min_block_size); + r = drm_buddy_alloc_blocks(mm, fpfn, + lpfn, + size, + min_block_size, + &vres->blocks, + vres->flags); + if (unlikely(r)) + goto error_free_blocks; + } else { + goto error_free_blocks; + } + } if (size > remaining_size) remaining_size = 0; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 18e004fa39d3..56bd1560fbcd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -203,6 +203,8 @@ void drm_buddy_fini(struct drm_buddy *mm) drm_block_free(mm, mm->roots[i]); } + drm_buddy_defrag(mm, mm->chunk_size << mm->max_order); I think this needs to be called higher up, otherwise we blow up with the WARN, plus we just freed the root(s). There is also the case with non-power-of-two VRAM size, in which case you get multiple roots and max_order is just the largest root and not entire address space. I guess do this in the loop above and use the root order instead? Also this should be done as part of the first patch and then in this patch it is just a case of exporting it. Every commit should ideally be functional by itself. + WARN_ON(mm->avail != mm->size); kfree(mm->roots); @@ -276,25 +278,39 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, -struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, +struct drm_buddy_block *block, +bool defrag) { + unsigned int order, block_order; struct drm_buddy_block *parent; + block_order = drm_buddy_block_order(block); + while ((parent = block->parent)) { - struct drm_buddy_block *buddy; + struct drm_buddy_block *buddy = NULL; buddy = __get_buddy(block); if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + /* +* Check the block and its buddy clear state and exit +* the loop if they both have the dissimilar state. +*/ + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } + + WARN_ON(defrag && + (drm_buddy_block_is_clear(block)
Re: [PATCH v6 1/3] drm/buddy: Implement tracking clear page feature
On 21/02/2024 12:40, Paneer Selvam, Arunpravin wrote: On 2/16/2024 5:33 PM, Matthew Auld wrote: On 08/02/2024 15:49, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. v1: (Christian) - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list. - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy) v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Probably needs a new unit test. I think we are missing something to forcefully re-merge everything at fini()? In theory we can just call the defrag routine. Otherwise we might trigger various warnings since the root(s) might still be split. Also one nit below. Otherwise I think looks good. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 192 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c | 10 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 18 +- 6 files changed, 187 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..33ad0cfbd54c 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm, __list_add(&block->link, node->link.prev, &node->link); } +static void clear_reset(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_CLEAR; +} + +static void mark_cleared(struct drm_buddy_block *block) +{ + block->header |= DRM_BUDDY_HEADER_CLEAR; +} + static void mark_allocated(struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; @@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm, mark_free(mm, block->left); mark_free(mm, block->right); + if (drm_buddy_block_is_clear(block)) { + mark_cleared(block->left); + mark_cleared(block->right); + clear_reset(block); + } + mark_split(block); return 0; @@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; + + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + list_del(&buddy->link); drm_block_free(m
Re: [PATCH 7/9] drm: tests: Fix invalid printf format specifiers in KUnit tests
On 21/02/2024 09:27, David Gow wrote: The drm_buddy_test's alloc_contiguous test used a u64 for the page size, which was then updated to be an 'unsigned long' to avoid 64-bit multiplication division helpers. However, the variable is logged by some KUNIT_ASSERT_EQ_MSG() using the '%d' or '%llu' format specifiers, the former of which is always wrong, and the latter is no longer correct now that ps is no longer a u64. Fix these to all use '%lu'. Also, drm_mm_test calls KUNIT_FAIL() with an empty string as the message. gcc warns if a printf format string is empty (apparently), so give these some more detailed error messages, which should be more useful anyway. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Fixes: fca7526b7d89 ("drm/tests/drm_buddy: fix build failure on 32-bit targets") Fixes: fc8d29e298cf ("drm: selftest: convert drm_mm selftest to KUnit") Signed-off-by: David Gow --- drivers/gpu/drm/tests/drm_buddy_test.c | 14 +++--- drivers/gpu/drm/tests/drm_mm_test.c| 6 +++--- 2 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index 8a464f7f4c61..3dbfa3078449 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -55,30 +55,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%lu\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%lu\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%lu\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%lu\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%lu\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -87,13 +87,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%lu\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit a
Re: [PATCH] drm/ttm: Fix an invalid freeing on already freed page in error path
On 21/02/2024 07:33, Thomas Hellström wrote: If caching mode change fails due to, for example, OOM we free the allocated pages in a two-step process. First the pages for which the caching change has already succeeded. Secondly the pages for which a caching change did not succeed. However the second step was incorrectly freeing the pages already freed in the first step. Fix. Signed-off-by: Thomas Hellström Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path") Cc: Christian König Cc: Dave Airlie Cc: Christian Koenig Cc: Huang Rui Cc: dri-devel@lists.freedesktop.org Cc: # v6.4+ Reviewed-by: Matthew Auld
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 19/02/2024 10:48, Matthew Auld wrote: On 19/02/2024 10:30, Christian König wrote: Am 19.02.24 um 11:28 schrieb Matthew Auld: On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. No, problem. I would have pushed this earlier, but my build server doesn't want to work any more. Looks like the SSD has passed its warranty :( Should I push the other three patches to drm-misc-fixes as well? I currently can't even build test them. Need to send a v2 for that. One minor change in the test just to be consistent with using u32. Thanks. Sent v2. If you could push that when you get a chance. Thanks. https://patchwork.freedesktop.org/series/130075/ Thanks, Christian. Thanks, Christian. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
[PATCH v2 3/3] drm/tests/drm_buddy: add alloc_range_bias test
Sanity check range bias with DRM_BUDDY_RANGE_ALLOCATION. v2: - Be consistent with u32 here. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Reviewed-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 218 + 1 file changed, 218 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index edacc1adb28f..1008d5b9d61e 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -14,11 +14,216 @@ #include "../lib/drm_random.h" +static unsigned int random_seed; + static inline u64 get_size(int order, u64 chunk_size) { return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_range_bias(struct kunit *test) +{ + u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; + DRM_RND_STATE(prng, random_seed); + unsigned int i, count, *order; + struct drm_buddy mm; + LIST_HEAD(allocated); + + bias_size = SZ_1M; + ps = roundup_pow_of_two(prandom_u32_state(&prng) % bias_size); + ps = max(SZ_4K, ps); + mm_size = (SZ_8M-1) & ~(ps-1); /* Multiple roots */ + + kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps); + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps), + "buddy_init failed\n"); + + count = mm_size / bias_size; + order = drm_random_order(count, &prng); + KUNIT_EXPECT_TRUE(test, order); + + /* +* Idea is to split the address space into uniform bias ranges, and then +* in some random order allocate within each bias, using various +* patterns within. This should detect if allocations leak out from a +* given bias, for example. +*/ + + for (i = 0; i < count; i++) { + LIST_HEAD(tmp); + u32 size; + + bias_start = order[i] * bias_size; + bias_end = bias_start + bias_size; + bias_rem = bias_size; + + /* internal round_up too big */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, +bias_end, bias_size + ps, bias_size, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, bias_size, bias_size); + + /* size too big */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, +bias_end, bias_size + ps, ps, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, bias_size + ps, ps); + + /* bias range too small for size */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start + ps, +bias_end, bias_size, ps, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start + ps, bias_end, bias_size, ps); + + /* bias misaligned */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start + ps, +bias_end - ps, +bias_size >> 1, bias_size >> 1, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1); + + /* single big page */ + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, bia
[PATCH v2 2/3] drm/buddy: check range allocation matches alignment
Likely not a big deal for real users, but for consistency we should respect the min_page_size here. Main issue is that bias allocations turns into normal range allocation if the range and size matches exactly, and in the next patch we want to add some unit tests for this part of the api. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Reviewed-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f3a6ac908f81..5ebdd6f8f36e 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -771,8 +771,12 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, return -EINVAL; /* Actual range allocation */ - if (start + size == end) + if (start + size == end) { + if (!IS_ALIGNED(start | end, min_block_size)) + return -EINVAL; + return __drm_buddy_alloc_range(mm, start, size, NULL, blocks); + } original_size = size; original_min_size = min_block_size; -- 2.43.0
[PATCH v2 1/3] drm/buddy: fix range bias
There is a corner case here where start/end is after/before the block range we are currently checking. If so we need to be sure that splitting the block will eventually give use the block size we need. To do that we should adjust the block range to account for the start/end, and only continue with the split if the size/alignment will fit the requested size. Not doing so can result in leaving split blocks unmerged when it eventually fails. Fixes: afea229fe102 ("drm: improve drm_buddy_alloc function") Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: # v5.18+ Reviewed-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c4222b886db7..f3a6ac908f81 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -332,6 +332,7 @@ alloc_range_bias(struct drm_buddy *mm, u64 start, u64 end, unsigned int order) { + u64 req_size = mm->chunk_size << order; struct drm_buddy_block *block; struct drm_buddy_block *buddy; LIST_HEAD(dfs); @@ -367,6 +368,15 @@ alloc_range_bias(struct drm_buddy *mm, if (drm_buddy_block_is_allocated(block)) continue; + if (block_start < start || block_end > end) { + u64 adjusted_start = max(block_start, start); + u64 adjusted_end = min(block_end, end); + + if (round_down(adjusted_end + 1, req_size) <= + round_up(adjusted_start, req_size)) + continue; + } + if (contains(start, end, block_start, block_end) && order == drm_buddy_block_order(block)) { /* -- 2.43.0
Re: [PATCH] drm/tests/drm_buddy: avoid 64-bit calculation
On 19/02/2024 11:41, Christian König wrote: Am 19.02.24 um 12:29 schrieb Arnd Bergmann: On Mon, Feb 19, 2024, at 12:22, Christian König wrote: Am 17.02.24 um 02:31 schrieb Randy Dunlap: On 2/16/24 12:24, Arnd Bergmann wrote: From: Arnd Bergmann The newly added drm_test_buddy_alloc_contiguous() test fails to link on 32-bit targets because of inadvertent 64-bit calculations: ERROR: modpost: "__aeabi_uldivmod" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined! ERROR: modpost: "__aeabi_ldivmod" [drivers/gpu/drm/tests/drm_buddy_test.ko] undefined! >From what I can tell, the numbers cannot possibly overflow a 32-bit size, so use different types for these. I noticed that the function has another possible flaw in that is mixes what it calls pages with 4KB units. This is a big confusing at best, or possibly broken when built on machines with larger pages. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Signed-off-by: Arnd Bergmann Tested-by: Randy Dunlap I've just pushed a similar patch Mathew came up a bit earlier to drm-misc-fixes. Sorry for the noise, I have to catch up on picking up patches for misc-fixes and misc-next. Ok, thanks. Have you looked at how this code works for larger values of PAGE_SIZE? Is there any need to change other things or will this work with the hardcoded 4KB chunks? I haven't looked into the details, but I've pointed out before that using PAGE_SIZE in the buddy or its test cases would be incorrect. Background is that the buddy allocator is for devices and those work independent of the CPU PAGE_SIZE. So it can be that on a CPU with 64k pages the buddy still needs to work with 4k. Could be that this is work, but could as well be that this is completely broken. Arun and Mathew needs to answer this, I haven't tested it nor reviewed the code. Yeah, we should not be using PAGE_SIZE or PAGE_SHIFT in drm_buddy.[ch] and tests/drm_buddy_test.c. The smallest default page size is SZ_4K for drm_buddy. A patch to fix that would be very welcome. If no takers I can send something. Regards, Christian. Arnd
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 19/02/2024 10:30, Christian König wrote: Am 19.02.24 um 11:28 schrieb Matthew Auld: On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. No, problem. I would have pushed this earlier, but my build server doesn't want to work any more. Looks like the SSD has passed its warranty :( Should I push the other three patches to drm-misc-fixes as well? I currently can't even build test them. Need to send a v2 for that. One minor change in the test just to be consistent with using u32. Thanks. Thanks, Christian. Thanks, Christian. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 19/02/2024 09:53, Christian König wrote: Am 19.02.24 um 10:42 schrieb Matthew Auld: On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. Already working on this. Just give me a few more minutes. Thanks. Thanks, Christian. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
Re: [PATCH 1/6] drm/tests/drm_buddy: fix 32b build
On 15/02/2024 17:44, Matthew Auld wrote: Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard Any chance someone can push just this single patch here, since it fixes 32b build? It already has an r-b from Arun. --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link)
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
On 16/02/2024 14:02, Christian König wrote: Am 16.02.24 um 14:21 schrieb Matthew Auld: On 16/02/2024 12:33, Christian König wrote: Am 16.02.24 um 13:23 schrieb Matthew Auld: On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? Other question is how can a buddy allocator fragment in the first place? The fragmentation is due to pages now being tracked as dirty/clear. Should the allocator merge together a page that is dirty with a page that is cleared? When should it do that? User wants to be able to keep the two separate if possible. For example, freeing one single dirty page can dirty a huge swathe of your already cleared pages if they are merged together. Or do you have some some other ideas here? Sorry, that was not what I meant. I should probably have been clearer. That dirty and clean pages are now kept separated is obvious, but why do you need to de-fragment them at some point? Ah, right. At the very least we need to do something similar to this at fini(), just to ensure we properly merge everything back together so we can correctly tear down the mm. Outside of that the thinking was that it might be useful to call when allocating larger min page-sizes. You might now be failing the allocation due to fragmentation, and so in some cases might be better off running some kind of defrag step first, instead of failing the allocation and trying to evict stuff. Anyway, if that is not a concern for amdgpu, then we just need to handle the fini() case and can keep this internal. Christian. Christian. 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? + if (!block->parent) + continue; + + order = __drm_buddy_free(mm, block, 1); + if (order >= min_order) + return; + } + } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block @@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_budd
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
On 16/02/2024 12:33, Christian König wrote: Am 16.02.24 um 13:23 schrieb Matthew Auld: On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? Other question is how can a buddy allocator fragment in the first place? The fragmentation is due to pages now being tracked as dirty/clear. Should the allocator merge together a page that is dirty with a page that is cleared? When should it do that? User wants to be able to keep the two separate if possible. For example, freeing one single dirty page can dirty a huge swathe of your already cleared pages if they are merged together. Or do you have some some other ideas here? Christian. 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? + if (!block->parent) + continue; + + order = __drm_buddy_free(mm, block, 1); + if (order >= min_order) + return; + } + } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block @@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm, if (drm_buddy_block_is_clear(block)) mm->clear_avail += drm_buddy_block_size(mm, block); - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); } EXPORT_SYMBOL(drm_buddy_free_block); @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm, err_undo: if (tmp != order) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -668,7 +713,7 @@ static int __alloc_range(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) &&am
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, -struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, +struct drm_buddy_block *block, +bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? + if (!block->parent) + continue; + + order = __drm_buddy_free(mm, block, 1); + if (order >= min_order) + return; + } + } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block @@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm, if (drm_buddy_block_is_clear(block)) mm->clear_avail += drm_buddy_block_size(mm, block); - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); } EXPORT_SYMBOL(drm_buddy_free_block); @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm, err_undo: if (tmp != order) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -668,7 +713,7 @@ static int __alloc_range(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); err_free: if (err == -ENOSPC && total_allocated_on_err) { diff --git a
Re: [PATCH v6 1/3] drm/buddy: Implement tracking clear page feature
On 08/02/2024 15:49, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. v1: (Christian) - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list. - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy) v2: (Matthew) - Add a wrapper drm_buddy_free_list_internal for the freeing of blocks operation within drm buddy. - Write a macro block_incompatible() to allocate the required blocks. - Update the xe driver for the drm_buddy_free_list change in arguments. Signed-off-by: Arunpravin Paneer Selvam Signed-off-by: Matthew Auld Suggested-by: Christian König Probably needs a new unit test. I think we are missing something to forcefully re-merge everything at fini()? In theory we can just call the defrag routine. Otherwise we might trigger various warnings since the root(s) might still be split. Also one nit below. Otherwise I think looks good. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 192 ++ drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 10 +- drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 4 +- include/drm/drm_buddy.h | 18 +- 6 files changed, 187 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 8db880244324..c0c851409241 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -571,7 +571,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -604,7 +604,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -912,7 +912,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..33ad0cfbd54c 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm, __list_add(&block->link, node->link.prev, &node->link); } +static void clear_reset(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_CLEAR; +} + +static void mark_cleared(struct drm_buddy_block *block) +{ + block->header |= DRM_BUDDY_HEADER_CLEAR; +} + static void mark_allocated(struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; @@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm, mark_free(mm, block->left); mark_free(mm, block->right); + if (drm_buddy_block_is_clear(block)) { + mark_cleared(block->left); + mark_cleared(block->right); + clear_reset(block); + } + mark_split(block); return 0; @@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; + + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + list_del(&buddy->link); drm_block_free
Re: [PATCH] drm/buddy: Modify duplicate list_splice_tail call
On 16/02/2024 10:00, Arunpravin Paneer Selvam wrote: Remove the duplicate list_splice_tail call when the total_allocated < size condition is true. Cc: # 6.7+ Fixes: 8746c6c9dfa3 ("drm/buddy: Fix alloc_range() error handling code") Reported-by: Bert Karwatzki Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c1a99bf4dffd..c4222b886db7 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -538,13 +538,13 @@ static int __alloc_range(struct drm_buddy *mm, list_add(&block->left->tmp_link, dfs); } while (1); - list_splice_tail(&allocated, blocks); - if (total_allocated < size) { err = -ENOSPC; goto err_free; } + list_splice_tail(&allocated, blocks); Sigh. Can we extend the unit test(s) to catch this? Reviewed-by: Matthew Auld + return 0; err_undo: base-commit: a64056bb5a3215bd31c8ce17d609ba0f4d5c55ea
[PATCH 6/6] drm/xe/stolen: ignore first page for FBC
Seems like we can potentially hit underruns if the CFB offset is within the first page of stolen. Just like i915 skip the first page. BSpec: 50214 Reported-by: Matt Roper Signed-off-by: Matthew Auld --- drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h b/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h index bd233007c1b7..003474cfdf31 100644 --- a/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h +++ b/drivers/gpu/drm/xe/compat-i915-headers/i915_gem_stolen.h @@ -19,6 +19,9 @@ static inline int i915_gem_stolen_insert_node_in_range(struct xe_device *xe, int err; u32 flags = XE_BO_CREATE_PINNED_BIT | XE_BO_CREATE_STOLEN_BIT; + if (start < SZ_4K) + start = SZ_4K; + if (align) size = ALIGN(size, align); -- 2.43.0
[PATCH 4/6] drm/tests/drm_buddy: add alloc_range_bias test
Sanity check range bias with DRM_BUDDY_RANGE_ALLOCATION. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König --- drivers/gpu/drm/tests/drm_buddy_test.c | 218 + 1 file changed, 218 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index edacc1adb28f..3d4b29686132 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -14,11 +14,216 @@ #include "../lib/drm_random.h" +static unsigned int random_seed; + static inline u64 get_size(int order, u64 chunk_size) { return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_range_bias(struct kunit *test) +{ + u32 mm_size, ps, bias_size, bias_start, bias_end, bias_rem; + DRM_RND_STATE(prng, random_seed); + unsigned int i, count, *order; + struct drm_buddy mm; + LIST_HEAD(allocated); + + bias_size = SZ_1M; + ps = roundup_pow_of_two(prandom_u32_state(&prng) % bias_size); + ps = max(SZ_4K, ps); + mm_size = (SZ_8M-1) & ~(ps-1); /* Multiple roots */ + + kunit_info(test, "mm_size=%u, ps=%u\n", mm_size, ps); + + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_init(&mm, mm_size, ps), + "buddy_init failed\n"); + + count = mm_size / bias_size; + order = drm_random_order(count, &prng); + KUNIT_EXPECT_TRUE(test, order); + + /* +* Idea is to split the address space into uniform bias ranges, and then +* in some random order allocate within each bias, using various +* patterns within. This should detect if allocations leak out from a +* given bias, for example. +*/ + + for (i = 0; i < count; i++) { + LIST_HEAD(tmp); + u64 size; + + bias_start = order[i] * bias_size; + bias_end = bias_start + bias_size; + bias_rem = bias_size; + + /* internal round_up too big */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, +bias_end, bias_size + ps, bias_size, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc failed with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, bias_size, bias_size); + + /* size too big */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, +bias_end, bias_size + ps, ps, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start, bias_end, bias_size + ps, ps); + + /* bias range too small for size */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start + ps, +bias_end, bias_size, ps, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start + ps, bias_end, bias_size, ps); + + /* bias misaligned */ + KUNIT_ASSERT_TRUE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start + ps, +bias_end - ps, +bias_size >> 1, bias_size >> 1, +&allocated, + DRM_BUDDY_RANGE_ALLOCATION), + "buddy_alloc h didn't fail with bias(%x-%x), size=%u, ps=%u\n", + bias_start + ps, bias_end - ps, bias_size >> 1, bias_size >> 1); + + /* single big page */ + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, bias_start, + bias_end, bias_size, bias_size, +
[PATCH 5/6] drm/xe/stolen: lower the default alignment
No need to be so aggressive here. The upper layers will already apply the needed alignment, plus some allocations might wish to skip it. Main issue is that we might want to have start/end bias range which doesn't match the default alignment which is rejected by the allocator. Signed-off-by: Matthew Auld Cc: Matt Roper --- drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c index 662f1e9bfc65..2e94f90e1018 100644 --- a/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c +++ b/drivers/gpu/drm/xe/xe_ttm_stolen_mgr.c @@ -203,7 +203,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe) { struct xe_ttm_stolen_mgr *mgr = drmm_kzalloc(&xe->drm, sizeof(*mgr), GFP_KERNEL); struct pci_dev *pdev = to_pci_dev(xe->drm.dev); - u64 stolen_size, io_size, pgsize; + u64 stolen_size, io_size; int err; if (IS_SRIOV_VF(xe)) @@ -220,10 +220,6 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe) return; } - pgsize = xe->info.vram_flags & XE_VRAM_FLAGS_NEED64K ? SZ_64K : SZ_4K; - if (pgsize < PAGE_SIZE) - pgsize = PAGE_SIZE; - /* * We don't try to attempt partial visible support for stolen vram, * since stolen is always at the end of vram, and the BAR size is pretty @@ -234,7 +230,7 @@ void xe_ttm_stolen_mgr_init(struct xe_device *xe) io_size = stolen_size; err = __xe_ttm_vram_mgr_init(xe, &mgr->base, XE_PL_STOLEN, stolen_size, -io_size, pgsize); +io_size, SZ_4K); if (err) { drm_dbg_kms(&xe->drm, "Stolen mgr init failed: %i\n", err); return; -- 2.43.0
[PATCH 3/6] drm/buddy: check range allocation matches alignment
Likely not a big deal for real users, but for consistency we should respect the min_page_size here. Main issue is that bias allocations turns into normal range allocation if the range and size matches exactly, and in the next patch we want to add some unit tests for this part of the api. Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König --- drivers/gpu/drm/drm_buddy.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index d09540d4065b..ee9913016626 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -771,8 +771,12 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, return -EINVAL; /* Actual range allocation */ - if (start + size == end) + if (start + size == end) { + if (!IS_ALIGNED(start | end, min_block_size)) + return -EINVAL; + return __drm_buddy_alloc_range(mm, start, size, NULL, blocks); + } original_size = size; original_min_size = min_block_size; -- 2.43.0
[PATCH 1/6] drm/tests/drm_buddy: fix 32b build
Doesn't seem to compile on 32b, presumably due to u64 mod/division. Simplest is to just switch over to u32 here. Also make print modifiers consistent with that. Fixes: a64056bb5a32 ("drm/tests/drm_buddy: add alloc_contiguous test") Reported-by: Geert Uytterhoeven Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: Maxime Ripard --- drivers/gpu/drm/tests/drm_buddy_test.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index fee6bec757d1..edacc1adb28f 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -21,7 +21,7 @@ static inline u64 get_size(int order, u64 chunk_size) static void drm_test_buddy_alloc_contiguous(struct kunit *test) { - u64 mm_size, ps = SZ_4K, i, n_pages, total; + u32 mm_size, ps = SZ_4K, i, n_pages, total; struct drm_buddy_block *block; struct drm_buddy mm; LIST_HEAD(left); @@ -56,30 +56,30 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, ps, ps, list, 0), - "buddy_alloc hit an error size=%d\n", + "buddy_alloc hit an error size=%u\n", ps); } while (++i < n_pages); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%d\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); drm_buddy_free_list(&mm, &middle); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 2 * ps); + "buddy_alloc didn't error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &right); KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc didn't error size=%llu\n", 3 * ps); + "buddy_alloc didn't error size=%u\n", 3 * ps); /* * At this point we should have enough contiguous space for 2 blocks, * however they are never buddies (since we freed middle and right) so @@ -88,13 +88,13 @@ static void drm_test_buddy_alloc_contiguous(struct kunit *test) KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 2 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 2 * ps); + "buddy_alloc hit an error size=%u\n", 2 * ps); drm_buddy_free_list(&mm, &left); KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, 3 * ps, ps, &allocated, DRM_BUDDY_CONTIGUOUS_ALLOCATION), - "buddy_alloc hit an error size=%d\n", 3 * ps); + "buddy_alloc hit an error size=%u\n", 3 * ps); total = 0; list_for_each_entry(block, &allocated, link) -- 2.43.0
[PATCH 2/6] drm/buddy: fix range bias
There is a corner case here where start/end is after/before the block range we are currently checking. If so we need to be sure that splitting the block will eventually give use the block size we need. To do that we should adjust the block range to account for the start/end, and only continue with the split if the size/alignment will fit the requested size. Not doing so can result in leaving split blocks unmerged when it eventually fails. Fixes: afea229fe102 ("drm: improve drm_buddy_alloc function") Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Christian König Cc: # v5.18+ --- drivers/gpu/drm/drm_buddy.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c1a99bf4dffd..d09540d4065b 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -332,6 +332,7 @@ alloc_range_bias(struct drm_buddy *mm, u64 start, u64 end, unsigned int order) { + u64 req_size = mm->chunk_size << order; struct drm_buddy_block *block; struct drm_buddy_block *buddy; LIST_HEAD(dfs); @@ -367,6 +368,15 @@ alloc_range_bias(struct drm_buddy *mm, if (drm_buddy_block_is_allocated(block)) continue; + if (block_start < start || block_end > end) { + u64 adjusted_start = max(block_start, start); + u64 adjusted_end = min(block_end, end); + + if (round_down(adjusted_end + 1, req_size) <= + round_up(adjusted_start, req_size)) + continue; + } + if (contains(start, end, block_start, block_end) && order == drm_buddy_block_order(block)) { /* -- 2.43.0
Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test
On 13/02/2024 13:52, Arunpravin Paneer Selvam wrote: Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION. References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Signed-off-by: Matthew Auld Reviewed-by: Arunpravin Paneer Selvam It looks like you changed the patch authorship here. Cc: Arunpravin Paneer Selvam Cc: Limonciello Cc: Christian König Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++ 1 file changed, 89 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index ea2af6bd9abe..fee6bec757d1 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -8,6 +8,7 @@ #include #include +#include #include @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size) return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_contiguous(struct kunit *test) +{ + u64 mm_size, ps = SZ_4K, i, n_pages, total; + struct drm_buddy_block *block; + struct drm_buddy mm; + LIST_HEAD(left); + LIST_HEAD(middle); + LIST_HEAD(right); + LIST_HEAD(allocated); + + mm_size = 16 * 3 * SZ_4K; + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /* +* Idea is to fragment the address space by alternating block +* allocations between three different lists; one for left, middle and +* right. We can then free a list to simulate fragmentation. In +* particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION, +* including the try_harder path. +*/ + + i = 0; + n_pages = mm_size / ps; + do { + struct list_head *list; + int slot = i % 3; + + if (slot == 0) + list = &left; + else if (slot == 1) + list = &middle; + else + list = &right; + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, 0), + "buddy_alloc hit an error size=%d\n", + ps); + } while (++i < n_pages); + + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%d\n", 3 * ps); + + drm_buddy_free_list(&mm, &middle); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 2 * ps); + + drm_buddy_free_list(&mm, &right); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + /* +* At this point we should have enough contiguous space for 2 blocks, +* however they are never buddies (since we freed middle and right) so +* will require the try_harder logic to find them. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 2 * ps); + + drm_buddy_free_list(&mm, &left); + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 3 * p
Re: [PATCH] drm/tests/drm_buddy: add alloc_contiguous test
On 12/02/2024 08:23, Arunpravin Paneer Selvam wrote: Hi Matthew, Can I push this test case along with the bug fix patch. Sure. Please go ahead. Thanks, Arun. On 2/8/2024 8:06 PM, Matthew Auld wrote: Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION. References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Limonciello Cc: Christian König --- drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++ 1 file changed, 89 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index ea2af6bd9abe..4215d8b5fcf0 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -8,6 +8,7 @@ #include #include +#include #include @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size) return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_contiguous(struct kunit *test) +{ + u64 mm_size, ps = SZ_4K, i, n_pages, total; + struct drm_buddy_block *block; + struct drm_buddy mm; + LIST_HEAD(left); + LIST_HEAD(middle); + LIST_HEAD(right); + LIST_HEAD(allocated); + + mm_size = 16 * 3 * SZ_4K; + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /* + * Idea is to fragment the address space by alternating block + * allocations between three different lists; one for left, middle and + * right. We can then free a list to simulate fragmentation. In + * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION, + * including the try_harder path. + */ + + i = 0; + n_pages = mm_size / ps; + do { + struct list_head *list; + int slot = i % 3; + + if (slot == 0) + list = &left; + else if (slot == 1) + list = &middle; + else + list = &right; + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, 0), + "buddy_alloc hit an error size=%d\n", + ps); + } while (++i < n_pages); + + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%d\n", 3 * ps); + + drm_buddy_free_list(&mm, &middle); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 2 * ps); + + drm_buddy_free_list(&mm, &right); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + /* + * At this point we should have enough contiguous space for 2 blocks, + * however they are never buddies (since we freed middle and right) so + * will require the try_harder logic to find them. + */ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 2 * ps); + + drm_buddy_free_list(&mm, &left); + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 3 * ps); + + total = 0; + list_for_each_entry(block, &allocated, link) + total += drm_buddy_block_size(&mm, block); + + KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3); + + drm_buddy_free_list(&mm, &allocated); + drm_buddy_fini(&mm); +} + static void drm_test_buddy_alloc_pathological(struct kunit *test) { u64 mm_size, size, start = 0; @@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = { KUNIT_CASE(drm_test_buddy_alloc_optimistic), KUNIT_CASE(drm_test_buddy_alloc_pessimistic), KUNIT_CASE(drm_test_buddy_alloc_pathological), + KUNIT_CASE(drm_test_buddy_alloc_contiguous), {} };
Re: [PATCH] drm/buddy: Fix alloc_range() error handling code
On 08/02/2024 14:17, Matthew Auld wrote: On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote: Hi Matthew, On 2/8/2024 7:00 PM, Matthew Auld wrote: On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: Few users have observed display corruption when they boot the machine to KDE Plasma or playing games. We have root caused the problem that whenever alloc_range() couldn't find the required memory blocks the function was returning SUCCESS in some of the corner cases. Can you please give an example here? In the try hard contiguous allocation, for example the requested memory is 1024 pages, it might go and pick the highest and last block (of size 512 pages) in the freelist where there are no more space exist in the total address range. In this kind of corner case, alloc_range was returning success though the allocated size is less than the requested size. Hence in try_hard_contiguous_allocation, we will not proceed to the LHS allocation and we return only with the RHS allocation having only the 512 pages of allocation. This leads to display corruption in many use cases (I think mainly when requested for contiguous huge buffer) mainly on APU platforms. Ok, I guess other thing is doing: lhs_offset = drm_buddy_block_offset(block) - lhs_size; I presume it's possible for block_offset < lhs_size here, which might be funny? I think would also be good to add some basic unit test here: https://patchwork.freedesktop.org/patch/577497/?series=129671&rev=1 Test passes with your patch, and ofc fails without it. Just the question of the lhs_offset above, Reviewed-by: Matthew Auld Thanks, Arun. The right approach would be if the total allocated size is less than the required size, the function should return -ENOSPC. Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Signed-off-by: Arunpravin Paneer Selvam Tested-by: Mario Limonciello --- drivers/gpu/drm/drm_buddy.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1); list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0; err_undo:
[PATCH] drm/tests/drm_buddy: add alloc_contiguous test
Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION. References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Signed-off-by: Matthew Auld Cc: Arunpravin Paneer Selvam Cc: Limonciello Cc: Christian König --- drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++ 1 file changed, 89 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index ea2af6bd9abe..4215d8b5fcf0 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -8,6 +8,7 @@ #include #include +#include #include @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size) return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_contiguous(struct kunit *test) +{ + u64 mm_size, ps = SZ_4K, i, n_pages, total; + struct drm_buddy_block *block; + struct drm_buddy mm; + LIST_HEAD(left); + LIST_HEAD(middle); + LIST_HEAD(right); + LIST_HEAD(allocated); + + mm_size = 16 * 3 * SZ_4K; + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /* +* Idea is to fragment the address space by alternating block +* allocations between three different lists; one for left, middle and +* right. We can then free a list to simulate fragmentation. In +* particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION, +* including the try_harder path. +*/ + + i = 0; + n_pages = mm_size / ps; + do { + struct list_head *list; + int slot = i % 3; + + if (slot == 0) + list = &left; + else if (slot == 1) + list = &middle; + else + list = &right; + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, 0), + "buddy_alloc hit an error size=%d\n", + ps); + } while (++i < n_pages); + + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%d\n", 3 * ps); + + drm_buddy_free_list(&mm, &middle); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 2 * ps); + + drm_buddy_free_list(&mm, &right); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + /* +* At this point we should have enough contiguous space for 2 blocks, +* however they are never buddies (since we freed middle and right) so +* will require the try_harder logic to find them. +*/ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 2 * ps); + + drm_buddy_free_list(&mm, &left); + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 3 * ps); + + total = 0; + list_for_each_entry(block, &allocated, link) + total += drm_buddy_block_size(&mm, block); + + KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3); + + drm_buddy_free_list(&
Re: [PATCH] drm/buddy: Fix alloc_range() error handling code
On 08/02/2024 13:47, Arunpravin Paneer Selvam wrote: Hi Matthew, On 2/8/2024 7:00 PM, Matthew Auld wrote: On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: Few users have observed display corruption when they boot the machine to KDE Plasma or playing games. We have root caused the problem that whenever alloc_range() couldn't find the required memory blocks the function was returning SUCCESS in some of the corner cases. Can you please give an example here? In the try hard contiguous allocation, for example the requested memory is 1024 pages, it might go and pick the highest and last block (of size 512 pages) in the freelist where there are no more space exist in the total address range. In this kind of corner case, alloc_range was returning success though the allocated size is less than the requested size. Hence in try_hard_contiguous_allocation, we will not proceed to the LHS allocation and we return only with the RHS allocation having only the 512 pages of allocation. This leads to display corruption in many use cases (I think mainly when requested for contiguous huge buffer) mainly on APU platforms. Ok, I guess other thing is doing: lhs_offset = drm_buddy_block_offset(block) - lhs_size; I presume it's possible for block_offset < lhs_size here, which might be funny? Thanks, Arun. The right approach would be if the total allocated size is less than the required size, the function should return -ENOSPC. Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Signed-off-by: Arunpravin Paneer Selvam Tested-by: Mario Limonciello --- drivers/gpu/drm/drm_buddy.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1); list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0; err_undo:
Re: [PATCH] drm/buddy: Fix alloc_range() error handling code
On 07/02/2024 17:44, Arunpravin Paneer Selvam wrote: Few users have observed display corruption when they boot the machine to KDE Plasma or playing games. We have root caused the problem that whenever alloc_range() couldn't find the required memory blocks the function was returning SUCCESS in some of the corner cases. Can you please give an example here? The right approach would be if the total allocated size is less than the required size, the function should return -ENOSPC. Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Signed-off-by: Arunpravin Paneer Selvam Tested-by: Mario Limonciello --- drivers/gpu/drm/drm_buddy.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1); list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0; err_undo:
Re: [PATCH v3 1/2] drm/buddy: Implement tracking clear page feature
On 30/01/2024 20:30, Arunpravin Paneer Selvam wrote: Hi Matthew, On 12/21/2023 12:51 AM, Matthew Auld wrote: Hi, On 14/12/2023 13:42, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. I was not involved, but it looks like we have also tried enabling the clear-on-free idea for VRAM in i915 and then also tracking that in the allocator, however that work unfortunately is not upstream. The code is open source though: https://github.com/intel-gpu/intel-gpu-i915-backports/blob/backport/main/drivers/gpu/drm/i915/i915_buddy.c#L300 It looks like some of the design differences there are having two separate free lists, so mm->clean and mm->dirty (sounds reasonable to me). And also the inclusion of a de-fragmentation routine, since buddy blocks are now not always merged back, we might choose to run the defrag in some cases, which also sounds reasonable. IIRC in amdgpu userspace can control the page-size for an allocation, so perhaps you would want to run it first if the allocation fails, before trying to evict stuff? I checked the clear-on-free idea implemented in i915. In amdgpu version, we are clearing all the blocks in amdgpu free routine and DRM buddy expects only the DRM_BUDDY_CLEARED flag. Basically, we are keeping the cleared blocks ready to be allocated when the user request for the cleared memory. We observed that this improves the performance on games and resolves the stutter issues as well. I see i915 active fences part does the same job for i915. Could we move this part into i915 free routine and set the DRM_BUDDY_CLEARED flag. On de-fragmentation , I have included a function which can be called at places where we get -ENOSPC. This routine will merge back the clear and dirty blocks together to form a larger block of requested size. I am wondering where we could use this routine as for the non-contiguous memory we have the fallback method and for the contiguous memory we have the try harder method which searches through the tree. Don't you also want to call it from your vram manager when the requested page size is something large, before trying to evict stuff? That could now fail due to fragmention IIUC. Or am I misreading mdgpu_vram_mgr_new()? I agree we can have 2 lists (clear list and dirty list) and this would reduce the search iterations. But we need to handle the 2 lists design in all the functions which might require more time for testing on all platforms. Could we just go ahead with 1 list (free list) for now and I am going to take up this work as my next task. Sounds good. Thanks, Arun. v1: (Christian) - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list. - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy) Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 169 +++--- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c | 10 +- include/drm/drm_buddy.h | 18 +- 5 files changed, 168 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 08916538a615..d0e199cc8f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
Re: [PATCH v5 1/3] drm/buddy: Implement tracking clear page feature
On 30/01/2024 19:48, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. v1: (Christian) - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list. - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy) Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 169 +++--- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 10 +- include/drm/drm_buddy.h | 18 +- 5 files changed, 168 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 08916538a615..d0e199cc8f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..d44172f23f05 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm, __list_add(&block->link, node->link.prev, &node->link); } +static void clear_reset(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_CLEAR; +} + +static void mark_cleared(struct drm_buddy_block *block) +{ + block->header |= DRM_BUDDY_HEADER_CLEAR; +} + static void mark_allocated(struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; @@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm, mark_free(mm, block->left); mark_free(mm, block->right); + if (drm_buddy_block_is_clear(block)) { + mark_cleared(block->left); + mark_cleared(block->right); + clear_reset(block); + } + mark_split(block); return 0; @@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; + + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + list_del(&buddy->link); drm_block_free(mm, block); @@ -295,6 +318,9 @@ void drm_buddy_free_block(struct drm_buddy *mm, { BUG_ON(!drm_buddy_block_is_allocated(block)); mm->avail += drm_buddy_block_size(mm, block); + if (drm_buddy_block_is_clear(block)) + mm->clear_avail += drm_buddy_block_size(mm, block); + __drm_buddy_free(mm, block); } EXPORT_SYMBOL(drm_buddy_free_block); @@ -305,10 +331,20 @@ EXPORT_SYMBOL(drm_buddy_free_block); * @mm: DRM buddy manager * @objects: input list head to free blocks */ -void drm_buddy_free_list(struct drm_buddy *mm, struct list_head *objects) +void drm_buddy_free_list(struct drm_buddy *mm, +struct list_head *objects, +unsigned long flags) { struct drm_buddy_block *block, *on; + if (flags &
Re: [PATCH] drm/doc/rfc: Removing missing reference to xe.rst
On 19/01/2024 16:25, Rodrigo Vivi wrote: On Tue, Jan 16, 2024 at 05:03:31PM -0500, Rodrigo Vivi wrote: The file has already been deleted as the tasks were completed. However the index reference was missed behind. Gentle ping on this one. I should have mentioned here that this fixes a doc build warning: Documentation/gpu/rfc/index.rst:35: WARNING: toctree contains reference to nonexisting document 'gpu/rfc/xe' Fixes: d11dc7aa98e5 ("drm/doc/rfc: Remove Xe's pre-merge plan") Cc: Lucas De Marchi Signed-off-by: Rodrigo Vivi Reviewed-by: Matthew Auld
Re: [PATCH v3 1/2] drm/buddy: Implement tracking clear page feature
Hi, On 14/12/2023 13:42, Arunpravin Paneer Selvam wrote: - Add tracking clear page feature. - Driver should enable the DRM_BUDDY_CLEARED flag if it successfully clears the blocks in the free path. On the otherhand, DRM buddy marks each block as cleared. - Track the available cleared pages size - If driver requests cleared memory we prefer cleared memory but fallback to uncleared if we can't find the cleared blocks. when driver requests uncleared memory we try to use uncleared but fallback to cleared memory if necessary. - When a block gets freed we clear it and mark the freed block as cleared, when there are buddies which are cleared as well we can merge them. Otherwise, we prefer to keep the blocks as separated. I was not involved, but it looks like we have also tried enabling the clear-on-free idea for VRAM in i915 and then also tracking that in the allocator, however that work unfortunately is not upstream. The code is open source though: https://github.com/intel-gpu/intel-gpu-i915-backports/blob/backport/main/drivers/gpu/drm/i915/i915_buddy.c#L300 It looks like some of the design differences there are having two separate free lists, so mm->clean and mm->dirty (sounds reasonable to me). And also the inclusion of a de-fragmentation routine, since buddy blocks are now not always merged back, we might choose to run the defrag in some cases, which also sounds reasonable. IIRC in amdgpu userspace can control the page-size for an allocation, so perhaps you would want to run it first if the allocation fails, before trying to evict stuff? v1: (Christian) - Depends on the flag check DRM_BUDDY_CLEARED, enable the block as cleared. Else, reset the clear flag for each block in the list. - For merging the 2 cleared blocks compare as below, drm_buddy_is_clear(block) != drm_buddy_is_clear(buddy) Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 6 +- drivers/gpu/drm/drm_buddy.c | 169 +++--- drivers/gpu/drm/i915/i915_ttm_buddy_manager.c | 6 +- drivers/gpu/drm/tests/drm_buddy_test.c| 10 +- include/drm/drm_buddy.h | 18 +- 5 files changed, 168 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index 08916538a615..d0e199cc8f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, return 0; error_free_blocks: - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); error_fini: ttm_resource_fini(man, &vres->base); @@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager *man, amdgpu_vram_mgr_do_reserve(man); - drm_buddy_free_list(mm, &vres->blocks); + drm_buddy_free_list(mm, &vres->blocks, 0); mutex_unlock(&mgr->lock); atomic64_sub(vis_usage, &mgr->vis_usage); @@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev) kfree(rsv); list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) { - drm_buddy_free_list(&mgr->mm, &rsv->allocated); + drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0); kfree(rsv); } if (!adev->gmc.is_app_apu) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..d44172f23f05 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm, __list_add(&block->link, node->link.prev, &node->link); } +static void clear_reset(struct drm_buddy_block *block) +{ + block->header &= ~DRM_BUDDY_HEADER_CLEAR; +} + +static void mark_cleared(struct drm_buddy_block *block) +{ + block->header |= DRM_BUDDY_HEADER_CLEAR; +} + static void mark_allocated(struct drm_buddy_block *block) { block->header &= ~DRM_BUDDY_HEADER_STATE; @@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm, mark_free(mm, block->left); mark_free(mm, block->right); + if (drm_buddy_block_is_clear(block)) { + mark_cleared(block->left); + mark_cleared(block->right); + clear_reset(block); + } + mark_split(block); return 0; @@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; + + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); +
Re: [PATCH v2 1/3] drm/buddy: Improve contiguous memory allocation
s, blocks); + return 0; I guess we could attempt to trim this also (could tweak the trim to work on multiple nodes)? But I guess in practice should be pretty meh, given that the extra rhs is hopefully not too big in the corner case where the alignment doesn't fit the min_block_size? Anyway, for patches 1-3, Reviewed-by: Matthew Auld + } else if (err != -ENOSPC) { + drm_buddy_free_list(mm, blocks); + return err; + } + /* Free blocks for the next iteration */ + drm_buddy_free_list(mm, blocks); + } + + return -ENOSPC; } /** @@ -626,7 +691,7 @@ int drm_buddy_block_trim(struct drm_buddy *mm, new_start = drm_buddy_block_offset(block); list_add(&block->tmp_link, &dfs); - err = __alloc_range(mm, &dfs, new_start, new_size, blocks); + err = __alloc_range(mm, &dfs, new_start, new_size, blocks, NULL); if (err) { mark_allocated(block); mm->avail -= drm_buddy_block_size(mm, block); @@ -645,7 +710,7 @@ EXPORT_SYMBOL(drm_buddy_block_trim); * @start: start of the allowed range for this block * @end: end of the allowed range for this block * @size: size of the allocation - * @min_page_size: alignment of the allocation + * @min_block_size: alignment of the allocation * @blocks: output list head to add allocated blocks * @flags: DRM_BUDDY_*_ALLOCATION flags * @@ -660,23 +725,24 @@ EXPORT_SYMBOL(drm_buddy_block_trim); */ int drm_buddy_alloc_blocks(struct drm_buddy *mm, u64 start, u64 end, u64 size, - u64 min_page_size, + u64 min_block_size, struct list_head *blocks, unsigned long flags) { struct drm_buddy_block *block = NULL; + u64 original_size, original_min_size; unsigned int min_order, order; - unsigned long pages; LIST_HEAD(allocated); + unsigned long pages; int err; if (size < mm->chunk_size) return -EINVAL; - if (min_page_size < mm->chunk_size) + if (min_block_size < mm->chunk_size) return -EINVAL; - if (!is_power_of_2(min_page_size)) + if (!is_power_of_2(min_block_size)) return -EINVAL; if (!IS_ALIGNED(start | end | size, mm->chunk_size)) @@ -690,14 +756,23 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, /* Actual range allocation */ if (start + size == end) - return __drm_buddy_alloc_range(mm, start, size, blocks); - - if (!IS_ALIGNED(size, min_page_size)) - return -EINVAL; + return __drm_buddy_alloc_range(mm, start, size, NULL, blocks); + + original_size = size; + original_min_size = min_block_size; + + /* Roundup the size to power of 2 */ + if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION) { + size = roundup_pow_of_two(size); + min_block_size = size; + /* Align size value to min_block_size */ + } else if (!IS_ALIGNED(size, min_block_size)) { + size = round_up(size, min_block_size); + } pages = size >> ilog2(mm->chunk_size); order = fls(pages) - 1; - min_order = ilog2(min_page_size) - ilog2(mm->chunk_size); + min_order = ilog2(min_block_size) - ilog2(mm->chunk_size); do { order = min(order, (unsigned int)fls(pages) - 1); @@ -716,6 +791,16 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, break; if (order-- == min_order) { + if (flags & DRM_BUDDY_CONTIGUOUS_ALLOCATION && + !(flags & DRM_BUDDY_RANGE_ALLOCATION)) + /* +* Try contiguous block allocation through +* try harder method +*/ + return __alloc_contig_try_harder(mm, + original_size, + original_min_size, + blocks); err = -ENOSPC; goto err_free; } @@ -732,6 +817,31 @@ int drm_buddy_alloc_blocks(struct drm_buddy *mm, break; } while (1); + /* Trim the allocated block to the required size */ + if (original_size != size) { + struct list_head *trim_list; + LIST_HEAD(temp); + u64 trim_size; + + trim_list = &am
Re: [PATCH 1/3] drm/buddy: Fix contiguous memory allocation issues
Hi, On 21/08/2023 11:14, Arunpravin Paneer Selvam wrote: The way now contiguous requests are implemented such that the size rounded up to power of 2 and the corresponding order block picked from the freelist. In addition to the older method, the new method will rounddown the size to power of 2 and the corresponding order block picked from the freelist. And for the remaining size we traverse the tree and try to allocate either from the freelist block's buddy or from the peer block. If the remaining size from peer/buddy block is not free, we pick the next freelist block and repeat the same method. Moved contiguous/alignment size computation part and trim function to the drm buddy manager. I think we should also mention somewhere what issue this is trying to solve. IIUC the roundup_power_of_two() might in some cases trigger -ENOSPC even though there might be enough free space, and so to help with that we introduce a try harder mechanism. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 253 ++-- include/drm/drm_buddy.h | 6 +- 2 files changed, 248 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 7098f125b54a..220f60c08a03 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -569,6 +569,197 @@ static int __drm_buddy_alloc_range(struct drm_buddy *mm, return __alloc_range(mm, &dfs, start, size, blocks); } +static int __alloc_contiguous_block_from_buddy(struct drm_buddy *mm, + u64 size, + u64 min_block_size, + struct drm_buddy_block *block, + struct list_head *blocks) +{ + struct drm_buddy_block *buddy, *parent = NULL; + u64 start, offset = 0; + LIST_HEAD(dfs); + int err; + + if (!block) + return -EINVAL; + + buddy = __get_buddy(block); + if (!buddy) + return -ENOSPC; + + if (drm_buddy_block_is_allocated(buddy)) + return -ENOSPC; + + parent = block->parent; + if (!parent) + return -ENOSPC; + + if (block->parent->right == block) { + u64 remaining; + + /* Compute the leftover size for allocation */ + remaining = max((size - drm_buddy_block_size(mm, buddy)), + min_block_size); + if (!IS_ALIGNED(remaining, min_block_size)) + remaining = round_up(remaining, min_block_size); + + /* Check if remaining size is greater than buddy block size */ + if (drm_buddy_block_size(mm, buddy) < remaining) + return -ENOSPC; + + offset = drm_buddy_block_size(mm, buddy) - remaining; + } + + list_add(&parent->tmp_link, &dfs); + start = drm_buddy_block_offset(parent) + offset; + + err = __alloc_range(mm, &dfs, start, size, blocks); + if (err) + return -ENOSPC; + + return 0; +} + +static int __alloc_contiguous_block_from_peer(struct drm_buddy *mm, + u64 size, + u64 min_block_size, + struct drm_buddy_block *block, + struct list_head *blocks) +{ + struct drm_buddy_block *first, *peer, *tmp; + struct drm_buddy_block *parent = NULL; + u64 start, offset = 0; + unsigned int order; + LIST_HEAD(dfs); + int err; + + if (!block) + return -EINVAL; + + order = drm_buddy_block_order(block); + /* Add freelist block to dfs list */ + list_add(&block->tmp_link, &dfs); + + tmp = block; + parent = block->parent; + while (parent) { + if (block->parent->left == block) { + if (parent->left != tmp) { + peer = parent->left; + break; + } + } else { + if (parent->right != tmp) { + peer = parent->right; + break; + } + } + + tmp = parent; + parent = tmp->parent; + } + + if (!parent) + return -ENOSPC; + + do { + if (drm_buddy_block_is_allocated(peer)) + return -ENOSPC; + /* Exit loop if peer block order is equal to block order */ + if (drm_buddy_block_order(peer) == order) + break; + + if (drm_buddy_block_is_split(peer)) { + /* Traverse down to the block order level */ +
Re: [PATCH v2] drm/ttm: fix one use-after-free
On Wed, 5 Jul 2023 at 11:08, Lang Yu wrote: > > bo->kref is increased once(kref_init()) in ttm_bo_release, > but decreased twice(ttm_bo_put()) respectively in > ttm_bo_delayed_delete and ttm_bo_cleanup_refs, > which is unbalanced. > > Just clean up bo resource in one place for a delayed deleted bo. > > Fixes: 9bff18d13473 ("drm/ttm: use per BO cleanup workers") > > [ 67.399887] refcount_t: underflow; use-after-free. > [ 67.399901] WARNING: CPU: 0 PID: 3172 at lib/refcount.c:28 > refcount_warn_saturate+0xc2/0x110 > [ 67.400124] RIP: 0010:refcount_warn_saturate+0xc2/0x110 > [ 67.400173] Call Trace: > [ 67.400176] > [ 67.400181] ttm_mem_evict_first+0x4fe/0x5b0 [ttm] > [ 67.400216] ttm_bo_mem_space+0x1e3/0x240 [ttm] > [ 67.400239] ttm_bo_validate+0xc7/0x190 [ttm] > [ 67.400253] ? ww_mutex_trylock+0x1b1/0x390 > [ 67.400266] ttm_bo_init_reserved+0x183/0x1c0 [ttm] > [ 67.400280] ? __rwlock_init+0x3d/0x70 > [ 67.400292] amdgpu_bo_create+0x1cd/0x4f0 [amdgpu] > [ 67.400607] ? __pfx_amdgpu_bo_user_destroy+0x10/0x10 [amdgpu] > [ 67.400980] amdgpu_bo_create_user+0x38/0x70 [amdgpu] > [ 67.401291] amdgpu_gem_object_create+0x77/0xb0 [amdgpu] > [ 67.401641] ? __pfx_amdgpu_bo_user_destroy+0x10/0x10 [amdgpu] > [ 67.401958] amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x228/0xa30 [amdgpu] > [ 67.402433] kfd_ioctl_alloc_memory_of_gpu+0x14e/0x390 [amdgpu] > [ 67.402824] ? lock_release+0x13f/0x290 > [ 67.402838] kfd_ioctl+0x1e0/0x640 [amdgpu] > [ 67.403205] ? __pfx_kfd_ioctl_alloc_memory_of_gpu+0x10/0x10 [amdgpu] > [ 67.403579] ? tomoyo_file_ioctl+0x19/0x20 > [ 67.403590] __x64_sys_ioctl+0x95/0xd0 > [ 67.403601] do_syscall_64+0x3b/0x90 > [ 67.403609] entry_SYSCALL_64_after_hwframe+0x72/0xdc > > Signed-off-by: Lang Yu > --- > drivers/gpu/drm/ttm/ttm_bo.c | 89 > 1 file changed, 10 insertions(+), 79 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c > index 326a3d13a829..1e073dfb1332 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -224,82 +224,6 @@ static void ttm_bo_flush_all_fences(struct > ttm_buffer_object *bo) > dma_resv_iter_end(&cursor); > } > > -/** > - * ttm_bo_cleanup_refs > - * If bo idle, remove from lru lists, and unref. > - * If not idle, block if possible. > - * > - * Must be called with lru_lock and reservation held, this function > - * will drop the lru lock and optionally the reservation lock before > returning. > - * > - * @bo:The buffer object to clean-up > - * @interruptible: Any sleeps should occur interruptibly. > - * @no_wait_gpu: Never wait for gpu. Return -EBUSY instead. > - * @unlock_resv: Unlock the reservation lock as well. > - */ > - > -static int ttm_bo_cleanup_refs(struct ttm_buffer_object *bo, > - bool interruptible, bool no_wait_gpu, > - bool unlock_resv) > -{ > - struct dma_resv *resv = &bo->base._resv; > - int ret; > - > - if (dma_resv_test_signaled(resv, DMA_RESV_USAGE_BOOKKEEP)) > - ret = 0; > - else > - ret = -EBUSY; > - > - if (ret && !no_wait_gpu) { > - long lret; > - > - if (unlock_resv) > - dma_resv_unlock(bo->base.resv); > - spin_unlock(&bo->bdev->lru_lock); > - > - lret = dma_resv_wait_timeout(resv, DMA_RESV_USAGE_BOOKKEEP, > -interruptible, > -30 * HZ); > - > - if (lret < 0) > - return lret; > - else if (lret == 0) > - return -EBUSY; > - > - spin_lock(&bo->bdev->lru_lock); > - if (unlock_resv && !dma_resv_trylock(bo->base.resv)) { > - /* > -* We raced, and lost, someone else holds the > reservation now, > -* and is probably busy in ttm_bo_cleanup_memtype_use. > -* > -* Even if it's not the case, because we finished > waiting any > -* delayed destruction would succeed, so just return > success > -* here. > -*/ > - spin_unlock(&bo->bdev->lru_lock); > - return 0; > - } > - ret = 0; > - } > - > - if (ret) { > - if (unlock_resv) > - dma_resv_unlock(bo->base.resv); > - spin_unlock(&bo->bdev->lru_lock); > - return ret; > - } > - > - spin_unlock(&bo->bdev->lru_lock); > - ttm_bo_cleanup_memtype_use(bo); > - > - if (unlock_resv) > - dma_resv_unlock(bo->base.resv); > - > - ttm_bo_put(bo); The put() here
Re: [PATCH v2] drm: fix drmm_mutex_init()
On 22/05/2023 10:43, Thomas Zimmermann wrote: Hi Am 19.05.23 um 11:07 schrieb Matthew Auld: In mutex_init() lockdep identifies a lock by defining a special static key for each lock class. However if we wrap the macro in a function, like in drmm_mutex_init(), we end up generating: int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) { static struct lock_class_key __key; __mutex_init((lock), "lock", &__key); } The static __key here is what lockdep uses to identify the lock class, however since this is just a normal function the key here will be created once, where all callers then use the same key. In effect the mutex->depmap.key will be the same pointer for different drmm_mutex_init() callers. This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. To fix this turn drmm_mutex_init() into a macro such that it generates a different "static struct lock_class_key __key" for each invocation, which looks to be inline with what mutex_init() wants. v2: - Revamp the commit message with clearer explanation of the issue. - Rather export __drmm_mutex_release() than static inline. Reported-by: Thomas Hellström Reported-by: Sarah Walker Fixes: e13f13e039dc ("drm: Add DRM-managed mutex_init()") Cc: Stanislaw Gruszka Cc: Boris Brezillon Cc: Thomas Zimmermann Cc: Jocelyn Falempe Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Signed-off-by: Matthew Auld Acked-by: Thomas Zimmermann Shall I add the patch to drm-misc-fixes? Yes, please do. Thanks. Best regards Thomas --- drivers/gpu/drm/drm_managed.c | 22 ++ include/drm/drm_managed.h | 18 +- 2 files changed, 19 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c index 4cf214de50c4..c21c3f623033 100644 --- a/drivers/gpu/drm/drm_managed.c +++ b/drivers/gpu/drm/drm_managed.c @@ -264,28 +264,10 @@ void drmm_kfree(struct drm_device *dev, void *data) } EXPORT_SYMBOL(drmm_kfree); -static void drmm_mutex_release(struct drm_device *dev, void *res) +void __drmm_mutex_release(struct drm_device *dev, void *res) { struct mutex *lock = res; mutex_destroy(lock); } - -/** - * drmm_mutex_init - &drm_device-managed mutex_init() - * @dev: DRM device - * @lock: lock to be initialized - * - * Returns: - * 0 on success, or a negative errno code otherwise. - * - * This is a &drm_device-managed version of mutex_init(). The initialized - * lock is automatically destroyed on the final drm_dev_put(). - */ -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) -{ - mutex_init(lock); - - return drmm_add_action_or_reset(dev, drmm_mutex_release, lock); -} -EXPORT_SYMBOL(drmm_mutex_init); +EXPORT_SYMBOL(__drmm_mutex_release); diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h index 359883942612..ad08f834af40 100644 --- a/include/drm/drm_managed.h +++ b/include/drm/drm_managed.h @@ -105,6 +105,22 @@ char *drmm_kstrdup(struct drm_device *dev, const char *s, gfp_t gfp); void drmm_kfree(struct drm_device *dev, void *data); -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock); +void __drmm_mutex_release(struct drm_device *dev, void *res); + +/** + * drmm_mutex_init - &drm_device-managed mutex_init() + * @dev: DRM device + * @lock: lock to be initialized + * + * Returns: + * 0 on success, or a negative errno code otherwise. + * + * This is a &drm_device-managed version of mutex_init(). The initialized + * lock is automatically destroyed on the final drm_dev_put(). + */ +#define drmm_mutex_init(dev, lock) ({ \ + mutex_init(lock); \ + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ +}) \ #endif
[PATCH v2] drm: fix drmm_mutex_init()
In mutex_init() lockdep identifies a lock by defining a special static key for each lock class. However if we wrap the macro in a function, like in drmm_mutex_init(), we end up generating: int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) { static struct lock_class_key __key; __mutex_init((lock), "lock", &__key); } The static __key here is what lockdep uses to identify the lock class, however since this is just a normal function the key here will be created once, where all callers then use the same key. In effect the mutex->depmap.key will be the same pointer for different drmm_mutex_init() callers. This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. To fix this turn drmm_mutex_init() into a macro such that it generates a different "static struct lock_class_key __key" for each invocation, which looks to be inline with what mutex_init() wants. v2: - Revamp the commit message with clearer explanation of the issue. - Rather export __drmm_mutex_release() than static inline. Reported-by: Thomas Hellström Reported-by: Sarah Walker Fixes: e13f13e039dc ("drm: Add DRM-managed mutex_init()") Cc: Stanislaw Gruszka Cc: Boris Brezillon Cc: Thomas Zimmermann Cc: Jocelyn Falempe Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Signed-off-by: Matthew Auld --- drivers/gpu/drm/drm_managed.c | 22 ++ include/drm/drm_managed.h | 18 +- 2 files changed, 19 insertions(+), 21 deletions(-) diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c index 4cf214de50c4..c21c3f623033 100644 --- a/drivers/gpu/drm/drm_managed.c +++ b/drivers/gpu/drm/drm_managed.c @@ -264,28 +264,10 @@ void drmm_kfree(struct drm_device *dev, void *data) } EXPORT_SYMBOL(drmm_kfree); -static void drmm_mutex_release(struct drm_device *dev, void *res) +void __drmm_mutex_release(struct drm_device *dev, void *res) { struct mutex *lock = res; mutex_destroy(lock); } - -/** - * drmm_mutex_init - &drm_device-managed mutex_init() - * @dev: DRM device - * @lock: lock to be initialized - * - * Returns: - * 0 on success, or a negative errno code otherwise. - * - * This is a &drm_device-managed version of mutex_init(). The initialized - * lock is automatically destroyed on the final drm_dev_put(). - */ -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) -{ - mutex_init(lock); - - return drmm_add_action_or_reset(dev, drmm_mutex_release, lock); -} -EXPORT_SYMBOL(drmm_mutex_init); +EXPORT_SYMBOL(__drmm_mutex_release); diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h index 359883942612..ad08f834af40 100644 --- a/include/drm/drm_managed.h +++ b/include/drm/drm_managed.h @@ -105,6 +105,22 @@ char *drmm_kstrdup(struct drm_device *dev, const char *s, gfp_t gfp); void drmm_kfree(struct drm_device *dev, void *data); -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock); +void __drmm_mutex_release(struct drm_device *dev, void *res); + +/** + * drmm_mutex_init - &drm_device-managed mutex_init() + * @dev: DRM device + * @lock: lock to be initialized + * + * Returns: + * 0 on success, or a negative errno code otherwise. + * + * This is a &drm_device-managed version of mutex_init(). The initialized + * lock is automatically destroyed on the final drm_dev_put(). + */ +#define drmm_mutex_init(dev, lock) ({ \ + mutex_init(lock);\ + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ +}) \ #endif -- 2.40.1
Re: [PATCH] drm/managed: Define drmm_mutex_init() as a macro to fix lockdep
On Fri, 19 May 2023 at 09:55, Boris Brezillon wrote: > > drmm_mutex_init() needs to be defined as a macro if we want > lockdep to classify locks properly. If we don't do that, all locks > will be considered as belonging to the same lock class, leading to > false positive deadlock reports. > > Signed-off-by: Boris Brezillon > Reported-by: Sarah Walker Yeah, we also encountered the same issue. Patch is here: https://patchwork.freedesktop.org/patch/537605/?series=117891&rev=2 > --- > drivers/gpu/drm/drm_managed.c | 26 -- > include/drm/drm_managed.h | 30 +- > 2 files changed, 29 insertions(+), 27 deletions(-) > > diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c > index 4cf214de50c4..71c49819a7a2 100644 > --- a/drivers/gpu/drm/drm_managed.c > +++ b/drivers/gpu/drm/drm_managed.c > @@ -263,29 +263,3 @@ void drmm_kfree(struct drm_device *dev, void *data) > free_dr(dr_match); > } > EXPORT_SYMBOL(drmm_kfree); > - > -static void drmm_mutex_release(struct drm_device *dev, void *res) > -{ > - struct mutex *lock = res; > - > - mutex_destroy(lock); > -} > - > -/** > - * drmm_mutex_init - &drm_device-managed mutex_init() > - * @dev: DRM device > - * @lock: lock to be initialized > - * > - * Returns: > - * 0 on success, or a negative errno code otherwise. > - * > - * This is a &drm_device-managed version of mutex_init(). The initialized > - * lock is automatically destroyed on the final drm_dev_put(). > - */ > -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) > -{ > - mutex_init(lock); > - > - return drmm_add_action_or_reset(dev, drmm_mutex_release, lock); > -} > -EXPORT_SYMBOL(drmm_mutex_init); > diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h > index 359883942612..87ffb92a16ba 100644 > --- a/include/drm/drm_managed.h > +++ b/include/drm/drm_managed.h > @@ -105,6 +105,34 @@ char *drmm_kstrdup(struct drm_device *dev, const char > *s, gfp_t gfp); > > void drmm_kfree(struct drm_device *dev, void *data); > > -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock); > +/* Private function, don't use. */ > +static inline void __drmm_mutex_release(struct drm_device *dev, void *res) > +{ > + struct mutex *lock = res; > + > + mutex_destroy(lock); > +} > + > +/** > + * drmm_mutex_init - &drm_device-managed mutex_init() > + * @dev: DRM device > + * @lock: lock to be initialized > + * > + * Returns: > + * 0 on success, or a negative errno code otherwise. > + * > + * This is a &drm_device-managed version of mutex_init(). The initialized > + * lock is automatically destroyed on the final drm_dev_put(). > + * > + * This needs to be defined as a macro to let lockdep classify locks > + * properly. If we don't do that, all locks will be considered as > + * belonging to the same lock class, leading to false positive lockdep > + * reports. > + */ > +#define drmm_mutex_init(dev, lock) \ > + ({\ > + mutex_init(lock); \ > + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ > + }) > > #endif > -- > 2.40.1 >
Re: [PATCH v5 1/7] drm: fix drmm_mutex_init()
On 17/05/2023 17:21, Thomas Zimmermann wrote: Hi Am 17.05.23 um 17:22 schrieb Matthew Auld: In mutex_init() lockdep seems to identify a lock by defining a static key for each lock class. However if we wrap the whole thing in a function the static key will be the same for everything calling that function, which looks to be the case for drmm_mutex_init(). This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. The other issue is that when looking at splats we lose the actual lock name, where instead of seeing something like xe->mem_access.lock for the name, we just see something generic like lock#8. Attempt to fix this by converting drmm_mutex_init() into a macro, which should ensure that mutex_init() behaves as expected. If that's what is required, then OK. But even with your commit mesage, I find it entirely non-obvious what the problem is. Isn't there a way to annotate drmm_mutex_init() so that lockdep treats it like a regular mutex_init()? AFAICT the issue is that with the existing drmm_mutex_init() we basically end up generating: int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) { static struct lock_class_key __key; __mutex_init((lock), "lock", &__key); } I think the special static __key is what lockdep uses to identify a lock class, so every time we call drmm_mutex_init() we should expect a different key. However since this is just a normal function the key will be created once and then all callers use the same key. For example, if you print mutex->depmap.key you will get the same pointer underneath for different drmm_mutex_init callers. And then ofc lockdep gets really confused. Turning it into a macro ensures that each drmm_mutex_init() generates a different "static struct lock_class_key __key" for each invocation, which looks to be inline with what mutex_init() wants. I'm not sure if there a better way to solve this... Best regards Thomas Reported-by: Thomas Hellström Fixes: e13f13e039dc ("drm: Add DRM-managed mutex_init()") Cc: Thomas Zimmermann Cc: Jocelyn Falempe Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Signed-off-by: Matthew Auld --- drivers/gpu/drm/drm_managed.c | 26 -- include/drm/drm_managed.h | 23 ++- 2 files changed, 22 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c index 4cf214de50c4..71c49819a7a2 100644 --- a/drivers/gpu/drm/drm_managed.c +++ b/drivers/gpu/drm/drm_managed.c @@ -263,29 +263,3 @@ void drmm_kfree(struct drm_device *dev, void *data) free_dr(dr_match); } EXPORT_SYMBOL(drmm_kfree); - -static void drmm_mutex_release(struct drm_device *dev, void *res) -{ - struct mutex *lock = res; - - mutex_destroy(lock); -} - -/** - * drmm_mutex_init - &drm_device-managed mutex_init() - * @dev: DRM device - * @lock: lock to be initialized - * - * Returns: - * 0 on success, or a negative errno code otherwise. - * - * This is a &drm_device-managed version of mutex_init(). The initialized - * lock is automatically destroyed on the final drm_dev_put(). - */ -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) -{ - mutex_init(lock); - - return drmm_add_action_or_reset(dev, drmm_mutex_release, lock); -} -EXPORT_SYMBOL(drmm_mutex_init); diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h index 359883942612..01f977e91933 100644 --- a/include/drm/drm_managed.h +++ b/include/drm/drm_managed.h @@ -105,6 +105,27 @@ char *drmm_kstrdup(struct drm_device *dev, const char *s, gfp_t gfp); void drmm_kfree(struct drm_device *dev, void *data); -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock); +static inline void __drmm_mutex_release(struct drm_device *dev, void *res) +{ + struct mutex *lock = res; + + mutex_destroy(lock); +} + +/** + * drmm_mutex_init - &drm_device-managed mutex_init() + * @dev: DRM device + * @lock: lock to be initialized + * + * Returns: + * 0 on success, or a negative errno code otherwise. + * + * This is a &drm_device-managed version of mutex_init(). The initialized + * lock is automatically destroyed on the final drm_dev_put(). + */ +#define drmm_mutex_init(dev, lock) ({ \ + mutex_init(lock); \ + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ +}) \ #endif
Re: [PATCH v5 1/7] drm: fix drmm_mutex_init()
On 17/05/2023 17:05, Stanislaw Gruszka wrote: On Wed, May 17, 2023 at 04:22:38PM +0100, Matthew Auld wrote: In mutex_init() lockdep seems to identify a lock by defining a static key for each lock class. However if we wrap the whole thing in a function the static key will be the same for everything calling that function, which looks to be the case for drmm_mutex_init(). This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. The other issue is that when looking at splats we lose the actual lock name, where instead of seeing something like xe->mem_access.lock for the name, we just see something generic like lock#8. Attempt to fix this by converting drmm_mutex_init() into a macro, which should ensure that mutex_init() behaves as expected. Nice catch :-) we observed lockdep deadlock false alarms too, but I could not spot it out and were adding lockdep_set_class(key) to avoid those. +static inline void __drmm_mutex_release(struct drm_device *dev, void *res) Can this be inline if used in drmm_add_action_or_reset() ? I think so. Did I missing something here? It at least builds for me. +{ + struct mutex *lock = res; + + mutex_destroy(lock); +} + +/** + * drmm_mutex_init - &drm_device-managed mutex_init() + * @dev: DRM device + * @lock: lock to be initialized + * + * Returns: + * 0 on success, or a negative errno code otherwise. + * + * This is a &drm_device-managed version of mutex_init(). The initialized + * lock is automatically destroyed on the final drm_dev_put(). + */ +#define drmm_mutex_init(dev, lock) ({ \ + mutex_init(lock);\ + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ +}) \ Regards Stanislaw
[PATCH v5 1/7] drm: fix drmm_mutex_init()
In mutex_init() lockdep seems to identify a lock by defining a static key for each lock class. However if we wrap the whole thing in a function the static key will be the same for everything calling that function, which looks to be the case for drmm_mutex_init(). This then results in impossible lockdep splats since lockdep thinks completely unrelated locks are the same lock class. The other issue is that when looking at splats we lose the actual lock name, where instead of seeing something like xe->mem_access.lock for the name, we just see something generic like lock#8. Attempt to fix this by converting drmm_mutex_init() into a macro, which should ensure that mutex_init() behaves as expected. Reported-by: Thomas Hellström Fixes: e13f13e039dc ("drm: Add DRM-managed mutex_init()") Cc: Thomas Zimmermann Cc: Jocelyn Falempe Cc: Daniel Vetter Cc: dri-devel@lists.freedesktop.org Signed-off-by: Matthew Auld --- drivers/gpu/drm/drm_managed.c | 26 -- include/drm/drm_managed.h | 23 ++- 2 files changed, 22 insertions(+), 27 deletions(-) diff --git a/drivers/gpu/drm/drm_managed.c b/drivers/gpu/drm/drm_managed.c index 4cf214de50c4..71c49819a7a2 100644 --- a/drivers/gpu/drm/drm_managed.c +++ b/drivers/gpu/drm/drm_managed.c @@ -263,29 +263,3 @@ void drmm_kfree(struct drm_device *dev, void *data) free_dr(dr_match); } EXPORT_SYMBOL(drmm_kfree); - -static void drmm_mutex_release(struct drm_device *dev, void *res) -{ - struct mutex *lock = res; - - mutex_destroy(lock); -} - -/** - * drmm_mutex_init - &drm_device-managed mutex_init() - * @dev: DRM device - * @lock: lock to be initialized - * - * Returns: - * 0 on success, or a negative errno code otherwise. - * - * This is a &drm_device-managed version of mutex_init(). The initialized - * lock is automatically destroyed on the final drm_dev_put(). - */ -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock) -{ - mutex_init(lock); - - return drmm_add_action_or_reset(dev, drmm_mutex_release, lock); -} -EXPORT_SYMBOL(drmm_mutex_init); diff --git a/include/drm/drm_managed.h b/include/drm/drm_managed.h index 359883942612..01f977e91933 100644 --- a/include/drm/drm_managed.h +++ b/include/drm/drm_managed.h @@ -105,6 +105,27 @@ char *drmm_kstrdup(struct drm_device *dev, const char *s, gfp_t gfp); void drmm_kfree(struct drm_device *dev, void *data); -int drmm_mutex_init(struct drm_device *dev, struct mutex *lock); +static inline void __drmm_mutex_release(struct drm_device *dev, void *res) +{ + struct mutex *lock = res; + + mutex_destroy(lock); +} + +/** + * drmm_mutex_init - &drm_device-managed mutex_init() + * @dev: DRM device + * @lock: lock to be initialized + * + * Returns: + * 0 on success, or a negative errno code otherwise. + * + * This is a &drm_device-managed version of mutex_init(). The initialized + * lock is automatically destroyed on the final drm_dev_put(). + */ +#define drmm_mutex_init(dev, lock) ({ \ + mutex_init(lock);\ + drmm_add_action_or_reset(dev, __drmm_mutex_release, lock); \ +}) \ #endif -- 2.40.1
Re: [PATCH 7/7] drm/i915: Allow user to set cache at BO creation
On Sat, 1 Apr 2023 at 07:37, wrote: > > From: Fei Yang > > To comply with the design that buffer objects shall have immutable > cache setting through out its life cycle, {set, get}_caching ioctl's > are no longer supported from MTL onward. With that change caching > policy can only be set at object creation time. The current code > applies a default (platform dependent) cache setting for all objects. > However this is not optimal for performance tuning. The patch extends > the existing gem_create uAPI to let user set PAT index for the object > at creation time. > The new extension is platform independent, so UMD's can switch to using > this extension for older platforms as well, while {set, get}_caching are > still supported on these legacy paltforms for compatibility reason. Do we forbid {set, get}_caching, when combined with this new extension on the same BO? There is some documentation in @cache_dirty. The concern is being able to subvert the flush-on-acquire for non-LLC. > > Cc: Chris Wilson > Cc: Matt Roper > Signed-off-by: Fei Yang > Reviewed-by: Andi Shyti > --- > drivers/gpu/drm/i915/gem/i915_gem_create.c | 33 > include/uapi/drm/i915_drm.h| 36 ++ > tools/include/uapi/drm/i915_drm.h | 36 ++ > 3 files changed, 105 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_create.c > b/drivers/gpu/drm/i915/gem/i915_gem_create.c > index e76c9703680e..1c6e2034d28e 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_create.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_create.c > @@ -244,6 +244,7 @@ struct create_ext { > unsigned int n_placements; > unsigned int placement_mask; > unsigned long flags; > + unsigned int pat_index; > }; > > static void repr_placements(char *buf, size_t size, > @@ -393,11 +394,39 @@ static int ext_set_protected(struct i915_user_extension > __user *base, void *data > return 0; > } > > +static int ext_set_pat(struct i915_user_extension __user *base, void *data) > +{ > + struct create_ext *ext_data = data; > + struct drm_i915_private *i915 = ext_data->i915; > + struct drm_i915_gem_create_ext_set_pat ext; > + unsigned int max_pat_index; > + > + BUILD_BUG_ON(sizeof(struct drm_i915_gem_create_ext_set_pat) != > +offsetofend(struct drm_i915_gem_create_ext_set_pat, > rsvd)); > + > + if (copy_from_user(&ext, base, sizeof(ext))) > + return -EFAULT; > + > + max_pat_index = INTEL_INFO(i915)->max_pat_index; > + > + if (ext.pat_index > max_pat_index) { > + drm_dbg(&i915->drm, "PAT index is invalid: %u\n", > + ext.pat_index); > + return -EINVAL; > + } > + > + ext_data->pat_index = ext.pat_index; > + > + return 0; > +} > + > static const i915_user_extension_fn create_extensions[] = { > [I915_GEM_CREATE_EXT_MEMORY_REGIONS] = ext_set_placements, > [I915_GEM_CREATE_EXT_PROTECTED_CONTENT] = ext_set_protected, > + [I915_GEM_CREATE_EXT_SET_PAT] = ext_set_pat, > }; > > +#define PAT_INDEX_NOT_SET 0x > /** > * Creates a new mm object and returns a handle to it. > * @dev: drm device pointer > @@ -417,6 +446,7 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void > *data, > if (args->flags & ~I915_GEM_CREATE_EXT_FLAG_NEEDS_CPU_ACCESS) > return -EINVAL; > > + ext_data.pat_index = PAT_INDEX_NOT_SET; > ret = i915_user_extensions(u64_to_user_ptr(args->extensions), >create_extensions, >ARRAY_SIZE(create_extensions), > @@ -453,5 +483,8 @@ i915_gem_create_ext_ioctl(struct drm_device *dev, void > *data, > if (IS_ERR(obj)) > return PTR_ERR(obj); > > + if (ext_data.pat_index != PAT_INDEX_NOT_SET) > + i915_gem_object_set_pat_index(obj, ext_data.pat_index); > + > return i915_gem_publish(obj, file, &args->size, &args->handle); > } > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h > index dba7c5a5b25e..03c5c314846e 100644 > --- a/include/uapi/drm/i915_drm.h > +++ b/include/uapi/drm/i915_drm.h > @@ -3630,9 +3630,13 @@ struct drm_i915_gem_create_ext { > * > * For I915_GEM_CREATE_EXT_PROTECTED_CONTENT usage see > * struct drm_i915_gem_create_ext_protected_content. > +* > +* For I915_GEM_CREATE_EXT_SET_PAT usage see > +* struct drm_i915_gem_create_ext_set_pat. > */ > #define I915_GEM_CREATE_EXT_MEMORY_REGIONS 0 > #define I915_GEM_CREATE_EXT_PROTECTED_CONTENT 1 > +#define I915_GEM_CREATE_EXT_SET_PAT 2 > __u64 extensions; > }; > > @@ -3747,6 +3751,38 @@ struct drm_i915_gem_create_ext_protected_content { > __u32 flags; > }; > > +/** > + * struct drm_i915_gem_create_ext_set_pat - The > + * I915_GEM_CREATE_EXT_SET_PAT extension. > + *
Re: [PATCH] drm/i915: Fix context runtime accounting
On Mon, 20 Mar 2023 at 15:14, Tvrtko Ursulin wrote: > > From: Tvrtko Ursulin > > When considering whether to mark one context as stopped and another as > started we need to look at whether the previous and new _contexts_ are > different and not just requests. Otherwise the software tracked context > start time was incorrectly updated to the most recent lite-restore time- > stamp, which was in some cases resulting in active time going backward, > until the context switch (typically the hearbeat pulse) would synchronise > with the hardware tracked context runtime. Easiest use case to observe > this behaviour was with a full screen clients with close to 100% engine > load. > > Signed-off-by: Tvrtko Ursulin > Fixes: bb6287cb1886 ("drm/i915: Track context current active time") > Cc: # v5.19+ Seems reasonable to me, fwiw, Reviewed-by: Matthew Auld
Re: [PATCH] drm/ttm: drop extra ttm_bo_put in ttm_bo_cleanup_refs
On Thu, 16 Mar 2023 at 07:26, Christian König wrote: > > That was accidentially left over when we switched to the delayed delete > worker. > > Suggested-by: Matthew Auld > Signed-off-by: Christian König > Fixes: ("9bff18d13473") drm/ttm: use per BO cleanup workers > Reported-by: Steven Rostedt (Google) > Tested-by: Steven Rostedt (Google) Reviewed-by: Matthew Auld
Re: [Intel-gfx] [BUG 6.3-rc1] Bad lock in ttm_bo_delayed_delete()
On Wed, 15 Mar 2023 at 18:41, Christian König wrote: > > Am 08.03.23 um 13:43 schrieb Steven Rostedt: > > On Wed, 8 Mar 2023 07:17:38 +0100 > > Christian König wrote: > > > >> What test case/environment do you run to trigger this? > > I'm running a 32bit x86 qemu instance. Attached is the config. > > > > The libvirt xml file is here: https://rostedt.org/vm-images/tracetest-32.xml > > and the VM image itself is here: > > https://rostedt.org/vm-images/tracetest-32.qcow2.bz2 > > I've started to download that, but it will take about an hour. So I > tried to avoid that for now. > > But looks like there isn't any other way to reproduce this, the code > seems to work with both amdgpu and radeon. > > My suspicion is that we just have a reference count issue in qxl or ttm > which was never noticed because it didn't caused any problems (except > for a minor memory corruption). Why does ttm_bo_cleanup_refs() do a bo_put() at the end? It doesn't make sense to me. Say if the BO is in the process of being delay freed (bo->deleted = true), and we just did the kref_init() in ttm_bo_release(), it might drop that ref hitting ttm_bo_release() yet again, this time doing the actual bo->destroy(), which frees the object. The worker then fires at some later point calling ttm_bo_delayed_delete(), but the BO has already been freed. > > Now you get a rain of warnings because we try to grab the lock in the > delete worker. > > Christian. > > > > > It happened again in another test (it's not 100% reproducible). > > > > [ 23.234838] [ cut here ] > > [ 23.236391] DEBUG_LOCKS_WARN_ON(lock->magic != lock) > > [ 23.236429] WARNING: CPU: 0 PID: 61 at kernel/locking/mutex.c:582 > > __ww_mutex_lock.constprop.0+0x566/0xfec > > [ 23.240990] Modules linked in: > > [ 23.242368] CPU: 0 PID: 61 Comm: kworker/0:1H Not tainted > > 6.3.0-rc1-test-1-ga98bd42762ed-dirty #972 > > [ 23.245106] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > 1.16.0-debian-1.16.0-5 04/01/2014 > > [ 23.247900] Workqueue: ttm ttm_bo_delayed_delete > > [ 23.249642] EIP: __ww_mutex_lock.constprop.0+0x566/0xfec > > [ 23.251563] Code: e8 2b 5a 95 ff 85 c0 0f 84 25 fb ff ff 8b 0d 18 71 3b > > c8 85 c9 0f 85 17 fb ff ff 68 c0 58 07 c8 68 07 77 05 c8 e8 e6 0a 40 ff > > <0f> 0b 58 5a e9 ff fa ff ff e8 f8 59 95 ff 85 c0 74 0e 8b 0d 18 71 > > [ 23.256901] EAX: 0028 EBX: ECX: c1847dd8 EDX: 0002 > > [ 23.258849] ESI: EDI: c12958bc EBP: c1847f00 ESP: c1847eac > > [ 23.260786] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010286 > > [ 23.262840] CR0: 80050033 CR2: ffbff000 CR3: 0850e000 CR4: 00150ef0 > > [ 23.264781] Call Trace: > > [ 23.265899] ? lock_is_held_type+0xbe/0x10c > > [ 23.267434] ? ttm_bo_delayed_delete+0x30/0x94 > > [ 23.268971] ww_mutex_lock+0x32/0x94 > > [ 23.270327] ttm_bo_delayed_delete+0x30/0x94 > > [ 23.271818] process_one_work+0x21a/0x538 > > [ 23.273242] worker_thread+0x146/0x398 > > [ 23.274616] kthread+0xea/0x10c > > [ 23.275859] ? process_one_work+0x538/0x538 > > [ 23.277312] ? kthread_complete_and_exit+0x1c/0x1c > > [ 23.278899] ret_from_fork+0x1c/0x28 > > [ 23.280223] irq event stamp: 33 > > [ 23.281440] hardirqs last enabled at (33): [] > > _raw_spin_unlock_irqrestore+0x2d/0x58 > > [ 23.283860] hardirqs last disabled at (32): [] > > kvfree_call_rcu+0x155/0x2ec > > [ 23.286066] softirqs last enabled at (0): [] > > copy_process+0x989/0x2368 > > [ 23.288220] softirqs last disabled at (0): [<>] 0x0 > > [ 23.289952] ---[ end trace ]--- > > [ 23.291501] [ cut here ] > > [ 23.293027] refcount_t: underflow; use-after-free. > > [ 23.294644] WARNING: CPU: 0 PID: 61 at lib/refcount.c:28 > > refcount_warn_saturate+0xb6/0xfc > > [ 23.296959] Modules linked in: > > [ 23.298168] CPU: 0 PID: 61 Comm: kworker/0:1H Tainted: GW > > 6.3.0-rc1-test-1-ga98bd42762ed-dirty #972 > > [ 23.301073] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > > 1.16.0-debian-1.16.0-5 04/01/2014 > > [ 23.303642] Workqueue: ttm ttm_bo_delayed_delete > > [ 23.305190] EIP: refcount_warn_saturate+0xb6/0xfc > > [ 23.306767] Code: 68 70 e1 0c c8 e8 f6 d6 a9 ff 0f 0b 58 c9 c3 90 80 3d > > 8a 78 38 c8 00 75 8a c6 05 8a 78 38 c8 01 68 9c e1 0c c8 e8 d6 d6 a9 ff > > <0f> 0b 59 c9 c3 80 3d 88 78 38 c8 00 0f 85 67 ff ff ff c6 05 88 78 > > [ 23.311935] EAX: 0026 EBX: c1295950 ECX: c1847e40 EDX: 0002 > > [ 23.313884] ESI: c12958bc EDI: f7591100 EBP: c1847f18 ESP: c1847f14 > > [ 23.315840] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 EFLAGS: 00010246 > > [ 23.317887] CR0: 80050033 CR2: ffbff000 CR3: 0850e000 CR4: 00150ef0 > > [ 23.319859] Call Trace: > > [ 23.320978] ttm_bo_delayed_delete+0x8c/0x94 > > [ 23.322492] process_one_work+0x21a/0x538 > > [ 23.323959] worker_thread+0x146/0x398 > > [ 23.325353] kthread+0xea/0x10c > > [
Re: [PATCH v4 5/5] drm/i915/gt: Make sure that errors are propagated through request chains
On 08/03/2023 09:41, Andi Shyti wrote: Currently, when we perform operations such as clearing or copying large blocks of memory, we generate multiple requests that are executed in a chain. However, if one of these requests fails, we may not realize it unless it happens to be the last request in the chain. This is because errors are not properly propagated. For this we need to keep propagating the chain of fence notification in order to always reach the final fence associated to the final request. To address this issue, we need to ensure that the chain of fence notifications is always propagated so that we can reach the final fence associated with the last request. By doing so, we will be able to detect any memory operation failures and determine whether the memory is still invalid. On copy and clear migration signal fences upon completion. On copy and clear migration, signal fences upon request completion to ensure that we have a reliable perpetuation of the operation outcome. Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration") Reported-by: Matthew Auld Suggested-by: Chris Wilson Signed-off-by: Andi Shyti Cc: sta...@vger.kernel.org Reviewed-by: Matthew Auld --- drivers/gpu/drm/i915/gt/intel_migrate.c | 41 ++--- 1 file changed, 30 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 3f638f1987968..0031e7b1b4704 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -742,13 +742,19 @@ intel_context_migrate_copy(struct intel_context *ce, dst_offset = 2 * CHUNK_SZ; } + /* +* While building the chain of requests, we need to ensure +* that no one can sneak into the timeline unnoticed. +*/ + mutex_lock(&ce->timeline->mutex); + Hmm, this looks different/new from the previous version. Why do we only do this for the copy and not the clear btw? Both should be conceptually the same. Sorry if I'm misunderstanding something here. do { int len; - rq = i915_request_create(ce); + rq = i915_request_create_locked(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_ce; + break; } if (deps) { @@ -878,10 +884,14 @@ intel_context_migrate_copy(struct intel_context *ce, /* Arbitration is re-enabled between requests. */ out_rq: - if (*out) + i915_sw_fence_await(&rq->submit); + i915_request_get(rq); + i915_request_add_locked(rq); + if (*out) { + i915_sw_fence_complete(&(*out)->submit); i915_request_put(*out); - *out = i915_request_get(rq); - i915_request_add(rq); + } + *out = rq; if (err) break; @@ -905,7 +915,10 @@ intel_context_migrate_copy(struct intel_context *ce, cond_resched(); } while (1); -out_ce: + mutex_unlock(&ce->timeline->mutex); + + if (*out) + i915_sw_fence_complete(&(*out)->submit); return err; } @@ -1005,7 +1018,7 @@ intel_context_migrate_clear(struct intel_context *ce, rq = i915_request_create(ce); if (IS_ERR(rq)) { err = PTR_ERR(rq); - goto out_ce; + break; } if (deps) { @@ -1056,17 +1069,23 @@ intel_context_migrate_clear(struct intel_context *ce, /* Arbitration is re-enabled between requests. */ out_rq: - if (*out) - i915_request_put(*out); - *out = i915_request_get(rq); + i915_sw_fence_await(&rq->submit); + i915_request_get(rq); i915_request_add(rq); + if (*out) { + i915_sw_fence_complete(&(*out)->submit); + i915_request_put(*out); + } + *out = rq; + if (err || !it.sg || !sg_dma_len(it.sg)) break; cond_resched(); } while (1); -out_ce: + if (*out) + i915_sw_fence_complete(&(*out)->submit); return err; }
Re: [Intel-gfx] [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer
On 06/03/2023 13:31, Das, Nirmoy wrote: Hi Matt, On 3/6/2023 1:25 PM, Matthew Auld wrote: On 06/03/2023 12:07, Nirmoy Das wrote: Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER flag for it. This also make sure that ttm allocates offset for lmem objects. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/display/intel_dpt.c | 4 +++- drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++- drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index ad1a37b515fb..2e6238881860 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb) size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE); - dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS); + dpt_obj = i915_gem_object_create_lmem(i915, size, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_USER); AFAICT this is just some driver internal stuff for display page-table, which gets mapped through GGTT or something, and is not the actual fb. Is it really exposed to the user? I misunderstood this for something else. I will remove this. if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt)) dpt_obj = i915_gem_object_create_stolen(i915, size); if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) { diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 3659350061a7..98ae3a3a986a 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper, obj = ERR_PTR(-ENODEV); if (HAS_LMEM(dev_priv)) { obj = i915_gem_object_create_lmem(dev_priv, size, - I915_BO_ALLOC_CONTIGUOUS); + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_USER); } else { /* * If the FB is too big, just don't use it since fbdev is not very diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c index bb6ea7de5c61..4a3680f6a3f5 100644 --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c @@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915, size * 2 > i915->dsm.usable_size) return NULL; - obj = i915_gem_object_create_region_at(mem, phys_base, size, 0); + obj = i915_gem_object_create_region_at(mem, phys_base, size, + I915_BO_ALLOC_USER); ALLOC_USER has the side effect of also zeroing the memory underneath, IIRC. However this here is the pre-allocated fb (will have some boot logo stuff), so we shouldn't ever clear it. This was my concern. I wonder if there is any other better way than to use a temp buffer to copy the pre-allocated content and put it back after getting i915_gem_object_create_region_at(). If we need ALLOC_USER for this buffer then maybe just a new flag like BO_PREALLOCATED which skips all the clearing? Regards, Nirmoy if (IS_ERR(obj)) return NULL;
Re: [PATCH 1/3] drm/i915: Set I915_BO_ALLOC_USER for framebuffer
On 06/03/2023 12:07, Nirmoy Das wrote: Framebuffer is exposed to userspace so set I915_BO_ALLOC_USER flag for it. This also make sure that ttm allocates offset for lmem objects. Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/display/intel_dpt.c | 4 +++- drivers/gpu/drm/i915/display/intel_fbdev.c | 3 ++- drivers/gpu/drm/i915/display/intel_plane_initial.c | 3 ++- 3 files changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c index ad1a37b515fb..2e6238881860 100644 --- a/drivers/gpu/drm/i915/display/intel_dpt.c +++ b/drivers/gpu/drm/i915/display/intel_dpt.c @@ -254,7 +254,9 @@ intel_dpt_create(struct intel_framebuffer *fb) size = round_up(size * sizeof(gen8_pte_t), I915_GTT_PAGE_SIZE); - dpt_obj = i915_gem_object_create_lmem(i915, size, I915_BO_ALLOC_CONTIGUOUS); + dpt_obj = i915_gem_object_create_lmem(i915, size, + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_USER); AFAICT this is just some driver internal stuff for display page-table, which gets mapped through GGTT or something, and is not the actual fb. Is it really exposed to the user? if (IS_ERR(dpt_obj) && i915_ggtt_has_aperture(to_gt(i915)->ggtt)) dpt_obj = i915_gem_object_create_stolen(i915, size); if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) { diff --git a/drivers/gpu/drm/i915/display/intel_fbdev.c b/drivers/gpu/drm/i915/display/intel_fbdev.c index 3659350061a7..98ae3a3a986a 100644 --- a/drivers/gpu/drm/i915/display/intel_fbdev.c +++ b/drivers/gpu/drm/i915/display/intel_fbdev.c @@ -163,7 +163,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper, obj = ERR_PTR(-ENODEV); if (HAS_LMEM(dev_priv)) { obj = i915_gem_object_create_lmem(dev_priv, size, - I915_BO_ALLOC_CONTIGUOUS); + I915_BO_ALLOC_CONTIGUOUS | + I915_BO_ALLOC_USER); } else { /* * If the FB is too big, just don't use it since fbdev is not very diff --git a/drivers/gpu/drm/i915/display/intel_plane_initial.c b/drivers/gpu/drm/i915/display/intel_plane_initial.c index bb6ea7de5c61..4a3680f6a3f5 100644 --- a/drivers/gpu/drm/i915/display/intel_plane_initial.c +++ b/drivers/gpu/drm/i915/display/intel_plane_initial.c @@ -110,7 +110,8 @@ initial_plane_vma(struct drm_i915_private *i915, size * 2 > i915->dsm.usable_size) return NULL; - obj = i915_gem_object_create_region_at(mem, phys_base, size, 0); + obj = i915_gem_object_create_region_at(mem, phys_base, size, + I915_BO_ALLOC_USER); ALLOC_USER has the side effect of also zeroing the memory underneath, IIRC. However this here is the pre-allocated fb (will have some boot logo stuff), so we shouldn't ever clear it. if (IS_ERR(obj)) return NULL;
Re: [Intel-gfx] [PATCH] drm/i915/gt: Make sure that errors are propagated through request chains
On Fri, 10 Feb 2023 at 14:06, Andi Shyti wrote: > > Currently, for operations like memory clear or copy for big > chunks of memory, we generate multiple requests executed in a > chain. > > But if one of the requests generated fails we would not know it > to unless it happens to the last request, because errors are not > properly propagated. > > For this we need to keep propagating the chain of fence > notification in order to always reach the final fence associated > to the final request. > > This way we would know that the memory operation has failed and > whether the memory is still invalid. > > On copy and clear migration signal fences upon completion. > > Fixes: cf586021642d80 ("drm/i915/gt: Pipelined page migration") > Reported-by: Matthew Auld > Suggested-by: Chris Wilson > Signed-off-by: Andi Shyti > Cc: sta...@vger.kernel.org Reviewed-by: Matthew Auld