Re: [PATCH v2 2/2] drm/ttm: Add a device flag to propagate -ENOSPC on OOM
Am 02.10.24 um 14:24 schrieb Thomas Hellström: Some graphics APIs differentiate between out-of-graphics-memory and out-of-host-memory (system memory). Add a device init flag to have -ENOSPC propagated from the resource managers instead of being converted to -ENOMEM, to aid driver stacks in determining what error code to return or whether corrective action can be taken at the driver level. Cc: Christian König Cc: Matthew Brost Signed-off-by: Thomas Hellström Independent of how we communicate flags to the TTM device init function this looks like the right approach to me. So feel free to add Reviewed-by: Christian König . Regards, Christian. --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 1 + include/drm/ttm/ttm_device.h | 13 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 320592435252..c4bec2ad301b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -835,7 +835,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, /* For backward compatibility with userspace */ if (ret == -ENOSPC) - return -ENOMEM; + return bo->bdev->propagate_enospc ? ret : -ENOMEM; /* * We might need to add a TTM. diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 0c85d10e5e0b..aee9d52d745b 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -203,6 +203,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func } bdev->funcs = funcs; + bdev->propagate_enospc = flags.propagate_enospc; ttm_sys_man_init(bdev); diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index 1534bd946c78..f9da78bbd925 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -266,6 +266,13 @@ struct ttm_device { * @wq: Work queue structure for the delayed delete workqueue. */ struct workqueue_struct *wq; + + /** +* @propagate_enospc: Whether -ENOSPC should be propagated to the caller after +* graphics memory allocation failure. If false, this will be converted to +* -ENOMEM, which is the default behaviour. +*/ + bool propagate_enospc; }; int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags); @@ -295,6 +302,12 @@ struct ttm_device_init_flags { u32 use_dma_alloc : 1; /** @use_dma32: If we should use GFP_DMA32 for device memory allocations. */ u32 use_dma32 : 1; + /** +* @propagate_enospc: Whether -ENOSPC should be propagated to the caller after +* graphics memory allocation failure. If false, this will be converted to +* -ENOMEM, which is the default behaviour. +*/ + u32 propagate_enospc : 1; }; int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs,
Re: [PATCH v2 1/2] drm/ttm: Change ttm_device_init to use a struct instead of multiple bools
Am 02.10.24 um 14:24 schrieb Thomas Hellström: The ttm_device_init funcition uses multiple bool arguments. That means readability in the caller becomes poor, and all callers need to change if yet another bool is added. Instead use a struct with multiple single-bit flags. This addresses both problems. Prefer it over using defines or enums with explicit bit shifts, since converting to and from these bit values uses logical operations or tests which are implicit with the struct usage, and ofc type-checking. Ok, that style is new to me. I've mostly seen defined parameter flags in the kernel. It obviously has some advantages, but do we have any precedence in the kernel for using that approach? Regards, Christian. This is in preparation of adding yet another bool flag parameter to the function. Cc: Christian König Cc: amd-...@lists.freedesktop.org Cc: intel-gfx@lists.freedesktop.org Cc: nouv...@lists.freedesktop.org Cc: spice-de...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org Cc: Zack Rusin Cc: Cc: Sui Jingfeng Cc: Signed-off-by: Thomas Hellström --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 -- drivers/gpu/drm/drm_gem_vram_helper.c | 7 --- drivers/gpu/drm/i915/intel_region_ttm.c | 3 ++- drivers/gpu/drm/loongson/lsdc_ttm.c | 5 - drivers/gpu/drm/nouveau/nouveau_ttm.c | 7 +-- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 6 -- drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 16 +++ .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 3 ++- drivers/gpu/drm/ttm/tests/ttm_device_test.c | 16 --- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 20 --- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h | 6 ++ drivers/gpu/drm/ttm/ttm_device.c | 7 +++ drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++-- drivers/gpu/drm/xe/xe_device.c| 3 ++- include/drm/ttm/ttm_device.h | 12 ++- 16 files changed, 71 insertions(+), 52 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 74adb983ab03..e43635ac54fd 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1853,8 +1853,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev, adev_to_drm(adev)->anon_inode->i_mapping, adev_to_drm(adev)->vma_offset_manager, - adev->need_swiotlb, - dma_addressing_limited(adev->dev)); + (struct ttm_device_init_flags){ + .use_dma_alloc = adev->need_swiotlb, + .use_dma32 = dma_addressing_limited(adev->dev) + }); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r; diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 22b1fe9c03b8..7c3165b00378 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -931,9 +931,10 @@ static int drm_vram_mm_init(struct drm_vram_mm *vmm, struct drm_device *dev, vmm->vram_size = vram_size; ret = ttm_device_init(&vmm->bdev, &bo_driver, dev->dev, -dev->anon_inode->i_mapping, -dev->vma_offset_manager, -false, true); + dev->anon_inode->i_mapping, + dev->vma_offset_manager, + (struct ttm_device_init_flags) + {.use_dma32 = true}); if (ret) return ret; diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index 04525d92bec5..db34da63814c 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -34,7 +34,8 @@ int intel_region_ttm_device_init(struct drm_i915_private *dev_priv) return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(), drm->dev, drm->anon_inode->i_mapping, - drm->vma_offset_manager, false, false); + drm->vma_offset_manager, + (struct ttm_device_init_flags){}); } /** diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c index 2e42c6970c9f..c684f1636f3f 100644 --- a/drivers/gpu/drm/loongson/lsdc_ttm.c +++ b/drivers/gpu/drm/loongson/lsdc_ttm
Re: RESEND Re: [RFC PATCH] dma-buf/dma-fence: Use a successful read_trylock() annotation for dma_fence_begin_signalling()
Sorry, somehow completely missed that. Feel free to push it to drm-misc-next. Christian. Am 18.09.24 um 14:34 schrieb Thomas Hellström: Christian, Ping? On Wed, 2024-08-14 at 10:37 +0200, Thomas Hellström wrote: Christian, Ack to merge this through drm-misc-next, or do you want to pick it up for dma-buf? Thanks, Thomas On Wed, 2024-08-14 at 09:10 +0200, Daniel Vetter wrote: On Fri, May 26, 2023 at 01:11:28PM +0200, Thomas Hellström wrote: Daniel, On 4/28/23 14:52, Thomas Hellström wrote: Condsider the following call sequence: /* Upper layer */ dma_fence_begin_signalling(); lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); ... The driver might here use a utility that is annotated as intended for the dma-fence signalling critical path. Now if the upper layer isn't correctly annotated yet for whatever reason, resulting in /* Upper layer */ lock(tainted_shared_lock); /* Driver callback */ dma_fence_begin_signalling(); We will receive a false lockdep locking order violation notification from dma_fence_begin_signalling(). However entering a dma-fence signalling critical section itself doesn't block and could not cause a deadlock. So use a successful read_trylock() annotation instead for dma_fence_begin_signalling(). That will make sure that the locking order is correctly registered in the first case, and doesn't register any locking order in the second case. The alternative is of course to make sure that the "Upper layer" is always correctly annotated. But experience shows that's not easily achievable in all cases. Signed-off-by: Thomas Hellström Resurrecting the discussion on this one. I can't see a situation where we would miss *relevant* locking order violation warnings with this patch. Ofc if we have a scheduler annotation patch that would work fine as well, but the lack of annotation in the scheduler callbacks is really starting to hurt us. Yeah this is just a bit too brain-melting to review, but I concur now. Reviewed-by: Daniel Vetter I think what would help is some lockdep selftests to check that we both catch the stuff we want to, and don't incur false positives. Maybe with a plea that lockdep should have some native form of cross-release annotations ... But definitely seperate patch set, since it might take a few rounds of review by lockdep folks. -Sima Thanks, Thomas --- drivers/dma-buf/dma-fence.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma- fence.c index f177c56269bb..17f632768ef9 100644 --- a/drivers/dma-buf/dma-fence.c +++ b/drivers/dma-buf/dma-fence.c @@ -308,8 +308,8 @@ bool dma_fence_begin_signalling(void) if (in_atomic()) return true; - /* ... and non-recursive readlock */ - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _RET_IP_); + /* ... and non-recursive successful read_trylock */ + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _RET_IP_); return false; } @@ -340,7 +340,7 @@ void __dma_fence_might_wait(void) lock_map_acquire(&dma_fence_lockdep_map); lock_map_release(&dma_fence_lockdep_map); if (tmp) - lock_acquire(&dma_fence_lockdep_map, 0, 0, 1, 1, NULL, _THIS_IP_); + lock_acquire(&dma_fence_lockdep_map, 0, 1, 1, 1, NULL, _THIS_IP_); } #endif
Re: [PATCH v7 0/3] drm: Use full allocated minor range for DRM
Am 23.08.24 um 18:30 schrieb Michał Winiarski: 64 DRM device nodes is not enough for everyone. Upgrade it to ~512K (which definitely is more than enough). Additionally, convert minors to use XArray instead of IDR to simplify the locking. Corresponding libdrm changes were merged in: https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/305 Added my Acked-by and pushed the result to drm-misc-next. Regards, Christian. v1 -> v2: Don't touch DRM_MINOR_CONTROL and its range (Simon Ser) v2 -> v3: Don't use legacy scheme for >=192 minor range (Dave Airlie) Add modparam for testing (Dave Airlie) Add lockdep annotation for IDR (Daniel Vetter) v3 -> v4: Convert from IDR to XArray (Matthew Wilcox) v4 -> v5: Fixup IDR to XArray conversion (Matthew Wilcox) v5 -> v6: Also convert Accel to XArray Rename skip_legacy_minors to force_extended_minors v6 -> v7: Drop the force_extended_minors patch intended for debug Rebase on latest drm-tip Update the cover letter, pointing out libdrm changes Michał Winiarski (3): drm: Use XArray instead of IDR for minors accel: Use XArray instead of IDR for minors drm: Expand max DRM device number to full MINORBITS drivers/accel/drm_accel.c | 110 +++-- drivers/gpu/drm/drm_drv.c | 97 ++--- drivers/gpu/drm/drm_file.c | 2 +- drivers/gpu/drm/drm_internal.h | 4 -- include/drm/drm_accel.h| 18 +- include/drm/drm_file.h | 5 ++ 6 files changed, 62 insertions(+), 174 deletions(-)
Re: [PATCH 1/2] drm/ttm: fix kernel-doc typo for @trylock_only
Am 23.08.24 um 16:11 schrieb Jani Nikula: s/tryock_only/trylock_only/ Fixes: da966b82bf3d ("drm/ttm: Provide a generic LRU walker helper") Cc: Thomas Hellström Cc: Christian König Signed-off-by: Jani Nikula Reviewed-by: Christian König --- include/drm/ttm/ttm_bo.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index d1a732d56259..7294dde240fb 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -222,7 +222,7 @@ struct ttm_lru_walk { struct ttm_operation_ctx *ctx; /** @ticket: The struct ww_acquire_ctx if any. */ struct ww_acquire_ctx *ticket; - /** @tryock_only: Only use trylock for locking. */ + /** @trylock_only: Only use trylock for locking. */ bool trylock_only; };
Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function
Am 24.07.24 um 11:37 schrieb Matthew Auld: On 24/07/2024 02:35, Marek Olšák wrote: The reason is that our DCC requires 768K alignment in some cases. I haven't read this patch series, but one way to do that is to align to 256K, overallocate by 512K, and then not use either 0, 256K, or 512K at the beginning to get to 768K alignment. Ah, so we need a non power-of-two alignment. That makes sense, thanks. Well actually the requirement is that memory reads for scanout needs to be distributed over the memory channels in a certain way. Our hw guys just expressed that as a rather strange non-power of two alignment :) Christian. Marek On Tue, Jul 23, 2024, 11:04 Matthew Auld <mailto:matthew.a...@intel.com>> wrote: On 23/07/2024 14:43, Paneer Selvam, Arunpravin wrote: > Hi Matthew, > > Can we push this version for now as we need to mainline the DCC changes > ASAP, > while we continue our discussion and proceed to implement the permanent > solution > for address alignment? Yeah, we can always merge now and circle back around later, if this for sure helps your usecase and is needed asap. I just didn't fully get the idea for needing this interface, but likely I am missing something. > > Thanks, > Arun. > > On 7/23/2024 6:55 PM, Arunpravin Paneer Selvam wrote: >> - Add a new start parameter in trim function to specify exact >> address from where to start the trimming. This would help us >> in situations like if drivers would like to do address alignment >> for specific requirements. >> >> - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this >> flag to disable the allocator trimming part. This patch enables >> the drivers control trimming and they can do it themselves >> based on the application requirements. >> >> v1:(Matthew) >> - check new_start alignment with min chunk_size >> - use range_overflows() >> >> Signed-off-by: Arunpravin Paneer Selvam mailto:arunpravin.paneersel...@amd.com>> >> Acked-by: Alex Deucher mailto:alexander.deuc...@amd.com>> >> Acked-by: Christian König mailto:christian.koe...@amd.com>> >> --- >> drivers/gpu/drm/drm_buddy.c | 25 +++-- >> drivers/gpu/drm/xe/xe_ttm_vram_mgr.c | 2 +- >> include/drm/drm_buddy.h | 2 ++ >> 3 files changed, 26 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c >> index 6a8e45e9d0ec..103c185bb1c8 100644 >> --- a/drivers/gpu/drm/drm_buddy.c >> +++ b/drivers/gpu/drm/drm_buddy.c >> @@ -851,6 +851,7 @@ static int __alloc_contig_try_harder(struct >> drm_buddy *mm, >> * drm_buddy_block_trim - free unused pages >> * >> * @mm: DRM buddy manager >> + * @start: start address to begin the trimming. >> * @new_size: original size requested >> * @blocks: Input and output list of allocated blocks. >> * MUST contain single block as input to be trimmed. >> @@ -866,11 +867,13 @@ static int __alloc_contig_try_harder(struct >> drm_buddy *mm, >> * 0 on success, error code on failure. >> */ >> int drm_buddy_block_trim(struct drm_buddy *mm, >> + u64 *start, >> u64 new_size, >> struct list_head *blocks) >> { >> struct drm_buddy_block *parent; >> struct drm_buddy_block *block; >> + u64 block_start, block_end; >> LIST_HEAD(dfs); >> u64 new_start; >> int err; >> @@ -882,6 +885,9 @@ int drm_buddy_block_trim(struct drm_buddy *mm, >> struct drm_buddy_block, >> link); >> + block_start = drm_buddy_block_offset(block); >> + block_end = block_start + drm_buddy_block_size(mm, block); >> + >> if (WARN_ON(!drm_buddy_block_is_allocated(block))) >> return -EINVAL; >> @@ -894,6 +900,20 @@ int drm_buddy_block_trim(struct drm_buddy *mm, >> if (new_size == drm_buddy_block_size(mm, block)) >> return 0; >> + new_start = block_start; >> + if (start) { >> + new_start = *start; >> + >> + if (new_start < block_start) >> + return -EINVA
Re: [PATCH v6 2/2] drm/amdgpu: Add address alignment support to DCC buffers
Am 18.07.24 um 12:32 schrieb Arunpravin Paneer Selvam: Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nearest power of two number as the buddy allocator does not support non power of two alignments. This applies only to the contiguous DCC buffers. v3:(Alex) - rewrite the max texture channel caches comparison code in an algorithmic way to determine the alignment size. v4:(Alex) - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c and add a new gmc func callback for dcc alignment. If the callback is non-NULL, call it to get the alignment, otherwise, use the default. v5:(Alex) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. v6: - Fix checkpatch warning reported by Intel CI. v7:(Christian) - remove the AMDGPU_GEM_CREATE_GFX12_DCC flag and keep a flag that checks the BO pinning and for a specific hw generation. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König Reviewed-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 39 +++- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 15 3 files changed, 58 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index febca3130497..654d0548a3f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs { uint64_t addr, uint64_t *flags); /* get the amount of memory used by the vbios for pre-OS console */ unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev); + /* get the DCC buffer alignment */ + u64 (*get_dcc_alignment)(struct amdgpu_device *adev); enum amdgpu_memory_partition (*query_mem_partition_mode)( struct amdgpu_device *adev); @@ -363,6 +365,10 @@ struct amdgpu_gmc { (adev)->gmc.gmc_funcs->override_vm_pte_flags \ ((adev), (vm), (addr), (pte_flags)) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) +#define amdgpu_gmc_get_dcc_alignment(_adev) ({ \ + typeof(_adev) (adev) = (_adev); \ + ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev))); \ +}) /** * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the BAR diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index f91cc149d06c..ace9d61fc512 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -512,6 +512,17 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, vres->flags |= DRM_BUDDY_RANGE_ALLOCATION; remaining_size = (u64)vres->base.size; + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) || +amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) { I think you should move this check into gmc_v12_0_get_dcc_alignment. E.g. here you just check if adev->gmc.gmc_funcs->get_dcc_alignment is not NULL. Then call the function and if it returns a non zero value apply it. Regards, Christian. + u64 adjust_size; + + if (adev->gmc.gmc_funcs->get_dcc_alignment) { + adjust_size = amdgpu_gmc_get_dcc_alignment(adev); + remaining_size = roundup_pow_of_two(remaining_size + adjust_size); + vres->flags |= DRM_BUDDY_TRIM_DISABLE; + } + } mutex_lock(&mgr->lock); while (remaining_size) { @@ -521,8 +532,13 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size = mgr->default_page_size; size = remaining_size; - if ((size >= (u64)pages_per_block << PAGE_SHIFT) && - !(size & (((u64)pages_per_block << PAGE_SHIFT) - 1))) + + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) || +amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) + min_block_size = size; + else if ((size >= (u64)pages_per_block << PAGE_SHIFT) && +
Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers
As far as I know, yes. Regards, Christian. Am 17.07.24 um 16:38 schrieb Paneer Selvam, Arunpravin: Hi Christian, Can we use the below combination flags to kick in hardware workaround while pinning BO's for this specific hw generation. if (place->flags & TTM_PL_FLAG_CONTIGUOUS) && (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 0) || amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(12, 0, 1))) { } Regards, Arun. On 7/17/2024 2:38 PM, Christian König wrote: Well that approach was discussed before and seemed to be to complicated. But I totally agree that the AMDGPU_GEM_CREATE_GFX12_DCC flag is a bad idea. This isn't anything userspace should need to specify in the first place. All we need is a hardware workaround which kicks in all the time while pinning BOs for this specific hw generation and texture channel configuration. Please remove the AMDGPU_GEM_CREATE_GFX12_DCC flag again if possible or specify why it is actually necessary? Regards, Christian. Am 17.07.24 um 05:44 schrieb Marek Olšák: AMDGPU_GEM_CREATE_GFX12_DCC is set on 90% of all memory allocations, and almost all of them are not displayable. Shouldn't we use a different way to indicate that we need a non-power-of-two alignment, such as looking at the alignment field directly? Marek On Tue, Jul 16, 2024, 11:45 Arunpravin Paneer Selvam wrote: Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nearest power of two number as the buddy allocator does not support non power of two alignments. This applies only to the contiguous DCC buffers. v3:(Alex) - rewrite the max texture channel caches comparison code in an algorithmic way to determine the alignment size. v4:(Alex) - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c and add a new gmc func callback for dcc alignment. If the callback is non-NULL, call it to get the alignment, otherwise, use the default. v5:(Alex) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. v6: - Fix checkpatch error reported by Intel CI. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König Reviewed-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 36 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 15 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index febca3130497..654d0548a3f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs { uint64_t addr, uint64_t *flags); /* get the amount of memory used by the vbios for pre-OS console */ unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev); + /* get the DCC buffer alignment */ + u64 (*get_dcc_alignment)(struct amdgpu_device *adev); enum amdgpu_memory_partition (*query_mem_partition_mode)( struct amdgpu_device *adev); @@ -363,6 +365,10 @@ struct amdgpu_gmc { (adev)->gmc.gmc_funcs->override_vm_pte_flags \ ((adev), (vm), (addr), (pte_flags)) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) +#define amdgpu_gmc_get_dcc_alignment(_adev) ({ \ + typeof(_adev) (adev) = (_adev); \ + ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev))); \ +}) /** * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the BAR diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index f91cc149d06c..aa9dca12371c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -512,6 +512,16 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, vres->flags |= DRM_BUDDY_RANGE_ALLOCATION; remaining_size = (u64)vres->base.size; + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) { + u64 adjust_size; + + if (adev->gmc.gmc_funcs->get_dcc_align
Re: [PATCH v5 2/2] drm/amdgpu: Add address alignment support to DCC buffers
Well that approach was discussed before and seemed to be to complicated. But I totally agree that the AMDGPU_GEM_CREATE_GFX12_DCC flag is a bad idea. This isn't anything userspace should need to specify in the first place. All we need is a hardware workaround which kicks in all the time while pinning BOs for this specific hw generation and texture channel configuration. Please remove the AMDGPU_GEM_CREATE_GFX12_DCC flag again if possible or specify why it is actually necessary? Regards, Christian. Am 17.07.24 um 05:44 schrieb Marek Olšák: AMDGPU_GEM_CREATE_GFX12_DCC is set on 90% of all memory allocations, and almost all of them are not displayable. Shouldn't we use a different way to indicate that we need a non-power-of-two alignment, such as looking at the alignment field directly? Marek On Tue, Jul 16, 2024, 11:45 Arunpravin Paneer Selvam wrote: Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nearest power of two number as the buddy allocator does not support non power of two alignments. This applies only to the contiguous DCC buffers. v3:(Alex) - rewrite the max texture channel caches comparison code in an algorithmic way to determine the alignment size. v4:(Alex) - Move the logic from amdgpu_vram_mgr_dcc_alignment() to gmc_v12_0.c and add a new gmc func callback for dcc alignment. If the callback is non-NULL, call it to get the alignment, otherwise, use the default. v5:(Alex) - Set the Alignment to a default value if the callback doesn't exist. - Add the callback to amdgpu_gmc_funcs. v6: - Fix checkpatch error reported by Intel CI. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Alex Deucher Acked-by: Christian König Reviewed-by: Frank Min --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 36 ++-- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 15 3 files changed, 55 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h index febca3130497..654d0548a3f8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h @@ -156,6 +156,8 @@ struct amdgpu_gmc_funcs { uint64_t addr, uint64_t *flags); /* get the amount of memory used by the vbios for pre-OS console */ unsigned int (*get_vbios_fb_size)(struct amdgpu_device *adev); + /* get the DCC buffer alignment */ + u64 (*get_dcc_alignment)(struct amdgpu_device *adev); enum amdgpu_memory_partition (*query_mem_partition_mode)( struct amdgpu_device *adev); @@ -363,6 +365,10 @@ struct amdgpu_gmc { (adev)->gmc.gmc_funcs->override_vm_pte_flags \ ((adev), (vm), (addr), (pte_flags)) #define amdgpu_gmc_get_vbios_fb_size(adev) (adev)->gmc.gmc_funcs->get_vbios_fb_size((adev)) +#define amdgpu_gmc_get_dcc_alignment(_adev) ({ \ + typeof(_adev) (adev) = (_adev); \ + ((adev)->gmc.gmc_funcs->get_dcc_alignment((adev))); \ +}) /** * amdgpu_gmc_vram_full_visible - Check if full VRAM is visible through the BAR diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index f91cc149d06c..aa9dca12371c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -512,6 +512,16 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, vres->flags |= DRM_BUDDY_RANGE_ALLOCATION; remaining_size = (u64)vres->base.size; + if (bo->flags & AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS && + bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC) { + u64 adjust_size; + + if (adev->gmc.gmc_funcs->get_dcc_alignment) { + adjust_size = amdgpu_gmc_get_dcc_alignment(adev); + remaining_size = roundup_pow_of_two(remaining_size + adjust_size); + vres->flags |= DRM_BUDDY_TRIM_DISABLE; + } + } mutex_lock(&mgr->lock); while (remaining_size) { @@ -521,8 +531,12 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size = mgr->default_page_size;
Re: [PATCH 1/2] drm: Add might_fault to drm_modeset_lock priming
Am 10.07.24 um 13:58 schrieb Daniel Vetter: On Wed, 10 Jul 2024 at 13:39, Christian König wrote: Am 10.07.24 um 11:31 schrieb Daniel Vetter: We already teach lockdep that dma_resv nests within drm_modeset_lock, but there's a lot more: All drm kms ioctl rely on being able to put/get_user while holding modeset locks, so we really need a might_fault in there too to complete the picture. Add it. Mhm, lockdep should be able to deduce that when there might be faults under the dma_resv lock there might also be faults under the drm_modeset_lock. You're not allowed to take a fault under dma_resv, because drivers might need to take that lock to handle faults. So unfortunately in our combined lockdep priming, there really seems to be no chain yet that teaches about faults possibly happening while holding drm_modeset_lock. Ah, of course! You are right, it was just the other way around. Thanks, Christian. -Sima Motivated by a syzbot report that blew up on bcachefs doing an unconditional console_lock way deep in the locking hierarchy, and lockdep only noticing the depency loop in a drm ioctl instead of much earlier. This annotation will make sure such issues have a much harder time escaping. References: https://lore.kernel.org/dri-devel/73db8b061cd43...@google.com/ Signed-off-by: Daniel Vetter Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: Sumit Semwal Cc: "Christian König" Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org On the other hand pointing it out explicitly doesn't hurts us at all, so Reviewed-by: Christian König . Regards, Christian. --- drivers/gpu/drm/drm_mode_config.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c index 568972258222..37d2e0a4ef4b 100644 --- a/drivers/gpu/drm/drm_mode_config.c +++ b/drivers/gpu/drm/drm_mode_config.c @@ -456,6 +456,8 @@ int drmm_mode_config_init(struct drm_device *dev) if (ret == -EDEADLK) ret = drm_modeset_backoff(&modeset_ctx); + might_fault(); + ww_acquire_init(&resv_ctx, &reservation_ww_class); ret = dma_resv_lock(&resv, &resv_ctx); if (ret == -EDEADLK)
Re: [PATCH 1/2] drm: Add might_fault to drm_modeset_lock priming
Am 10.07.24 um 11:31 schrieb Daniel Vetter: We already teach lockdep that dma_resv nests within drm_modeset_lock, but there's a lot more: All drm kms ioctl rely on being able to put/get_user while holding modeset locks, so we really need a might_fault in there too to complete the picture. Add it. Mhm, lockdep should be able to deduce that when there might be faults under the dma_resv lock there might also be faults under the drm_modeset_lock. Motivated by a syzbot report that blew up on bcachefs doing an unconditional console_lock way deep in the locking hierarchy, and lockdep only noticing the depency loop in a drm ioctl instead of much earlier. This annotation will make sure such issues have a much harder time escaping. References: https://lore.kernel.org/dri-devel/73db8b061cd43...@google.com/ Signed-off-by: Daniel Vetter Cc: Maarten Lankhorst Cc: Maxime Ripard Cc: Thomas Zimmermann Cc: Sumit Semwal Cc: "Christian König" Cc: linux-me...@vger.kernel.org Cc: linaro-mm-...@lists.linaro.org On the other hand pointing it out explicitly doesn't hurts us at all, so Reviewed-by: Christian König . Regards, Christian. --- drivers/gpu/drm/drm_mode_config.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/drm_mode_config.c b/drivers/gpu/drm/drm_mode_config.c index 568972258222..37d2e0a4ef4b 100644 --- a/drivers/gpu/drm/drm_mode_config.c +++ b/drivers/gpu/drm/drm_mode_config.c @@ -456,6 +456,8 @@ int drmm_mode_config_init(struct drm_device *dev) if (ret == -EDEADLK) ret = drm_modeset_backoff(&modeset_ctx); + might_fault(); + ww_acquire_init(&resv_ctx, &reservation_ww_class); ret = dma_resv_lock(&resv, &resv_ctx); if (ret == -EDEADLK)
Re: [PATCH 2/6] drm/ttm: Store the bo_kmap_type in struct iosys_map
Hi, Am 17.06.24 um 14:32 schrieb Thomas Zimmermann: Hi Am 14.06.24 um 16:31 schrieb Christian König: Am 14.06.24 um 15:21 schrieb Thomas Zimmermann: For each instances of struct iosys_map set up by ttm_bo_vmap(), store the type of allocation in the instance. Use this information to unmap the memory in ttm_bo_vunmap(). This change simplifies the unmap code and puts the complicated logic entirely into the map code. I'm not sure that's a good idea. The mapping information should already be available in the resource and storing it in the iosys_map structures duplicates that information. So we might run into the issue that the resource has changed and so we need a different approach now, but the iosys_map will say that we should unmap things for example. Patches 1 and 2 are only here to make patch 4 (add the kmap special case) work. How can I distinguish between vmap'ed and kmap'ed memory? It's not clear to me, whether there is a benefit from patch 4. The xe driver makes it sound like that, but the rest of the kernel appears to disagree. Yeah, exactly that's the point. The last time we talked about that we came to the conclusion that the kmap approach of mapping only a single page or range of pages isn't that useful in general. The only use case where you actually need this is the ttm_bo_vm_access_kmap() helper and that is static and internal to TTM. So what exactly is the use case xe tries to handle here? Regards, Christian. Best regards Thomas Regards, Christian. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/ttm/ttm_bo_util.c | 46 +-- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 0b3f4267130c4..a9df0deff2deb 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -36,6 +36,7 @@ #include #include +#include struct ttm_transfer_obj { struct ttm_buffer_object base; @@ -479,24 +480,29 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) if (mem->bus.is_iomem) { void __iomem *vaddr_iomem; + u16 alloc_flags; - if (mem->bus.addr) + if (mem->bus.addr) { vaddr_iomem = (void __iomem *)mem->bus.addr; - else if (mem->bus.caching == ttm_write_combined) - vaddr_iomem = ioremap_wc(mem->bus.offset, - bo->base.size); + alloc_flags = ttm_bo_map_premapped; + } else if (mem->bus.caching == ttm_write_combined) { + vaddr_iomem = ioremap_wc(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; #ifdef CONFIG_X86 - else if (mem->bus.caching == ttm_cached) - vaddr_iomem = ioremap_cache(mem->bus.offset, - bo->base.size); + } else if (mem->bus.caching == ttm_cached) { + vaddr_iomem = ioremap_cache(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; #endif - else + } else { vaddr_iomem = ioremap(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; + } if (!vaddr_iomem) return -ENOMEM; iosys_map_set_vaddr_iomem(map, vaddr_iomem); + map->alloc_flags = alloc_flags; } else { struct ttm_operation_ctx ctx = { @@ -506,6 +512,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) struct ttm_tt *ttm = bo->ttm; pgprot_t prot; void *vaddr; + u16 alloc_flags; ret = ttm_tt_populate(bo->bdev, ttm, &ctx); if (ret) @@ -519,8 +526,10 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) vaddr = vmap(ttm->pages, ttm->num_pages, 0, prot); if (!vaddr) return -ENOMEM; + alloc_flags = ttm_bo_map_vmap; iosys_map_set_vaddr(map, vaddr); + map->alloc_flags = alloc_flags; } return 0; @@ -537,20 +546,27 @@ EXPORT_SYMBOL(ttm_bo_vmap); */ void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map) { - struct ttm_resource *mem = bo->resource; - dma_resv_assert_held(bo->base.resv); if (iosys_map_is_null(map)) return; - if (!map->is_iomem) - vunmap(map->vaddr); - else if (!mem->bus.addr) + switch (map->alloc_flags) { + case ttm_bo_map_iomap: iounmap(map->vaddr_iomem); - iosys_map_clear(map); - + break; + case ttm_bo_map_vmap: + vunmap(map->vaddr); + break; + case ttm_bo_map_premapped: + break; + default: + drm_err(bo->base.dev, "Unsupported alloc_flags 0x%x\n", map->alloc_flags); + return; + } ttm_mem_io_free(bo->bdev, bo->resource); + + iosys_map_clear(map); } EXPORT_SYMBOL(ttm_bo_vunmap);
Re: [PATCH 3/6] drm/ttm: Support partial buffer mappings for ttm_bo_vmap()
Am 14.06.24 um 15:21 schrieb Thomas Zimmermann: Add offset and size parameters to ttm_bo_vmap() to allow for partial mappings of a buffer object. This brings the functionality on par with ttm_bo_kmap(). Well the long term plan was to remove this functionality from ttm_bo_kmap() and nuke that function sooner or later. What exactly is the use case for partial mappings? Regards, Christian. Callers pass the byte offset and size within the buffer object and receive a page-aligned mapping of the buffer object's memory for the specified area. Also update all callers of ttm_bo_vmap() for the new parameters. As before, existing callers map the buffer object's complete memory. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/drm_gem_ttm_helper.c | 2 +- drivers/gpu/drm/drm_gem_vram_helper.c | 2 +- drivers/gpu/drm/loongson/lsdc_gem.c | 2 +- drivers/gpu/drm/qxl/qxl_object.c | 2 +- drivers/gpu/drm/ttm/ttm_bo_util.c | 21 +++-- drivers/gpu/drm/xe/xe_lrc.c | 2 +- drivers/gpu/drm/xe/xe_vm.c| 2 +- include/drm/ttm/ttm_bo.h | 4 +++- 8 files changed, 24 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/drm_gem_ttm_helper.c b/drivers/gpu/drm/drm_gem_ttm_helper.c index 3734aa2d1c5b5..f26b7c9077a68 100644 --- a/drivers/gpu/drm/drm_gem_ttm_helper.c +++ b/drivers/gpu/drm/drm_gem_ttm_helper.c @@ -67,7 +67,7 @@ int drm_gem_ttm_vmap(struct drm_gem_object *gem, { struct ttm_buffer_object *bo = drm_gem_ttm_of_gem(gem); - return ttm_bo_vmap(bo, map); + return ttm_bo_vmap(bo, 0, gem->size, map); } EXPORT_SYMBOL(drm_gem_ttm_vmap); diff --git a/drivers/gpu/drm/drm_gem_vram_helper.c b/drivers/gpu/drm/drm_gem_vram_helper.c index 6027584406af6..1670f9a459a9d 100644 --- a/drivers/gpu/drm/drm_gem_vram_helper.c +++ b/drivers/gpu/drm/drm_gem_vram_helper.c @@ -398,7 +398,7 @@ int drm_gem_vram_vmap(struct drm_gem_vram_object *gbo, struct iosys_map *map) * no mapping present. */ if (iosys_map_is_null(&gbo->map)) { - ret = ttm_bo_vmap(&gbo->bo, &gbo->map); + ret = ttm_bo_vmap(&gbo->bo, 0, gbo->bo.base.size, &gbo->map); if (ret) return ret; } diff --git a/drivers/gpu/drm/loongson/lsdc_gem.c b/drivers/gpu/drm/loongson/lsdc_gem.c index a720d8f532093..f709960c781b9 100644 --- a/drivers/gpu/drm/loongson/lsdc_gem.c +++ b/drivers/gpu/drm/loongson/lsdc_gem.c @@ -77,7 +77,7 @@ static int lsdc_gem_object_vmap(struct drm_gem_object *obj, struct iosys_map *ma return ret; } - ret = ttm_bo_vmap(tbo, &lbo->map); + ret = ttm_bo_vmap(tbo, 0, tbo->base.size, &lbo->map); if (ret) { drm_err(obj->dev, "ttm bo vmap failed\n"); lsdc_bo_unpin(lbo); diff --git a/drivers/gpu/drm/qxl/qxl_object.c b/drivers/gpu/drm/qxl/qxl_object.c index 5893e27a7ae50..9f06d5e26a32c 100644 --- a/drivers/gpu/drm/qxl/qxl_object.c +++ b/drivers/gpu/drm/qxl/qxl_object.c @@ -164,7 +164,7 @@ int qxl_bo_vmap_locked(struct qxl_bo *bo, struct iosys_map *map) goto out; } - r = ttm_bo_vmap(&bo->tbo, &bo->map); + r = ttm_bo_vmap(&bo->tbo, 0, bo->tbo.base.size, &bo->map); if (r) { qxl_bo_unpin_locked(bo); return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index a9df0deff2deb..31f9772f05dac 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -457,17 +457,23 @@ EXPORT_SYMBOL(ttm_bo_kunmap); * ttm_bo_vmap * * @bo: The buffer object. + * @offset: Byte offset into the buffer. + * @size: Number of bytes to map. * @map: pointer to a struct iosys_map representing the map. * * Sets up a kernel virtual mapping, using ioremap or vmap to the * data in the buffer object. The parameter @map returns the virtual * address as struct iosys_map. Unmap the buffer with ttm_bo_vunmap(). + * The address stored in @map will be aligned to the next lower page + * boundaries. * * Returns * -ENOMEM: Out of memory. * -EINVAL: Invalid range. */ -int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) +int ttm_bo_vmap(struct ttm_buffer_object *bo, + unsigned long offset, unsigned long size, + struct iosys_map *map) { struct ttm_resource *mem = bo->resource; int ret; @@ -483,18 +489,18 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) u16 alloc_flags; if (mem->bus.addr) { - vaddr_iomem = (void __iomem *)mem->bus.addr; + vaddr_iomem = (u8 __iomem *)mem->bus.addr + offset; alloc_flags = ttm_bo_map_premapped; } else if (mem->bus.caching == ttm_write_combined) { - vaddr_iomem = ioremap_wc(mem->bus.offset, bo->base.size); +
Re: [PATCH 2/6] drm/ttm: Store the bo_kmap_type in struct iosys_map
Am 14.06.24 um 15:21 schrieb Thomas Zimmermann: For each instances of struct iosys_map set up by ttm_bo_vmap(), store the type of allocation in the instance. Use this information to unmap the memory in ttm_bo_vunmap(). This change simplifies the unmap code and puts the complicated logic entirely into the map code. I'm not sure that's a good idea. The mapping information should already be available in the resource and storing it in the iosys_map structures duplicates that information. So we might run into the issue that the resource has changed and so we need a different approach now, but the iosys_map will say that we should unmap things for example. Regards, Christian. Signed-off-by: Thomas Zimmermann --- drivers/gpu/drm/ttm/ttm_bo_util.c | 46 +-- 1 file changed, 31 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c b/drivers/gpu/drm/ttm/ttm_bo_util.c index 0b3f4267130c4..a9df0deff2deb 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_util.c +++ b/drivers/gpu/drm/ttm/ttm_bo_util.c @@ -36,6 +36,7 @@ #include #include +#include struct ttm_transfer_obj { struct ttm_buffer_object base; @@ -479,24 +480,29 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) if (mem->bus.is_iomem) { void __iomem *vaddr_iomem; + u16 alloc_flags; - if (mem->bus.addr) + if (mem->bus.addr) { vaddr_iomem = (void __iomem *)mem->bus.addr; - else if (mem->bus.caching == ttm_write_combined) - vaddr_iomem = ioremap_wc(mem->bus.offset, -bo->base.size); + alloc_flags = ttm_bo_map_premapped; + } else if (mem->bus.caching == ttm_write_combined) { + vaddr_iomem = ioremap_wc(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; #ifdef CONFIG_X86 - else if (mem->bus.caching == ttm_cached) - vaddr_iomem = ioremap_cache(mem->bus.offset, - bo->base.size); + } else if (mem->bus.caching == ttm_cached) { + vaddr_iomem = ioremap_cache(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; #endif - else + } else { vaddr_iomem = ioremap(mem->bus.offset, bo->base.size); + alloc_flags = ttm_bo_map_iomap; + } if (!vaddr_iomem) return -ENOMEM; iosys_map_set_vaddr_iomem(map, vaddr_iomem); + map->alloc_flags = alloc_flags; } else { struct ttm_operation_ctx ctx = { @@ -506,6 +512,7 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) struct ttm_tt *ttm = bo->ttm; pgprot_t prot; void *vaddr; + u16 alloc_flags; ret = ttm_tt_populate(bo->bdev, ttm, &ctx); if (ret) @@ -519,8 +526,10 @@ int ttm_bo_vmap(struct ttm_buffer_object *bo, struct iosys_map *map) vaddr = vmap(ttm->pages, ttm->num_pages, 0, prot); if (!vaddr) return -ENOMEM; + alloc_flags = ttm_bo_map_vmap; iosys_map_set_vaddr(map, vaddr); + map->alloc_flags = alloc_flags; } return 0; @@ -537,20 +546,27 @@ EXPORT_SYMBOL(ttm_bo_vmap); */ void ttm_bo_vunmap(struct ttm_buffer_object *bo, struct iosys_map *map) { - struct ttm_resource *mem = bo->resource; - dma_resv_assert_held(bo->base.resv); if (iosys_map_is_null(map)) return; - if (!map->is_iomem) - vunmap(map->vaddr); - else if (!mem->bus.addr) + switch (map->alloc_flags) { + case ttm_bo_map_iomap: iounmap(map->vaddr_iomem); - iosys_map_clear(map); - + break; + case ttm_bo_map_vmap: + vunmap(map->vaddr); + break; + case ttm_bo_map_premapped: + break; + default: + drm_err(bo->base.dev, "Unsupported alloc_flags 0x%x\n", map->alloc_flags); + return; + } ttm_mem_io_free(bo->bdev, bo->resource); + + iosys_map_clear(map); } EXPORT_SYMBOL(ttm_bo_vunmap);
Re: [PATCH V2] drm/ttm: remove unused paramter
Am 01.04.24 um 05:04 schrieb jesse.zh...@amd.com: From: Jesse Zhang remove the unsed the paramter in the function ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence. V2:rebase the patch on top of drm-misc-next (Christian) And pushed to drm-misc-next. Thanks, Christian. Signed-off-by: Jesse Zhang Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index e059b1e1b13b..6396dece0db1 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_put); static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, -struct ttm_resource **mem, struct ttm_operation_ctx *ctx, struct ttm_place *hop) { @@ -469,7 +468,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, if (ret != -EMULTIHOP) break; - ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); } while (!ret); if (ret) { @@ -698,7 +697,6 @@ EXPORT_SYMBOL(ttm_bo_unpin); */ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, struct ttm_resource_manager *man, -struct ttm_resource *mem, bool no_wait_gpu) { struct dma_fence *fence; @@ -787,7 +785,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, if (ret) continue; - ret = ttm_bo_add_move_fence(bo, man, *res, ctx->no_wait_gpu); + ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, res); if (ret == -EBUSY) @@ -894,7 +892,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, bounce: ret = ttm_bo_handle_move_mem(bo, res, false, ctx, &hop); if (ret == -EMULTIHOP) { - ret = ttm_bo_bounce_temp_buffer(bo, &res, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); /* try and move to final place now. */ if (!ret) goto bounce;
Re: [PATCH] drm/ttm: remove unused paramter
Am 29.03.24 um 12:10 schrieb Christian König: Am 25.03.24 um 08:45 schrieb Jesse Zhang: remove the unsed the paramter in the function ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence. Signed-off-by: Jesse Zhang Good catch, Reviewed-by: Christian König Please rebase that patch on top of drm-misc-next. Regards, Christian. Regards, Christian. --- drivers/gpu/drm/ttm/ttm_bo.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..7f08787687a7 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_put); static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, - struct ttm_resource **mem, struct ttm_operation_ctx *ctx, struct ttm_place *hop) { @@ -470,7 +469,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, if (ret != -EMULTIHOP) break; - ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); } while (!ret); if (ret) { @@ -699,7 +698,6 @@ EXPORT_SYMBOL(ttm_bo_unpin); */ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, struct ttm_resource_manager *man, - struct ttm_resource *mem, bool no_wait_gpu) { struct dma_fence *fence; @@ -753,7 +751,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, return ret; } while (1); - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); + return ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); } /** @@ -802,7 +800,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (unlikely(ret)) goto error; - ret = ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); + ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, mem); if (ret == -EBUSY) @@ -866,7 +864,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo, bounce: ret = ttm_bo_handle_move_mem(bo, mem, false, ctx, &hop); if (ret == -EMULTIHOP) { - ret = ttm_bo_bounce_temp_buffer(bo, &mem, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); if (ret) goto out; /* try and move to final place now. */
Re: [PATCH] drm/ttm: remove unused paramter
Am 25.03.24 um 08:45 schrieb Jesse Zhang: remove the unsed the paramter in the function ttm_bo_bounce_temp_buffer and ttm_bo_add_move_fence. Signed-off-by: Jesse Zhang Good catch, Reviewed-by: Christian König Regards, Christian. --- drivers/gpu/drm/ttm/ttm_bo.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..7f08787687a7 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -402,7 +402,6 @@ void ttm_bo_put(struct ttm_buffer_object *bo) EXPORT_SYMBOL(ttm_bo_put); static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, -struct ttm_resource **mem, struct ttm_operation_ctx *ctx, struct ttm_place *hop) { @@ -470,7 +469,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, if (ret != -EMULTIHOP) break; - ret = ttm_bo_bounce_temp_buffer(bo, &evict_mem, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); } while (!ret); if (ret) { @@ -699,7 +698,6 @@ EXPORT_SYMBOL(ttm_bo_unpin); */ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, struct ttm_resource_manager *man, -struct ttm_resource *mem, bool no_wait_gpu) { struct dma_fence *fence; @@ -753,7 +751,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, return ret; } while (1); - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); + return ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); } /** @@ -802,7 +800,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, if (unlikely(ret)) goto error; - ret = ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); + ret = ttm_bo_add_move_fence(bo, man, ctx->no_wait_gpu); if (unlikely(ret)) { ttm_resource_free(bo, mem); if (ret == -EBUSY) @@ -866,7 +864,7 @@ static int ttm_bo_move_buffer(struct ttm_buffer_object *bo, bounce: ret = ttm_bo_handle_move_mem(bo, mem, false, ctx, &hop); if (ret == -EMULTIHOP) { - ret = ttm_bo_bounce_temp_buffer(bo, &mem, ctx, &hop); + ret = ttm_bo_bounce_temp_buffer(bo, ctx, &hop); if (ret) goto out; /* try and move to final place now. */
Re: [PATCH v2 09/16] drm/ttm: fix ttm_bo.h kernel-doc warnings
Am 08.03.24 um 12:55 schrieb Jani Nikula: Some renames, some formatting fixes, add some missing documentation. Cc: Christian Koenig Cc: Huang Rui Acked-by: Thomas Zimmermann Signed-off-by: Jani Nikula --- include/drm/ttm/ttm_bo.h | 18 -- 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h index 0223a41a64b2..59151ecb2db3 100644 --- a/include/drm/ttm/ttm_bo.h +++ b/include/drm/ttm/ttm_bo.h @@ -83,6 +83,9 @@ enum ttm_bo_type { * @resource: structure describing current placement. * @ttm: TTM structure holding system pages. * @deleted: True if the object is only a zombie and already deleted. + * @bulk_move: The bulk move object. + * @priority: Priority for LRU, BOs with lower priority are evicted first. + * @pin_count: Pin count. * * Base class for TTM buffer object, that deals with data placement and CPU * mappings. GPU mappings are really up to the driver, but for simpler GPUs @@ -128,26 +131,28 @@ struct ttm_buffer_object { struct work_struct delayed_delete; /** -* Special members that are protected by the reserve lock -* and the bo::lock when written to. Can be read with -* either of these locks held. +* @sg: Special members that are protected by the reserve lock and the +* bo::lock when written to. Can be read with either of these locks +* held. Actually that is completely outdated since the bo::lock was removed years ago. I would just write that as "@sg: external source of pages and DMA addresses, protected by the reservation lock." (or something like this). With that fixed feel free to add Reviewed-by: Christian König to this patch and the other TTM cleanup patches in this series. Regards, Christian. */ struct sg_table *sg; }; +#define TTM_BO_MAP_IOMEM_MASK 0x80 + /** * struct ttm_bo_kmap_obj * * @virtual: The current kernel virtual address. * @page: The page when kmap'ing a single page. * @bo_kmap_type: Type of bo_kmap. + * @bo: The TTM BO. * * Object describing a kernel mapping. Since a TTM bo may be located * in various memory types with various caching policies, the * mapping can either be an ioremap, a vmap, a kmap or part of a * premapped region. */ -#define TTM_BO_MAP_IOMEM_MASK 0x80 struct ttm_bo_kmap_obj { void *virtual; struct page *page; @@ -171,6 +176,7 @@ struct ttm_bo_kmap_obj { * @force_alloc: Don't check the memory account during suspend or CPU page * faults. Should only be used by TTM internally. * @resv: Reservation object to allow reserved evictions with. + * @bytes_moved: Statistics on how many bytes have been moved. * * Context for TTM operations like changing buffer placement or general memory * allocation. @@ -264,7 +270,7 @@ static inline int ttm_bo_reserve(struct ttm_buffer_object *bo, * ttm_bo_reserve_slowpath: * @bo: A pointer to a struct ttm_buffer_object. * @interruptible: Sleep interruptible if waiting. - * @sequence: Set (@bo)->sequence to this value after lock + * @ticket: Ticket used to acquire the ww_mutex. * * This is called after ttm_bo_reserve returns -EAGAIN and we backed off * from all our other reservations. Because there are no other reservations @@ -303,7 +309,7 @@ static inline void ttm_bo_assign_mem(struct ttm_buffer_object *bo, } /** - * ttm_bo_move_null = assign memory for a buffer object. + * ttm_bo_move_null - assign memory for a buffer object. * @bo: The bo to assign the memory to * @new_mem: The memory to be assigned. *
Re: [PATCH v8 3/3] drm/buddy: Add user for defragmentation
Am 05.03.24 um 12:14 schrieb Paneer Selvam, Arunpravin: On 3/5/2024 4:33 PM, Paneer Selvam, Arunpravin wrote: Hi Christian, On 3/4/2024 10:09 PM, Christian König wrote: Am 04.03.24 um 17:32 schrieb Arunpravin Paneer Selvam: Add amdgpu driver as user for the drm buddy defragmentation. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++-- drivers/gpu/drm/drm_buddy.c | 1 + 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..cff8a526c622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size, &vres->blocks, vres->flags); - if (unlikely(r)) - goto error_free_blocks; + if (unlikely(r)) { + if (r == -ENOSPC) { + drm_buddy_defrag(mm, min_block_size); + r = drm_buddy_alloc_blocks(mm, fpfn, + lpfn, + size, + min_block_size, + &vres->blocks, + vres->flags); That doesn't looks like something we should do. We might fallback when contiguous memory is requested, but certainly not on normal allocation failure. yes, defrag here not useful for normal allocations. But worried about the bigger min_block_size normal allocations. In such cases, I think we should move this drm_buddy_defrag() call into buddy allocator file. For example if the required size is 1024KiB and if min_block_size is 256KiB, the allocator first tries to find the 1024KiB block, when there is no single 1024KiB block, the allocator goes one level below in freelist and tries to search for two 512KiB blocks and goes on. At one point of time if we have less space, we might go further levels below to search four 256KiB blocks to satisfy the request. Assuming if the allocator cannot find the first 256KiB block, that time I think we might need to merge the two 128KiB blocks through defragmentation function. And again for the second 256KiB block, we might need to call the defragmentation again to merge two 128KiB blocks or four 64KiB blocks to form minimum alignment size of 256KiB. This goes on for the third and fourth 256KiB blocks to complete the required size allocation of 1024KiB. Please let me know if my understanding is not correct. I don't think we should do that. We essentially have to support two different use cases: 1. Non contiguous allocation with 2MiB min_block_size for everything larger than 2MiB. Using a block size as large as possible is desirable, but not something we enforce. 2. Contiguous allocations for display, firmware etc.. Here we need to enforce a large block size and can live with the additional overhead caused by force merging. As you have suggested we can also rename this as force merge or some other names. Yeah, but just an suggestion. You are way deeper in the code and handling than I'm, so feel free to name it whatever you think fits best. Regards, Christian. Thanks, Arun. Thanks, Arun. Regards, Christian. + if (unlikely(r)) + goto error_free_blocks; + } else { + goto error_free_blocks; + } + } if (size > remaining_size) remaining_size = 0; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 40131ed9b0cd..19440f8caec0 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -396,6 +396,7 @@ void drm_buddy_defrag(struct drm_buddy *mm, } } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block
Re: [PATCH v8 3/3] drm/buddy: Add user for defragmentation
Am 04.03.24 um 17:32 schrieb Arunpravin Paneer Selvam: Add amdgpu driver as user for the drm buddy defragmentation. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++-- drivers/gpu/drm/drm_buddy.c | 1 + 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..cff8a526c622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size, &vres->blocks, vres->flags); - if (unlikely(r)) - goto error_free_blocks; + if (unlikely(r)) { + if (r == -ENOSPC) { + drm_buddy_defrag(mm, min_block_size); + r = drm_buddy_alloc_blocks(mm, fpfn, + lpfn, + size, + min_block_size, + &vres->blocks, + vres->flags); That doesn't looks like something we should do. We might fallback when contiguous memory is requested, but certainly not on normal allocation failure. Regards, Christian. + if (unlikely(r)) + goto error_free_blocks; + } else { + goto error_free_blocks; + } + } if (size > remaining_size) remaining_size = 0; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 40131ed9b0cd..19440f8caec0 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -396,6 +396,7 @@ void drm_buddy_defrag(struct drm_buddy *mm, } } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block
Re: [PATCH] drm/i915: fix applying placement flag
Am 01.03.24 um 17:04 schrieb Lucas De Marchi: On Thu, Feb 29, 2024 at 02:01:05PM +0100, Christian König wrote: Gentle ping. Can I get an rb for that? Thanks, Christian. Reviewed-by: Lucas De Marchi Thanks! For some reason CI failed, but can't be related with this change. I re-triggered it to see if we can get a green run before merging. Do you want to pick it into a i915 branch or should I push it to drm-misc-next(-fixes) then? Christian. thanks Lucas De Marchi Am 26.02.24 um 15:27 schrieb Christian König: Switching from a separate list to flags introduced a bug here. We were accidentially ORing the flag before initailizing the placement and not after. So this code didn't do nothing except producing a warning. Signed-off-by: Christian König Reported-by: Stephen Rothwell Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index a6b0aaf30cbe..7264fb08eee8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, unsigned int flags = obj->flags; unsigned int i; - places[0].flags |= TTM_PL_FLAG_DESIRED; i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : obj->mm.region, &places[0], obj->bo_offset, obj->base.size, flags); + places[0].flags |= TTM_PL_FLAG_DESIRED; /* Cache this on object? */ for (i = 0; i < num_allowed; ++i) {
[PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König Reviewed-by: Zack Rusin --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 010b0cb7693c..8bc79924d171 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_FALLBACK; c++; } -- 2.34.1
[PATCH 1/2] drm/ttm: improve idle/busy handling v5
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas v5: fix bug pointed out by Matthew Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 --- drivers/gpu/drm/ttm/ttm_bo.c | 231 + drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 96a724e8f3ff..e059b1e1b13b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, potentially evicting other idle buffer objects. - * This function may sleep while waiting for space to become available. + * Allocates a resource for the buffer object pointed to by @bo, using the + * placement flags in @placement, potentially evicting other buffer objects when + * @force_space is true. + * This function may sleep while waiting for resources to become available. * Returns: - * -EBUSY: No space available (only if no_wait == 1). + * -EBUSY: No space available (only if no_wait == true). * -ENOSPC: Could not allocate space for the buffer object, either due to * fragmentation or concurrent allocators. * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. */ -int ttm_bo_mem_space(struct ttm_buffer_object *bo, - struct ttm_placement *placement, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) +static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, +struct ttm_placement *placement, +struct ttm_operation_ctx *ctx, +bool force_space, +struct ttm_resource **res) { struct ttm_device *bdev = bo->bdev; - bool type_found = false; + struct ww_acquire_ctx *ticket; int i, ret; + ticket = dma_resv_locking_ctx(bo->base.resv); ret = dma_resv_reserve_fences(bo->base.resv, 1); if (unlikely(ret)) return ret; @@ -790,98 +762,73 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, const struct ttm_place *place = &placement->placement[i]; struct ttm_resource_manager *man; -
Re: [PATCH] drm/i915: fix applying placement flag
Gentle ping. Can I get an rb for that? Thanks, Christian. Am 26.02.24 um 15:27 schrieb Christian König: Switching from a separate list to flags introduced a bug here. We were accidentially ORing the flag before initailizing the placement and not after. So this code didn't do nothing except producing a warning. Signed-off-by: Christian König Reported-by: Stephen Rothwell Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index a6b0aaf30cbe..7264fb08eee8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, unsigned int flags = obj->flags; unsigned int i; - places[0].flags |= TTM_PL_FLAG_DESIRED; i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : obj->mm.region, &places[0], obj->bo_offset, obj->base.size, flags); + places[0].flags |= TTM_PL_FLAG_DESIRED; /* Cache this on object? */ for (i = 0; i < num_allowed; ++i) {
Re: [PATCH] drm/i915/ttm: Fix TTM_PL_FLAG_DESIRED
Am 27.02.24 um 21:26 schrieb Ville Syrjala: From: Ville Syrjälä inlined from ‘i915_ttm_get_pages’ at ../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:847:2: ../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:165:18: warning: ‘places[0].flags’ is used uninitialized [-Wuninitialized] 165 | places[0].flags |= TTM_PL_FLAG_DESIRED; | ~^~ ../drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function ‘i915_ttm_get_pages’: ../drivers/gpu/drm/i915/gem/i915_gem_ttm.c:837:26: note: ‘places’ declared here 837 | struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1]; | ^~ Furhermore we then proceed to call i915_ttm_place_from_region() which memset()s the whole thing back to zero anyway. So in the end we lose the TTM_PL_FLAG_DESIRED flag (and fortunately also whatever else stack garbage happened to be in the flags at this point). No idea what functional changes this will result in... I've already send out the same patch yesterday. Please review that one. Sorry for the noise, didn't realized that i915_ttm_place_from_region() was initializing the flags and not the caller while converting this. Thanks, Christian. Cc: Somalapuram Amaranath Cc: Christian König Cc: Zack Rusin Cc: Thomas Zimmermann Cc: Thomas Hellström Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Signed-off-by: Ville Syrjälä --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 27dcfd8a34bb..e6f177183c0f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, unsigned int flags = obj->flags; unsigned int i; - places[0].flags |= TTM_PL_FLAG_DESIRED; i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : obj->mm.region, &places[0], obj->bo_offset, obj->base.size, flags); + places[0].flags |= TTM_PL_FLAG_DESIRED; /* Cache this on object? */ for (i = 0; i < num_allowed; ++i) {
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
Am 27.02.24 um 09:12 schrieb Matthew Auld: On 26/02/2024 20:21, Thomas Hellström wrote: Hi, Christian On Fri, 2024-02-23 at 15:30 +0100, Christian König wrote: Am 06.02.24 um 13:56 schrieb Christian König: Am 06.02.24 um 13:53 schrieb Thomas Hellström: Hi, Christian, On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 Sending this through xe CI, will try to review asap. Take your time. At the moment people are bombarding me with work and I have only two hands and one head as well :( So I've digged myself out of that hole and would rather like to get this new feature into 6.9. Any time to review it? I can also plan some time to review your LRU changes next week. Thanks, Christian. Sorry for the late response. Was planning to review but saw that there was still an xe CI failure. https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-129579v1/bat-atsm-2/igt@xe_evict_...@evict-overcommit-parallel-nofree-samefd.html I haven't really had time to look into what might be causing this, though. Maybe in ttm_bo_alloc_resource(): @@ -772,7 +772,7 @@ static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, do { ret = ttm_resource_alloc(bo, place, res); - if (unlikely(ret != -ENOSPC)) + if (unlikely(ret && ret != -ENOSPC)) return ret; if (likely(!ret) || !force_space) break; Otherwise we allocate VRAM but never correctly synchronise against the move fence, since we missed adding it to the BO. When we trigger async evictions that would explain the above test failure where we detect VRAM corruption, since someone else is still using the VRAM we allocated. What do you think? Yup, that looks like the right thing to do. Thanks. Give me a moment to fix that. Christian. /Thomas Christian. /Thomas --- drivers/gpu/drm/ttm/ttm_bo.c | 231 +--- --- -- drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ba3f09e2d7e6..b12f435542a9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx- no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, pote
[PATCH] drm/i915: fix applying placement flag
Switching from a separate list to flags introduced a bug here. We were accidentially ORing the flag before initailizing the placement and not after. So this code didn't do nothing except producing a warning. Signed-off-by: Christian König Reported-by: Stephen Rothwell Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index a6b0aaf30cbe..7264fb08eee8 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -162,10 +162,10 @@ i915_ttm_placement_from_obj(const struct drm_i915_gem_object *obj, unsigned int flags = obj->flags; unsigned int i; - places[0].flags |= TTM_PL_FLAG_DESIRED; i915_ttm_place_from_region(num_allowed ? obj->mm.placements[0] : obj->mm.region, &places[0], obj->bo_offset, obj->base.size, flags); + places[0].flags |= TTM_PL_FLAG_DESIRED; /* Cache this on object? */ for (i = 0; i < num_allowed; ++i) { -- 2.34.1
Re: linux-next: build failure after merge of the drm-misc tree
Am 25.02.24 um 22:47 schrieb Stephen Rothwell: Hi all, On Mon, 26 Feb 2024 08:41:16 +1100 Stephen Rothwell wrote: On Tue, 20 Feb 2024 08:48:21 +1100 Stephen Rothwell wrote: On Mon, 12 Feb 2024 15:15:54 +0200 Jani Nikula wrote: On Tue, 06 Feb 2024, Stephen Rothwell wrote: After merging the drm-misc tree, today's linux-next build (i386 defconfig) failed like this: In function 'i915_ttm_placement_from_obj', inlined from 'i915_ttm_get_pages' at drivers/gpu/drm/i915/gem/i915_gem_ttm.c:847:2: drivers/gpu/drm/i915/gem/i915_gem_ttm.c:165:18: error: 'places[0].flags' is used uninitialized [-Werror=uninitialized] 165 | places[0].flags |= TTM_PL_FLAG_DESIRED; | ~^~ drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function 'i915_ttm_get_pages': drivers/gpu/drm/i915/gem/i915_gem_ttm.c:837:26: note: 'places' declared here 837 | struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1]; | ^~ Caused by commit a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Cc: more people. I applied the following hack for today: From: Stephen Rothwell Date: Tue, 6 Feb 2024 15:17:54 +1100 Subject: [PATCH] drm/ttm: initialise places Signed-off-by: Stephen Rothwell --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 80c6cafc8887..34e699e67c25 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -834,7 +834,7 @@ static int __i915_ttm_get_pages(struct drm_i915_gem_object *obj, static int i915_ttm_get_pages(struct drm_i915_gem_object *obj) { - struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1]; + struct ttm_place places[I915_TTM_MAX_PLACEMENTS + 1] = {}; struct ttm_placement placement; /* restricted by sg_alloc_table */ -- 2.43.0 I am still applying the above patch ... Any progress? And this commit is now in the drm tree. Sorry for the delay. Oring in the flag needs to come after the call and not before it. Going to fix this. Thanks, Christian.
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
Am 06.02.24 um 13:56 schrieb Christian König: Am 06.02.24 um 13:53 schrieb Thomas Hellström: Hi, Christian, On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 Sending this through xe CI, will try to review asap. Take your time. At the moment people are bombarding me with work and I have only two hands and one head as well :( So I've digged myself out of that hole and would rather like to get this new feature into 6.9. Any time to review it? I can also plan some time to review your LRU changes next week. Thanks, Christian. Christian. /Thomas --- drivers/gpu/drm/ttm/ttm_bo.c | 231 +-- -- drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ba3f09e2d7e6..b12f435542a9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx- no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, potentially evicting other idle buffer objects. - * This function may sleep while waiting for space to become available. + * Allocates a resource for the buffer object pointed to by @bo, using the + * placement flags in @placement, potentially evicting other buffer objects when + * @force_space is true. + * This function may sleep while waiting for resources to become available. * Returns: - * -EBUSY: No space available (only if no_wait == 1). + * -EBUSY: No space available (only if no_wait == true). * -ENOSPC: Could not allocate space for the buffer object, either due to * fragmentation or concurrent allocators. * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. */ -int ttm_bo_mem_space(struct ttm_buffer_object *bo, - struct ttm_placement *placement, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) +static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, + struct ttm_placement *placement, + struct ttm_operation_ctx *ctx, + bool force_space, + struct ttm_resource **res) { struct ttm_device *bdev = bo->bdev; - bool type_found = false; + struct ww_acquire_ctx *ticket; int i, ret; + ticket = dma_resv_locking_ctx(bo->base.resv); ret = dma_resv_reserve_fences(bo->base.resv, 1);
Re: [PATCH] drm/ttm: Fix an invalid freeing on already freed page in error path
Am 22.02.24 um 08:34 schrieb Thomas Hellström: On Wed, 2024-02-21 at 11:26 +0100, Christian König wrote: Am 21.02.24 um 08:33 schrieb Thomas Hellström: If caching mode change fails due to, for example, OOM we free the allocated pages in a two-step process. First the pages for which the caching change has already succeeded. Secondly the pages for which a caching change did not succeed. However the second step was incorrectly freeing the pages already freed in the first step. Fix. Signed-off-by: Thomas Hellström Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path") Cc: Christian König Cc: Dave Airlie Cc: Christian Koenig Cc: Huang Rui Cc: dri-de...@lists.freedesktop.org Cc: # v6.4+ You don't know how much time I've spend staring at this line to find the bug in it and haven't seen it. Got bug reports about that for month as well. Yeah, sorry about that. We should probably have Kunit tests exercising OOM in the pool code involving WC pages. I'll push this to drm-misc-next. drm-misc-fixes please! That needs to be backported ASAP. Need to dig up the bug report for this again. Thanks, Christian. /Thomas Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index b62f420a9f96..112438d965ff 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, enum ttm_caching caching, pgoff_t start_page, pgoff_t end_page) { - struct page **pages = tt->pages; + struct page **pages = &tt->pages[start_page]; unsigned int order; pgoff_t i, nr;
Re: [PATCH v7 3/3] drm/buddy: Add defragmentation support
Am 21.02.24 um 13:18 schrieb Arunpravin Paneer Selvam: Add a function to support defragmentation. Thinking more about it maybe you want to call this function differently. Essentially we are force merging pages even if their cleared flag doesn't match, that is something different than defragmentation I think. Maybe rename it for force_merge or something similar. Not mandatory, just an idea how to improve the readability of the code. Apart from that just let me know when I can push it to drm-misc-next. Christian. v1: - Defragment the memory beginning from min_order till the required memory space is available. v2(Matthew): - add amdgpu user for defragmentation - add a warning if the two blocks are incompatible on defragmentation - call full defragmentation in the fini() function - place a condition to test if min_order is equal to 0 - replace the list with safe_reverse() variant as we might remove the block from the list. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 17 +++- drivers/gpu/drm/drm_buddy.c | 93 +--- include/drm/drm_buddy.h | 3 + 3 files changed, 97 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c index e494f5bf136a..cff8a526c622 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c @@ -533,8 +533,21 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager *man, min_block_size, &vres->blocks, vres->flags); - if (unlikely(r)) - goto error_free_blocks; + if (unlikely(r)) { + if (r == -ENOSPC) { + drm_buddy_defrag(mm, min_block_size); + r = drm_buddy_alloc_blocks(mm, fpfn, + lpfn, + size, + min_block_size, + &vres->blocks, + vres->flags); + if (unlikely(r)) + goto error_free_blocks; + } else { + goto error_free_blocks; + } + } if (size > remaining_size) remaining_size = 0; diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 18e004fa39d3..56bd1560fbcd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -203,6 +203,8 @@ void drm_buddy_fini(struct drm_buddy *mm) drm_block_free(mm, mm->roots[i]); } + drm_buddy_defrag(mm, mm->chunk_size << mm->max_order); + WARN_ON(mm->avail != mm->size); kfree(mm->roots); @@ -276,25 +278,39 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, -struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, +struct drm_buddy_block *block, +bool defrag) { + unsigned int order, block_order; struct drm_buddy_block *parent; + block_order = drm_buddy_block_order(block); + while ((parent = block->parent)) { - struct drm_buddy_block *buddy; + struct drm_buddy_block *buddy = NULL; buddy = __get_buddy(block); if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + /* +* Check the block and its buddy clear state and exit +* the loop if they both have the dissimilar state. +*/ + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } + + WARN_ON(defrag && + (drm_buddy_block_is_clear(block) == +drm_buddy_block_is_clear(buddy))); list_del(&buddy->link); @@ -304,8 +320,57 @@ static void __drm_buddy_free(struct
Re: [PATCH] drm/ttm: Fix an invalid freeing on already freed page in error path
Am 21.02.24 um 08:33 schrieb Thomas Hellström: If caching mode change fails due to, for example, OOM we free the allocated pages in a two-step process. First the pages for which the caching change has already succeeded. Secondly the pages for which a caching change did not succeed. However the second step was incorrectly freeing the pages already freed in the first step. Fix. Signed-off-by: Thomas Hellström Fixes: 379989e7cbdc ("drm/ttm/pool: Fix ttm_pool_alloc error path") Cc: Christian König Cc: Dave Airlie Cc: Christian Koenig Cc: Huang Rui Cc: dri-de...@lists.freedesktop.org Cc: # v6.4+ You don't know how much time I've spend staring at this line to find the bug in it and haven't seen it. Got bug reports about that for month as well. Reviewed-by: Christian König --- drivers/gpu/drm/ttm/ttm_pool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_pool.c b/drivers/gpu/drm/ttm/ttm_pool.c index b62f420a9f96..112438d965ff 100644 --- a/drivers/gpu/drm/ttm/ttm_pool.c +++ b/drivers/gpu/drm/ttm/ttm_pool.c @@ -387,7 +387,7 @@ static void ttm_pool_free_range(struct ttm_pool *pool, struct ttm_tt *tt, enum ttm_caching caching, pgoff_t start_page, pgoff_t end_page) { - struct page **pages = tt->pages; + struct page **pages = &tt->pages[start_page]; unsigned int order; pgoff_t i, nr;
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
Am 16.02.24 um 15:47 schrieb Matthew Auld: On 16/02/2024 14:02, Christian König wrote: Am 16.02.24 um 14:21 schrieb Matthew Auld: On 16/02/2024 12:33, Christian König wrote: Am 16.02.24 um 13:23 schrieb Matthew Auld: On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? Other question is how can a buddy allocator fragment in the first place? The fragmentation is due to pages now being tracked as dirty/clear. Should the allocator merge together a page that is dirty with a page that is cleared? When should it do that? User wants to be able to keep the two separate if possible. For example, freeing one single dirty page can dirty a huge swathe of your already cleared pages if they are merged together. Or do you have some some other ideas here? Sorry, that was not what I meant. I should probably have been clearer. That dirty and clean pages are now kept separated is obvious, but why do you need to de-fragment them at some point? Ah, right. At the very least we need to do something similar to this at fini(), just to ensure we properly merge everything back together so we can correctly tear down the mm. Outside of that the thinking was that it might be useful to call when allocating larger min page-sizes. You might now be failing the allocation due to fragmentation, and so in some cases might be better off running some kind of defrag step first, instead of failing the allocation and trying to evict stuff. Anyway, if that is not a concern for amdgpu, then we just need to handle the fini() case and can keep this internal. Ah, yes that makes more sense. So you basically force merge the pages before fini to avoid warnings that the buddy isn't empty. Thanks, that answers my curiosity. But I unfortunately still don't have time to dig deep enough into this for a review. Thanks, Christian. Christian. Christian. 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? +
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
Am 16.02.24 um 14:21 schrieb Matthew Auld: On 16/02/2024 12:33, Christian König wrote: Am 16.02.24 um 13:23 schrieb Matthew Auld: On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? Other question is how can a buddy allocator fragment in the first place? The fragmentation is due to pages now being tracked as dirty/clear. Should the allocator merge together a page that is dirty with a page that is cleared? When should it do that? User wants to be able to keep the two separate if possible. For example, freeing one single dirty page can dirty a huge swathe of your already cleared pages if they are merged together. Or do you have some some other ideas here? Sorry, that was not what I meant. I should probably have been clearer. That dirty and clean pages are now kept separated is obvious, but why do you need to de-fragment them at some point? Christian. Christian. 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? + if (!block->parent) + continue; + + order = __drm_buddy_free(mm, block, 1); + if (order >= min_order) + return; + } + } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block @@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm, if (drm_buddy_block_is_clear(block)) mm->clear_avail += drm_buddy_block_size(mm, block); - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); } EXPORT_SYMBOL(drm_buddy_free_block); @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm, err_undo: if (tmp != order) - __drm_buddy_free
Re: [PATCH 0/6 V4] fdinfo shared stats
Am 15.02.24 um 15:20 schrieb Alex Deucher: On Thu, Feb 15, 2024 at 9:18 AM Christian König wrote: Am 12.02.24 um 22:04 schrieb Alex Deucher: We had a request to add shared buffer stats to fdinfo for amdgpu and while implementing that, Christian mentioned that just looking at the GEM handle count doesn't take into account buffers shared with other subsystems like V4L or RDMA. Those subsystems don't use GEM, so it doesn't really matter from a GPU top perspective, but it's more correct if you actually want to see shared buffers. After further discussions, add a helper and update all fdinfo implementations to use that helper for consistency. v4: switch drm_gem_object_is_shared_for_memory_stats() to an inline function I'm still not sure if looking at the actual handle count is the right approach, but it's certainly better than before. Well, it's consistent across drivers. Yeah, which makes it easy to change if we find something better. So Reviewed-by: Christian König for the entire series. Should I take this through drm-misc-next? Yes, please. Done. Regards, Christian. Thanks, Alex Regards, Christian. Alex Deucher (6): Documentation/gpu: Update documentation on drm-shared-* drm: add drm_gem_object_is_shared_for_memory_stats() helper drm: update drm_show_memory_stats() for dma-bufs drm/amdgpu: add shared fdinfo stats drm/i915: Update shared stats to use the new gem helper drm/xe: Update shared stats to use the new gem helper Documentation/gpu/drm-usage-stats.rst | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 6 ++ drivers/gpu/drm/drm_file.c | 2 +- drivers/gpu/drm/i915/i915_drm_client.c | 2 +- drivers/gpu/drm/xe/xe_drm_client.c | 2 +- include/drm/drm_gem.h | 13 + 8 files changed, 38 insertions(+), 4 deletions(-)
Re: [PATCH v6 3/3] drm/buddy: Add defragmentation support
Am 16.02.24 um 13:23 schrieb Matthew Auld: On 08/02/2024 15:50, Arunpravin Paneer Selvam wrote: Add a function to support defragmentation. v1: Defragment the memory beginning from min_order till the required memory space is available. Signed-off-by: Arunpravin Paneer Selvam Suggested-by: Matthew Auld --- drivers/gpu/drm/drm_buddy.c | 67 +++-- include/drm/drm_buddy.h | 3 ++ No users? Other question is how can a buddy allocator fragment in the first place? Christian. 2 files changed, 59 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index 33ad0cfbd54c..fac423d2cb73 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -276,10 +276,12 @@ drm_get_buddy(struct drm_buddy_block *block) } EXPORT_SYMBOL(drm_get_buddy); -static void __drm_buddy_free(struct drm_buddy *mm, - struct drm_buddy_block *block) +static unsigned int __drm_buddy_free(struct drm_buddy *mm, + struct drm_buddy_block *block, + bool defrag) { struct drm_buddy_block *parent; + unsigned int order; while ((parent = block->parent)) { struct drm_buddy_block *buddy; @@ -289,12 +291,14 @@ static void __drm_buddy_free(struct drm_buddy *mm, if (!drm_buddy_block_is_free(buddy)) break; - if (drm_buddy_block_is_clear(block) != - drm_buddy_block_is_clear(buddy)) - break; + if (!defrag) { + if (drm_buddy_block_is_clear(block) != + drm_buddy_block_is_clear(buddy)) + break; - if (drm_buddy_block_is_clear(block)) - mark_cleared(parent); + if (drm_buddy_block_is_clear(block)) + mark_cleared(parent); + } Maybe check if the two blocks are incompatible and chuck a warn if they are not? Main thing is not to hide issues with split blocks that should have been merged before. list_del(&buddy->link); @@ -304,8 +308,49 @@ static void __drm_buddy_free(struct drm_buddy *mm, block = parent; } + order = drm_buddy_block_order(block); mark_free(mm, block); + + return order; +} + +/** + * drm_buddy_defrag - Defragmentation routine + * + * @mm: DRM buddy manager + * @min_order: minimum order in the freelist to begin + * the defragmentation process + * + * Driver calls the defragmentation function when the + * requested memory allocation returns -ENOSPC. + */ +void drm_buddy_defrag(struct drm_buddy *mm, + unsigned int min_order) Just wondering if we need "full defag" also? We would probably need to call this at fini() anyway. +{ + struct drm_buddy_block *block; + struct list_head *list; + unsigned int order; + int i; + + if (min_order > mm->max_order) + return; + + for (i = min_order - 1; i >= 0; i--) { Need to be careful with min_order = 0 ? + list = &mm->free_list[i]; + if (list_empty(list)) + continue; + + list_for_each_entry_reverse(block, list, link) { Don't we need the safe_reverse() variant here, since this is removing from the list? + if (!block->parent) + continue; + + order = __drm_buddy_free(mm, block, 1); + if (order >= min_order) + return; + } + } } +EXPORT_SYMBOL(drm_buddy_defrag); /** * drm_buddy_free_block - free a block @@ -321,7 +366,7 @@ void drm_buddy_free_block(struct drm_buddy *mm, if (drm_buddy_block_is_clear(block)) mm->clear_avail += drm_buddy_block_size(mm, block); - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); } EXPORT_SYMBOL(drm_buddy_free_block); @@ -470,7 +515,7 @@ __alloc_range_bias(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -588,7 +633,7 @@ alloc_from_freelist(struct drm_buddy *mm, err_undo: if (tmp != order) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); return ERR_PTR(err); } @@ -668,7 +713,7 @@ static int __alloc_range(struct drm_buddy *mm, if (buddy && (drm_buddy_block_is_free(block) && drm_buddy_block_is_free(buddy))) - __drm_buddy_free(mm, block); + __drm_buddy_free(mm, block, 0); err_free: if (err == -ENOSPC && total_allocated_on_err) { diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h index d81c596dfa38..d0f63e7b5915 100644 --- a/include/drm/drm_buddy.h +++ b/include/drm/drm_buddy.h @@ -166,6 +166,9 @@ void drm_buddy_free_list(struct drm_buddy *mm, struct list_head *objects, unsigned int flags); +void drm_buddy_defrag(struct drm_buddy *mm, +
Re: [PATCH] drm/buddy: Modify duplicate list_splice_tail call
Am 16.02.24 um 12:46 schrieb Arunpravin Paneer Selvam: On 2/16/2024 4:41 PM, Matthew Auld wrote: On 16/02/2024 10:00, Arunpravin Paneer Selvam wrote: Remove the duplicate list_splice_tail call when the total_allocated < size condition is true. Cc: # 6.7+ Fixes: 8746c6c9dfa3 ("drm/buddy: Fix alloc_range() error handling code") Reported-by: Bert Karwatzki Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/drm_buddy.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index c1a99bf4dffd..c4222b886db7 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -538,13 +538,13 @@ static int __alloc_range(struct drm_buddy *mm, list_add(&block->left->tmp_link, dfs); } while (1); - list_splice_tail(&allocated, blocks); - if (total_allocated < size) { err = -ENOSPC; goto err_free; } + list_splice_tail(&allocated, blocks); Sigh. Can we extend the unit test(s) to catch this? Sure, Let me check. In the meantime I'm going to push this one to drm-misc-fixes. Regards, Christian. Regards, Arun. Reviewed-by: Matthew Auld + return 0; err_undo: base-commit: a64056bb5a3215bd31c8ce17d609ba0f4d5c55ea
Re: [PATCH 0/6 V4] fdinfo shared stats
Am 12.02.24 um 22:04 schrieb Alex Deucher: We had a request to add shared buffer stats to fdinfo for amdgpu and while implementing that, Christian mentioned that just looking at the GEM handle count doesn't take into account buffers shared with other subsystems like V4L or RDMA. Those subsystems don't use GEM, so it doesn't really matter from a GPU top perspective, but it's more correct if you actually want to see shared buffers. After further discussions, add a helper and update all fdinfo implementations to use that helper for consistency. v4: switch drm_gem_object_is_shared_for_memory_stats() to an inline function I'm still not sure if looking at the actual handle count is the right approach, but it's certainly better than before. So Reviewed-by: Christian König for the entire series. Should I take this through drm-misc-next? Regards, Christian. Alex Deucher (6): Documentation/gpu: Update documentation on drm-shared-* drm: add drm_gem_object_is_shared_for_memory_stats() helper drm: update drm_show_memory_stats() for dma-bufs drm/amdgpu: add shared fdinfo stats drm/i915: Update shared stats to use the new gem helper drm/xe: Update shared stats to use the new gem helper Documentation/gpu/drm-usage-stats.rst | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 6 ++ drivers/gpu/drm/drm_file.c | 2 +- drivers/gpu/drm/i915/i915_drm_client.c | 2 +- drivers/gpu/drm/xe/xe_drm_client.c | 2 +- include/drm/drm_gem.h | 13 + 8 files changed, 38 insertions(+), 4 deletions(-)
Re: [PATCH 2/2] drm/tests/drm_buddy: add alloc_contiguous test
Am 13.02.24 um 15:28 schrieb Matthew Auld: On 13/02/2024 13:52, Arunpravin Paneer Selvam wrote: Sanity check DRM_BUDDY_CONTIGUOUS_ALLOCATION. References: https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Signed-off-by: Matthew Auld Reviewed-by: Arunpravin Paneer Selvam It looks like you changed the patch authorship here. Going to fix this if I get tasked with pushing this to drm-misc-fixes. But I still have hope that Arun will figure out how to do this himself. Christian. Cc: Arunpravin Paneer Selvam Cc: Limonciello Cc: Christian König Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/tests/drm_buddy_test.c | 89 ++ 1 file changed, 89 insertions(+) diff --git a/drivers/gpu/drm/tests/drm_buddy_test.c b/drivers/gpu/drm/tests/drm_buddy_test.c index ea2af6bd9abe..fee6bec757d1 100644 --- a/drivers/gpu/drm/tests/drm_buddy_test.c +++ b/drivers/gpu/drm/tests/drm_buddy_test.c @@ -8,6 +8,7 @@ #include #include +#include #include @@ -18,6 +19,93 @@ static inline u64 get_size(int order, u64 chunk_size) return (1 << order) * chunk_size; } +static void drm_test_buddy_alloc_contiguous(struct kunit *test) +{ + u64 mm_size, ps = SZ_4K, i, n_pages, total; + struct drm_buddy_block *block; + struct drm_buddy mm; + LIST_HEAD(left); + LIST_HEAD(middle); + LIST_HEAD(right); + LIST_HEAD(allocated); + + mm_size = 16 * 3 * SZ_4K; + + KUNIT_EXPECT_FALSE(test, drm_buddy_init(&mm, mm_size, ps)); + + /* + * Idea is to fragment the address space by alternating block + * allocations between three different lists; one for left, middle and + * right. We can then free a list to simulate fragmentation. In + * particular we want to exercise the DRM_BUDDY_CONTIGUOUS_ALLOCATION, + * including the try_harder path. + */ + + i = 0; + n_pages = mm_size / ps; + do { + struct list_head *list; + int slot = i % 3; + + if (slot == 0) + list = &left; + else if (slot == 1) + list = &middle; + else + list = &right; + KUNIT_ASSERT_FALSE_MSG(test, + drm_buddy_alloc_blocks(&mm, 0, mm_size, + ps, ps, list, 0), + "buddy_alloc hit an error size=%d\n", + ps); + } while (++i < n_pages); + + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%d\n", 3 * ps); + + drm_buddy_free_list(&mm, &middle); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 2 * ps); + + drm_buddy_free_list(&mm, &right); + KUNIT_ASSERT_TRUE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc didn't error size=%llu\n", 3 * ps); + /* + * At this point we should have enough contiguous space for 2 blocks, + * however they are never buddies (since we freed middle and right) so + * will require the try_harder logic to find them. + */ + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 2 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 2 * ps); + + drm_buddy_free_list(&mm, &left); + KUNIT_ASSERT_FALSE_MSG(test, drm_buddy_alloc_blocks(&mm, 0, mm_size, + 3 * ps, ps, &allocated, + DRM_BUDDY_CONTIGUOUS_ALLOCATION), + "buddy_alloc hit an error size=%d\n", 3 * ps); + + total = 0; + list_for_each_entry(block, &allocated, link) + total += drm_buddy_block_size(&mm, block); + + KUNIT_ASSERT_EQ(test, total, ps * 2 + ps * 3); + + drm_buddy_free_list(&mm, &allocated); + drm_buddy_fini(&mm); +} + static void drm_test_buddy_alloc_pathological(struct kunit *test) { u64 mm_size, size, start = 0; @@ -280,6 +368,7 @@ static struct kunit_case drm_buddy_tests[] = { KUNIT_CASE(drm_test_buddy_alloc_optimistic), KUNIT_CASE(drm_test_buddy_alloc_pessimistic), KUNIT_CASE(drm_test_buddy_alloc_pathological), + KUNIT_CASE(drm_test_buddy_alloc_contiguous), {} };
Re: [PATCH] drm/buddy: Fix alloc_range() error handling code
Am 07.02.24 um 18:44 schrieb Arunpravin Paneer Selvam: Few users have observed display corruption when they boot the machine to KDE Plasma or playing games. We have root caused the problem that whenever alloc_range() couldn't find the required memory blocks the function was returning SUCCESS in some of the corner cases. The right approach would be if the total allocated size is less than the required size, the function should return -ENOSPC. Gitlab ticket link - https://gitlab.freedesktop.org/drm/amd/-/issues/3097 Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation") Signed-off-by: Arunpravin Paneer Selvam Tested-by: Mario Limonciello Acked-by: Christian König CC: stable.. ? --- drivers/gpu/drm/drm_buddy.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c index f57e6d74fb0e..c1a99bf4dffd 100644 --- a/drivers/gpu/drm/drm_buddy.c +++ b/drivers/gpu/drm/drm_buddy.c @@ -539,6 +539,12 @@ static int __alloc_range(struct drm_buddy *mm, } while (1); list_splice_tail(&allocated, blocks); + + if (total_allocated < size) { + err = -ENOSPC; + goto err_free; + } + return 0; err_undo:
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
Am 06.02.24 um 13:53 schrieb Thomas Hellström: Hi, Christian, On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 Sending this through xe CI, will try to review asap. Take your time. At the moment people are bombarding me with work and I have only two hands and one head as well :( Christian. /Thomas --- drivers/gpu/drm/ttm/ttm_bo.c | 231 +-- -- drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ba3f09e2d7e6..b12f435542a9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx- no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, potentially evicting other idle buffer objects. - * This function may sleep while waiting for space to become available. + * Allocates a resource for the buffer object pointed to by @bo, using the + * placement flags in @placement, potentially evicting other buffer objects when + * @force_space is true. + * This function may sleep while waiting for resources to become available. * Returns: - * -EBUSY: No space available (only if no_wait == 1). + * -EBUSY: No space available (only if no_wait == true). * -ENOSPC: Could not allocate space for the buffer object, either due to * fragmentation or concurrent allocators. * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. */ -int ttm_bo_mem_space(struct ttm_buffer_object *bo, - struct ttm_placement *placement, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) +static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, +struct ttm_placement *placement, +struct ttm_operation_ctx *ctx, +bool force_space, +struct ttm_resource **res) { struct ttm_device *bdev = bo->bdev; - bool type_found = false; + struct ww_acquire_ctx *ticket; int i, ret; + ticket = dma_resv_locking_ctx(bo->base.resv); ret = dma_resv_reserve_fences(bo->base.r
[PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König Reviewed-by: Zack Rusin --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index b671b0665492..0eac179a387c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_FALLBACK; c++; } -- 2.34.1
[PATCH 1/2] drm/ttm: improve idle/busy handling v4
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well v4: keep the ttm_bo_mem_space functionality as it is for now, only add new handling for ttm_bo_validate as suggested by Thomas Signed-off-by: Christian König Reviewed-by: Zack Rusin v3 --- drivers/gpu/drm/ttm/ttm_bo.c | 231 + drivers/gpu/drm/ttm/ttm_resource.c | 16 +- include/drm/ttm/ttm_resource.h | 3 +- 3 files changed, 121 insertions(+), 129 deletions(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index ba3f09e2d7e6..b12f435542a9 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** - * ttm_bo_mem_space + * ttm_bo_alloc_resource - Allocate backing store for a BO * - * @bo: Pointer to a struct ttm_buffer_object. the data of which - * we want to allocate space for. - * @placement: Proposed new placement for the buffer object. - * @mem: A struct ttm_resource. + * @bo: Pointer to a struct ttm_buffer_object of which we want a resource for + * @placement: Proposed new placement for the buffer object * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space + * @res: The resulting struct ttm_resource. * - * Allocate memory space for the buffer object pointed to by @bo, using - * the placement flags in @placement, potentially evicting other idle buffer objects. - * This function may sleep while waiting for space to become available. + * Allocates a resource for the buffer object pointed to by @bo, using the + * placement flags in @placement, potentially evicting other buffer objects when + * @force_space is true. + * This function may sleep while waiting for resources to become available. * Returns: - * -EBUSY: No space available (only if no_wait == 1). + * -EBUSY: No space available (only if no_wait == true). * -ENOSPC: Could not allocate space for the buffer object, either due to * fragmentation or concurrent allocators. * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. */ -int ttm_bo_mem_space(struct ttm_buffer_object *bo, - struct ttm_placement *placement, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) +static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, +struct ttm_placement *placement, +struct ttm_operation_ctx *ctx, +bool force_space, +struct ttm_resource **res) { struct ttm_device *bdev = bo->bdev; - bool type_found = false; + struct ww_acquire_ctx *ticket; int i, ret; + ticket = dma_resv_locking_ctx(bo->base.resv); ret = dma_resv_reserve_fences(bo->base.resv, 1); if (unlikely(ret)) return ret; @@ -790,98 +762,73 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, const struct ttm_place *place = &placement->placement[i]; struct ttm_resource_manager *man; - if (place->flags & TTM_PL_FLAG_FALLBACK) -
Rework TTMs busy handling
Hi guys, so pushed the first few patches from this series. I hope that I correctly managed to resolve the silent Xe merge conflict in drm-tip, but would be nice if somebody could double check. Then for the two remaining patches I've implemented most of what Thomas suggest, e.g. the existing functionality sticks around for eviction and hobs, but ttm_bo_validate will now try to always move things into the non-fallback placements on validation first. What I haven't done yet is to split up the preferred placement since I couldn't immediately see an use case for this, but it's really something we might do in the future as well. Please review and comment, Christian.
Re: [PATCH] mm: Remove double faults once write a device pfn
Am 24.01.24 um 12:04 schrieb Alistair Popple: "Zhou, Xianrong" writes: [AMD Official Use Only - General] The vmf_insert_pfn_prot could cause unnecessary double faults on a device pfn. Because currently the vmf_insert_pfn_prot does not make the pfn writable so the pte entry is normally read-only or dirty catching. What? How do you got to this conclusion? Sorry. I did not mention that this problem only exists on arm64 platform. Ok, that makes at least a little bit more sense. Because on arm64 platform the PTE_RDONLY is automatically attached to the userspace pte entries even through VM_WRITE + VM_SHARE. The PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However vmf_insert_pfn_prot do not make the pte writable passing false @mkwrite to insert_pfn. Question is why is arm64 doing this? As far as I can see they must have some hardware reason for that. The mkwrite parameter to insert_pfn() was added by commit b2770da642540 to make insert_pfn() look more like insert_pfn_pmd() so that the DAX code can insert PTEs which are writable and dirty at the same time. This is one scenario to do so. In fact on arm64 there are many scenarios could be to do so. So we can let vmf_insert_pfn_prot supporting @mkwrite for drivers at core layer and let drivers to decide whether or not to make writable and dirty at one time. The patch did this. Otherwise double faults on arm64 when call vmf_insert_pfn_prot. Well, that doesn't answer my question why arm64 is double faulting in the first place,. Eh. On arm64 When userspace mmap() with PROT_WRITE and MAP_SHARED the vma->vm_page_prot has the PTE_RDONLY and PTE_WRITE within PAGE_SHARED_EXEC. (seeing arm64 protection_map) Well that's your observation, but not the explanation why arm64 is doing this. See this would have quite some negative impact on performance, not only for gfx drivers but in general. So either the observation is incorrect or there is a *really* good reason why arm64 is taking this performance penalty. When write the userspace virtual address the first fault happen and call into driver's .fault->ttm_bo_vm_fault_reserved->vmf_insert_pfn_prot->insert_pfn. The insert_pfn will establish the pte entry. However the vmf_insert_pfn_prot pass false @mkwrite to insert_pfn by default and so insert_pfn could not make the pfn writable and it do not call maybe_mkwrite(pte_mkdirty(entry), vma) to clear the PTE_RDONLY bit. So the pte entry is actually write protection for mmu. So when the first fault return and re-execute the store instruction the second fault happen again. And in second fault it just only do pte_mkdirty(entry) which clear the PTE_RDONLY. It depends if the ARM64 CPU in question supports hardware dirty bit management (DBM). If that is the case and PTE_DBM (ie. PTE_WRITE) is set HW will automatically clear PTE_RDONLY bit to mark the entry dirty instead of raising a write fault. So you shouldn't see a double fault if PTE_DBM/WRITE is set. On ARM64 you can kind of think of PTE_RDONLY as the HW dirty bit and PTE_DBM as the read/write permission bit with SW being responsible for updating PTE_RDONLY via the fault handler if DBM is not supported by HW. At least that's my understanding from having hacked on this in the past. You can see all this weirdness happening in the definitions of pte_dirty() and pte_write() for ARM64. +1 Thanks a lot for that, this was exactly the information I was looking for. In this light it makes this patch here look unnecessary and questionable at best. Xianrong if you have an arm64 platform which really double faults (confirmed through a debugger for example) then you have to ask why this platform shows this behavior and not try to work around it. Behaviors like those usually have a very very good reason and without a confirmed explanation I'm not allowing any patch in which would disable stuff like that. Regards, Christian. I think so and hope no wrong. So as long as this isn't sorted out I'm going to reject this patch. Regards, Christian. This is a completely different use case to what you try to use it here for and that looks extremely fishy to me. Regards, Christian. The first fault only sets up the pte entry which actually is dirty catching. And the second immediate fault to the pfn due to first dirty catching when the cpu re-execute the store instruction. It could be that this is done to work around some hw behavior, but not because of dirty catching. Normally if the drivers call vmf_insert_pfn_prot and also supply 'pfn_mkwrite' callback within vm_operations_struct which requires the pte to be dirty catching then the vmf_insert_pfn_prot and the double fault are reasonable. It is not a problem. Well, as far as I can see that behavior absolutely doesn't make sense. When pfn_mkwrite is requested then the driver should use PAGE_COPY, which is exactly what VMWGFX (the only driver using dirty tracking) is doing. Everybody else uses PAGE_SHARED which should make the pte w
Re: [PATCH] mm: Remove double faults once write a device pfn
Am 24.01.24 um 03:43 schrieb Zhou, Xianrong: [AMD Official Use Only - General] The vmf_insert_pfn_prot could cause unnecessary double faults on a device pfn. Because currently the vmf_insert_pfn_prot does not make the pfn writable so the pte entry is normally read-only or dirty catching. What? How do you got to this conclusion? Sorry. I did not mention that this problem only exists on arm64 platform. Ok, that makes at least a little bit more sense. Because on arm64 platform the PTE_RDONLY is automatically attached to the userspace pte entries even through VM_WRITE + VM_SHARE. The PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However vmf_insert_pfn_prot do not make the pte writable passing false @mkwrite to insert_pfn. Question is why is arm64 doing this? As far as I can see they must have some hardware reason for that. The mkwrite parameter to insert_pfn() was added by commit b2770da642540 to make insert_pfn() look more like insert_pfn_pmd() so that the DAX code can insert PTEs which are writable and dirty at the same time. This is one scenario to do so. In fact on arm64 there are many scenarios could be to do so. So we can let vmf_insert_pfn_prot supporting @mkwrite for drivers at core layer and let drivers to decide whether or not to make writable and dirty at one time. The patch did this. Otherwise double faults on arm64 when call vmf_insert_pfn_prot. Well, that doesn't answer my question why arm64 is double faulting in the first place,. So as long as this isn't sorted out I'm going to reject this patch. Regards, Christian. This is a completely different use case to what you try to use it here for and that looks extremely fishy to me. Regards, Christian. The first fault only sets up the pte entry which actually is dirty catching. And the second immediate fault to the pfn due to first dirty catching when the cpu re-execute the store instruction. It could be that this is done to work around some hw behavior, but not because of dirty catching. Normally if the drivers call vmf_insert_pfn_prot and also supply 'pfn_mkwrite' callback within vm_operations_struct which requires the pte to be dirty catching then the vmf_insert_pfn_prot and the double fault are reasonable. It is not a problem. Well, as far as I can see that behavior absolutely doesn't make sense. When pfn_mkwrite is requested then the driver should use PAGE_COPY, which is exactly what VMWGFX (the only driver using dirty tracking) is doing. Everybody else uses PAGE_SHARED which should make the pte writeable immediately. Regards, Christian. However the most of drivers calling vmf_insert_pfn_prot do not supply the 'pfn_mkwrite' callback so that the second fault is unnecessary. So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair, we should also supply vmf_insert_pfn_mkwrite for drivers as well. Signed-off-by: Xianrong Zhou --- arch/x86/entry/vdso/vma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c| 2 +- drivers/gpu/drm/ttm/ttm_bo_vm.c| 8 +--- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 8 +--- include/drm/ttm/ttm_bo.h | 3 ++- include/linux/mm.h | 2 +- mm/memory.c| 14 +++--- 10 files changed, 30 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 7645730dc228..dd2431c2975f 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK)) { return vmf_insert_pfn_prot(vma, vmf->address, __pa(pvti) >> PAGE_SHIFT, - pgprot_decrypted(vma- vm_page_prot)); + pgprot_decrypted(vma- vm_page_prot), + true); } } else if (sym_offset == image->sym_hvclock_page) { pfn = hv_get_tsc_pfn(); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 49a5f1c73b3e..adcb20d9e624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf) } ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma- vm_page_prot, - TTM_BO_VM_NUM_PREFAULT); + TTM_BO_VM_NUM_PREFAULT, true); drm_dev_exit(idx); } else { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 9227f8146a58..c6f1
Re: [PATCH] mm: Remove double faults once write a device pfn
Am 23.01.24 um 09:33 schrieb Zhou, Xianrong: [AMD Official Use Only - General] The vmf_insert_pfn_prot could cause unnecessary double faults on a device pfn. Because currently the vmf_insert_pfn_prot does not make the pfn writable so the pte entry is normally read-only or dirty catching. What? How do you got to this conclusion? Sorry. I did not mention that this problem only exists on arm64 platform. Ok, that makes at least a little bit more sense. Because on arm64 platform the PTE_RDONLY is automatically attached to the userspace pte entries even through VM_WRITE + VM_SHARE. The PTE_RDONLY needs to be cleared in vmf_insert_pfn_prot. However vmf_insert_pfn_prot do not make the pte writable passing false @mkwrite to insert_pfn. Question is why is arm64 doing this? As far as I can see they must have some hardware reason for that. The mkwrite parameter to insert_pfn() was added by commit b2770da642540 to make insert_pfn() look more like insert_pfn_pmd() so that the DAX code can insert PTEs which are writable and dirty at the same time. This is a completely different use case to what you try to use it here for and that looks extremely fishy to me. Regards, Christian. The first fault only sets up the pte entry which actually is dirty catching. And the second immediate fault to the pfn due to first dirty catching when the cpu re-execute the store instruction. It could be that this is done to work around some hw behavior, but not because of dirty catching. Normally if the drivers call vmf_insert_pfn_prot and also supply 'pfn_mkwrite' callback within vm_operations_struct which requires the pte to be dirty catching then the vmf_insert_pfn_prot and the double fault are reasonable. It is not a problem. Well, as far as I can see that behavior absolutely doesn't make sense. When pfn_mkwrite is requested then the driver should use PAGE_COPY, which is exactly what VMWGFX (the only driver using dirty tracking) is doing. Everybody else uses PAGE_SHARED which should make the pte writeable immediately. Regards, Christian. However the most of drivers calling vmf_insert_pfn_prot do not supply the 'pfn_mkwrite' callback so that the second fault is unnecessary. So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair, we should also supply vmf_insert_pfn_mkwrite for drivers as well. Signed-off-by: Xianrong Zhou --- arch/x86/entry/vdso/vma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c| 2 +- drivers/gpu/drm/ttm/ttm_bo_vm.c| 8 +--- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 8 +--- include/drm/ttm/ttm_bo.h | 3 ++- include/linux/mm.h | 2 +- mm/memory.c| 14 +++--- 10 files changed, 30 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 7645730dc228..dd2431c2975f 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK)) { return vmf_insert_pfn_prot(vma, vmf->address, __pa(pvti) >> PAGE_SHIFT, - pgprot_decrypted(vma- vm_page_prot)); + pgprot_decrypted(vma- vm_page_prot), + true); } } else if (sym_offset == image->sym_hvclock_page) { pfn = hv_get_tsc_pfn(); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 49a5f1c73b3e..adcb20d9e624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf) } ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma- vm_page_prot, - TTM_BO_VM_NUM_PREFAULT); + TTM_BO_VM_NUM_PREFAULT, true); drm_dev_exit(idx); } else { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 9227f8146a58..c6f13ae6c308 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1114,7 +1114,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) if (drm_dev_enter(dev, &idx)) { ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma- vm_page_prot, - TTM_BO_VM_NUM_PREFAULT); + TTM_BO_VM_NUM_PREFAULT, true); drm_dev_exit(idx); } else { ret = ttm_bo_vm_dummy_page(vmf, vmf->vma
Re: [PATCH] mm: Remove double faults once write a device pfn
Am 22.01.24 um 04:32 schrieb Xianrong Zhou: The vmf_insert_pfn_prot could cause unnecessary double faults on a device pfn. Because currently the vmf_insert_pfn_prot does not make the pfn writable so the pte entry is normally read-only or dirty catching. What? How do you got to this conclusion? The first fault only sets up the pte entry which actually is dirty catching. And the second immediate fault to the pfn due to first dirty catching when the cpu re-execute the store instruction. It could be that this is done to work around some hw behavior, but not because of dirty catching. Normally if the drivers call vmf_insert_pfn_prot and also supply 'pfn_mkwrite' callback within vm_operations_struct which requires the pte to be dirty catching then the vmf_insert_pfn_prot and the double fault are reasonable. It is not a problem. Well, as far as I can see that behavior absolutely doesn't make sense. When pfn_mkwrite is requested then the driver should use PAGE_COPY, which is exactly what VMWGFX (the only driver using dirty tracking) is doing. Everybody else uses PAGE_SHARED which should make the pte writeable immediately. Regards, Christian. However the most of drivers calling vmf_insert_pfn_prot do not supply the 'pfn_mkwrite' callback so that the second fault is unnecessary. So just like vmf_insert_mixed and vmf_insert_mixed_mkwrite pair, we should also supply vmf_insert_pfn_mkwrite for drivers as well. Signed-off-by: Xianrong Zhou --- arch/x86/entry/vdso/vma.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 2 +- drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 2 +- drivers/gpu/drm/nouveau/nouveau_gem.c | 2 +- drivers/gpu/drm/radeon/radeon_gem.c| 2 +- drivers/gpu/drm/ttm/ttm_bo_vm.c| 8 +--- drivers/gpu/drm/vmwgfx/vmwgfx_page_dirty.c | 8 +--- include/drm/ttm/ttm_bo.h | 3 ++- include/linux/mm.h | 2 +- mm/memory.c| 14 +++--- 10 files changed, 30 insertions(+), 16 deletions(-) diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c index 7645730dc228..dd2431c2975f 100644 --- a/arch/x86/entry/vdso/vma.c +++ b/arch/x86/entry/vdso/vma.c @@ -185,7 +185,8 @@ static vm_fault_t vvar_fault(const struct vm_special_mapping *sm, if (pvti && vclock_was_used(VDSO_CLOCKMODE_PVCLOCK)) { return vmf_insert_pfn_prot(vma, vmf->address, __pa(pvti) >> PAGE_SHIFT, - pgprot_decrypted(vma->vm_page_prot)); + pgprot_decrypted(vma->vm_page_prot), + true); } } else if (sym_offset == image->sym_hvclock_page) { pfn = hv_get_tsc_pfn(); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 49a5f1c73b3e..adcb20d9e624 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -64,7 +64,7 @@ static vm_fault_t amdgpu_gem_fault(struct vm_fault *vmf) } ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot, - TTM_BO_VM_NUM_PREFAULT); + TTM_BO_VM_NUM_PREFAULT, true); drm_dev_exit(idx); } else { diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index 9227f8146a58..c6f13ae6c308 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1114,7 +1114,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) if (drm_dev_enter(dev, &idx)) { ret = ttm_bo_vm_fault_reserved(vmf, vmf->vma->vm_page_prot, - TTM_BO_VM_NUM_PREFAULT); + TTM_BO_VM_NUM_PREFAULT, true); drm_dev_exit(idx); } else { ret = ttm_bo_vm_dummy_page(vmf, vmf->vma->vm_page_prot); diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c b/drivers/gpu/drm/nouveau/nouveau_gem.c index 49c2bcbef129..7e1453762ec9 100644 --- a/drivers/gpu/drm/nouveau/nouveau_gem.c +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c @@ -56,7 +56,7 @@ static vm_fault_t nouveau_ttm_fault(struct vm_fault *vmf) nouveau_bo_del_io_reserve_lru(bo); prot = vm_get_page_prot(vma->vm_flags); - ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT); + ret = ttm_bo_vm_fault_reserved(vmf, prot, TTM_BO_VM_NUM_PREFAULT, true); nouveau_bo_add_io_reserve_lru(bo); if (ret == VM_FAULT_RETRY && !(vmf->flags & FAULT_FLAG_RETRY_NOWAIT)) return ret; diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c index 3fec3acdaf28..b21cf00ae162 100644 --- a/drivers/gpu/drm/rade
Re: Rework TTMs busy handling
Am 12.01.24 um 13:51 schrieb Christian König: Hi guys, just a gentle ping on this. Zack any more comments for the VMWGFX parts? Thanks, Christian. same as the last time. Things I've changed: Implemented the requirements from Zack to correctly fill in the busy placements for VMWGFX. Renamed the placement flags to desired and fallback as suggested by Michel. Rebased on drm-tip instead of drm-misc-next and fixed XE as well. Please review and comment, Christian.
[PATCH 4/5] drm/ttm: improve idle/busy handling v3
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 131 - drivers/gpu/drm/ttm/ttm_resource.c | 16 ++- drivers/gpu/drm/xe/xe_bo.c | 4 +- include/drm/ttm/ttm_bo.h | 3 +- include/drm/ttm/ttm_resource.h | 3 +- 7 files changed, 68 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index b671b0665492..06fb3fc47eaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, -&(*bo_ptr)->tbo.resource, &ctx); +&(*bo_ptr)->tbo.resource, &ctx, false); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8722beba494e..f23cdc7c5b08 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -966,7 +966,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) placements.mem_type = TTM_PL_TT; placements.flags = bo->resource->placement; - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); if (unlikely(r)) return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index a5e11a92e0b9..3783be24d832 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, hop_placement.placement = hop; /* find space in the bounce domain */ - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true); if (ret) return ret; /* move to the bounce domain */ @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, return ttm_bo_pipeline_gutting(bo); } - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true); if (ret) { if (ret != -ERESTARTSYS) { pr_err("Failed to find memory space for buffer 0x%p eviction\n", @@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** * ttm_bo_mem_space * @@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, * @placement: Proposed new placement for the buffer object. * @mem: A struct ttm_resource. * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space
[PATCH 5/5] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 06fb3fc47eaa..2752f2a67a44 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_FALLBACK; c++; } -- 2.34.1
[PATCH 3/5] drm/ttm: replace busy placement with flags v6
From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX v6: implement some missing VMWGFX functionality pointed out by Zack, rename the flags as suggested by Michel, rebase on drm-tip and adjust XE as well Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 +-- drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +-- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 --- drivers/gpu/drm/ttm/ttm_resource.c | 73 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 33 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 -- drivers/gpu/drm/xe/xe_bo.c | 33 +- include/drm/ttm/ttm_placement.h| 10 +-- include/drm/ttm/ttm_resource.h | 8 +-- 19 files changed, 118 insertions(+), 197 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 425cebcc5cbf..b671b0665492 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1397,8 +1394,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_DESIRED; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 75c9fd2c6c2a..8722beba494e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpf
[PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space
Only convert it to ENOMEM in ttm_bo_validate. This allows ttm_bo_validate to distinct between an out of memory situation and just out of space in a placement domain. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret; -- 2.34.1
Rework TTMs busy handling
Hi guys, same as the last time. Things I've changed: Implemented the requirements from Zack to correctly fill in the busy placements for VMWGFX. Renamed the placement flags to desired and fallback as suggested by Michel. Rebased on drm-tip instead of drm-misc-next and fixed XE as well. Please review and comment, Christian.
[PATCH 1/5] drm/vmwgfx: remove vmw_vram_gmr_placement
Seems to be unused. Signed-off-by: Christian König --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h| 1 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 28 -- 2 files changed, 29 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 3cd5090dedfc..12efecc17df6 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -942,7 +942,6 @@ vmw_is_cursor_bypass3_enabled(const struct vmw_private *dev_priv) extern const size_t vmw_tt_size; extern struct ttm_placement vmw_vram_placement; -extern struct ttm_placement vmw_vram_gmr_placement; extern struct ttm_placement vmw_sys_placement; extern struct ttm_device_funcs vmw_bo_driver; extern const struct vmw_sg_table * diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c index af8562c95cc3..a84fffcef8e1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c @@ -43,13 +43,6 @@ static const struct ttm_place sys_placement_flags = { .flags = 0 }; -static const struct ttm_place gmr_placement_flags = { - .fpfn = 0, - .lpfn = 0, - .mem_type = VMW_PL_GMR, - .flags = 0 -}; - struct ttm_placement vmw_vram_placement = { .num_placement = 1, .placement = &vram_placement_flags, @@ -57,27 +50,6 @@ struct ttm_placement vmw_vram_placement = { .busy_placement = &vram_placement_flags }; -static const struct ttm_place vram_gmr_placement_flags[] = { - { - .fpfn = 0, - .lpfn = 0, - .mem_type = TTM_PL_VRAM, - .flags = 0 - }, { - .fpfn = 0, - .lpfn = 0, - .mem_type = VMW_PL_GMR, - .flags = 0 - } -}; - -struct ttm_placement vmw_vram_gmr_placement = { - .num_placement = 2, - .placement = vram_gmr_placement_flags, - .num_busy_placement = 1, - .busy_placement = &gmr_placement_flags -}; - struct ttm_placement vmw_sys_placement = { .num_placement = 1, .placement = &sys_placement_flags, -- 2.34.1
Re: Rework TTMs busy handling
Am 09.01.24 um 09:14 schrieb Thomas Hellström: Hi, Christian On Tue, 2024-01-09 at 08:47 +0100, Christian König wrote: Hi guys, I'm trying to make this functionality a bit more useful for years now since we multiple reports that behavior of drivers can be suboptimal when multiple placements be given. So basically instead of hacking around the TTM behavior in the driver once more I've gone ahead and changed the idle/busy placement list into idle/busy placement flags. This not only saves a bunch of code, but also allows setting some placements as fallback which are used if allocating from the preferred ones didn't worked. Zack pointed out that some removed VMWGFX code was brought back because of rebasing, fixed in this version. Intel CI seems to be happy with those patches, so any more comments? Looks like Xe changes are missing? (xe is now in drm-tip). I also have some doubts about the naming "idle" vs "busy", since an elaborate eviction mechanism would probably at some point want to check for gpu idle vs gpu busy, and this might create some confusion moving forward for people confusing busy as in memory overcommit with busy as in gpu activity. I can't immediately think of something better, though. Yeah, I was wondering about that as well. Especially since I wanted to add some more flags in the future when for example a bandwidth quota how much memory can be moved in/out is exceeded. Something like phase1, phase2, phase3 etc..., but that's also not very descriptive either. Going to take a look at XE as well, thanks for the notice. Regards, Christian. /Thomas Regards, Christian.
[PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space
Only convert it to ENOMEM in ttm_bo_validate. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret; -- 2.34.1
[PATCH 5/5] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index f110dfdc4feb..979cecf18f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_BUSY; c++; } -- 2.34.1
[PATCH 3/5] drm/ttm: replace busy placement with flags v5
From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 +-- drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +-- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 --- drivers/gpu/drm/ttm/ttm_resource.c | 73 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 2 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 -- include/drm/ttm/ttm_placement.h| 10 +-- include/drm/ttm/ttm_resource.h | 8 +-- 18 files changed, 79 insertions(+), 172 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cef920a93924..aa0dd6dad068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 05991c5c8ddb..9a6a00b1af40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0; - abo->placement.busy_placement = &abo->placements[1]; - abo->placement.num_busy_placement = 1; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE;
[PATCH 4/5] drm/ttm: improve idle/busy handling v2
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 131 - drivers/gpu/drm/ttm/ttm_resource.c | 15 ++- include/drm/ttm/ttm_bo.h | 3 +- include/drm/ttm/ttm_resource.h | 3 +- 6 files changed, 65 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index aa0dd6dad068..f110dfdc4feb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, -&(*bo_ptr)->tbo.resource, &ctx); +&(*bo_ptr)->tbo.resource, &ctx, false); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 9a6a00b1af40..00da9a81cf6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) placements.mem_type = TTM_PL_TT; placements.flags = bo->resource->placement; - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); if (unlikely(r)) return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index aa12bd5cfd17..17bfc252f76d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, hop_placement.placement = hop; /* find space in the bounce domain */ - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true); if (ret) return ret; /* move to the bounce domain */ @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, return ttm_bo_pipeline_gutting(bo); } - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true); if (ret) { if (ret != -ERESTARTSYS) { pr_err("Failed to find memory space for buffer 0x%p eviction\n", @@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** * ttm_bo_mem_space * @@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, * @placement: Proposed new placement for the buffer object. * @mem: A struct ttm_resource. * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space * * Allocate memory space for the b
[PATCH 1/5] drm/vmwgfx: remove vmw_vram_gmr_placement
Seems to be unused. Signed-off-by: Christian König --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h| 1 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 28 -- 2 files changed, 29 deletions(-) diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index 3cd5090dedfc..12efecc17df6 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -942,7 +942,6 @@ vmw_is_cursor_bypass3_enabled(const struct vmw_private *dev_priv) extern const size_t vmw_tt_size; extern struct ttm_placement vmw_vram_placement; -extern struct ttm_placement vmw_vram_gmr_placement; extern struct ttm_placement vmw_sys_placement; extern struct ttm_device_funcs vmw_bo_driver; extern const struct vmw_sg_table * diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c index af8562c95cc3..a84fffcef8e1 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c @@ -43,13 +43,6 @@ static const struct ttm_place sys_placement_flags = { .flags = 0 }; -static const struct ttm_place gmr_placement_flags = { - .fpfn = 0, - .lpfn = 0, - .mem_type = VMW_PL_GMR, - .flags = 0 -}; - struct ttm_placement vmw_vram_placement = { .num_placement = 1, .placement = &vram_placement_flags, @@ -57,27 +50,6 @@ struct ttm_placement vmw_vram_placement = { .busy_placement = &vram_placement_flags }; -static const struct ttm_place vram_gmr_placement_flags[] = { - { - .fpfn = 0, - .lpfn = 0, - .mem_type = TTM_PL_VRAM, - .flags = 0 - }, { - .fpfn = 0, - .lpfn = 0, - .mem_type = VMW_PL_GMR, - .flags = 0 - } -}; - -struct ttm_placement vmw_vram_gmr_placement = { - .num_placement = 2, - .placement = vram_gmr_placement_flags, - .num_busy_placement = 1, - .busy_placement = &gmr_placement_flags -}; - struct ttm_placement vmw_sys_placement = { .num_placement = 1, .placement = &sys_placement_flags, -- 2.34.1
Rework TTMs busy handling
Hi guys, I'm trying to make this functionality a bit more useful for years now since we multiple reports that behavior of drivers can be suboptimal when multiple placements be given. So basically instead of hacking around the TTM behavior in the driver once more I've gone ahead and changed the idle/busy placement list into idle/busy placement flags. This not only saves a bunch of code, but also allows setting some placements as fallback which are used if allocating from the preferred ones didn't worked. Zack pointed out that some removed VMWGFX code was brought back because of rebasing, fixed in this version. Intel CI seems to be happy with those patches, so any more comments? Regards, Christian.
Re: [PATCH 2/4] drm/ttm: replace busy placement with flags v4
Am 04.01.24 um 21:02 schrieb Zack Rusin: On Thu, Jan 4, 2024 at 10:05 AM Christian König wrote: From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 + drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 73 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 2 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +- include/drm/ttm/ttm_placement.h| 10 ++- include/drm/ttm/ttm_resource.h | 8 +- 18 files changed, 159 insertions(+), 187 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cef920a93924..aa0dd6dad068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 05991c5c8ddb..9a6a00b1af40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0; - abo->placement.busy_placement = &abo->placements[1]; - abo->placement.n
[PATCH 2/4] drm/ttm: replace busy placement with flags v4
From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 + drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 73 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 2 - drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +- include/drm/ttm/ttm_placement.h| 10 ++- include/drm/ttm/ttm_resource.h | 8 +- 18 files changed, 159 insertions(+), 187 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cef920a93924..aa0dd6dad068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 05991c5c8ddb..9a6a00b1af40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0; - abo->placement.busy_placement = &abo->placements[1]; - abo->placement.num_busy_placement = 1; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; } else { /*
[PATCH 4/4] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index f110dfdc4feb..979cecf18f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_BUSY; c++; } -- 2.34.1
[PATCH 3/4] drm/ttm: improve idle/busy handling v2
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 131 - drivers/gpu/drm/ttm/ttm_resource.c | 15 ++- include/drm/ttm/ttm_bo.h | 3 +- include/drm/ttm/ttm_resource.h | 3 +- 6 files changed, 65 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index aa0dd6dad068..f110dfdc4feb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, -&(*bo_ptr)->tbo.resource, &ctx); +&(*bo_ptr)->tbo.resource, &ctx, false); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 9a6a00b1af40..00da9a81cf6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) placements.mem_type = TTM_PL_TT; placements.flags = bo->resource->placement; - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); if (unlikely(r)) return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index aa12bd5cfd17..17bfc252f76d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, hop_placement.placement = hop; /* find space in the bounce domain */ - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true); if (ret) return ret; /* move to the bounce domain */ @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, return ttm_bo_pipeline_gutting(bo); } - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true); if (ret) { if (ret != -ERESTARTSYS) { pr_err("Failed to find memory space for buffer 0x%p eviction\n", @@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** * ttm_bo_mem_space * @@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, * @placement: Proposed new placement for the buffer object. * @mem: A struct ttm_resource. * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space * * Allocate memory space for the b
[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space
Only convert it to ENOMEM in ttm_bo_validate. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret; -- 2.34.1
Rework TTMs busy handling
Hi guys, I'm trying to make this functionality a bit more useful for years now since we multiple reports that behavior of drivers can be suboptimal when multiple placements be given. So basically instead of hacking around the TTM behavior in the driver once more I've gone ahead and changed the idle/busy placement list into idle/busy placement flags. This not only saves a bunch of code, but also allows setting some placements as fallback which are used if allocating from the preferred ones didn't worked. Intel CI seems to be happy with those patches, so any more comments? Regards, Christian.
[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space
Only convert it to ENOMEM in ttm_bo_validate. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret; -- 2.34.1
Re: [PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space
Before anybody wonders why no additional people are on CC: I just send that out to get feedback from the CI systems. Regards, Christian. Am 13.12.23 um 15:42 schrieb Christian König: Only convert it to ENOMEM in ttm_bo_validate. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret;
[PATCH 4/4] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index f110dfdc4feb..979cecf18f17 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_BUSY; c++; } -- 2.34.1
[PATCH 1/4] drm/ttm: return ENOSPC from ttm_bo_mem_space
Only convert it to ENOMEM in ttm_bo_validate. Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret; -- 2.34.1
[PATCH 3/4] drm/ttm: improve idle/busy handling
Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 131 - drivers/gpu/drm/ttm/ttm_resource.c | 14 ++- include/drm/ttm/ttm_bo.h | 3 +- include/drm/ttm/ttm_resource.h | 3 +- 6 files changed, 64 insertions(+), 91 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index aa0dd6dad068..f110dfdc4feb 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, -&(*bo_ptr)->tbo.resource, &ctx); +&(*bo_ptr)->tbo.resource, &ctx, false); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 9a6a00b1af40..00da9a81cf6c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -967,7 +967,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) placements.mem_type = TTM_PL_TT; placements.flags = bo->resource->placement; - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); if (unlikely(r)) return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index aa12bd5cfd17..17bfc252f76d 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, hop_placement.placement = hop; /* find space in the bounce domain */ - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true); if (ret) return ret; /* move to the bounce domain */ @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, return ttm_bo_pipeline_gutting(bo); } - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true); if (ret) { if (ret != -ERESTARTSYS) { pr_err("Failed to find memory space for buffer 0x%p eviction\n", @@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct ttm_buffer_object *bo, return ret; } -/* - * Repeatedly evict memory from the LRU for @mem_type until we create enough - * space, or we've evicted everything and there isn't enough space. - */ -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, - const struct ttm_place *place, - struct ttm_resource **mem, - struct ttm_operation_ctx *ctx) -{ - struct ttm_device *bdev = bo->bdev; - struct ttm_resource_manager *man; - struct ww_acquire_ctx *ticket; - int ret; - - man = ttm_manager_type(bdev, place->mem_type); - ticket = dma_resv_locking_ctx(bo->base.resv); - do { - ret = ttm_resource_alloc(bo, place, mem); - if (likely(!ret)) - break; - if (unlikely(ret != -ENOSPC)) - return ret; - ret = ttm_mem_evict_first(bdev, man, place, ctx, - ticket); - if (unlikely(ret != 0)) - return ret; - } while (1); - - return ttm_bo_add_move_fence(bo, man, *mem, ctx->no_wait_gpu); -} - /** * ttm_bo_mem_space * @@ -763,6 +732,7 @@ static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, * @placement: Proposed new placement for the buffer object. * @mem: A struct ttm_resource. * @ctx: if and how to sleep, lock buffers and alloc memory + * @force_space: If we should evict buffers to force space * * Allocate memory space for the buffer object pointed to by @bo, using * th
[PATCH 2/4] drm/ttm: replace busy placement with flags v3
From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 + drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 73 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +- include/drm/ttm/ttm_placement.h| 10 ++- include/drm/ttm/ttm_resource.h | 8 +- 18 files changed, 160 insertions(+), 187 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cef920a93924..aa0dd6dad068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 05991c5c8ddb..9a6a00b1af40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0; - abo->placement.busy_placement = &abo->placements[1]; - abo->placement.num_busy_placement = 1; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; } else { /* Move to GTT memory */ a
Re: [RFC] drm/i915: Allow dmabuf mmap forwarding
Am 13.12.23 um 12:46 schrieb Tvrtko Ursulin: Hi, On 12/12/2023 14:10, Christian König wrote: Hi Tvrtko, Thanks for pointing this mail out once more, I've totally missed it. That's okay, if it was really urgent I would have re-raised the thread earlier. :) As it stands so far it is only about acceptance test suites failing and no known real use cases affected. Am 12.12.23 um 11:37 schrieb Tvrtko Ursulin: On 25/09/2023 14:16, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Allow mmap forwarding for imported buffers in order to allow minigbm mmap to work on aperture-less platforms such as Meteorlake. So far i915 did not allow mmap on imported buffers but from minigbm perspective that worked because of the DRM_IOCTL_I915_GEM_MMAP_GTT fall- back would then be attempted, and would be successful. This stops working on Meteorlake since there is no aperture. Allow i915 to mmap imported buffers using forwarding via dma_buf_mmap(), which allows the primary minigbm path of DRM_IOCTL_I915_GEM_MMAP_OFFSET / I915_MMAP_OFFSET_WB to work. Signed-off-by: Tvrtko Ursulin Cc: Daniel Vetter Cc: Christian König Cc: Matthew Auld Cc: Nirmoy Das --- 1) It is unclear to me if any real userspace depends on this, but there are certainly compliance suites which fail. Well that is actually intentional, but see below. 2) It is also a bit unclear to me if dma_buf_mmap() is exactly intended for this kind of use. It seems that it is, but I also found some old mailing list discussions suggesting there might be some unresolved questions around VMA revocation. I actually solved those a few years back by introducing the vma_set_file() function which standardized the dance necessary for the dma_buf_mmap() function. 1 + 2 = RFC for now. Daniel and Christian were involved in 2) in the past so comments would be appreciated. Any comments on this one? I don't have all the historical knowledge of when this was maybe attempted before and what problems were hit, or something. So would there be downsides or it is fine to forward it. It works technically inside the kernel and Thomas Zimmerman suggested a patch set which made it possible to use for all DRM drivers. But IIRC this patch set was rejected with the rational that while doing an mmap() on an imported DMA-buf works when userspace actually does this then there is a bug in userspace. The UMD doesn't seems to be aware of the fact that the buffer is imported and so for example needs to call dma_buf_begin_cpu_access() and dma_buf_end_cpu_access(). UMDs can trivially work around this by doing the mmap() on the DMA-buf file descriptor instead (potentially after re-exporting it), but the kernel really shouldn't help hide userspace bugs. Hm right, however why does drm_gem_shmem_mmap: if (obj->import_attach) { ret = dma_buf_mmap(obj->dma_buf, vma, 0); Honestly I have absolutely no idea. Isn't that allowing drivers which use the helper to to forward to dma_buf_mmap? Yes, Daniel mentioned that some drivers did this before we found that it's actually not a good idea. It could be that this code piece was meant with that and we only allow it to avoid breaking UAPI. Never the less I think we should add documentation for this. Maybe I am getting lost in the forest of callbacks in this area.. Because it is supposed to be about shmem objects, but drivers which use the helper and rely on common prime import look and also use drm_gem_shmem_prime_import_sg_table can get there. I don't fully understand it either of hand. Regards, Christian. Regards, Tvrtko Test-with: 20230925131539.32743-1-tvrtko.ursu...@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 78 +++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 + 2 files changed, 65 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index aa4d842d4c5a..78c84c0a8b08 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include @@ -664,6 +665,7 @@ insert_mmo(struct drm_i915_gem_object *obj, struct i915_mmap_offset *mmo) static struct i915_mmap_offset * mmap_offset_attach(struct drm_i915_gem_object *obj, enum i915_mmap_type mmap_type, + bool forward_mmap, struct drm_file *file) { struct drm_i915_private *i915 = to_i915(obj->base.dev); @@ -682,6 +684,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, mmo->obj = obj; mmo->mmap_type = mmap_type; + mmo->forward_mmap = forward_mmap; drm_vma_node_reset(&mmo->vma_node); err = drm_vma_offset_add(obj->base.dev->vma_offset_manager, @@ -714,12 +717,25 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, return ERR_PTR(err);
Re: [RFC] drm/i915: Allow dmabuf mmap forwarding
Hi Tvrtko, Thanks for pointing this mail out once more, I've totally missed it. Am 12.12.23 um 11:37 schrieb Tvrtko Ursulin: On 25/09/2023 14:16, Tvrtko Ursulin wrote: From: Tvrtko Ursulin Allow mmap forwarding for imported buffers in order to allow minigbm mmap to work on aperture-less platforms such as Meteorlake. So far i915 did not allow mmap on imported buffers but from minigbm perspective that worked because of the DRM_IOCTL_I915_GEM_MMAP_GTT fall- back would then be attempted, and would be successful. This stops working on Meteorlake since there is no aperture. Allow i915 to mmap imported buffers using forwarding via dma_buf_mmap(), which allows the primary minigbm path of DRM_IOCTL_I915_GEM_MMAP_OFFSET / I915_MMAP_OFFSET_WB to work. Signed-off-by: Tvrtko Ursulin Cc: Daniel Vetter Cc: Christian König Cc: Matthew Auld Cc: Nirmoy Das --- 1) It is unclear to me if any real userspace depends on this, but there are certainly compliance suites which fail. Well that is actually intentional, but see below. 2) It is also a bit unclear to me if dma_buf_mmap() is exactly intended for this kind of use. It seems that it is, but I also found some old mailing list discussions suggesting there might be some unresolved questions around VMA revocation. I actually solved those a few years back by introducing the vma_set_file() function which standardized the dance necessary for the dma_buf_mmap() function. 1 + 2 = RFC for now. Daniel and Christian were involved in 2) in the past so comments would be appreciated. Any comments on this one? I don't have all the historical knowledge of when this was maybe attempted before and what problems were hit, or something. So would there be downsides or it is fine to forward it. It works technically inside the kernel and Thomas Zimmerman suggested a patch set which made it possible to use for all DRM drivers. But IIRC this patch set was rejected with the rational that while doing an mmap() on an imported DMA-buf works when userspace actually does this then there is a bug in userspace. The UMD doesn't seems to be aware of the fact that the buffer is imported and so for example needs to call dma_buf_begin_cpu_access() and dma_buf_end_cpu_access(). UMDs can trivially work around this by doing the mmap() on the DMA-buf file descriptor instead (potentially after re-exporting it), but the kernel really shouldn't help hide userspace bugs. Regards, Christian. Regards, Tvrtko Test-with: 20230925131539.32743-1-tvrtko.ursu...@linux.intel.com --- drivers/gpu/drm/i915/gem/i915_gem_mman.c | 78 +++ .../gpu/drm/i915/gem/i915_gem_object_types.h | 1 + 2 files changed, 65 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c index aa4d842d4c5a..78c84c0a8b08 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c @@ -5,6 +5,7 @@ */ #include +#include #include #include #include @@ -664,6 +665,7 @@ insert_mmo(struct drm_i915_gem_object *obj, struct i915_mmap_offset *mmo) static struct i915_mmap_offset * mmap_offset_attach(struct drm_i915_gem_object *obj, enum i915_mmap_type mmap_type, + bool forward_mmap, struct drm_file *file) { struct drm_i915_private *i915 = to_i915(obj->base.dev); @@ -682,6 +684,7 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, mmo->obj = obj; mmo->mmap_type = mmap_type; + mmo->forward_mmap = forward_mmap; drm_vma_node_reset(&mmo->vma_node); err = drm_vma_offset_add(obj->base.dev->vma_offset_manager, @@ -714,12 +717,25 @@ mmap_offset_attach(struct drm_i915_gem_object *obj, return ERR_PTR(err); } +static bool +should_forward_mmap(struct drm_i915_gem_object *obj, + enum i915_mmap_type mmap_type) +{ + if (!obj->base.import_attach) + return false; + + return mmap_type == I915_MMAP_TYPE_WB || + mmap_type == I915_MMAP_TYPE_WC || + mmap_type == I915_MMAP_TYPE_UC; +} + static int __assign_mmap_offset(struct drm_i915_gem_object *obj, enum i915_mmap_type mmap_type, u64 *offset, struct drm_file *file) { struct i915_mmap_offset *mmo; + bool should_forward; if (i915_gem_object_never_mmap(obj)) return -ENODEV; @@ -735,12 +751,15 @@ __assign_mmap_offset(struct drm_i915_gem_object *obj, if (mmap_type == I915_MMAP_TYPE_FIXED) return -ENODEV; + should_forward = should_forward_mmap(obj, mmap_type); + if (mmap_type != I915_MMAP_TYPE_GTT && !i915_gem_object_has_struct_page(obj) && - !i915_gem_object_has_iomem(obj)) + !i915_gem_object_has_iomem(obj) && + !should_forward) return -ENODEV; - mmo = mmap_offset_attach(obj, mmap_t
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Am 04.12.23 um 00:32 schrieb Alistair Popple: Christian König writes: Am 01.12.23 um 06:48 schrieb Zeng, Oak: [SNIP] Besides memory eviction/oversubscription, there are a few other pain points when I use hmm: 1) hmm doesn't support file-back memory, so it is hard to share memory b/t process in a gpu environment. You mentioned you have a plan... How hard is it to support file-backed in your approach? As hard as it is to support it through HMM. That's what I meant that this approach doesn't integrate well, as far as I know the problem isn't inside HMM or any other solution but rather in the file system layer. In what way does HMM not support file-backed memory? I was under the impression that at least hmm_range_fault() does. Oh, well file-backed memory is indeed supported by HMM. IIRC KFD actually allows this for the SVM implementation. It's just that the way the file system layer (for example) does writeback absolutely doesn't fit well with how GPUs and other acceleration devices work. The general assumption in the kernel seems to be that page faults and preemption are extremely cheap. So things like copy on write is used quite extensively. For a CPU this basically means you just need to context change into the kernel once to get the new address of a page into your PTEs on write, while for acceleration devices this always require a complete CPU round trip for each initial write access for a 4k page. The performance impact is just horrible. Regards, Christian. - Alistair Regards, Christian. 2)virtual address range based memory attribute/hint: with hmadvise, where do you save the memory attribute of a virtual address range? Do you need to extend vm_area_struct to save it? With hmm, we have to maintain such information at driver. This ends up with pretty complicated logic to split/merge those address range. I know core mm has similar logic to split/merge vma... Oak -Weixi -----Original Message- From: Christian König Sent: Thursday, November 30, 2023 4:28 PM To: Zeng, Oak; Christian König ; zhuweixi; linux- m...@kvack.org;linux-ker...@vger.kernel.org;a...@linux-foundation.org; Danilo Krummrich; Dave Airlie; Daniel Vetter Cc:intel-gvt-...@lists.freedesktop.org;rcampb...@nvidia.com; mhairgr...@nvidia.com;j...@nvidia.com;weixi@openeuler.sh; jhubb...@nvidia.com;intel-gfx@lists.freedesktop.org;apop...@nvidia.com; xinhui@amd.com;amd-...@lists.freedesktop.org; tvrtko.ursu...@linux.intel.com;ogab...@kernel.org;jgli...@redhat.com; dri- de...@lists.freedesktop.org;z...@nvidia.com; Vivi, Rodrigo ;alexander.deuc...@amd.com;leo...@nvidia.com; felix.kuehl...@amd.com; Wang, Zhi A; mgor...@suse.de Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices Hi Oak, yeah, #4 is indeed a really good point and I think Felix will agree to that as well. HMM is basically still missing a way to advise device attributes for the CPU address space. Both migration strategy as well as device specific information (like cache preferences) fall into this category. Since there is a device specific component in those attributes as well I think device specific IOCTLs still make sense to update them, but HMM should offer the functionality to manage and store those information. Split and merge of VMAs only become a problem if you attach those information to VMAs, if you keep them completely separate than that doesn't become an issue either. The down side of this approach is that you don't get automatically extending attribute ranges for growing VMAs for example. Regards, Christian. Am 29.11.23 um 23:23 schrieb Zeng, Oak: Hi Weixi, Even though Christian has listed reasons rejecting this proposal (yes they are very reasonable to me), I would open my mind and further explore the possibility here. Since the current GPU driver uses a hmm based implementation (AMD and NV has done this; At Intel we are catching up), I want to explore how much we can benefit from the proposed approach and how your approach can solve some pain points of our development. So basically what I am questioning here is: what is the advantage of your approach against hmm. To implement a UVM (unified virtual address space b/t cpu and gpu device), with hmm, driver essentially need to implement below functions: 1. device page table update. Your approach requires the same because this is device specific codes 2. Some migration functions to migrate memory b/t system memory and GPU local memory. My understanding is, even though you generalized this a bit, such as modified cpu page fault path, provided "general" gm_dev_fault handler... but device driver still need to provide migration functions because migration functions have to be device specific (i.e., using device dma/copy engine for performance purpose). Right? 3. GPU physical memory management, this part is now in drm/buddy, shared by all drivers. I thin
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Am 01.12.23 um 06:48 schrieb Zeng, Oak: [SNIP] 3. MMU notifiers register hooks at certain core MM events, while GMEM declares basic functions and internally invokes them. GMEM requires less from the driver side -- no need to understand what core MM behaves at certain MMU events. GMEM also expects fewer bugs than MMU notifiers: implementing basic operations with standard declarations vs. implementing whatever random device MM logic in MMU notifiers. This seems true to me. I feel the mmu notifier thing, especially the synchronization/lock design (those sequence numbers, interacting with driver lock, and the mmap lock) are very complicated. I indeed spent time to understand the specification documented in hmm.rst... Your approach seems better. I have to agree on that as well. HMM/MMU notifiers are developed with exposing MM functionality in mind instead of trying to fulfill driver requirements. But this originated not in HMM/MMU notifiers, rather it was a requirement to not change the CPU side of the MM code to much. So when you can get the acknowledgement to make changes to the CPU side of the MM code to better handle device driver requirements then I'm totally in favor of this. It's just that I don't think the approach of starting with a new framework/idea will help with that. Instead rather try to improve the existing functionality. 5. GMEM has been demonstrated to allow device memory oversubscription (a GMEM-based 32GB NPU card can run a GPT model oversubscribing 500GB host DDR), while drivers using HMM/MMU notifier must implement this logic one by one. I will submit this part in a future RFC patch. When device memory is oversubscribed, do you call a driver callback function to evict device memory to system memory? Or just cpu copy? Copy with device's fast copy engine is faster. I can see even though with both approach we need to implement a driver copy function, with your approach, the driver logic can be simplified. With today's drm/ttm, I do see the logic in the memory eviction area is very complicated. Those eviction fence (some call it suspend fence), dma-fence enable signallingvery complicated to me. Essentially evict device memory to system memory is nothing different from evict system memory to disk... so if your approach can leverage some linux core mm eviction logic, I do see it can simplify things here... We actually already do this in TTM as well through the MM shrinkers. It's just that it's an intentional design decision to make the whole thing asynchronously using dma_fence etc... That's why you have this complexity in there. I want to reiterate that GMEM's shared address space support is a bonus result, not a main contribution... It was done because it was not difficult to implement internal CPU-device coordination mechanism when core MM is extended by GMEM to support devices. Besides memory eviction/oversubscription, there are a few other pain points when I use hmm: 1) hmm doesn't support file-back memory, so it is hard to share memory b/t process in a gpu environment. You mentioned you have a plan... How hard is it to support file-backed in your approach? As hard as it is to support it through HMM. That's what I meant that this approach doesn't integrate well, as far as I know the problem isn't inside HMM or any other solution but rather in the file system layer. Regards, Christian. 2)virtual address range based memory attribute/hint: with hmadvise, where do you save the memory attribute of a virtual address range? Do you need to extend vm_area_struct to save it? With hmm, we have to maintain such information at driver. This ends up with pretty complicated logic to split/merge those address range. I know core mm has similar logic to split/merge vma... Oak -Weixi -Original Message- From: Christian König Sent: Thursday, November 30, 2023 4:28 PM To: Zeng, Oak; Christian König ; zhuweixi; linux- m...@kvack.org;linux-ker...@vger.kernel.org;a...@linux-foundation.org; Danilo Krummrich; Dave Airlie; Daniel Vetter Cc:intel-gvt-...@lists.freedesktop.org;rcampb...@nvidia.com; mhairgr...@nvidia.com;j...@nvidia.com;weixi@openeuler.sh; jhubb...@nvidia.com;intel-gfx@lists.freedesktop.org;apop...@nvidia.com; xinhui@amd.com;amd-...@lists.freedesktop.org; tvrtko.ursu...@linux.intel.com;ogab...@kernel.org;jgli...@redhat.com; dri- de...@lists.freedesktop.org;z...@nvidia.com; Vivi, Rodrigo ;alexander.deuc...@amd.com;leo...@nvidia.com; felix.kuehl...@amd.com; Wang, Zhi A; mgor...@suse.de Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices Hi Oak, yeah, #4 is indeed a really good point and I think Felix will agree to that as well. HMM is basically still missing a way to advise device attributes for the CPU address space. Both migration strategy as well as device specific information (like cac
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
in Linux..." To be honest, not using a logical page table for anonymous memory is why Linux THP fails compared with FreeBSD's superpage, but I am not going to elaborate it here. But yes, and I am looking for merging struct vm_object->logical_page_table with struct address_space->i_pages. This will make a natural support for devices oversubscribing both host DRAM and disks. As explained in my cover letter, struct vm_object borrows FreeBSD's VM design -- it provides a unified abstraction layer for anonymous, file-backed memory and etc. I'm not that deep into this stuff, so leaving this to the experts on FreeBSD. 3. "Requirements to CPU address space management and device address space management are just massively different. For example huge and giant pages are a must have for modern devices..." I think you are asking two questions. First, is VA space a problem? No, this is about something completely different. GMEM assumes that device VA space should be covered by CPU VA space (sorry i386), ... [SNIP] I'm removing this because you were talking about something different than what I meant. I will try to explain the background on an example outside of machine learning and compute since this framework should be applicable to every use case and not be limited to those. Otherwise Linux would sooner or later just be applicable to only those use cases. So let's take a look at how modern games use a GPU for example. On startup a rather large part of the GPU address space is allocated, for example 64GiB. Then the necessary resources (images, texture, vertices, shaders etc..) are loaded into separate buffer objects. Those resources are then mapped into the allocated address on a page by page basis. So you basically don't have large VMAs which cover one resource, but rather the page tables are used as a remapping table into the available resources. This increases the number of virtual mappings drastically, it's kind of comparable how an anon_vma works inside a VMA on Linux. Those mappings also are not setup at start and then used throughout the whole lifetime of the process, but rather done very dynamically sometimes resulting in thousands of mapping operations per second. Additional to that devices have page table feature which CPUs don't have. This ranges from support for partial resident texture over flags how caching and dynamic color space compression is made. So the mappings contain tons of device specific information and it's most likely not even possible to handle all of this with a device independent mmap() call. 4. "The argument that a shared memory management leads to less bugs has also absolutely not be proven true. Instead we literally spend month if not years hunting down bugs which resulted from interaction between CPU and devices." This is another case supporting GMEM. Don't developers want to let GMEM handle the CPU-device interaction so that they can waive months of debugging cost? No, we already have HMM for that. Regards, Christian. PS, hmadvise() is based on the idea of Nvidia's cudaMemAdvise() which provides abundant and useful memory policies. HMM extended mbind() instead. -Weixi -Original Message- From: Christian König Sent: Wednesday, November 29, 2023 11:22 PM To: zhuweixi ; Dave Airlie Cc: linux...@kvack.org; linux-ker...@vger.kernel.org; a...@linux-foundation.org; weixi@openeuler.sh; mgor...@suse.de; jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; zhi.a.w...@intel.com; intel-gvt-...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; tvrtko.ursu...@linux.intel.com Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices Am 29.11.23 um 09:27 schrieb zhuweixi: Glad to hear that more sharable code is desirable. IMHO, for a common MM subsystem, it is more beneficial for GMEM to extend core MM instead of building a separate one. As stated in the beginning of my RFC letter, MM systems are large and similar. Even a sophisticated one like Linux MM that has evolved over decades still suffers from an increasing number of bugs[1]. So, directly extending core MM to support devices not only avoids opening a new box of bugs, but also allows the community to concentrate on maintaining one single MM system. On the other side, GMEM does no hurt to core MM If a CPU process is not attached with device contexts. @Christian, could you provide more information on what AMD proposed with KFD and why it was r
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Hi Oak, yeah, #4 is indeed a really good point and I think Felix will agree to that as well. HMM is basically still missing a way to advise device attributes for the CPU address space. Both migration strategy as well as device specific information (like cache preferences) fall into this category. Since there is a device specific component in those attributes as well I think device specific IOCTLs still make sense to update them, but HMM should offer the functionality to manage and store those information. Split and merge of VMAs only become a problem if you attach those information to VMAs, if you keep them completely separate than that doesn't become an issue either. The down side of this approach is that you don't get automatically extending attribute ranges for growing VMAs for example. Regards, Christian. Am 29.11.23 um 23:23 schrieb Zeng, Oak: Hi Weixi, Even though Christian has listed reasons rejecting this proposal (yes they are very reasonable to me), I would open my mind and further explore the possibility here. Since the current GPU driver uses a hmm based implementation (AMD and NV has done this; At Intel we are catching up), I want to explore how much we can benefit from the proposed approach and how your approach can solve some pain points of our development. So basically what I am questioning here is: what is the advantage of your approach against hmm. To implement a UVM (unified virtual address space b/t cpu and gpu device), with hmm, driver essentially need to implement below functions: 1. device page table update. Your approach requires the same because this is device specific codes 2. Some migration functions to migrate memory b/t system memory and GPU local memory. My understanding is, even though you generalized this a bit, such as modified cpu page fault path, provided "general" gm_dev_fault handler... but device driver still need to provide migration functions because migration functions have to be device specific (i.e., using device dma/copy engine for performance purpose). Right? 3. GPU physical memory management, this part is now in drm/buddy, shared by all drivers. I think with your approach, driver still need to provide callback functions to allocate/free physical pages. Right? Or do you let linux core mm buddy manage device memory directly? 4. madvise/hints/virtual address range management. This has been pain point for us. Right now device driver has to maintain certain virtual address range data structure to maintain hints and other virtual address range based memory attributes. Driver need to sync with linux vma. Driver need to explicitly deal with range split/merging... HMM doesn't provide support in this area. Your approach seems cleaner/simpler to me... So in above, I have examined the some key factors of a gpu UVM memory manager. I think for #1 and #2, hmm has provide pretty good abstraction/tools for address space mirroring and migration helpers. For #3, since we have a common drm/buddy layer, I don't think it is a big problem for driver writer now. I do see #4 is something you solved more beautifully, requires new system call though. Oak -Original Message- From: dri-devel On Behalf Of Christian König Sent: Tuesday, November 28, 2023 8:09 AM To: Weixi Zhu ; linux...@kvack.org; linux- ker...@vger.kernel.org; a...@linux-foundation.org; Danilo Krummrich ; Dave Airlie ; Daniel Vetter Cc: dri-de...@lists.freedesktop.org; leo...@nvidia.com; apop...@nvidia.com; amd-...@lists.freedesktop.org; mgor...@suse.de; z...@nvidia.com; Wang, Zhi A ; rcampb...@nvidia.com; j...@nvidia.com; weixi@openeuler.sh; jhubb...@nvidia.com; intel-gfx@lists.freedesktop.org; mhairgr...@nvidia.com; jgli...@redhat.com; Vivi, Rodrigo ; intel-gvt-...@lists.freedesktop.org; tvrtko.ursu...@linux.intel.com; felix.kuehl...@amd.com; xinhui@amd.com; alexander.deuc...@amd.com; ogab...@kernel.org Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices Adding a few missing important people to the explicit to list. Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
tly to the CPU address space. Regards, Christian. [1] Huang, Jian, Moinuddin K. Qureshi, and Karsten Schwan. "An evolutionary study of linux memory management for fun and profit." 2016 USENIX Annual Technical Conference (USENIX ATC 16). 2016. Thanks, Weixi -Original Message- From: Dave Airlie Sent: Wednesday, November 29, 2023 1:15 PM To: Christian König Cc: zhuweixi ; linux...@kvack.org; linux-ker...@vger.kernel.org; a...@linux-foundation.org; weixi@openeuler.sh; mgor...@suse.de; jgli...@redhat.com; rcampb...@nvidia.com; jhubb...@nvidia.com; apop...@nvidia.com; mhairgr...@nvidia.com; z...@nvidia.com; alexander.deuc...@amd.com; xinhui@amd.com; amd-...@lists.freedesktop.org; felix.kuehl...@amd.com; ogab...@kernel.org; dri-de...@lists.freedesktop.org; j...@nvidia.com; leo...@nvidia.com; zhen...@linux.intel.com; zhi.a.w...@intel.com; intel-gvt-...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; jani.nik...@linux.intel.com; joonas.lahti...@linux.intel.com; rodrigo.v...@intel.com; tvrtko.ursu...@linux.intel.com Subject: Re: [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices On Tue, 28 Nov 2023 at 23:07, Christian König wrote: Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). Well that is pretty much exactly what AMD has already proposed with KFD and was rejected for rather good reasons. MMU functions The MMU functions peer_map() and peer_unmap() overlap other functions, leav
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Adding a few missing important people to the explicit to list. Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). 2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily mean that a zero-filled physical page should be mapped. The virtual page may have been mapped to an external memory device. 3. Unmapping a page may include sending device TLB invalidation (even if its MMU shares CPU page table) and manipulating device PTEs. Semantics of new syscalls: 1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED) Allocate virtual address that is shared between the CPU and all attached devices. Data is guaranteed to be coherent whenever the address is accessed by either CPU or any attached device. If the device does not support page fault, then device driver is responsible for faulting memory before data gets accessed. By default, the CPU DRAM is can be used as a swap backup for the device local memory. 2. hmadvise(NUMA_id, va_start, size, memory_hint) Issuing memory hint for a given VMA. This extends traditional madvise() syscall with an extra argument so that programmers have better control with heterogeneous devices registered as NUMA nodes. One useful memory hint could be MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to NUMA node #id. Another useful memory hint is MADV_DONTNEED. This is helpful to increase device memory utilization. It
Re: [Intel-gfx] [RFC PATCH 0/6] Supporting GMEM (generalized memory management) for external memory devices
Am 28.11.23 um 13:50 schrieb Weixi Zhu: The problem: Accelerator driver developers are forced to reinvent external MM subsystems case by case, because Linux core MM only considers host memory resources. These reinvented MM subsystems have similar orders of magnitude of LoC as Linux MM (80K), e.g. Nvidia-UVM has 70K, AMD GPU has 14K and Huawei NPU has 30K. Meanwhile, more and more vendors are implementing their own accelerators, e.g. Microsoft's Maia 100. At the same time, application-level developers suffer from poor programmability -- they must consider parallel address spaces and be careful about the limited device DRAM capacity. This can be alleviated if a malloc()-ed virtual address can be shared by the accelerator, or the abundant host DRAM can further transparently backup the device local memory. These external MM systems share similar mechanisms except for the hardware-dependent part, so reinventing them is effectively introducing redundant code (14K~70K for each case). Such developing/maintaining is not cheap. Furthermore, to share a malloc()-ed virtual address, device drivers need to deeply interact with Linux MM via low-level MM APIs, e.g. MMU notifiers/HMM. This raises the bar for driver development, since developers must understand how Linux MM works. Further, it creates code maintenance problems -- any changes to Linux MM potentially require coordinated changes to accelerator drivers using low-level MM APIs. Putting a cache-coherent bus between host and device will not make these external MM subsystems disappear. For example, a throughput-oriented accelerator will not tolerate executing heavy memory access workload with a host MMU/IOMMU via a remote bus. Therefore, devices will still have their own MMU and pick a simpler page table format for lower address translation overhead, requiring external MM subsystems. What GMEM (Generalized Memory Management [1]) does: GMEM extends Linux MM to share its machine-independent MM code. Only high-level interface is provided for device drivers. This prevents accelerator drivers from reinventing the wheel, but relies on drivers to implement their hardware-dependent functions declared by GMEM. GMEM's key interface include gm_dev_create(), gm_as_create(), gm_as_attach() and gm_dev_register_physmem(). Here briefly describe how a device driver utilizes them: 1. At boot time, call gm_dev_create() and registers the implementation of hardware-dependent functions as declared in struct gm_mmu. - If the device has local DRAM, call gm_dev_register_physmem() to register available physical addresses. 2. When a device context is initialized (e.g. triggered by ioctl), check if the current CPU process has been attached to a gmem address space (struct gm_as). If not, call gm_as_create() and point current->mm->gm_as to it. 3. Call gm_as_attach() to attach the device context to a gmem address space. 4. Invoke gm_dev_fault() to resolve a page fault or prepare data before device computation happens. GMEM has changed the following assumptions in Linux MM: 1. An mm_struct not only handle a single CPU context, but may also handle external memory contexts encapsulated as gm_context listed in mm->gm_as. An external memory context can include a few or all of the following parts: an external MMU (that requires TLB invalidation), an external page table (that requires PTE manipulation) and external DRAM (that requires physical memory management). Well that is pretty much exactly what AMD has already proposed with KFD and was rejected for rather good reasons. 2. Faulting a MAP_PRIVATE VMA with no CPU PTE found does not necessarily mean that a zero-filled physical page should be mapped. The virtual page may have been mapped to an external memory device. 3. Unmapping a page may include sending device TLB invalidation (even if its MMU shares CPU page table) and manipulating device PTEs. Semantics of new syscalls: 1. mmap(..., MAP_PRIVATE | MAP_PEER_SHARED) Allocate virtual address that is shared between the CPU and all attached devices. Data is guaranteed to be coherent whenever the address is accessed by either CPU or any attached device. If the device does not support page fault, then device driver is responsible for faulting memory before data gets accessed. By default, the CPU DRAM is can be used as a swap backup for the device local memory. 2. hmadvise(NUMA_id, va_start, size, memory_hint) Issuing memory hint for a given VMA. This extends traditional madvise() syscall with an extra argument so that programmers have better control with heterogeneous devices registered as NUMA nodes. One useful memory hint could be MADV_PREFETCH, which guarantees that the physical data of the given VMA [VA, VA+size) is migrated to NUMA node #id. Another useful memory hint is MADV_DONTNEED. This is
Re: [Intel-gfx] [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Am 27.11.23 um 17:47 schrieb Bhardwaj, Rajneesh: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Hamza Mahfooz Sent: Monday, November 27, 2023 10:53 AM To: Christian König ; jani.nik...@linux.intel.com; kher...@redhat.com; d...@redhat.com; za...@vmware.com; Olsak, Marek ; linux-graphics-maintai...@vmware.com; amd-...@lists.freedesktop.org; nouv...@lists.freedesktop.org; intel-gfx@lists.freedesktop.org; virtualizat...@lists.linux.dev; spice-de...@lists.freedesktop.org; dri-de...@lists.freedesktop.org Subject: Re: [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT On 11/27/23 09:54, Christian König wrote: Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. I found the subject description a bit misleading. Maybe use a Fixes tag describing it is a fix for suspend resume regression other than that, looks good to me. Well exactly that's the problem, this isn't really a fix and we also don't want to backport it. Basically the previous behavior was working as design, it's just that it was never intended to be used like this. Acked-by: Rajneesh Bhardwaj Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2893 Thanks, Christian. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index aa0dd6dad068..ddc8fb4db678 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* + * When GTT is just an alternative to VRAM make sure that we + * only use it as fallback and still try to fill up VRAM first. + */ + if (domain & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_BUSY; c++; } -- Hamza
[Intel-gfx] [PATCH 2/2] drm/amdgpu: use GTT only as fallback for VRAM|GTT
Try to fill up VRAM as well by setting the busy flag on GTT allocations. This fixes the issue that when VRAM was evacuated for suspend it's never filled up again unless the application is restarted. Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index aa0dd6dad068..ddc8fb4db678 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -173,6 +173,12 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) abo->flags & AMDGPU_GEM_CREATE_PREEMPTIBLE ? AMDGPU_PL_PREEMPT : TTM_PL_TT; places[c].flags = 0; + /* +* When GTT is just an alternative to VRAM make sure that we +* only use it as fallback and still try to fill up VRAM first. +*/ + if (domain & AMDGPU_GEM_DOMAIN_VRAM) + places[c].flags |= TTM_PL_FLAG_BUSY; c++; } -- 2.34.1
[Intel-gfx] [PATCH 1/2] drm/ttm: replace busy placement with flags v3
From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +-- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 + drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 +++-- drivers/gpu/drm/ttm/ttm_resource.c | 73 drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 3 +- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 99 +- include/drm/ttm/ttm_placement.h| 10 ++- include/drm/ttm/ttm_resource.h | 8 +- 18 files changed, 160 insertions(+), 187 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index cef920a93924..aa0dd6dad068 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1406,8 +1403,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 05991c5c8ddb..9a6a00b1af40 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0; - abo->placement.busy_placement = &abo->placements[1]; - abo->placement.num_busy_placement = 1; + abo->placements[0].flags |= TTM_PL_FLAG_IDLE; } else { /* Move to GTT memory */ a
[Intel-gfx] TTM improvement and amdgpu fix
Hi guys, TTM has a feature which allows to specify placements for normal operation as well as when all domains are "busy" and don't have free space. Not very widely used since it was a bit inflexible and required making multiple placement lists. Replace the multiple lists with flags and start to use this in amdgpu as well. As future improvement we should probably re-work was "busy" means for a domain as well. Please comment and/or test. Thanks, Christian.
Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by
Am 02.10.23 um 20:22 schrieb Kees Cook: On Mon, Oct 02, 2023 at 08:11:41PM +0200, Christian König wrote: Am 02.10.23 um 20:08 schrieb Kees Cook: On Mon, Oct 02, 2023 at 08:01:57PM +0200, Christian König wrote: Am 02.10.23 um 18:53 schrieb Kees Cook: On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote: On Mon, Oct 2, 2023 at 5:20 AM Christian König wrote: Am 29.09.23 um 21:33 schrieb Kees Cook: On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote: This is a batch of patches touching drm for preparing for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by to structs that would benefit from the annotation. [...] Since this got Acks, I figure I should carry it in my tree. Let me know if this should go via drm instead. Applied to for-next/hardening, thanks! [1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by https://git.kernel.org/kees/c/a6046ac659d6 STOP! In a follow up discussion Alex and I figured out that this won't work. I'm so confused; from the discussion I saw that Alex said both instances were false positives? The value in the structure is byte swapped based on some firmware endianness which not necessary matches the CPU endianness. SMU10 is APU only so the endianess of the SMU firmware and the CPU will always match. Which I think is what is being said here? Please revert that one from going upstream if it's already on it's way. And because of those reasons I strongly think that patches like this should go through the DRM tree :) Sure, that's fine -- please let me know. It was others Acked/etc. Who should carry these patches? Probably best if the relevant maintainer pick them up individually. Some of those structures are filled in by firmware/hardware and only the maintainers can judge if that value actually matches what the compiler needs. We have cases where individual bits are used as flags or when the size is byte swapped etc... Even Alex and I didn't immediately say how and where that field is actually used and had to dig that up. That's where the confusion came from. Okay, I've dropped them all from my tree. Several had Acks/Reviews, so hopefully those can get picked up for the DRM tree? I will pick those up to go through drm-misc-next. Going to ping maintainers once more when I'm not sure if stuff is correct or not. Sounds great; thanks! I wasn't 100% sure for the VC4 patch, but pushed the whole set to drm-misc-next anyway. This also means that the patches are now auto merged into the drm-tip integration branch and should any build or unit test go boom we should notice immediately and can revert it pretty easily. Thanks, Christian. -Kees
Re: [Intel-gfx] [PATCH] dma-buf: Deny copy-on-writes mmaps
Am 04.10.23 um 01:03 schrieb Andi Shyti: From: Chris Wilson Enforce that an mmap of a dmabuf is always using MAP_SHARED so that all access (both read and writes) using the device memory and not a local copy-on-write page in system memory. As much as I would like to do this I fear that this won't work. First of all interesting approach to do this in .get_unmapped_area. The standard handling is to have the check like "if (is_cow_mapping(vma->vm_flags)) return -EINVAL;", see TTM for example. Then IIRC we already tried this and had to revert it because it breaks the UAPI. Some broken applications actually use shared mappings (but not really cow) and we would like to keep them working. Regards, Christian. Signed-off-by: Chris Wilson Signed-off-by: Andi Shyti --- drivers/dma-buf/dma-buf.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c index 21916bba77d5..1ec297241842 100644 --- a/drivers/dma-buf/dma-buf.c +++ b/drivers/dma-buf/dma-buf.c @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -128,6 +129,19 @@ static struct file_system_type dma_buf_fs_type = { .kill_sb = kill_anon_super, }; +static unsigned long +dma_buf_get_unmapped_area(struct file *file, + unsigned long addr, + unsigned long len, + unsigned long pgoff, + unsigned long flags) +{ + if ((flags & MAP_TYPE) == MAP_PRIVATE) + return -EINVAL; + + return current->mm->get_unmapped_area(file, addr, len, pgoff, flags); +} + static int dma_buf_mmap_internal(struct file *file, struct vm_area_struct *vma) { struct dma_buf *dmabuf; @@ -508,6 +522,7 @@ static void dma_buf_show_fdinfo(struct seq_file *m, struct file *file) static const struct file_operations dma_buf_fops = { .release= dma_buf_file_release, + .get_unmapped_area = dma_buf_get_unmapped_area, .mmap = dma_buf_mmap_internal, .llseek = dma_buf_llseek, .poll = dma_buf_poll,
Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by
Am 02.10.23 um 20:08 schrieb Kees Cook: On Mon, Oct 02, 2023 at 08:01:57PM +0200, Christian König wrote: Am 02.10.23 um 18:53 schrieb Kees Cook: On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote: On Mon, Oct 2, 2023 at 5:20 AM Christian König wrote: Am 29.09.23 um 21:33 schrieb Kees Cook: On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote: This is a batch of patches touching drm for preparing for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by to structs that would benefit from the annotation. [...] Since this got Acks, I figure I should carry it in my tree. Let me know if this should go via drm instead. Applied to for-next/hardening, thanks! [1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by https://git.kernel.org/kees/c/a6046ac659d6 STOP! In a follow up discussion Alex and I figured out that this won't work. I'm so confused; from the discussion I saw that Alex said both instances were false positives? The value in the structure is byte swapped based on some firmware endianness which not necessary matches the CPU endianness. SMU10 is APU only so the endianess of the SMU firmware and the CPU will always match. Which I think is what is being said here? Please revert that one from going upstream if it's already on it's way. And because of those reasons I strongly think that patches like this should go through the DRM tree :) Sure, that's fine -- please let me know. It was others Acked/etc. Who should carry these patches? Probably best if the relevant maintainer pick them up individually. Some of those structures are filled in by firmware/hardware and only the maintainers can judge if that value actually matches what the compiler needs. We have cases where individual bits are used as flags or when the size is byte swapped etc... Even Alex and I didn't immediately say how and where that field is actually used and had to dig that up. That's where the confusion came from. Okay, I've dropped them all from my tree. Several had Acks/Reviews, so hopefully those can get picked up for the DRM tree? I will pick those up to go through drm-misc-next. Going to ping maintainers once more when I'm not sure if stuff is correct or not. Christian. Thanks! -Kees
Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by
Am 02.10.23 um 18:53 schrieb Kees Cook: On Mon, Oct 02, 2023 at 11:06:19AM -0400, Alex Deucher wrote: On Mon, Oct 2, 2023 at 5:20 AM Christian König wrote: Am 29.09.23 um 21:33 schrieb Kees Cook: On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote: This is a batch of patches touching drm for preparing for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by to structs that would benefit from the annotation. [...] Since this got Acks, I figure I should carry it in my tree. Let me know if this should go via drm instead. Applied to for-next/hardening, thanks! [1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by https://git.kernel.org/kees/c/a6046ac659d6 STOP! In a follow up discussion Alex and I figured out that this won't work. I'm so confused; from the discussion I saw that Alex said both instances were false positives? The value in the structure is byte swapped based on some firmware endianness which not necessary matches the CPU endianness. SMU10 is APU only so the endianess of the SMU firmware and the CPU will always match. Which I think is what is being said here? Please revert that one from going upstream if it's already on it's way. And because of those reasons I strongly think that patches like this should go through the DRM tree :) Sure, that's fine -- please let me know. It was others Acked/etc. Who should carry these patches? Probably best if the relevant maintainer pick them up individually. Some of those structures are filled in by firmware/hardware and only the maintainers can judge if that value actually matches what the compiler needs. We have cases where individual bits are used as flags or when the size is byte swapped etc... Even Alex and I didn't immediately say how and where that field is actually used and had to dig that up. That's where the confusion came from. Regards, Christian. Thanks! -Kees Regards, Christian. [2/9] drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by https://git.kernel.org/kees/c/4df33089b46f [3/9] drm/i915/selftests: Annotate struct perf_series with __counted_by https://git.kernel.org/kees/c/ffd3f823bdf6 [4/9] drm/msm/dpu: Annotate struct dpu_hw_intr with __counted_by https://git.kernel.org/kees/c/2de35a989b76 [5/9] drm/nouveau/pm: Annotate struct nvkm_perfdom with __counted_by https://git.kernel.org/kees/c/188aeb08bfaa [6/9] drm/vc4: Annotate struct vc4_perfmon with __counted_by https://git.kernel.org/kees/c/59a54dc896c3 [7/9] drm/virtio: Annotate struct virtio_gpu_object_array with __counted_by https://git.kernel.org/kees/c/5cd476de33af [8/9] drm/vmwgfx: Annotate struct vmw_surface_dirty with __counted_by https://git.kernel.org/kees/c/b426f2e5356a [9/9] drm/v3d: Annotate struct v3d_perfmon with __counted_by https://git.kernel.org/kees/c/dc662fa1b0e4 Take care,
Re: [Intel-gfx] [PATCH 0/9] drm: Annotate structs with __counted_by
Am 29.09.23 um 21:33 schrieb Kees Cook: On Fri, 22 Sep 2023 10:32:05 -0700, Kees Cook wrote: This is a batch of patches touching drm for preparing for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by to structs that would benefit from the annotation. [...] Since this got Acks, I figure I should carry it in my tree. Let me know if this should go via drm instead. Applied to for-next/hardening, thanks! [1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by https://git.kernel.org/kees/c/a6046ac659d6 STOP! In a follow up discussion Alex and I figured out that this won't work. The value in the structure is byte swapped based on some firmware endianness which not necessary matches the CPU endianness. Please revert that one from going upstream if it's already on it's way. And because of those reasons I strongly think that patches like this should go through the DRM tree :) Regards, Christian. [2/9] drm/amdgpu/discovery: Annotate struct ip_hw_instance with __counted_by https://git.kernel.org/kees/c/4df33089b46f [3/9] drm/i915/selftests: Annotate struct perf_series with __counted_by https://git.kernel.org/kees/c/ffd3f823bdf6 [4/9] drm/msm/dpu: Annotate struct dpu_hw_intr with __counted_by https://git.kernel.org/kees/c/2de35a989b76 [5/9] drm/nouveau/pm: Annotate struct nvkm_perfdom with __counted_by https://git.kernel.org/kees/c/188aeb08bfaa [6/9] drm/vc4: Annotate struct vc4_perfmon with __counted_by https://git.kernel.org/kees/c/59a54dc896c3 [7/9] drm/virtio: Annotate struct virtio_gpu_object_array with __counted_by https://git.kernel.org/kees/c/5cd476de33af [8/9] drm/vmwgfx: Annotate struct vmw_surface_dirty with __counted_by https://git.kernel.org/kees/c/b426f2e5356a [9/9] drm/v3d: Annotate struct v3d_perfmon with __counted_by https://git.kernel.org/kees/c/dc662fa1b0e4 Take care,
Re: [Intel-gfx] [PATCH 1/9] drm/amd/pm: Annotate struct smu10_voltage_dependency_table with __counted_by
Am 22.09.23 um 19:41 schrieb Alex Deucher: On Fri, Sep 22, 2023 at 1:32 PM Kees Cook wrote: Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time checking via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). As found with Coccinelle[1], add __counted_by for struct smu10_voltage_dependency_table. [1] https://github.com/kees/kernel-tools/blob/trunk/coccinelle/examples/counted_by.cocci Cc: Evan Quan Cc: Alex Deucher Cc: "Christian König" Cc: "Pan, Xinhui" Cc: David Airlie Cc: Daniel Vetter Cc: Xiaojian Du Cc: Huang Rui Cc: Kevin Wang Cc: amd-...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org Signed-off-by: Kees Cook Acked-by: Alex Deucher Mhm, I'm not sure if this is a good idea. That is a structure filled in by the firmware, isn't it? That would imply that we might need to byte swap count before it is checkable. Regards, Christian. --- drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h index 808e0ecbe1f0..42adc2a3dcbc 100644 --- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h +++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu10_hwmgr.h @@ -192,7 +192,7 @@ struct smu10_clock_voltage_dependency_record { struct smu10_voltage_dependency_table { uint32_t count; - struct smu10_clock_voltage_dependency_record entries[]; + struct smu10_clock_voltage_dependency_record entries[] __counted_by(count); }; struct smu10_clock_voltage_information { -- 2.34.1
Re: [Intel-gfx] [PATCH v2] drm: Update file owner during use
Am 20.09.23 um 15:21 schrieb Tvrtko Ursulin: On 28/08/2023 20:58, Rob Clark wrote: On Wed, Jun 21, 2023 at 2:48 AM Tvrtko Ursulin wrote: From: Tvrtko Ursulin With the typical model where the display server opens the file descriptor and then hands it over to the client(*), we were showing stale data in debugfs. Fix it by updating the drm_file->pid on ioctl access from a different process. The field is also made RCU protected to allow for lockless readers. Update side is protected with dev->filelist_mutex. Before: $ cat /sys/kernel/debug/dri/0/clients command pid dev master a uid magic Xorg 2344 0 y y 0 0 Xorg 2344 0 n y 0 2 Xorg 2344 0 n y 0 3 Xorg 2344 0 n y 0 4 After: $ cat /sys/kernel/debug/dri/0/clients command tgid dev master a uid magic Xorg 830 0 y y 0 0 xfce4-session 880 0 n y 0 1 xfwm4 943 0 n y 0 2 neverball 1095 0 n y 0 3 *) More detailed and historically accurate description of various handover implementation kindly provided by Emil Velikov: """ The traditional model, the server was the orchestrator managing the primary device node. From the fd, to the master status and authentication. But looking at the fd alone, this has varied across the years. IIRC in the DRI1 days, Xorg (libdrm really) would have a list of open fd(s) and reuse those whenever needed, DRI2 the client was responsible for open() themselves and with DRI3 the fd was passed to the client. Around the inception of DRI3 and systemd-logind, the latter became another possible orchestrator. Whereby Xorg and Wayland compositors could ask it for the fd. For various reasons (hysterical and genuine ones) Xorg has a fallback path going the open(), whereas Wayland compositors are moving to solely relying on logind... some never had fallback even. Over the past few years, more projects have emerged which provide functionality similar (be that on API level, Dbus, or otherwise) to systemd-logind. """ v2: * Fixed typo in commit text and added a fine historical explanation from Emil. Signed-off-by: Tvrtko Ursulin Cc: "Christian König" Cc: Daniel Vetter Acked-by: Christian König Reviewed-by: Emil Velikov Reviewed-by: Rob Clark Tested-by: Rob Clark Thanks. If everyone else is happy with this approach I don't have the commit rights for drm-misc. Going to take care of pushing this. Regards, Christian. Regards, Tvrtko --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++-- drivers/gpu/drm/drm_auth.c | 3 +- drivers/gpu/drm/drm_debugfs.c | 10 --- drivers/gpu/drm/drm_file.c | 40 +++-- drivers/gpu/drm/drm_ioctl.c | 3 ++ drivers/gpu/drm/nouveau/nouveau_drm.c | 5 +++- drivers/gpu/drm/vmwgfx/vmwgfx_gem.c | 6 ++-- include/drm/drm_file.h | 13 ++-- 8 files changed, 71 insertions(+), 15 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c index 74055cba3dc9..849097dff02b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c @@ -963,6 +963,7 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) list_for_each_entry(file, &dev->filelist, lhead) { struct task_struct *task; struct drm_gem_object *gobj; + struct pid *pid; int id; /* @@ -972,8 +973,9 @@ static int amdgpu_debugfs_gem_info_show(struct seq_file *m, void *unused) * Therefore, we need to protect this ->comm access using RCU. */ rcu_read_lock(); - task = pid_task(file->pid, PIDTYPE_TGID); - seq_printf(m, "pid %8d command %s:\n", pid_nr(file->pid), + pid = rcu_dereference(file->pid); + task = pid_task(pid, PIDTYPE_TGID); + seq_printf(m, "pid %8d command %s:\n", pid_nr(pid), task ? task->comm : ""); rcu_read_unlock(); diff --git a/drivers/gpu/drm/drm_auth.c b/drivers/gpu/drm/drm_auth.c index cf92a9ae8034..2ed2585ded37 100644 --- a/drivers/gpu/drm/drm_auth.c +++ b/drivers/gpu/drm/drm_auth.c @@ -235,7 +235,8 @@ static int drm_new_set_master(struct drm_device *dev, struct drm_file *fpriv) static int drm_master_check_perm(struct drm_device *dev, struct drm_file *file_priv) { - if (file_priv->pid == task_pid(current) && file_priv->was_master) + if (file_priv->was_master &
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 07.09.23 um 18:33 schrieb suijingfeng: Hi, On 2023/9/7 17:08, Christian König wrote: I strongly suggest that you just completely drop this here Drop this is OK, no problem. Then I will go to develop something else. This version is not intended to merge originally, as it's a RFC. Also, the core mechanism already finished, it is the first patch in this series. Things left are just policy (how to specify one and parse the kernel CMD line) and nothing interesting left. It is actually to fulfill my promise at V3 which is to give some examples as usage cases. and go into the AST driver and try to fix it. Well, someone tell me that this is well defined behavior yesterday, which imply that it is not a bug. I'm not going to fix a non-bug. Sorry for that, I wasn't realizing what you are actually trying to do. But if thomas ask me to fix it, then I probably have to try to fix. But I suggest if things not broken, don't fix it. Otherwise this may incur more big trouble. For server's single display use case, it is good enough. Yeah, exactly that's the reason why you shouldn't mess with this. In theory you could try to re-program the necessary north bridge blocks to make integrated graphics work even if you installed a dedicated VGA adapter, but you will most likely be missing something. The only real fix is to tell the BIOS that you want to use the integrated VGA device even if a dedicated one is detected. If you want to learn more about the background AMD has a bunch of documentation around this on their website: https://www.amd.com/en/search/documentation/hub.html The most interesting document for you is probably the BIOS programming manual, but don't ask me what exactly the title of that one. @Alex do you remember what that was called? IIRC Intel had similar documentations public, but I don't know where to find those of hand. Regards, Christian. Thanks.
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 07.09.23 um 17:26 schrieb suijingfeng: [SNIP] Then, I'll give you another example, see below for elaborate description. I have one AMD BC160 GPU, see[1] to get what it looks like. The GPU don't has a display connector interface exported. It actually can be seen as a render-only GPU or compute class GPU for bitcoin. But the firmware of it still acclaim this GPU as VGA compatible. When mount this GPU onto motherboard, the system always select this GPU as primary. But this GPU can't be able to connect with a monitor. Under such a situation, modprobe.blacklist=amdgpu don't works either, because vgaarb always select this GPU as primary, this is a device-level decision. It's not VGAARB which makes this selection, it's the BIOS. VGAARB just detects what the BIOS has decided. $ dmesg | grep vgaarb: [ 3.541405] pci :0c:00.0: vgaarb: BAR 0: [mem 0xa000-0xafff 64bit pref] contains firmware FB [0xa000-0xa02f] [ 3.901448] pci :05:00.0: vgaarb: setting as boot VGA device [ 3.905375] pci :05:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 3.905382] pci :0c:00.0: vgaarb: setting as boot VGA device (overriding previous) [ 3.909375] pci :0c:00.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none [ 3.913375] pci :0d:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none [ 3.913377] vgaarb: loaded [ 13.513760] amdgpu :0c:00.0: vgaarb: deactivate vga console [ 19.020992] amdgpu :0c:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem I'm using ubuntu 22.04 system, with ast.modeset=10 passed on the cmd line, I still be able to enter the graphics system. And views this GPU as a render-only GPU. Probably continue to examine what's wrong, except this, drm/amdgpu report " *ERROR* IB test failed on sdma0 (-110)" to me. Does this count as problem? No, again that is perfectly expected behavior. Some BIOSes (or maybe most by modern standard) allows to override this, but if you later override this by the OS you run the hardware outside what's validated. When you put a VGA device into a board with an integrated VGA device the integrated one gets disabled. This is even part of some PCIe specification IIRC. So the problems you run into here are perfectly expected. Regards, Christian. Before I could find solution, I have keep this de-fact render only GPU mounted. Because I need recompile kennel module, install the kernel module and testing. All I need is a 2D video card to display something, ast drm is OK, despite simple. It suit the need for my daily usage with VIM, that's enough for me. Now, the real questions that I want ask is: 1) Does the fact that when the kernel driver module got blocked (by modprobe.blacklist=amdgpu), while the vgaarb still select it as primary which leave the X server crash there (because no kennel space driver loaded) count as a problem? 2) Does my approach that mounting another GPU as the primary display adapter, while its real purpose is to solving bugs and development for another GPU, count as a use case? $ cat demsg.txt | grep drm [ 10.099888] ACPI: bus type drm_connector registered [ 11.083920] etnaviv :0d:00.0: [drm] bind etnaviv-display, master name: :0d:00.0 [ 11.084106] [drm] Initialized etnaviv 1.3.0 20151214 for :0d:00.0 on minor 0 [ 13.301702] [drm] amdgpu kernel modesetting enabled. [ 13.359820] [drm] initializing kernel modesetting (NAVI12 0x1002:0x7360 0x1002:0x0A34 0xC7). [ 13.368246] [drm] register mmio base: 0xEB10 [ 13.372861] [drm] register mmio size: 524288 [ 13.380788] [drm] add ip block number 0 [ 13.385661] [drm] add ip block number 1 [ 13.390531] [drm] add ip block number 2 [ 13.395405] [drm] add ip block number 3 [ 13.399760] [drm] add ip block number 4 [ 13.404111] [drm] add ip block number 5 [ 13.408378] [drm] add ip block number 6 [ 13.413249] [drm] add ip block number 7 [ 13.433546] [drm] add ip block number 8 [ 13.433547] [drm] add ip block number 9 [ 13.497757] [drm] VCN decode is enabled in VM mode [ 13.502540] [drm] VCN encode is enabled in VM mode [ 13.508785] [drm] JPEG decode is enabled in VM mode [ 13.529596] [drm] vm size is 262144 GB, 4 levels, block size is 9-bit, fragment size is 9-bit [ 13.564762] [drm] Detected VRAM RAM=8176M, BAR=256M [ 13.569628] [drm] RAM width 2048bits HBM [ 13.574167] [drm] amdgpu: 8176M of VRAM memory ready [ 13.579125] [drm] amdgpu: 15998M of GTT memory ready. [ 13.584184] [drm] GART: num cpu pages 131072, num gpu pages 131072 [ 13.590505] [drm] PCIE GART of 512M enabled (table at 0x00800030). [ 13.598749] [drm] Found VCN firmware Version ENC: 1.16 DEC: 5 VEP: 0 Revision: 4 [ 13.671786] [drm] reserve 0xe0 from 0x81fd00 for PSP TMR [ 13.801235] [drm] Display Core v3.2.247 initialized on DCN 2.0 [ 13.807061] [drm] DP-HDM
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 07.09.23 um 14:32 schrieb suijingfeng: Hi, On 2023/9/7 17:08, Christian König wrote: Well, I have over 25 years of experience with display hardware and what you describe here was never an issue. I want to give you an example to let you know more. I have a ASRock AD2550B-ITX board[1], When another discrete video card is mounted into it mini PCIe slot or PCI slot, The IGD cannot be the primary display adapter anymore. The display is totally black. I have try to draft a few trivial patch to help fix this[2]. And I want to use the IGD as primary, does this count as an issue? No, this is completely expected behavior and a limitation of the hardware design. As far as I know both AMD and Intel GPUs work the same here. Regards, Christian. [1] https://www.asrock.com/mb/Intel/AD2550-ITX/ [2] https://patchwork.freedesktop.org/series/123073/
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 07.09.23 um 04:30 schrieb Sui Jingfeng: Hi, On 2023/9/6 17:40, Christian König wrote: Am 06.09.23 um 11:08 schrieb suijingfeng: Well, welcome to correct me if I'm wrong. You seem to have some very basic misunderstandings here. The term framebuffer describes some VRAM memory used for scanout. This framebuffer is exposed to userspace through some framebuffer driver, on UEFI platforms that is usually efifb but can be quite a bunch of different drivers. When the DRM drivers load they remove the previous drivers using drm_aperture_remove_conflicting_pci_framebuffers() (or similar function), but this does not mean that the framebuffer or scanout parameters are modified in any way. It just means that the framebuffer is just no longer exposed through this driver. Take over is the perfectly right description here because that's exactly what's happening. The framebuffer configuration including the VRAM memory as well as the parameters for scanout are exposed by the newly loaded DRM driver. In other words userspace can query through the DRM interfaces which monitors already driven by the hardware and so in your terminology figure out which is the primary one. I'm a little bit of not convinced about this idea, you might be correct. Well I can point you to the code if you don't believe me. But there cases where three are multiple monitors and each video card connect one. Yeah, but this is irrelevant. The key point is the configuration is taken over when the driver loads. So whatever is there before as setup (one monitor showing console, three monitors mirrored, whatever) should be there after loading the driver as well. This configuration is just immediately overwritten because nobody cares about it. It also quite common that no monitors is connected, let the machine boot first, then find a monitors to connect to a random display output. See which will display. I don't expect the primary shake with. The primary one have to be determined as early as possible, because of the VGA console and the framebuffer console may directly output the primary. Well that is simply not correct. There is not concept of "primary" display, it can just be that a monitor was brought up by the BIOS or bootloader and we take over this configuration. Get the DDC and/or HPD involved may necessary complicated the problem. There are ASpeed BMC who add a virtual connector in order to able display remotely. There are also have commands to force a connector to be connected status. It's just that as Thomas explained as well that this completely irrelevant to any modern desktop. Both X and Wayland both iterate the available devices and start rendering to them which one was used during boot doesn't really matter to them. You may be correct, but I'm still not sure. I probably need more times to investigate. Me and my colleagues are mainly using X server, the version varies from 1.20.4 and 1.21.1.4. Even this is true, the problems still exist for non-modern desktops. Well, I have over 25 years of experience with display hardware and what you describe here was never an issue. What you have is simply a broken display driver which for some reason can't handle your use case. I strongly suggest that you just completely drop this here and go into the AST driver and try to fix it. Regards, Christian. Apart from that ranting like this and trying to explain stuff to people who obviously have much better background in the topic is not going to help your patches getting upstream. Thanks for you tell me so much knowledge, I'm realized where are the problems now. I will try to resolve the concerns at the next version. Regards, Christian.
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 06.09.23 um 12:31 schrieb Sui Jingfeng: Hi, On 2023/9/6 14:45, Christian König wrote: Firmware framebuffer device already get killed by the drm_aperture_remove_conflicting_pci_framebuffers() function (or its siblings). So, this series is definitely not to interact with the firmware framebuffer (or more intelligent framebuffer drivers). It is for user space program, such as X server and Wayland compositor. Its for Linux user or drm drivers testers, which allow them to direct graphic display server using right hardware of interested as primary video card. Also, I believe that X server and Wayland compositor are the best test examples. If a specific DRM driver can't work with X server as a primary, then there probably have something wrong. But what's the use case for overriding this setting? On a specific machine with multiple GPUs mounted, only the primary graphics get POST-ed (initialized) by the firmware. Therefore, the DRM drivers for the rest video cards, have to choose to work without the prerequisite setups done by firmware, This is called as POST. Well, you don't seem to understand the background here. This is perfectly normal behavior. Secondary cards are posted after loading the appropriate DRM driver. At least for amdgpu this is done by calling the appropriate functions in the BIOS. Well, thanks for you tell me this. You know more than me and definitely have a better understanding. Are you telling me that the POST function for AMDGPU reside in the BIOS? The kernel call into the BIOS? Yes, exactly that. Does the BIOS here refer to the UEFI runtime or ATOM BIOS or something else? On dGPUs it's the VBIOS on a flashrom on the board, for iGPUs (APUs as AMD calls them) it's part of the system BIOS. UEFI is actually just a small subsystem in the system BIOS which replaced the old interface used between system BIOS, video BIOS and operating system. But the POST function for the drm ast, reside in the kernel space (in other word, in ast.ko). Is this statement correct? I don't know the ast driver well enough to answer that, but I assume they just read the BIOS and execute the appropriate functions. I means that for ASpeed BMC chip, if the firmware not POST the display controller. Then we have to POST it at the kernel space before doing various modeset option. We can only POST this chip by directly operate the various registers. Am I correct for the judgement about ast drm driver? Well POST just means Power On Self Test, but what you mean is initializing the hardware. Some drivers can of course initialize the hardware without the help of the BIOS, but I don't think AST can do that. As far as I know it's a relatively simple driver. BTW firmware is not the same as the BIOS (which runs the POST), firmware usually refers to something run on microcontrollers inside the ASIC while the (system or video) BIOS runs on the host CPU. Regards, Christian. Thanks for your reviews.
Re: [Intel-gfx] [Nouveau] [RFC, drm-misc-next v4 0/9] PCI/VGA: Allowing the user to select the primary video adapter at boot time
Am 06.09.23 um 11:08 schrieb suijingfeng: Well, welcome to correct me if I'm wrong. You seem to have some very basic misunderstandings here. The term framebuffer describes some VRAM memory used for scanout. This framebuffer is exposed to userspace through some framebuffer driver, on UEFI platforms that is usually efifb but can be quite a bunch of different drivers. When the DRM drivers load they remove the previous drivers using drm_aperture_remove_conflicting_pci_framebuffers() (or similar function), but this does not mean that the framebuffer or scanout parameters are modified in any way. It just means that the framebuffer is just no longer exposed through this driver. Take over is the perfectly right description here because that's exactly what's happening. The framebuffer configuration including the VRAM memory as well as the parameters for scanout are exposed by the newly loaded DRM driver. In other words userspace can query through the DRM interfaces which monitors already driven by the hardware and so in your terminology figure out which is the primary one. It's just that as Thomas explained as well that this completely irrelevant to any modern desktop. Both X and Wayland both iterate the available devices and start rendering to them which one was used during boot doesn't really matter to them. Apart from that ranting like this and trying to explain stuff to people who obviously have much better background in the topic is not going to help your patches getting upstream. Regards, Christian.