[PATCH 2/2] drm/ttm: Add a device flag to propagate -ENOSPC on OOM
Some graphics APIs differentiate between out-of-graphics-memory and out-of-host-memory (system memory). Add a device init flag to have -ENOSPC propagated from the resource managers instead of being converted to -ENOMEM, to aid driver stacks in determining what error code to return or whether corrective action can be taken at the driver level. Cc: Christian König Cc: Matthew Brost Signed-off-by: Thomas Hellström --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 1 + include/drm/ttm/ttm_device.h | 13 + 3 files changed, 15 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 320592435252..c4bec2ad301b 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -835,7 +835,7 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, /* For backward compatibility with userspace */ if (ret == -ENOSPC) - return -ENOMEM; + return bo->bdev->propagate_enospc ? ret : -ENOMEM; /* * We might need to add a TTM. diff --git a/drivers/gpu/drm/ttm/ttm_device.c b/drivers/gpu/drm/ttm/ttm_device.c index 0c85d10e5e0b..aee9d52d745b 100644 --- a/drivers/gpu/drm/ttm/ttm_device.c +++ b/drivers/gpu/drm/ttm/ttm_device.c @@ -203,6 +203,7 @@ int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *func } bdev->funcs = funcs; + bdev->propagate_enospc = flags.propagate_enospc; ttm_sys_man_init(bdev); diff --git a/include/drm/ttm/ttm_device.h b/include/drm/ttm/ttm_device.h index 1534bd946c78..f9da78bbd925 100644 --- a/include/drm/ttm/ttm_device.h +++ b/include/drm/ttm/ttm_device.h @@ -266,6 +266,13 @@ struct ttm_device { * @wq: Work queue structure for the delayed delete workqueue. */ struct workqueue_struct *wq; + + /** +* @propagate_enospc: Whether -ENOSPC should be propagated to the caller after +* graphics memory allocation failure. If false, this will be converted to +* -ENOMEM, which is the default behaviour. +*/ + bool propagate_enospc; }; int ttm_global_swapout(struct ttm_operation_ctx *ctx, gfp_t gfp_flags); @@ -295,6 +302,12 @@ struct ttm_device_init_flags { u32 use_dma_alloc : 1; /** @use_dma32: If we should use GFP_DMA32 for device memory allocations. */ u32 use_dma32 : 1; + /** +* @propagate_enospc: Whether -ENOSPC should be propagated to the caller after +* graphics memory allocation failure. If false, this will be converted to +* -ENOMEM, which is the default behaviour. +*/ + u32 propagate_enospc : 1; }; int ttm_device_init(struct ttm_device *bdev, const struct ttm_device_funcs *funcs, -- 2.46.0
[PATCH 1/2] drm/ttm: Change ttm_device_init to use a struct instead of multiple bools
The ttm_device_init funcition uses multiple bool arguments. That means readability in the caller becomes poor, and all callers need to change if yet another bool is added. Instead use a struct with multiple single-bit flags. This addresses both problems. Prefer it over using defines or enums with explicit bit shifts, since converting to and from these bit values uses logical operations or tests which are implicit with the struct usage, and ofc type-checking. This is in preparation of adding yet another bool flag parameter to the function. Cc: Christian König Cc: amd-...@lists.freedesktop.org Cc: intel-...@lists.freedesktop.org Cc: nouveau@lists.freedesktop.org Cc: spice-de...@lists.freedesktop.org Cc: dri-de...@lists.freedesktop.org Cc: Zack Rusin Cc: Cc: Sui Jingfeng Cc: Signed-off-by: Thomas Hellström --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 -- drivers/gpu/drm/i915/intel_region_ttm.c | 3 ++- drivers/gpu/drm/loongson/lsdc_ttm.c | 5 - drivers/gpu/drm/nouveau/nouveau_ttm.c | 7 +-- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 6 -- drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 16 +++ .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 3 ++- drivers/gpu/drm/ttm/tests/ttm_device_test.c | 16 --- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 20 --- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h | 6 ++ drivers/gpu/drm/ttm/ttm_device.c | 7 +++ drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++-- drivers/gpu/drm/xe/xe_device.c| 3 ++- include/drm/ttm/ttm_device.h | 12 ++- 15 files changed, 67 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index b8bc7fa8c375..9439fc12c17b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1853,8 +1853,10 @@ int amdgpu_ttm_init(struct amdgpu_device *adev) r = ttm_device_init(&adev->mman.bdev, &amdgpu_bo_driver, adev->dev, adev_to_drm(adev)->anon_inode->i_mapping, adev_to_drm(adev)->vma_offset_manager, - adev->need_swiotlb, - dma_addressing_limited(adev->dev)); + (struct ttm_device_init_flags){ + .use_dma_alloc = adev->need_swiotlb, + .use_dma32 = dma_addressing_limited(adev->dev) + }); if (r) { DRM_ERROR("failed initializing buffer object driver(%d).\n", r); return r; diff --git a/drivers/gpu/drm/i915/intel_region_ttm.c b/drivers/gpu/drm/i915/intel_region_ttm.c index 04525d92bec5..db34da63814c 100644 --- a/drivers/gpu/drm/i915/intel_region_ttm.c +++ b/drivers/gpu/drm/i915/intel_region_ttm.c @@ -34,7 +34,8 @@ int intel_region_ttm_device_init(struct drm_i915_private *dev_priv) return ttm_device_init(&dev_priv->bdev, i915_ttm_driver(), drm->dev, drm->anon_inode->i_mapping, - drm->vma_offset_manager, false, false); + drm->vma_offset_manager, + (struct ttm_device_init_flags){}); } /** diff --git a/drivers/gpu/drm/loongson/lsdc_ttm.c b/drivers/gpu/drm/loongson/lsdc_ttm.c index 2e42c6970c9f..c684f1636f3f 100644 --- a/drivers/gpu/drm/loongson/lsdc_ttm.c +++ b/drivers/gpu/drm/loongson/lsdc_ttm.c @@ -544,7 +544,10 @@ int lsdc_ttm_init(struct lsdc_device *ldev) ret = ttm_device_init(&ldev->bdev, &lsdc_bo_driver, ddev->dev, ddev->anon_inode->i_mapping, - ddev->vma_offset_manager, false, true); + ddev->vma_offset_manager, + (struct ttm_device_init_flags){ + .use_dma32 = true + }); if (ret) return ret; diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index e244927eb5d4..5f89d2b40425 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -302,8 +302,11 @@ nouveau_ttm_init(struct nouveau_drm *drm) ret = ttm_device_init(&drm->ttm.bdev, &nouveau_bo_driver, drm->dev->dev, dev->anon_inode->i_mapping, dev->vma_offset_manager, - drm_need_swiotlb(drm->client.mmu.dmabits), - drm->client.mmu.dmabits <=
[PATCH 0/2] drm/ttm: Add an option to report graphics memory OOM
Some graphics APIs differentiate between out-of-graphics-memory and out-of-host-memory (system memory). Add a device init flag to have -ENOSPC propagated from the resource managers instead of being converted to -ENOMEM, to aid driver stacks in determining what error code to return or whether corrective action can be taken at the driver level. The first patch deals with a ttm_device_init() interface change, The Second patch adds the actual functionality. A follow-up will be posted for Xe once this is merged / backmerged. Thomas Hellström (2): drm/ttm: Change ttm_device_init to use a struct instead of multiple bools drm/ttm: Add a device flag to propagate -ENOSPC on OOM drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 +++-- drivers/gpu/drm/i915/intel_region_ttm.c | 3 ++- drivers/gpu/drm/loongson/lsdc_ttm.c | 5 +++- drivers/gpu/drm/nouveau/nouveau_ttm.c | 7 -- drivers/gpu/drm/qxl/qxl_ttm.c | 2 +- drivers/gpu/drm/radeon/radeon_ttm.c | 6 +++-- drivers/gpu/drm/ttm/tests/ttm_bo_test.c | 16 ++-- .../gpu/drm/ttm/tests/ttm_bo_validate_test.c | 3 ++- drivers/gpu/drm/ttm/tests/ttm_device_test.c | 16 ++-- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.c | 20 ++- drivers/gpu/drm/ttm/tests/ttm_kunit_helpers.h | 6 ++--- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- drivers/gpu/drm/ttm/ttm_device.c | 8 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 +-- drivers/gpu/drm/xe/xe_device.c| 3 ++- include/drm/ttm/ttm_device.h | 25 ++- 16 files changed, 82 insertions(+), 50 deletions(-) -- 2.46.0
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v5
Hi, Christian On Thu, 2024-02-29 at 14:40 +0100, Christian König wrote: > Previously we would never try to move a BO into the preferred > placements > when it ever landed in a busy placement since those were considered > compatible. > > Rework the whole handling and finally unify the idle and busy > handling. > ttm_bo_validate() is now responsible to try idle placement first and > then > use the busy placement if that didn't worked. > > Drawback is that we now always try the idle placement first for each > validation which might cause some additional CPU overhead on > overcommit. > > v2: fix kerneldoc warning and coding style > v3: take care of XE as well > v4: keep the ttm_bo_mem_space functionality as it is for now, only > add > new handling for ttm_bo_validate as suggested by Thomas > v5: fix bug pointed out by Matthew > > Signed-off-by: Christian König > Reviewed-by: Zack Rusin v3 Now Xe CI passes \o/ Still some checkpatch.pl warnings on both these lines. For the first line I think it uses From: in the email as the author and when that doesn't match the SOB, it becomes unhappy. With that fixed, Reviewed-by: Thomas Hellström > --- > drivers/gpu/drm/ttm/ttm_bo.c | 231 +-- > -- > drivers/gpu/drm/ttm/ttm_resource.c | 16 +- > include/drm/ttm/ttm_resource.h | 3 +- > 3 files changed, 121 insertions(+), 129 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > index 96a724e8f3ff..e059b1e1b13b 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct > ttm_buffer_object *bo, > return ret; > } > > -/* > - * Repeatedly evict memory from the LRU for @mem_type until we > create enough > - * space, or we've evicted everything and there isn't enough space. > - */ > -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > - const struct ttm_place *place, > - struct ttm_resource **mem, > - struct ttm_operation_ctx *ctx) > -{ > - struct ttm_device *bdev = bo->bdev; > - struct ttm_resource_manager *man; > - struct ww_acquire_ctx *ticket; > - int ret; > - > - man = ttm_manager_type(bdev, place->mem_type); > - ticket = dma_resv_locking_ctx(bo->base.resv); > - do { > - ret = ttm_resource_alloc(bo, place, mem); > - if (likely(!ret)) > - break; > - if (unlikely(ret != -ENOSPC)) > - return ret; > - ret = ttm_mem_evict_first(bdev, man, place, ctx, > - ticket); > - if (unlikely(ret != 0)) > - return ret; > - } while (1); > - > - return ttm_bo_add_move_fence(bo, man, *mem, ctx- > >no_wait_gpu); > -} > - > /** > - * ttm_bo_mem_space > + * ttm_bo_alloc_resource - Allocate backing store for a BO > * > - * @bo: Pointer to a struct ttm_buffer_object. the data of which > - * we want to allocate space for. > - * @placement: Proposed new placement for the buffer object. > - * @mem: A struct ttm_resource. > + * @bo: Pointer to a struct ttm_buffer_object of which we want a > resource for > + * @placement: Proposed new placement for the buffer object > * @ctx: if and how to sleep, lock buffers and alloc memory > + * @force_space: If we should evict buffers to force space > + * @res: The resulting struct ttm_resource. > * > - * Allocate memory space for the buffer object pointed to by @bo, > using > - * the placement flags in @placement, potentially evicting other > idle buffer objects. > - * This function may sleep while waiting for space to become > available. > + * Allocates a resource for the buffer object pointed to by @bo, > using the > + * placement flags in @placement, potentially evicting other buffer > objects when > + * @force_space is true. > + * This function may sleep while waiting for resources to become > available. > * Returns: > - * -EBUSY: No space available (only if no_wait == 1). > + * -EBUSY: No space available (only if no_wait == true). > * -ENOSPC: Could not allocate space for the buffer object, either > due to > * fragmentation or concurrent allocators. > * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. > */ > -int ttm_bo_mem_space(struct ttm_buffer_object *bo, > - struct ttm_placement *placement, > - struct ttm_resource **mem, > - struct ttm_operation_ctx *ctx) > +static
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
Hi, Christian On Fri, 2024-02-23 at 15:30 +0100, Christian König wrote: > Am 06.02.24 um 13:56 schrieb Christian König: > > Am 06.02.24 um 13:53 schrieb Thomas Hellström: > > > Hi, Christian, > > > > > > On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: > > > > Previously we would never try to move a BO into the preferred > > > > placements > > > > when it ever landed in a busy placement since those were > > > > considered > > > > compatible. > > > > > > > > Rework the whole handling and finally unify the idle and busy > > > > handling. > > > > ttm_bo_validate() is now responsible to try idle placement > > > > first and > > > > then > > > > use the busy placement if that didn't worked. > > > > > > > > Drawback is that we now always try the idle placement first for > > > > each > > > > validation which might cause some additional CPU overhead on > > > > overcommit. > > > > > > > > v2: fix kerneldoc warning and coding style > > > > v3: take care of XE as well > > > > v4: keep the ttm_bo_mem_space functionality as it is for now, > > > > only > > > > add > > > > new handling for ttm_bo_validate as suggested by Thomas > > > > > > > > Signed-off-by: Christian König > > > > Reviewed-by: Zack Rusin v3 > > > Sending this through xe CI, will try to review asap. > > > > Take your time. At the moment people are bombarding me with work > > and I > > have only two hands and one head as well :( > > So I've digged myself out of that hole and would rather like to get > this > new feature into 6.9. > > Any time to review it? I can also plan some time to review your LRU > changes next week. > > Thanks, > Christian. Sorry for the late response. Was planning to review but saw that there was still an xe CI failure. https://intel-gfx-ci.01.org/tree/intel-xe/xe-pw-129579v1/bat-atsm-2/igt@xe_evict_...@evict-overcommit-parallel-nofree-samefd.html I haven't really had time to look into what might be causing this, though. /Thomas > > > > > Christian. > > > > > > > > /Thomas > > > > > > > > > > --- > > > > drivers/gpu/drm/ttm/ttm_bo.c | 231 +--- > > > > --- > > > > -- > > > > drivers/gpu/drm/ttm/ttm_resource.c | 16 +- > > > > include/drm/ttm/ttm_resource.h | 3 +- > > > > 3 files changed, 121 insertions(+), 129 deletions(-) > > > > > > > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > > > > b/drivers/gpu/drm/ttm/ttm_bo.c > > > > index ba3f09e2d7e6..b12f435542a9 100644 > > > > --- a/drivers/gpu/drm/ttm/ttm_bo.c > > > > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > > > > @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct > > > > ttm_buffer_object *bo, > > > > return ret; > > > > } > > > > -/* > > > > - * Repeatedly evict memory from the LRU for @mem_type until we > > > > create enough > > > > - * space, or we've evicted everything and there isn't enough > > > > space. > > > > - */ > > > > -static int ttm_bo_mem_force_space(struct ttm_buffer_object > > > > *bo, > > > > - const struct ttm_place *place, > > > > - struct ttm_resource **mem, > > > > - struct ttm_operation_ctx *ctx) > > > > -{ > > > > - struct ttm_device *bdev = bo->bdev; > > > > - struct ttm_resource_manager *man; > > > > - struct ww_acquire_ctx *ticket; > > > > - int ret; > > > > - > > > > - man = ttm_manager_type(bdev, place->mem_type); > > > > - ticket = dma_resv_locking_ctx(bo->base.resv); > > > > - do { > > > > - ret = ttm_resource_alloc(bo, place, mem); > > > > - if (likely(!ret)) > > > > - break; > > > > - if (unlikely(ret != -ENOSPC)) > > > > - return ret; > > > > - ret = ttm_mem_evict_first(bdev, man, place, ctx, > > > > - ticket); > > > > - if (unlikely(ret != 0)) > > > > - return ret; > > > > - } while (1); > > > > - > > > > -
Re: [PATCH 1/2] drm/ttm: improve idle/busy handling v4
Hi, Christian, On Fri, 2024-01-26 at 15:09 +0100, Christian König wrote: > Previously we would never try to move a BO into the preferred > placements > when it ever landed in a busy placement since those were considered > compatible. > > Rework the whole handling and finally unify the idle and busy > handling. > ttm_bo_validate() is now responsible to try idle placement first and > then > use the busy placement if that didn't worked. > > Drawback is that we now always try the idle placement first for each > validation which might cause some additional CPU overhead on > overcommit. > > v2: fix kerneldoc warning and coding style > v3: take care of XE as well > v4: keep the ttm_bo_mem_space functionality as it is for now, only > add > new handling for ttm_bo_validate as suggested by Thomas > > Signed-off-by: Christian König > Reviewed-by: Zack Rusin v3 Sending this through xe CI, will try to review asap. /Thomas > --- > drivers/gpu/drm/ttm/ttm_bo.c | 231 +-- > -- > drivers/gpu/drm/ttm/ttm_resource.c | 16 +- > include/drm/ttm/ttm_resource.h | 3 +- > 3 files changed, 121 insertions(+), 129 deletions(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > index ba3f09e2d7e6..b12f435542a9 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -724,64 +724,36 @@ static int ttm_bo_add_move_fence(struct > ttm_buffer_object *bo, > return ret; > } > > -/* > - * Repeatedly evict memory from the LRU for @mem_type until we > create enough > - * space, or we've evicted everything and there isn't enough space. > - */ > -static int ttm_bo_mem_force_space(struct ttm_buffer_object *bo, > - const struct ttm_place *place, > - struct ttm_resource **mem, > - struct ttm_operation_ctx *ctx) > -{ > - struct ttm_device *bdev = bo->bdev; > - struct ttm_resource_manager *man; > - struct ww_acquire_ctx *ticket; > - int ret; > - > - man = ttm_manager_type(bdev, place->mem_type); > - ticket = dma_resv_locking_ctx(bo->base.resv); > - do { > - ret = ttm_resource_alloc(bo, place, mem); > - if (likely(!ret)) > - break; > - if (unlikely(ret != -ENOSPC)) > - return ret; > - ret = ttm_mem_evict_first(bdev, man, place, ctx, > - ticket); > - if (unlikely(ret != 0)) > - return ret; > - } while (1); > - > - return ttm_bo_add_move_fence(bo, man, *mem, ctx- > >no_wait_gpu); > -} > - > /** > - * ttm_bo_mem_space > + * ttm_bo_alloc_resource - Allocate backing store for a BO > * > - * @bo: Pointer to a struct ttm_buffer_object. the data of which > - * we want to allocate space for. > - * @placement: Proposed new placement for the buffer object. > - * @mem: A struct ttm_resource. > + * @bo: Pointer to a struct ttm_buffer_object of which we want a > resource for > + * @placement: Proposed new placement for the buffer object > * @ctx: if and how to sleep, lock buffers and alloc memory > + * @force_space: If we should evict buffers to force space > + * @res: The resulting struct ttm_resource. > * > - * Allocate memory space for the buffer object pointed to by @bo, > using > - * the placement flags in @placement, potentially evicting other > idle buffer objects. > - * This function may sleep while waiting for space to become > available. > + * Allocates a resource for the buffer object pointed to by @bo, > using the > + * placement flags in @placement, potentially evicting other buffer > objects when > + * @force_space is true. > + * This function may sleep while waiting for resources to become > available. > * Returns: > - * -EBUSY: No space available (only if no_wait == 1). > + * -EBUSY: No space available (only if no_wait == true). > * -ENOSPC: Could not allocate space for the buffer object, either > due to > * fragmentation or concurrent allocators. > * -ERESTARTSYS: An interruptible sleep was interrupted by a signal. > */ > -int ttm_bo_mem_space(struct ttm_buffer_object *bo, > - struct ttm_placement *placement, > - struct ttm_resource **mem, > - struct ttm_operation_ctx *ctx) > +static int ttm_bo_alloc_resource(struct ttm_buffer_object *bo, > + struct ttm_placement *placement, > + struct ttm_operation_ctx *ctx, > + bool force_space, > + struct ttm_resource **res) > { > struct ttm_device *bdev = bo->bdev; > - bool type_found = false; > + struct ww_acquire_ctx *ticket; > int i, ret; > > + ticket = dma_resv_locking_ctx(bo->base.resv); > ret = dma_resv_reserve_fences(bo->base.resv, 1); > if (unlikely(ret)) > return
Re: Re: Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6
On Fri, 2024-01-26 at 16:22 -0600, Lucas De Marchi wrote: > On Fri, Jan 26, 2024 at 04:16:58PM -0600, Lucas De Marchi wrote: > > On Thu, Jan 18, 2024 at 05:38:16PM +0100, Thomas Hellström wrote: > > > > > > On 1/17/24 13:27, Thomas Hellström wrote: > > > > > > > > On 1/17/24 11:47, Thomas Hellström wrote: > > > > > Hi, Christian > > > > > > > > > > Xe changes look good. Will send the series to xe ci to check > > > > > for > > > > > regressions. > > > > > > > > Hmm, there are some checkpatch warnings about author / SOB > > > > email > > > > mismatch, > > > > > > With those fixed, this patch is > > > > > > Reviewed-by: Thomas Hellström > > > > > > it actually broke drm-tip now that this is merged: > > > > ../drivers/gpu/drm/xe/xe_bo.c:41:10: error: ‘struct ttm_placement’ > > has no member named ‘num_busy_placement’; did you mean > > ‘num_placement’ > > 41 | .num_busy_placement = 1, > > | ^~ > > | num_placement > > ../drivers/gpu/drm/xe/xe_bo.c:41:31: error: excess elements in > > struct initializer [-Werror] > > 41 | .num_busy_placement = 1, > > | ^ > > > > > > Apparently a conflict with another patch that got applied a few > > days > > ago: a201c6ee37d6 ("drm/xe/bo: Evict VRAM to TT rather than to > > system") > > oh, no... apparently that commit is from a long time ago. The > problem > was that drm-misc-next was not yet in sync with drm-next. Thomas, do > you > have a fixup for this to put in rerere? > > Lucas De Marchi I added this as a manual fixup and ran some quick igt tests. Seems to work.
Re: [PATCH 4/5] drm/ttm: improve idle/busy handling v3
On 1/18/24 15:24, Thomas Hellström wrote: On Fri, 2024-01-12 at 13:51 +0100, Christian König wrote: Previously we would never try to move a BO into the preferred placements when it ever landed in a busy placement since those were considered compatible. Rework the whole handling and finally unify the idle and busy handling. ttm_bo_validate() is now responsible to try idle placement first and then use the busy placement if that didn't worked. Drawback is that we now always try the idle placement first for each validation which might cause some additional CPU overhead on overcommit. v2: fix kerneldoc warning and coding style v3: take care of XE as well Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- drivers/gpu/drm/ttm/ttm_bo.c | 131 --- -- drivers/gpu/drm/ttm/ttm_resource.c | 16 ++- drivers/gpu/drm/xe/xe_bo.c | 4 +- include/drm/ttm/ttm_bo.h | 3 +- include/drm/ttm/ttm_resource.h | 3 +- 7 files changed, 68 insertions(+), 93 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index b671b0665492..06fb3fc47eaa 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct amdgpu_device *adev, (*bo_ptr)->placements[i].lpfn = (offset + size) >> PAGE_SHIFT; } r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, - &(*bo_ptr)->tbo.resource, &ctx); + &(*bo_ptr)->tbo.resource, &ctx, false); if (r) goto error; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 8722beba494e..f23cdc7c5b08 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -966,7 +966,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo) placements.mem_type = TTM_PL_TT; placements.flags = bo->resource->placement; - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); if (unlikely(r)) return r; diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index a5e11a92e0b9..3783be24d832 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct ttm_buffer_object *bo, hop_placement.placement = hop; /* find space in the bounce domain */ - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, true); if (ret) return ret; /* move to the bounce domain */ @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object *bo, return ttm_bo_pipeline_gutting(bo); } - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, true); This is what breaks xe's selftest since the evict-flags preferred placement is never tried so it changes the behavior. The xe evict flags were set up to "Try to evict to TT first, but if that causes recursive eviction, try to evict to system". This here ignores the preferred eviction placement. So "Preferred" in effect changes meaning from "Don't try this if it will cause an eviction", to "Don't try this in the evict path", which is hard for the driver to have any knowledge about. Then also it sounds from the commit message with this patch "Preferred" also gets overloaded with "Always retry preferred on validate", but shouldn't that really be a dynamic driver decision and not something TTM should try to enforce in a static way? Drivers could have short- circuited the ttm_bo_validate() call if it succeeded once, and have a deeper thought about when to migrate from, say TT to VRAM and vice versa. For the specific behaviour sought here, there is (or at least used to be) a construct in the vmwgfx driver that first called ttm_bo_validate() with VRAM as preferred placement and no fallback. If that failed due to VRAM being full, it called ttm_bo_validate() again, this time with fallback and VRAM allowing eviction. /Thomas To conclude here, a suggestion would be, 1) Could we separate the semantics of the "Preferred" flag. Perhaps creating yet another one. first flag: "Retry this placement even if bo is in a valid placement" second flag: "Don't use this placement if it would cause an eviction" (Retain ol
Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6
On 1/17/24 13:27, Thomas Hellström wrote: On 1/17/24 11:47, Thomas Hellström wrote: Hi, Christian Xe changes look good. Will send the series to xe ci to check for regressions. Hmm, there are some checkpatch warnings about author / SOB email mismatch, With those fixed, this patch is Reviewed-by: Thomas Hellström But worserthere are some regressions in the dma-buf ktest (it tests evicting of a dynamic dma-buf), https://patchwork.freedesktop.org/series/128873/ I'll take a look later today or tomorrow. These are from the next patch. Will continue the discussion there. /Thomas /Thomas /Thomas On 1/12/24 13:51, Christian König wrote: From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX v6: implement some missing VMWGFX functionality pointed out by Zack, rename the flags as suggested by Michel, rebase on drm-tip and adjust XE as well Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 11 +--- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 37 +-- drivers/gpu/drm/loongson/lsdc_ttm.c | 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c | 8 +-- drivers/gpu/drm/radeon/radeon_uvd.c | 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 --- drivers/gpu/drm/ttm/ttm_resource.c | 73 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 33 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 -- drivers/gpu/drm/xe/xe_bo.c | 33 +- include/drm/ttm/ttm_placement.h | 10 +-- include/drm/ttm/ttm_resource.h | 8 +-- 19 files changed, 118 insertions(+), 197 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 425cebcc5cbf..b671b0665492 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1397,8 +1394,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_DESIRED; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 75c9fd2c6c2a..8722beba494e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gm
Re: [PATCH 4/5] drm/ttm: improve idle/busy handling v3
On Fri, 2024-01-12 at 13:51 +0100, Christian König wrote: > Previously we would never try to move a BO into the preferred > placements > when it ever landed in a busy placement since those were considered > compatible. > > Rework the whole handling and finally unify the idle and busy > handling. > ttm_bo_validate() is now responsible to try idle placement first and > then > use the busy placement if that didn't worked. > > Drawback is that we now always try the idle placement first for each > validation which might cause some additional CPU overhead on > overcommit. > > v2: fix kerneldoc warning and coding style > v3: take care of XE as well > > Signed-off-by: Christian König > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 2 +- > drivers/gpu/drm/ttm/ttm_bo.c | 131 --- > -- > drivers/gpu/drm/ttm/ttm_resource.c | 16 ++- > drivers/gpu/drm/xe/xe_bo.c | 4 +- > include/drm/ttm/ttm_bo.h | 3 +- > include/drm/ttm/ttm_resource.h | 3 +- > 7 files changed, 68 insertions(+), 93 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > index b671b0665492..06fb3fc47eaa 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c > @@ -404,7 +404,7 @@ int amdgpu_bo_create_kernel_at(struct > amdgpu_device *adev, > (*bo_ptr)->placements[i].lpfn = (offset + size) >> > PAGE_SHIFT; > } > r = ttm_bo_mem_space(&(*bo_ptr)->tbo, &(*bo_ptr)->placement, > - &(*bo_ptr)->tbo.resource, &ctx); > + &(*bo_ptr)->tbo.resource, &ctx, false); > if (r) > goto error; > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > index 8722beba494e..f23cdc7c5b08 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c > @@ -966,7 +966,7 @@ int amdgpu_ttm_alloc_gart(struct > ttm_buffer_object *bo) > placements.mem_type = TTM_PL_TT; > placements.flags = bo->resource->placement; > > - r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx); > + r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx, true); > if (unlikely(r)) > return r; > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > index a5e11a92e0b9..3783be24d832 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -414,7 +414,7 @@ static int ttm_bo_bounce_temp_buffer(struct > ttm_buffer_object *bo, > hop_placement.placement = hop; > > /* find space in the bounce domain */ > - ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx); > + ret = ttm_bo_mem_space(bo, &hop_placement, &hop_mem, ctx, > true); > if (ret) > return ret; > /* move to the bounce domain */ > @@ -454,7 +454,7 @@ static int ttm_bo_evict(struct ttm_buffer_object > *bo, > return ttm_bo_pipeline_gutting(bo); > } > > - ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx); > + ret = ttm_bo_mem_space(bo, &placement, &evict_mem, ctx, > true); This is what breaks xe's selftest since the evict-flags preferred placement is never tried so it changes the behavior. The xe evict flags were set up to "Try to evict to TT first, but if that causes recursive eviction, try to evict to system". This here ignores the preferred eviction placement. So "Preferred" in effect changes meaning from "Don't try this if it will cause an eviction", to "Don't try this in the evict path", which is hard for the driver to have any knowledge about. Then also it sounds from the commit message with this patch "Preferred" also gets overloaded with "Always retry preferred on validate", but shouldn't that really be a dynamic driver decision and not something TTM should try to enforce in a static way? Drivers could have short- circuited the ttm_bo_validate() call if it succeeded once, and have a deeper thought about when to migrate from, say TT to VRAM and vice versa. For the specific behaviour sought here, there is (or at least used to be) a construct in the vmwgfx driver that first called ttm_bo_validate() with VRAM as preferred placement and no fallback. If that failed due to VRAM being full, it called ttm_bo_validate() again, this time with fallback and VRAM allowing eviction. /Thomas > if (ret) { > if (ret != -ERESTARTSYS) { > pr_err("Failed to find memory space for > buffer 0x%p eviction\n", > @@ -724,37 +724,6 @@ static int ttm_bo_add_move_fence(struct > ttm_buffer_object *bo, > return ret; > } > > -/* > - * Repeatedly evict memory from the LRU for @mem_type until we > create enough > - * space, or we've evicted everything and there isn't enough space. > - */ > -stati
Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6
On 1/17/24 11:47, Thomas Hellström wrote: Hi, Christian Xe changes look good. Will send the series to xe ci to check for regressions. Hmm, there are some checkpatch warnings about author / SOB email mismatch, But worserthere are some regressions in the dma-buf ktest (it tests evicting of a dynamic dma-buf), https://patchwork.freedesktop.org/series/128873/ I'll take a look later today or tomorrow. /Thomas /Thomas On 1/12/24 13:51, Christian König wrote: From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX v6: implement some missing VMWGFX functionality pointed out by Zack, rename the flags as suggested by Michel, rebase on drm-tip and adjust XE as well Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 11 +--- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 37 +-- drivers/gpu/drm/loongson/lsdc_ttm.c | 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c | 8 +-- drivers/gpu/drm/radeon/radeon_uvd.c | 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 --- drivers/gpu/drm/ttm/ttm_resource.c | 73 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 33 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 -- drivers/gpu/drm/xe/xe_bo.c | 33 +- include/drm/ttm/ttm_placement.h | 10 +-- include/drm/ttm/ttm_resource.h | 8 +-- 19 files changed, 118 insertions(+), 197 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 425cebcc5cbf..b671b0665492 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1397,8 +1394,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_DESIRED; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 75c9fd2c6c2a..8722beba494e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@
Re: [PATCH 3/5] drm/ttm: replace busy placement with flags v6
Hi, Christian Xe changes look good. Will send the series to xe ci to check for regressions. /Thomas On 1/12/24 13:51, Christian König wrote: From: Somalapuram Amaranath Instead of a list of separate busy placement add flags which indicate that a placement should only be used when there is room or if we need to evict. v2: add missing TTM_PL_FLAG_IDLE for i915 v3: fix auto build test ERROR on drm-tip/drm-tip v4: fix some typos pointed out by checkpatch v5: cleanup some rebase problems with VMWGFX v6: implement some missing VMWGFX functionality pointed out by Zack, rename the flags as suggested by Michel, rebase on drm-tip and adjust XE as well Signed-off-by: Christian König Signed-off-by: Somalapuram Amaranath --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +--- drivers/gpu/drm/drm_gem_vram_helper.c | 2 - drivers/gpu/drm/i915/gem/i915_gem_ttm.c| 37 +-- drivers/gpu/drm/loongson/lsdc_ttm.c| 2 - drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +++-- drivers/gpu/drm/nouveau/nouveau_bo.h | 1 - drivers/gpu/drm/qxl/qxl_object.c | 2 - drivers/gpu/drm/qxl/qxl_ttm.c | 2 - drivers/gpu/drm/radeon/radeon_object.c | 2 - drivers/gpu/drm/radeon/radeon_ttm.c| 8 +-- drivers/gpu/drm/radeon/radeon_uvd.c| 1 - drivers/gpu/drm/ttm/ttm_bo.c | 21 --- drivers/gpu/drm/ttm/ttm_resource.c | 73 +- drivers/gpu/drm/vmwgfx/vmwgfx_bo.c | 33 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_ttm_buffer.c | 4 -- drivers/gpu/drm/xe/xe_bo.c | 33 +- include/drm/ttm/ttm_placement.h| 10 +-- include/drm/ttm/ttm_resource.h | 8 +-- 19 files changed, 118 insertions(+), 197 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c index 425cebcc5cbf..b671b0665492 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -220,9 +220,6 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain) placement->num_placement = c; placement->placement = places; - - placement->num_busy_placement = c; - placement->busy_placement = places; } /** @@ -1397,8 +1394,7 @@ vm_fault_t amdgpu_bo_fault_reserve_notify(struct ttm_buffer_object *bo) AMDGPU_GEM_DOMAIN_GTT); /* Avoid costly evictions; only set GTT as a busy placement */ - abo->placement.num_busy_placement = 1; - abo->placement.busy_placement = &abo->placements[1]; + abo->placements[0].flags |= TTM_PL_FLAG_DESIRED; r = ttm_bo_validate(bo, &abo->placement, &ctx); if (unlikely(r == -EBUSY || r == -ERESTARTSYS)) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 75c9fd2c6c2a..8722beba494e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -102,23 +102,19 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, /* Don't handle scatter gather BOs */ if (bo->type == ttm_bo_type_sg) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } /* Object isn't an AMDGPU object so ignore */ if (!amdgpu_bo_is_amdgpu_bo(bo)) { placement->placement = &placements; - placement->busy_placement = &placements; placement->num_placement = 1; - placement->num_busy_placement = 1; return; } abo = ttm_to_amdgpu_bo(bo); if (abo->flags & AMDGPU_GEM_CREATE_DISCARDABLE) { placement->num_placement = 0; - placement->num_busy_placement = 0; return; } @@ -128,13 +124,13 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, case AMDGPU_PL_OA: case AMDGPU_PL_DOORBELL: placement->num_placement = 0; - placement->num_busy_placement = 0; return; case TTM_PL_VRAM: if (!adev->mman.buffer_funcs_enabled) { /* Move to system memory */ amdgpu_bo_placement_from_domain(abo, AMDGPU_GEM_DOMAIN_CPU); + } else if (!amdgpu_gmc_vram_full_visible(&adev->gmc) && !(abo->flags & AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED) && amdgpu_bo_in_cpu_visible_vram(abo)) { @@ -149,8 +145,7 @@ static void amdgpu_evict_flags(struct ttm_buffer_object *bo, AMDGPU_GEM_DOMAIN_CPU); abo->placements[0].fpfn = adev->gmc.visible_vram_size >> PAGE_SHIFT; abo->placements[0].lpfn = 0;
Re: [PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space
Hi, On 1/12/24 13:51, Christian König wrote: Only convert it to ENOMEM in ttm_bo_validate. This allows ttm_bo_validate to distinct between an out of memory NIT: s/distinct/distinguish/ situation and just out of space in a placement domain. In fact it would be nice if this could be propagated back to drivers as well at some point, but then perhaps guarded with a flag in the operation context. In any case Reviewed-by: Thomas Hellström Signed-off-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index edf10618fe2b..8c1eaa74fa21 100644 --- a/drivers/gpu/drm/ttm/ttm_bo.c +++ b/drivers/gpu/drm/ttm/ttm_bo.c @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object *bo, goto error; } - ret = -ENOMEM; + ret = -ENOSPC; if (!type_found) { pr_err(TTM_PFX "No compatible memory type found\n"); ret = -EINVAL; @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, return -EINVAL; ret = ttm_bo_move_buffer(bo, placement, ctx); + /* For backward compatibility with userspace */ + if (ret == -ENOSPC) + return -ENOMEM; if (ret) return ret;
Re: [PATCH 2/5] drm/ttm: return ENOSPC from ttm_bo_mem_space
On Tue, 2024-01-09 at 08:47 +0100, Christian König wrote: > Only convert it to ENOMEM in ttm_bo_validate. > Could we have a more elaborate commit description here, (why is this change needed)? > Signed-off-by: Christian König > --- > drivers/gpu/drm/ttm/ttm_bo.c | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/ttm/ttm_bo.c > b/drivers/gpu/drm/ttm/ttm_bo.c > index edf10618fe2b..8c1eaa74fa21 100644 > --- a/drivers/gpu/drm/ttm/ttm_bo.c > +++ b/drivers/gpu/drm/ttm/ttm_bo.c > @@ -830,7 +830,7 @@ int ttm_bo_mem_space(struct ttm_buffer_object > *bo, > goto error; > } > > - ret = -ENOMEM; > + ret = -ENOSPC; > if (!type_found) { > pr_err(TTM_PFX "No compatible memory type found\n"); > ret = -EINVAL; > @@ -916,6 +916,9 @@ int ttm_bo_validate(struct ttm_buffer_object *bo, > return -EINVAL; > > ret = ttm_bo_move_buffer(bo, placement, ctx); > + /* For backward compatibility with userspace */ > + if (ret == -ENOSPC) > + return -ENOMEM; > if (ret) > return ret; >
Re: Rework TTMs busy handling
Hi, Christian On Tue, 2024-01-09 at 08:47 +0100, Christian König wrote: > Hi guys, > > I'm trying to make this functionality a bit more useful for years now > since we multiple reports that behavior of drivers can be suboptimal > when multiple placements be given. > > So basically instead of hacking around the TTM behavior in the driver > once more I've gone ahead and changed the idle/busy placement list > into idle/busy placement flags. This not only saves a bunch of code, > but also allows setting some placements as fallback which are used if > allocating from the preferred ones didn't worked. > > Zack pointed out that some removed VMWGFX code was brought back > because > of rebasing, fixed in this version. > > Intel CI seems to be happy with those patches, so any more comments? Looks like Xe changes are missing? (xe is now in drm-tip). I also have some doubts about the naming "idle" vs "busy", since an elaborate eviction mechanism would probably at some point want to check for gpu idle vs gpu busy, and this might create some confusion moving forward for people confusing busy as in memory overcommit with busy as in gpu activity. I can't immediately think of something better, though. /Thomas > > Regards, > Christian. > >
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
On 11/10/23 11:42, Christian König wrote: Am 10.11.23 um 10:39 schrieb Thomas Hellström: [SNIP] I was thinking more of the general design of a base-class that needs to be refcounted. Say a driver vm that inherits from gpu-vm, gem_object and yet another base-class that supplies its own refcount. What's the best-practice way to do refcounting? All base-classes supplying a refcount of its own, or the subclass supplying a refcount and the base-classes supply destroy helpers. From my experience the most common design pattern in the Linux kernel is that you either have reference counted objects which contain a private pointer (like struct file, struct inode etc..) or the lifetime is defined by the user of the object instead of reference counting and in this case you can embed it into your own object. But to be clear this is nothing I see needing urgent attention. Well, I have never seen stuff like that in the kernel. Might be that this works, but I would rather not try if avoidable. That would also make it possible for the driver to decide the context for the put() call: If the driver needs to be able to call put() from irq / atomic context but the base-class'es destructor doesn't allow atomic context, the driver can push freeing out to a work item if needed. Finally, the refcount overflow Christian pointed out. Limiting the number of mapping sounds like a reasonable remedy to me. Well that depends, I would rather avoid having a dependency for mappings. Taking the CPU VM handling as example as far as I know vm_area_structs doesn't grab a reference to their mm_struct either. Instead they get automatically destroyed when the mm_struct is destroyed. Certainly, that would be possible. However, thinking about it, this might call for huge trouble. First of all, we'd still need to reference count a GPUVM and take a reference for each VM_BO, as we do already. Now instead of simply increasing the reference count for each mapping as well, we'd need a *mandatory* driver callback that is called when the GPUVM reference count drops to zero. Maybe something like vm_destroy(). The reason is that GPUVM can't just remove all mappings from the tree nor can it free them by itself, since drivers might use them for tracking their allocated page tables and/or other stuff. Now, let's think about the scope this callback might be called from. When a VM_BO is destroyed the driver might hold a couple of locks (for Xe it would be the VM's shared dma-resv lock and potentially the corresponding object's dma-resv lock if they're not the same already). If destroying this VM_BO leads to the VM being destroyed, the drivers vm_destroy() callback would be called with those locks being held as well. I feel like doing this finally opens the doors of the locking hell entirely. I think we should really avoid that. I don't think we need to worry much about this particular locking hell because if we hold I have to agree with Danilo here. Especially you have cases where you usually lock BO->VM (for example eviction) as well as cases where you need to lock VM->BO (command submission). Because of this in amdgpu we used (or abused?) the dma_resv of the root BO as lock for the VM. Since this is a ww_mutex locking it in both VM, BO as well as BO, VM order works. Yes, gpuvm is doing the same. (although not necessarily using the page-table root bo, but any bo of the driver's choice). But I read it as Danilo feared the case where the VM destructor was called with a VM resv (or possibly bo resv) held. I meant the driver can easily ensure that's not happening, and in some cases it can't happen. Thanks, Thomas Regards, Christian. , for example a vm and bo resv when putting the vm_bo, we need to keep additional strong references for the bo / vm pointer we use for unlocking. Hence putting the vm_bo under those locks can never lead to the vm getting destroyed. Also, don't we already sort of have a mandatory vm_destroy callback? + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free)) + return; That's a really good point, but I fear exactly that's the use case. I would expect that VM_BO structures are added in the drm_gem_object_funcs.open callback and freed in drm_gem_object_funcs.close. Since it is perfectly legal for userspace to close a BO while there are still mappings (can trivial be that the app is killed) I would expect that the drm_gem_object_funcs.close handling is something like asking drm_gpuvm destroying the VM_BO and getting the mappings which should be cleared in the page table in return. In amdgpu we even go a step further and the VM structure keeps track of all the mappings of deleted VM_BOs so that higher level can query those and clear them later on. Background is that the drm_gem_object_funcs.close can't fail, bu
Re: [Nouveau] [PATCH drm-misc-next v9 09/12] drm/gpuvm: reference count drm_gpuvm structures
On 11/8/23 01:12, Danilo Krummrich wrote: Implement reference counting for struct drm_gpuvm. Signed-off-by: Danilo Krummrich Reviewed-by: Thomas Hellström --- drivers/gpu/drm/drm_gpuvm.c| 56 +- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 20 ++--- include/drm/drm_gpuvm.h| 31 +- 3 files changed, 90 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 53e2c406fb04..ef968eba6fe6 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -746,6 +746,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name, gpuvm->rb.tree = RB_ROOT_CACHED; INIT_LIST_HEAD(&gpuvm->rb.list); + kref_init(&gpuvm->kref); + gpuvm->name = name ? name : "unknown"; gpuvm->flags = flags; gpuvm->ops = ops; @@ -770,15 +772,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name, } EXPORT_SYMBOL_GPL(drm_gpuvm_init); -/** - * drm_gpuvm_destroy() - cleanup a &drm_gpuvm - * @gpuvm: pointer to the &drm_gpuvm to clean up - * - * Note that it is a bug to call this function on a manager that still - * holds GPU VA mappings. - */ -void -drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) +static void +drm_gpuvm_fini(struct drm_gpuvm *gpuvm) { gpuvm->name = NULL; @@ -790,7 +785,35 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) drm_gem_object_put(gpuvm->r_obj); } -EXPORT_SYMBOL_GPL(drm_gpuvm_destroy); + +static void +drm_gpuvm_free(struct kref *kref) +{ + struct drm_gpuvm *gpuvm = container_of(kref, struct drm_gpuvm, kref); + + drm_gpuvm_fini(gpuvm); + + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free)) + return; + + gpuvm->ops->vm_free(gpuvm); +} + +/** + * drm_gpuvm_put() - drop a struct drm_gpuvm reference + * @gpuvm: the &drm_gpuvm to release the reference of + * + * This releases a reference to @gpuvm. + * + * This function may be called from atomic context. + */ +void +drm_gpuvm_put(struct drm_gpuvm *gpuvm) +{ + if (gpuvm) + kref_put(&gpuvm->kref, drm_gpuvm_free); +} +EXPORT_SYMBOL_GPL(drm_gpuvm_put); static int __drm_gpuva_insert(struct drm_gpuvm *gpuvm, @@ -839,11 +862,21 @@ drm_gpuva_insert(struct drm_gpuvm *gpuvm, { u64 addr = va->va.addr; u64 range = va->va.range; + int ret; if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, range))) return -EINVAL; - return __drm_gpuva_insert(gpuvm, va); + ret = __drm_gpuva_insert(gpuvm, va); + if (likely(!ret)) + /* Take a reference of the GPUVM for the successfully inserted +* drm_gpuva. We can't take the reference in +* __drm_gpuva_insert() itself, since we don't want to increse +* the reference count for the GPUVM's kernel_alloc_node. +*/ + drm_gpuvm_get(gpuvm); + + return ret; } EXPORT_SYMBOL_GPL(drm_gpuva_insert); @@ -876,6 +909,7 @@ drm_gpuva_remove(struct drm_gpuva *va) } __drm_gpuva_remove(va); + drm_gpuvm_put(va->vm); } EXPORT_SYMBOL_GPL(drm_gpuva_remove); diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c b/drivers/gpu/drm/nouveau/nouveau_uvmm.c index 54be12c1272f..cb2f06565c46 100644 --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c @@ -1780,6 +1780,18 @@ nouveau_uvmm_bo_unmap_all(struct nouveau_bo *nvbo) } } +static void +nouveau_uvmm_free(struct drm_gpuvm *gpuvm) +{ + struct nouveau_uvmm *uvmm = uvmm_from_gpuvm(gpuvm); + + kfree(uvmm); +} + +static const struct drm_gpuvm_ops gpuvm_ops = { + .vm_free = nouveau_uvmm_free, +}; + int nouveau_uvmm_ioctl_vm_init(struct drm_device *dev, void *data, @@ -1830,7 +1842,7 @@ nouveau_uvmm_ioctl_vm_init(struct drm_device *dev, NOUVEAU_VA_SPACE_END, init->kernel_managed_addr, init->kernel_managed_size, - NULL); + &gpuvm_ops); /* GPUVM takes care from here on. */ drm_gem_object_put(r_obj); @@ -1849,8 +1861,7 @@ nouveau_uvmm_ioctl_vm_init(struct drm_device *dev, return 0; out_gpuvm_fini: - drm_gpuvm_destroy(&uvmm->base); - kfree(uvmm); + drm_gpuvm_put(&uvmm->base); out_unlock: mutex_unlock(&cli->mutex); return ret; @@ -1902,7 +1913,6 @@ nouveau_uvmm_fini(struct nouveau_uvmm *uvmm) mutex_lock(&cli->mutex); nouveau_vmm_fini(&uvmm->vmm); - drm_gpuvm_destroy(&uvmm->base); - kfree(uvmm); + drm_gpuvm_put(&uvmm->base); mutex_unlock(&cli->mutex); } diff --git a/include/drm/drm_gpuvm
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
On 11/10/23 09:50, Christian König wrote: Am 09.11.23 um 19:34 schrieb Danilo Krummrich: On 11/9/23 17:03, Christian König wrote: Am 09.11.23 um 16:50 schrieb Thomas Hellström: [SNIP] Did we get any resolution on this? FWIW, my take on this is that it would be possible to get GPUVM to work both with and without internal refcounting; If with, the driver needs a vm close to resolve cyclic references, if without that's not necessary. If GPUVM is allowed to refcount in mappings and vm_bos, that comes with a slight performance drop but as Danilo pointed out, the VM lifetime problem iterating over a vm_bo's mapping becomes much easier and the code thus becomes easier to maintain moving forward. That convinced me it's a good thing. I strongly believe you guys stumbled over one of the core problems with the VM here and I think that reference counting is the right answer to solving this. The big question is that what is reference counted and in which direction does the dependency points, e.g. we have here VM, BO, BO_VM and Mapping objects. Those patches here suggest a counted Mapping -> VM reference and I'm pretty sure that this isn't a good idea. What we should rather really have is a BO -> VM or BO_VM ->VM reference. In other words that each BO which is part of the VM keeps a reference to the VM. We have both. Please see the subsequent patch introducing VM_BO structures for that. As I explained, mappings (struct drm_gpuva) keep a pointer to their VM they're mapped in and besides that it doesn't make sense to free a VM that still contains mappings, the reference count ensures that. This simply ensures memory safety. BTW: At least in amdgpu we can have BOs which (temporary) doesn't have any mappings, but are still considered part of the VM. That should be possible. Another issue Christian brought up is that something intended to be embeddable (a base class) shouldn't really have its own refcount. I think that's a valid point. If you at some point need to derive from multiple such structs each having its own refcount, things will start to get weird. One way to resolve that would be to have the driver's subclass provide get() and put() ops, and export a destructor for the base-class, rather than to have the base-class provide the refcount and a destructor ops. GPUVM simply follows the same pattern we have with drm_gem_objects. And I think it makes sense. Why would we want to embed two struct drm_gpuvm in a single driver structure? Because you need one drm_gpuvm structure for each application using the driver? Or am I missing something? As far as I can see a driver would want to embed that into your fpriv structure which is allocated during drm_driver.open callback. I was thinking more of the general design of a base-class that needs to be refcounted. Say a driver vm that inherits from gpu-vm, gem_object and yet another base-class that supplies its own refcount. What's the best-practice way to do refcounting? All base-classes supplying a refcount of its own, or the subclass supplying a refcount and the base-classes supply destroy helpers. But to be clear this is nothing I see needing urgent attention. Well, I have never seen stuff like that in the kernel. Might be that this works, but I would rather not try if avoidable. That would also make it possible for the driver to decide the context for the put() call: If the driver needs to be able to call put() from irq / atomic context but the base-class'es destructor doesn't allow atomic context, the driver can push freeing out to a work item if needed. Finally, the refcount overflow Christian pointed out. Limiting the number of mapping sounds like a reasonable remedy to me. Well that depends, I would rather avoid having a dependency for mappings. Taking the CPU VM handling as example as far as I know vm_area_structs doesn't grab a reference to their mm_struct either. Instead they get automatically destroyed when the mm_struct is destroyed. Certainly, that would be possible. However, thinking about it, this might call for huge trouble. First of all, we'd still need to reference count a GPUVM and take a reference for each VM_BO, as we do already. Now instead of simply increasing the reference count for each mapping as well, we'd need a *mandatory* driver callback that is called when the GPUVM reference count drops to zero. Maybe something like vm_destroy(). The reason is that GPUVM can't just remove all mappings from the tree nor can it free them by itself, since drivers might use them for tracking their allocated page tables and/or other stuff. Now, let's think about the scope this callback might be called from. When a VM_BO is destroyed the driver might hold a couple of locks (for Xe it would be the VM's shared dma-resv lock and potentially the
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
Danilo, Christian On 11/6/23 17:42, Danilo Krummrich wrote: On Mon, Nov 06, 2023 at 04:10:50PM +0100, Christian König wrote: Am 06.11.23 um 15:11 schrieb Danilo Krummrich: On Mon, Nov 06, 2023 at 02:05:13PM +0100, Christian König wrote: Am 06.11.23 um 13:16 schrieb Danilo Krummrich: [SNIP] This reference count just prevents that the VM is freed as long as other ressources are attached to it that carry a VM pointer, such as mappings and VM_BOs. The motivation for that are VM_BOs. For mappings it's indeed a bit paranoid, but it doesn't hurt either and keeps it consistant. Ah! Yeah, we have similar semantics in amdgpu as well. But we keep the reference to the root GEM object and not the VM. Ok, that makes much more sense then keeping one reference for each mapping. Because of this the mapping should *never* have a reference to the VM, but rather the VM destroys all mapping when it is destroyed itself. Hence, If the VM is still alive at a point where you don't expect it to be, then it's simply a driver bug. Driver bugs is just what I try to prevent here. When individual mappings keep the VM structure alive then drivers are responsible to clean them up, if the VM cleans up after itself then we don't need to worry about it in the driver. Drivers are *always* responsible for that. This has nothing to do with whether the VM is reference counted or not. GPUVM can't clean up mappings after itself. Why not? I feel like we're talking past each other here, at least to some extend. However, I can't yet see where exactly the misunderstanding resides. +1 At least in amdgpu we have it exactly like that. E.g. the higher level can cleanup the BO_VM structure at any time possible, even when there are mappings. What do you mean with "cleanup the VM_BO structue" exactly? The VM_BO structure keeps track of all the mappings mapped in the VM_BO's VM being backed by the VM_BO's GEM object. And the GEM objects keeps a list of the corresponding VM_BOs. Hence, as long as there are mappings that this VM_BO keeps track of, this VM_BO should stay alive. No, exactly the other way around. When the VM_BO structure is destroyed the mappings are destroyed with them. This seems to be the same misunderstanding as with the VM reference count. It seems to me that you want to say that for amdgpu it seems to be a use-case to get rid of all mappings backed by a given BO and mapped in a given VM, hence a VM_BO. You can do that. Thers's even a helper for that in GPUVM. But also in this case you first need to get rid of all mappings before you *free* the VM_BO - GPUVM ensures that. Otherwise you would need to destroy each individual mapping separately before teardown which is quite inefficient. Not sure what you mean, but I don't see a difference between walking all VM_BOs and removing their mappings and walking the VM's tree of mappings and removing each of them. Comes down to the same effort in the end. But surely can go both ways if you know all the existing VM_BOs. The VM then keeps track which areas still need to be invalidated in the physical representation of the page tables. And the VM does that through its tree of mappings (struct drm_gpuva). Hence, if the VM would just remove those structures on cleanup by itself, you'd loose the ability of cleaning up the page tables. Unless, you track this separately, which would make the whole tracking of GPUVM itself kinda pointless. But how do you then keep track of areas which are freed and needs to be updated so that nobody can access the underlying memory any more? "areas which are freed", what do refer to? What do yo mean with that? Do you mean areas of the VA space not containing mappings? Why would I need to track them explicitly? When the mapping is removed the corresponding page tables / page table entries are gone as well, hence no subsequent access to the underlaying memory would be possible. I would expect that the generalized GPU VM handling would need something similar. If we leave that to the driver then each driver would have to implement that stuff on it's own again. Similar to what? What exactly do you think can be generalized here? Similar to how amdgpu works. I don't think it's quite fair to just throw the "look at what amdgpu does" argument at me. What am I supposed to do? Read and understand *every* detail of *every* driver? Did you read through the GPUVM code? That's a honest question and I'm asking it because I feel like you're picking up some details from commit messages and start questioning them (and that's perfectly fine and absolutely welcome). But if the answers don't satisfy you or do not lead to a better understanding it just seems you ask others to check out amdgpu rather than taking the time to go though the proposed code yourself making suggestions to improve it or explicitly point out the changes you require. From what I can see you are basically re-inventing everything we already have in there and asking the same questions
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
On Thu, 2023-11-02 at 18:32 +0100, Danilo Krummrich wrote: > Hi Thomas, > > thanks for your timely response on that! > > On 11/2/23 18:09, Thomas Hellström wrote: > > On Thu, 2023-11-02 at 00:31 +0100, Danilo Krummrich wrote: > > > Implement reference counting for struct drm_gpuvm. > > > > > > Signed-off-by: Danilo Krummrich > > > --- > > > drivers/gpu/drm/drm_gpuvm.c | 44 > > > +++- > > > -- > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 20 +--- > > > include/drm/drm_gpuvm.h | 31 +- > > > 3 files changed, 78 insertions(+), 17 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > b/drivers/gpu/drm/drm_gpuvm.c > > > index 53e2c406fb04..6a88eafc5229 100644 > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > @@ -746,6 +746,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > > > char *name, > > > gpuvm->rb.tree = RB_ROOT_CACHED; > > > INIT_LIST_HEAD(&gpuvm->rb.list); > > > > > > + kref_init(&gpuvm->kref); > > > + > > > gpuvm->name = name ? name : "unknown"; > > > gpuvm->flags = flags; > > > gpuvm->ops = ops; > > > @@ -770,15 +772,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, > > > const > > > char *name, > > > } > > > EXPORT_SYMBOL_GPL(drm_gpuvm_init); > > > > > > -/** > > > - * drm_gpuvm_destroy() - cleanup a &drm_gpuvm > > > - * @gpuvm: pointer to the &drm_gpuvm to clean up > > > - * > > > - * Note that it is a bug to call this function on a manager that > > > still > > > - * holds GPU VA mappings. > > > - */ > > > -void > > > -drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > > > +static void > > > +drm_gpuvm_fini(struct drm_gpuvm *gpuvm) > > > { > > > gpuvm->name = NULL; > > > > > > @@ -790,7 +785,33 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > > > > > > drm_gem_object_put(gpuvm->r_obj); > > > } > > > -EXPORT_SYMBOL_GPL(drm_gpuvm_destroy); > > > + > > > +static void > > > +drm_gpuvm_free(struct kref *kref) > > > +{ > > > + struct drm_gpuvm *gpuvm = container_of(kref, struct > > > drm_gpuvm, kref); > > > + > > > + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free)) > > > + return; > > > + > > > + drm_gpuvm_fini(gpuvm); > > > + > > > + gpuvm->ops->vm_free(gpuvm); > > > +} > > > + > > > +/** > > > + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm reference > > copy-paste error in function name. > > > > Also it appears like xe might put a vm from irq context so we > > should > > document the context where this function call is allowable, and if > > applicable add a might_sleep(). > > From GPUVM PoV I don't see why we can't call this from an IRQ > context. > It depends on the driver callbacks of GPUVM (->vm_free) and the resv > GEM's > free callback. Both are controlled by the driver. Hence, I don't see > the > need for a restriction here. OK. we should keep in mind though, that if such a restriction is needed in the future, it might be some work to fix the drivers. > > > > > If this function needs to sleep we can work around that in Xe by > > keeping an xe-private refcount for the xe vm container, but I'd > > like to > > avoid that if possible and piggy-back on the refcount introduced > > here. > > > > > + * @gpuvm: the &drm_gpuvm to release the reference of > > > + * > > > + * This releases a reference to @gpuvm. > > > + */ > > > +void > > > +drm_gpuvm_put(struct drm_gpuvm *gpuvm) > > > +{ > > > + if (gpuvm) > > > + kref_put(&gpuvm->kref, drm_gpuvm_free); > > > +} > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_put); > > > > > > static int > > > __drm_gpuva_insert(struct drm_gpuvm *gpuvm, > > > @@ -843,7 +864,7 @@ drm_gpuva_insert(struct drm_gpuvm *gpuvm, > > > if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, > > > range))) > > > return -EINVAL; > > > > > > - return __drm_gpuva_insert(gp
Re: [Nouveau] [PATCH drm-misc-next v8 10/12] drm/gpuvm: add an abstraction for a VM / BO combination
gpuva_link(ctx->new_va, ctx->vm_bo); > * > * // prevent the new GPUVA from being freed in > * // driver_mapping_create() > @@ -577,22 +611,23 @@ > * int driver_gpuva_remap(struct drm_gpuva_op *op, void *__ctx) > * { > * struct driver_context *ctx = __ctx; > + * struct drm_gpuva *va = op->remap.unmap->va; > * > * drm_gpuva_remap(ctx->prev_va, ctx->next_va, &op- > >remap); > * > - * drm_gpuva_unlink(op->remap.unmap->va); > - * kfree(op->remap.unmap->va); > - * > * if (op->remap.prev) { > - * drm_gpuva_link(ctx->prev_va); > + * drm_gpuva_link(ctx->prev_va, va->vm_bo); > * ctx->prev_va = NULL; > * } > * > * if (op->remap.next) { > - * drm_gpuva_link(ctx->next_va); > + * drm_gpuva_link(ctx->next_va, va->vm_bo); > * ctx->next_va = NULL; > * } > * > + * drm_gpuva_unlink(va); > + * kfree(va); > + * > * return 0; > * } > * > @@ -813,6 +848,195 @@ drm_gpuvm_put(struct drm_gpuvm *gpuvm) > } > EXPORT_SYMBOL_GPL(drm_gpuvm_put); > > +/** > + * drm_gpuvm_bo_create() - create a new instance of struct > drm_gpuvm_bo > + * @gpuvm: The &drm_gpuvm the @obj is mapped in. > + * @obj: The &drm_gem_object being mapped in the @gpuvm. > + * > + * If provided by the driver, this function uses the &drm_gpuvm_ops > + * vm_bo_alloc() callback to allocate. > + * > + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on > failure > + */ > +struct drm_gpuvm_bo * > +drm_gpuvm_bo_create(struct drm_gpuvm *gpuvm, > + struct drm_gem_object *obj) > +{ > + const struct drm_gpuvm_ops *ops = gpuvm->ops; > + struct drm_gpuvm_bo *vm_bo; > + > + if (ops && ops->vm_bo_alloc) > + vm_bo = ops->vm_bo_alloc(); > + else > + vm_bo = kzalloc(sizeof(*vm_bo), GFP_KERNEL); > + > + if (unlikely(!vm_bo)) > + return NULL; > + > + vm_bo->vm = drm_gpuvm_get(gpuvm); > + vm_bo->obj = obj; > + drm_gem_object_get(obj); > + > + kref_init(&vm_bo->kref); > + INIT_LIST_HEAD(&vm_bo->list.gpuva); > + INIT_LIST_HEAD(&vm_bo->list.entry.gem); > + > + return vm_bo; > +} > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_create); > + > +static void > +drm_gpuvm_bo_destroy(struct kref *kref) > +{ > + struct drm_gpuvm_bo *vm_bo = container_of(kref, struct > drm_gpuvm_bo, > + kref); > + struct drm_gpuvm *gpuvm = vm_bo->vm; > + const struct drm_gpuvm_ops *ops = gpuvm->ops; > + struct drm_gem_object *obj = vm_bo->obj; > + bool lock = !drm_gpuvm_resv_protected(gpuvm); > + > + if (!lock) > + drm_gpuvm_resv_assert_held(gpuvm); > + > + drm_gem_gpuva_assert_lock_held(obj); > + list_del(&vm_bo->list.entry.gem); > + > + if (ops && ops->vm_bo_free) > + ops->vm_bo_free(vm_bo); > + else > + kfree(vm_bo); > + > + drm_gpuvm_put(gpuvm); > + drm_gem_object_put(obj); > +} > + > +/** > + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm_bo reference > + * @vm_bo: the &drm_gpuvm_bo to release the reference of > + * > + * This releases a reference to @vm_bo. > + * > + * If the reference count drops to zero, the &gpuvm_bo is destroyed, > which > + * includes removing it from the GEMs gpuva list. Hence, if a call > to this > + * function can potentially let the reference count to zero the > caller must > + * hold the dma-resv or driver specific GEM gpuva lock. Should Ideally document the context for this function as well, to avoid future pitfalls and arguments, and also potentially a might_sleep(). Reviewed-by: Thomas Hellström
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
On Thu, 2023-11-02 at 00:31 +0100, Danilo Krummrich wrote: > Implement reference counting for struct drm_gpuvm. > > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 44 +++- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 20 +--- > include/drm/drm_gpuvm.h | 31 +- > 3 files changed, 78 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 53e2c406fb04..6a88eafc5229 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -746,6 +746,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > gpuvm->rb.tree = RB_ROOT_CACHED; > INIT_LIST_HEAD(&gpuvm->rb.list); > > + kref_init(&gpuvm->kref); > + > gpuvm->name = name ? name : "unknown"; > gpuvm->flags = flags; > gpuvm->ops = ops; > @@ -770,15 +772,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > } > EXPORT_SYMBOL_GPL(drm_gpuvm_init); > > -/** > - * drm_gpuvm_destroy() - cleanup a &drm_gpuvm > - * @gpuvm: pointer to the &drm_gpuvm to clean up > - * > - * Note that it is a bug to call this function on a manager that > still > - * holds GPU VA mappings. > - */ > -void > -drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > +static void > +drm_gpuvm_fini(struct drm_gpuvm *gpuvm) > { > gpuvm->name = NULL; > > @@ -790,7 +785,33 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > > drm_gem_object_put(gpuvm->r_obj); > } > -EXPORT_SYMBOL_GPL(drm_gpuvm_destroy); > + > +static void > +drm_gpuvm_free(struct kref *kref) > +{ > + struct drm_gpuvm *gpuvm = container_of(kref, struct > drm_gpuvm, kref); > + > + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free)) > + return; > + > + drm_gpuvm_fini(gpuvm); > + > + gpuvm->ops->vm_free(gpuvm); > +} > + > +/** > + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm reference copy-paste error in function name. Also it appears like xe might put a vm from irq context so we should document the context where this function call is allowable, and if applicable add a might_sleep(). If this function needs to sleep we can work around that in Xe by keeping an xe-private refcount for the xe vm container, but I'd like to avoid that if possible and piggy-back on the refcount introduced here. > + * @gpuvm: the &drm_gpuvm to release the reference of > + * > + * This releases a reference to @gpuvm. > + */ > +void > +drm_gpuvm_put(struct drm_gpuvm *gpuvm) > +{ > + if (gpuvm) > + kref_put(&gpuvm->kref, drm_gpuvm_free); > +} > +EXPORT_SYMBOL_GPL(drm_gpuvm_put); > > static int > __drm_gpuva_insert(struct drm_gpuvm *gpuvm, > @@ -843,7 +864,7 @@ drm_gpuva_insert(struct drm_gpuvm *gpuvm, > if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, range))) > return -EINVAL; > > - return __drm_gpuva_insert(gpuvm, va); > + return __drm_gpuva_insert(drm_gpuvm_get(gpuvm), va); Here we leak a reference if __drm_gpuva_insert() fails, and IMO the reference should be taken where the pointer holding the reference is assigned (in this case in __drm_gpuva_insert()), or document the reference transfer from the argument close to the assignment. But since a va itself is not refcounted it clearly can't outlive the vm, so is a reference really needed here? I'd suggest using an accessor that instead of using va->vm uses va- >vm_bo->vm, to avoid needing to worry about the vm->vm refcount altoghether. Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v8 10/12] drm/gpuvm: add an abstraction for a VM / BO combination
On Thu, 2023-11-02 at 00:31 +0100, Danilo Krummrich wrote: > Add an abstraction layer between the drm_gpuva mappings of a > particular > drm_gem_object and this GEM object itself. The abstraction represents > a > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > holds > a list of drm_gpuvm_bo structures (the structure representing this > abstraction), while each drm_gpuvm_bo contains list of mappings of > this > GEM object. > > This has multiple advantages: > > 1) We can use the drm_gpuvm_bo structure to attach it to various > lists > of the drm_gpuvm. This is useful for tracking external and evicted > objects per VM, which is introduced in subsequent patches. > > 2) Finding mappings of a certain drm_gem_object mapped in a certain > drm_gpuvm becomes much cheaper. > > 3) Drivers can derive and extend the structure to easily represent > driver specific states of a BO for a certain GPUVM. > > The idea of this abstraction was taken from amdgpu, hence the credit > for > this idea goes to the developers of amdgpu. > > Cc: Christian König > Reviewed-by: Boris Brezillon > Signed-off-by: Danilo Krummrich Reviewed-by: Thomas Hellström > --- > drivers/gpu/drm/drm_gpuvm.c | 336 +-- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 63 +++-- > include/drm/drm_gem.h | 32 +-- > include/drm/drm_gpuvm.h | 185 +- > 4 files changed, 530 insertions(+), 86 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 6a88eafc5229..2c8fdefb19f0 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -70,6 +70,18 @@ > * &drm_gem_object, such as the &drm_gem_object containing the root > page table, > * but it can also be a 'dummy' object, which can be allocated with > * drm_gpuvm_resv_object_alloc(). > + * > + * In order to connect a struct drm_gpuva its backing > &drm_gem_object each > + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and > each > + * &drm_gpuvm_bo contains a list of &drm_gpuva structures. > + * > + * A &drm_gpuvm_bo is an abstraction that represents a combination > of a > + * &drm_gpuvm and a &drm_gem_object. Every such combination should > be unique. > + * This is ensured by the API through drm_gpuvm_bo_obtain() and > + * drm_gpuvm_bo_obtain_prealloc() which first look into the > corresponding > + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance > of this > + * particular combination. If not existent a new instance is created > and linked > + * to the &drm_gem_object. > */ > > /** > @@ -395,21 +407,28 @@ > /** > * DOC: Locking > * > - * Generally, the GPU VA manager does not take care of locking > itself, it is > - * the drivers responsibility to take care about locking. Drivers > might want to > - * protect the following operations: inserting, removing and > iterating > - * &drm_gpuva objects as well as generating all kinds of operations, > such as > - * split / merge or prefetch. > - * > - * The GPU VA manager also does not take care of the locking of the > backing > - * &drm_gem_object buffers GPU VA lists by itself; drivers are > responsible to > - * enforce mutual exclusion using either the GEMs dma_resv lock or > alternatively > - * a driver specific external lock. For the latter see also > - * drm_gem_gpuva_set_lock(). > - * > - * However, the GPU VA manager contains lockdep checks to ensure > callers of its > - * API hold the corresponding lock whenever the &drm_gem_objects GPU > VA list is > - * accessed by functions such as drm_gpuva_link() or > drm_gpuva_unlink(). > + * In terms of managing &drm_gpuva entries DRM GPUVM does not take > care of > + * locking itself, it is the drivers responsibility to take care > about locking. > + * Drivers might want to protect the following operations: > inserting, removing > + * and iterating &drm_gpuva objects as well as generating all kinds > of > + * operations, such as split / merge or prefetch. > + * > + * DRM GPUVM also does not take care of the locking of the backing > + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo > abstractions by > + * itself; drivers are responsible to enforce mutual exclusion using > either the > + * GEMs dma_resv lock or alternatively a driver specific external > lock. For the > + * latter see also drm_gem_gpuva_set_lock(). > + * > + * However, DRM GPUVM contains lockdep checks to ensure callers of > its API hold > + * the co
Re: [Nouveau] [PATCH drm-misc-next v8 09/12] drm/gpuvm: reference count drm_gpuvm structures
On Thu, 2023-11-02 at 00:31 +0100, Danilo Krummrich wrote: > Implement reference counting for struct drm_gpuvm. > > Signed-off-by: Danilo Krummrich Will port the Xe series over to check that it works properly and get back with review on this one. > --- > drivers/gpu/drm/drm_gpuvm.c | 44 +++- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 20 +--- > include/drm/drm_gpuvm.h | 31 +- > 3 files changed, 78 insertions(+), 17 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 53e2c406fb04..6a88eafc5229 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -746,6 +746,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > gpuvm->rb.tree = RB_ROOT_CACHED; > INIT_LIST_HEAD(&gpuvm->rb.list); > > + kref_init(&gpuvm->kref); > + > gpuvm->name = name ? name : "unknown"; > gpuvm->flags = flags; > gpuvm->ops = ops; > @@ -770,15 +772,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > } > EXPORT_SYMBOL_GPL(drm_gpuvm_init); > > -/** > - * drm_gpuvm_destroy() - cleanup a &drm_gpuvm > - * @gpuvm: pointer to the &drm_gpuvm to clean up > - * > - * Note that it is a bug to call this function on a manager that > still > - * holds GPU VA mappings. > - */ > -void > -drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > +static void > +drm_gpuvm_fini(struct drm_gpuvm *gpuvm) > { > gpuvm->name = NULL; > > @@ -790,7 +785,33 @@ drm_gpuvm_destroy(struct drm_gpuvm *gpuvm) > > drm_gem_object_put(gpuvm->r_obj); > } > -EXPORT_SYMBOL_GPL(drm_gpuvm_destroy); > + > +static void > +drm_gpuvm_free(struct kref *kref) > +{ > + struct drm_gpuvm *gpuvm = container_of(kref, struct > drm_gpuvm, kref); > + > + if (drm_WARN_ON(gpuvm->drm, !gpuvm->ops->vm_free)) > + return; > + > + drm_gpuvm_fini(gpuvm); > + > + gpuvm->ops->vm_free(gpuvm); > +} > + > +/** > + * drm_gpuvm_bo_put() - drop a struct drm_gpuvm reference > + * @gpuvm: the &drm_gpuvm to release the reference of > + * > + * This releases a reference to @gpuvm. > + */ > +void > +drm_gpuvm_put(struct drm_gpuvm *gpuvm) > +{ > + if (gpuvm) > + kref_put(&gpuvm->kref, drm_gpuvm_free); > +} > +EXPORT_SYMBOL_GPL(drm_gpuvm_put); > > static int > __drm_gpuva_insert(struct drm_gpuvm *gpuvm, > @@ -843,7 +864,7 @@ drm_gpuva_insert(struct drm_gpuvm *gpuvm, > if (unlikely(!drm_gpuvm_range_valid(gpuvm, addr, range))) > return -EINVAL; > > - return __drm_gpuva_insert(gpuvm, va); > + return __drm_gpuva_insert(drm_gpuvm_get(gpuvm), va); > } > EXPORT_SYMBOL_GPL(drm_gpuva_insert); > > @@ -876,6 +897,7 @@ drm_gpuva_remove(struct drm_gpuva *va) > } > > __drm_gpuva_remove(va); > + drm_gpuvm_put(va->vm); > } > EXPORT_SYMBOL_GPL(drm_gpuva_remove); > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > index 54be12c1272f..cb2f06565c46 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > @@ -1780,6 +1780,18 @@ nouveau_uvmm_bo_unmap_all(struct nouveau_bo > *nvbo) > } > } > > +static void > +nouveau_uvmm_free(struct drm_gpuvm *gpuvm) > +{ > + struct nouveau_uvmm *uvmm = uvmm_from_gpuvm(gpuvm); > + > + kfree(uvmm); > +} > + > +static const struct drm_gpuvm_ops gpuvm_ops = { > + .vm_free = nouveau_uvmm_free, > +}; > + > int > nouveau_uvmm_ioctl_vm_init(struct drm_device *dev, > void *data, > @@ -1830,7 +1842,7 @@ nouveau_uvmm_ioctl_vm_init(struct drm_device > *dev, > NOUVEAU_VA_SPACE_END, > init->kernel_managed_addr, > init->kernel_managed_size, > - NULL); > + &gpuvm_ops); > /* GPUVM takes care from here on. */ > drm_gem_object_put(r_obj); > > @@ -1849,8 +1861,7 @@ nouveau_uvmm_ioctl_vm_init(struct drm_device > *dev, > return 0; > > out_gpuvm_fini: > - drm_gpuvm_destroy(&uvmm->base); > - kfree(uvmm); > + drm_gpuvm_put(&uvmm->base); > out_unlock: > mutex_unlock(&cli->mutex); > return ret; > @@ -1902,7 +1913,6 @@ nouveau_uvmm_fini(struct nouveau_uvmm *uvmm) > > mutex_lock(&cli->mutex); > nouveau_vmm_fini(&uvmm->vmm); > - drm_gpuvm_destroy(&uvmm->base); > - kfree(uvmm); > + drm_gpuvm_put(&uvmm->base); > mutex_unlock(&cli->mutex); > } > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > index 0c2e24155a93..4e6e1fd3485a 100644 > --- a/include/drm/drm_gpuvm.h > +++ b/include/drm/drm_gpuvm.h > @@ -247,6 +247,11 @@ struct drm_gpuvm { > struct list_head list; > } rb; > > + /** > + * @kref: reference count of this object > + */ > +
Re: [Nouveau] [PATCH drm-misc-next v8 03/12] drm/gpuvm: export drm_gpuvm_range_valid()
On Thu, 2023-11-02 at 00:30 +0100, Danilo Krummrich wrote: > Drivers may use this function to validate userspace requests in > advance, > hence export it. > > Signed-off-by: Danilo Krummrich Reviewed-by: Thomas Hellström > --- > drivers/gpu/drm/drm_gpuvm.c | 14 +- > include/drm/drm_gpuvm.h | 1 + > 2 files changed, 14 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 445767f8fbc4..2669f9bbc377 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -649,7 +649,18 @@ drm_gpuvm_in_kernel_node(struct drm_gpuvm > *gpuvm, u64 addr, u64 range) > return krange && addr < kend && kstart < end; > } > > -static bool > +/** > + * drm_gpuvm_range_valid() - checks whether the given range is valid > for the > + * given &drm_gpuvm > + * @gpuvm: the GPUVM to check the range for > + * @addr: the base address > + * @range: the range starting from the base address > + * > + * Checks whether the range is within the GPUVM's managed > boundaries. > + * > + * Returns: true for a valid range, false otherwise > + */ > +bool > drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm, > u64 addr, u64 range) > { > @@ -657,6 +668,7 @@ drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm, > drm_gpuvm_in_mm_range(gpuvm, addr, range) && > !drm_gpuvm_in_kernel_node(gpuvm, addr, range); > } > +EXPORT_SYMBOL_GPL(drm_gpuvm_range_valid); > > /** > * drm_gpuvm_init() - initialize a &drm_gpuvm > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > index 687fd5893624..13eac6f70061 100644 > --- a/include/drm/drm_gpuvm.h > +++ b/include/drm/drm_gpuvm.h > @@ -253,6 +253,7 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, > const char *name, > const struct drm_gpuvm_ops *ops); > void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm); > > +bool drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm, u64 addr, u64 > range); > bool drm_gpuvm_interval_empty(struct drm_gpuvm *gpuvm, u64 addr, u64 > range); > > static inline struct drm_gpuva *
Re: [Nouveau] [PATCH drm-misc-next v8 02/12] drm/gpuvm: don't always WARN in drm_gpuvm_check_overflow()
On Thu, 2023-11-02 at 00:30 +0100, Danilo Krummrich wrote: > Don't always WARN in drm_gpuvm_check_overflow() and separate it into > a > drm_gpuvm_check_overflow() and a dedicated > drm_gpuvm_warn_check_overflow() variant. > > This avoids printing warnings due to invalid userspace requests. > > Signed-off-by: Danilo Krummrich Reviewed-by: Thomas Hellström > --- > drivers/gpu/drm/drm_gpuvm.c | 20 +--- > 1 file changed, 13 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index d7367a202fee..445767f8fbc4 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -614,12 +614,18 @@ static int __drm_gpuva_insert(struct drm_gpuvm > *gpuvm, > static void __drm_gpuva_remove(struct drm_gpuva *va); > > static bool > -drm_gpuvm_check_overflow(struct drm_gpuvm *gpuvm, u64 addr, u64 > range) > +drm_gpuvm_check_overflow(u64 addr, u64 range) > { > u64 end; > > - return drm_WARN(gpuvm->drm, check_add_overflow(addr, range, > &end), > - "GPUVA address limited to %zu bytes.\n", > sizeof(end)); > + return check_add_overflow(addr, range, &end); > +} > + > +static bool > +drm_gpuvm_warn_check_overflow(struct drm_gpuvm *gpuvm, u64 addr, u64 > range) > +{ > + return drm_WARN(gpuvm->drm, drm_gpuvm_check_overflow(addr, > range), > + "GPUVA address limited to %zu bytes.\n", > sizeof(addr)); > } > > static bool > @@ -647,7 +653,7 @@ static bool > drm_gpuvm_range_valid(struct drm_gpuvm *gpuvm, > u64 addr, u64 range) > { > - return !drm_gpuvm_check_overflow(gpuvm, addr, range) && > + return !drm_gpuvm_check_overflow(addr, range) && > drm_gpuvm_in_mm_range(gpuvm, addr, range) && > !drm_gpuvm_in_kernel_node(gpuvm, addr, range); > } > @@ -682,7 +688,7 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > gpuvm->ops = ops; > gpuvm->drm = drm; > > - drm_gpuvm_check_overflow(gpuvm, start_offset, range); > + drm_gpuvm_warn_check_overflow(gpuvm, start_offset, range); > gpuvm->mm_start = start_offset; > gpuvm->mm_range = range; > > @@ -691,8 +697,8 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, const > char *name, > gpuvm->kernel_alloc_node.va.addr = reserve_offset; > gpuvm->kernel_alloc_node.va.range = reserve_range; > > - if (likely(!drm_gpuvm_check_overflow(gpuvm, > reserve_offset, > - reserve_range))) > + if (likely(!drm_gpuvm_warn_check_overflow(gpuvm, > reserve_offset, > + > reserve_range))) > __drm_gpuva_insert(gpuvm, &gpuvm- > >kernel_alloc_node); > } > }
Re: [Nouveau] [PATCH drm-misc-next v8 12/12] drm/nouveau: use GPUVM common infrastructure
On Thu, 2023-11-02 at 00:31 +0100, Danilo Krummrich wrote: > GPUVM provides common infrastructure to track external and evicted > GEM > objects as well as locking and validation helpers. > > Especially external and evicted object tracking is a huge improvement > compared to the current brute force approach of iterating all > mappings > in order to lock and validate the GPUVM's GEM objects. Hence, make us > of > it. > > Signed-off-by: Danilo Krummrich NIT: Multiple checkpatch warnings in this one. > --- > drivers/gpu/drm/nouveau/nouveau_bo.c | 4 +- > drivers/gpu/drm/nouveau/nouveau_exec.c | 57 -- > drivers/gpu/drm/nouveau/nouveau_exec.h | 4 - > drivers/gpu/drm/nouveau/nouveau_sched.c | 9 ++- > drivers/gpu/drm/nouveau/nouveau_sched.h | 7 +- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 99 --- > -- > 6 files changed, 90 insertions(+), 90 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c > b/drivers/gpu/drm/nouveau/nouveau_bo.c > index 7afad86da64b..b7dda486a7ea 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_bo.c > +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c > @@ -1061,17 +1061,18 @@ nouveau_bo_move(struct ttm_buffer_object *bo, > bool evict, > { > struct nouveau_drm *drm = nouveau_bdev(bo->bdev); > struct nouveau_bo *nvbo = nouveau_bo(bo); > + struct drm_gem_object *obj = &bo->base; > struct ttm_resource *old_reg = bo->resource; > struct nouveau_drm_tile *new_tile = NULL; > int ret = 0; > > - > if (new_reg->mem_type == TTM_PL_TT) { > ret = nouveau_ttm_tt_bind(bo->bdev, bo->ttm, > new_reg); > if (ret) > return ret; > } > > + drm_gpuvm_bo_gem_evict(obj, evict); > nouveau_bo_move_ntfy(bo, new_reg); > ret = ttm_bo_wait_ctx(bo, ctx); > if (ret) > @@ -1136,6 +1137,7 @@ nouveau_bo_move(struct ttm_buffer_object *bo, > bool evict, > out_ntfy: > if (ret) { > nouveau_bo_move_ntfy(bo, bo->resource); > + drm_gpuvm_bo_gem_evict(obj, !evict); > } > return ret; > } > diff --git a/drivers/gpu/drm/nouveau/nouveau_exec.c > b/drivers/gpu/drm/nouveau/nouveau_exec.c > index bf6c12f4342a..9d9835fb5970 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_exec.c > +++ b/drivers/gpu/drm/nouveau/nouveau_exec.c > @@ -1,7 +1,5 @@ > // SPDX-License-Identifier: MIT > > -#include > - > #include "nouveau_drv.h" > #include "nouveau_gem.h" > #include "nouveau_mem.h" > @@ -86,14 +84,12 @@ > */ > > static int > -nouveau_exec_job_submit(struct nouveau_job *job) > +nouveau_exec_job_submit(struct nouveau_job *job, > + struct drm_gpuvm_exec *vme) > { > struct nouveau_exec_job *exec_job = to_nouveau_exec_job(job); > struct nouveau_cli *cli = job->cli; > struct nouveau_uvmm *uvmm = nouveau_cli_uvmm(cli); > - struct drm_exec *exec = &job->exec; > - struct drm_gem_object *obj; > - unsigned long index; > int ret; > > /* Create a new fence, but do not emit yet. */ > @@ -102,52 +98,29 @@ nouveau_exec_job_submit(struct nouveau_job *job) > return ret; > > nouveau_uvmm_lock(uvmm); > - drm_exec_init(exec, DRM_EXEC_INTERRUPTIBLE_WAIT | > - DRM_EXEC_IGNORE_DUPLICATES); > - drm_exec_until_all_locked(exec) { > - struct drm_gpuva *va; > - > - drm_gpuvm_for_each_va(va, &uvmm->base) { > - if (unlikely(va == &uvmm- > >base.kernel_alloc_node)) > - continue; > - > - ret = drm_exec_prepare_obj(exec, va->gem.obj, > 1); > - drm_exec_retry_on_contention(exec); > - if (ret) > - goto err_uvmm_unlock; > - } > + ret = drm_gpuvm_exec_lock(vme); > + if (ret) { > + nouveau_uvmm_unlock(uvmm); > + return ret; > } > nouveau_uvmm_unlock(uvmm); > > - drm_exec_for_each_locked_object(exec, index, obj) { > - struct nouveau_bo *nvbo = nouveau_gem_object(obj); > - > - ret = nouveau_bo_validate(nvbo, true, false); > - if (ret) > - goto err_exec_fini; > + ret = drm_gpuvm_exec_validate(vme); > + if (ret) { > + drm_gpuvm_exec_unlock(vme); > + return ret; > } > > return 0; > - > -err_uvmm_unlock: > - nouveau_uvmm_unlock(uvmm); > -err_exec_fini: > - drm_exec_fini(exec); > - return ret; > - > } > > static void > -nouveau_exec_job_armed_submit(struct nouveau_job *job) > +nouveau_exec_job_armed_submit(struct nouveau_job *job, > + struct drm_gpuvm_exec *vme) > { > - struct drm_exec *exec = &job->exec; > - struct drm_gem_object *obj; > - unsign
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Wed, 2023-11-01 at 18:21 +0100, Danilo Krummrich wrote: > On 11/1/23 17:38, Thomas Hellström wrote: > > On Tue, 2023-10-31 at 18:38 +0100, Danilo Krummrich wrote: > > > On 10/31/23 11:32, Thomas Hellström wrote: > > > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > > > Add an abstraction layer between the drm_gpuva mappings of a > > > > > particular > > > > > drm_gem_object and this GEM object itself. The abstraction > > > > > represents > > > > > a > > > > > combination of a drm_gem_object and drm_gpuvm. The > > > > > drm_gem_object > > > > > holds > > > > > a list of drm_gpuvm_bo structures (the structure representing > > > > > this > > > > > abstraction), while each drm_gpuvm_bo contains list of > > > > > mappings > > > > > of > > > > > this > > > > > GEM object. > > > > > > > > > > This has multiple advantages: > > > > > > > > > > 1) We can use the drm_gpuvm_bo structure to attach it to > > > > > various > > > > > lists > > > > > of the drm_gpuvm. This is useful for tracking external > > > > > and > > > > > evicted > > > > > objects per VM, which is introduced in subsequent > > > > > patches. > > > > > > > > > > 2) Finding mappings of a certain drm_gem_object mapped in a > > > > > certain > > > > > drm_gpuvm becomes much cheaper. > > > > > > > > > > 3) Drivers can derive and extend the structure to easily > > > > > represent > > > > > driver specific states of a BO for a certain GPUVM. > > > > > > > > > > The idea of this abstraction was taken from amdgpu, hence the > > > > > credit > > > > > for > > > > > this idea goes to the developers of amdgpu. > > > > > > > > > > Cc: Christian König > > > > > Signed-off-by: Danilo Krummrich > > > > > --- > > > > > drivers/gpu/drm/drm_gpuvm.c | 335 > > > > > +-- > > > > > -- > > > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > > > > > include/drm/drm_gem.h | 32 +-- > > > > > include/drm/drm_gpuvm.h | 188 > > > > > +- > > > > > 4 files changed, 533 insertions(+), 86 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > > > b/drivers/gpu/drm/drm_gpuvm.c > > > > > index c03332883432..7f4f5919f84c 100644 > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > > > @@ -70,6 +70,18 @@ > > > > > * &drm_gem_object, such as the &drm_gem_object containing > > > > > the > > > > > root > > > > > page table, > > > > > * but it can also be a 'dummy' object, which can be > > > > > allocated > > > > > with > > > > > * drm_gpuvm_resv_object_alloc(). > > > > > + * > > > > > + * In order to connect a struct drm_gpuva its backing > > > > > &drm_gem_object each > > > > > + * &drm_gem_object maintains a list of &drm_gpuvm_bo > > > > > structures, > > > > > and > > > > > each > > > > > + * &drm_gpuvm_bo contains a list of &drm_gpuva structures. > > > > > + * > > > > > + * A &drm_gpuvm_bo is an abstraction that represents a > > > > > combination > > > > > of a > > > > > + * &drm_gpuvm and a &drm_gem_object. Every such combination > > > > > should > > > > > be unique. > > > > > + * This is ensured by the API through drm_gpuvm_bo_obtain() > > > > > and > > > > > + * drm_gpuvm_bo_obtain_prealloc() which first look into the > > > > > corresponding > > > > > + * &drm_gem_object list of &drm_gpuvm_bos for an existing > > > > > instance > > > > > of this > > > > > + * particular combination. If not existent a new instance is > > > > > created >
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Tue, 2023-10-31 at 18:38 +0100, Danilo Krummrich wrote: > On 10/31/23 11:32, Thomas Hellström wrote: > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > Add an abstraction layer between the drm_gpuva mappings of a > > > particular > > > drm_gem_object and this GEM object itself. The abstraction > > > represents > > > a > > > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > > > holds > > > a list of drm_gpuvm_bo structures (the structure representing > > > this > > > abstraction), while each drm_gpuvm_bo contains list of mappings > > > of > > > this > > > GEM object. > > > > > > This has multiple advantages: > > > > > > 1) We can use the drm_gpuvm_bo structure to attach it to various > > > lists > > > of the drm_gpuvm. This is useful for tracking external and > > > evicted > > > objects per VM, which is introduced in subsequent patches. > > > > > > 2) Finding mappings of a certain drm_gem_object mapped in a > > > certain > > > drm_gpuvm becomes much cheaper. > > > > > > 3) Drivers can derive and extend the structure to easily > > > represent > > > driver specific states of a BO for a certain GPUVM. > > > > > > The idea of this abstraction was taken from amdgpu, hence the > > > credit > > > for > > > this idea goes to the developers of amdgpu. > > > > > > Cc: Christian König > > > Signed-off-by: Danilo Krummrich > > > --- > > > drivers/gpu/drm/drm_gpuvm.c | 335 > > > +-- > > > -- > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > > > include/drm/drm_gem.h | 32 +-- > > > include/drm/drm_gpuvm.h | 188 +- > > > 4 files changed, 533 insertions(+), 86 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > b/drivers/gpu/drm/drm_gpuvm.c > > > index c03332883432..7f4f5919f84c 100644 > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > @@ -70,6 +70,18 @@ > > > * &drm_gem_object, such as the &drm_gem_object containing the > > > root > > > page table, > > > * but it can also be a 'dummy' object, which can be allocated > > > with > > > * drm_gpuvm_resv_object_alloc(). > > > + * > > > + * In order to connect a struct drm_gpuva its backing > > > &drm_gem_object each > > > + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, > > > and > > > each > > > + * &drm_gpuvm_bo contains a list of &drm_gpuva structures. > > > + * > > > + * A &drm_gpuvm_bo is an abstraction that represents a > > > combination > > > of a > > > + * &drm_gpuvm and a &drm_gem_object. Every such combination > > > should > > > be unique. > > > + * This is ensured by the API through drm_gpuvm_bo_obtain() and > > > + * drm_gpuvm_bo_obtain_prealloc() which first look into the > > > corresponding > > > + * &drm_gem_object list of &drm_gpuvm_bos for an existing > > > instance > > > of this > > > + * particular combination. If not existent a new instance is > > > created > > > and linked > > > + * to the &drm_gem_object. > > > */ > > > > > > /** > > > @@ -395,21 +407,28 @@ > > > /** > > > * DOC: Locking > > > * > > > - * Generally, the GPU VA manager does not take care of locking > > > itself, it is > > > - * the drivers responsibility to take care about locking. > > > Drivers > > > might want to > > > - * protect the following operations: inserting, removing and > > > iterating > > > - * &drm_gpuva objects as well as generating all kinds of > > > operations, > > > such as > > > - * split / merge or prefetch. > > > - * > > > - * The GPU VA manager also does not take care of the locking of > > > the > > > backing > > > - * &drm_gem_object buffers GPU VA lists by itself; drivers are > > > responsible to > > > - * enforce mutual exclusion using either the GEMs dma_resv lock > > > or > > > alternatively > > > - * a driver specific external lock. For the latter see also >
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Wed, 2023-11-01 at 10:41 +0100, Thomas Hellström wrote: > Hi, Danilo, > > On Tue, 2023-10-31 at 18:52 +0100, Danilo Krummrich wrote: > > On 10/31/23 17:45, Thomas Hellström wrote: > > > On Tue, 2023-10-31 at 17:39 +0100, Danilo Krummrich wrote: > > > > On 10/31/23 12:25, Thomas Hellström wrote: > > > > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > > > > Add an abstraction layer between the drm_gpuva mappings of > > > > > > a > > > > > > particular > > > > > > drm_gem_object and this GEM object itself. The abstraction > > > > > > represents > > > > > > a > > > > > > combination of a drm_gem_object and drm_gpuvm. The > > > > > > drm_gem_object > > > > > > holds > > > > > > a list of drm_gpuvm_bo structures (the structure > > > > > > representing > > > > > > this > > > > > > abstraction), while each drm_gpuvm_bo contains list of > > > > > > mappings > > > > > > of > > > > > > this > > > > > > GEM object. > > > > > > > > > > > > This has multiple advantages: > > > > > > > > > > > > 1) We can use the drm_gpuvm_bo structure to attach it to > > > > > > various > > > > > > lists > > > > > > of the drm_gpuvm. This is useful for tracking external > > > > > > and > > > > > > evicted > > > > > > objects per VM, which is introduced in subsequent > > > > > > patches. > > > > > > > > > > > > 2) Finding mappings of a certain drm_gem_object mapped in a > > > > > > certain > > > > > > drm_gpuvm becomes much cheaper. > > > > > > > > > > > > 3) Drivers can derive and extend the structure to easily > > > > > > represent > > > > > > driver specific states of a BO for a certain GPUVM. > > > > > > > > > > > > The idea of this abstraction was taken from amdgpu, hence > > > > > > the > > > > > > credit > > > > > > for > > > > > > this idea goes to the developers of amdgpu. > > > > > > > > > > > > Cc: Christian König > > > > > > Signed-off-by: Danilo Krummrich > > > > > > --- > > > > > > drivers/gpu/drm/drm_gpuvm.c | 335 > > > > > > +-- > > > > > > -- > > > > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > > > > > > include/drm/drm_gem.h | 32 +-- > > > > > > include/drm/drm_gpuvm.h | 188 > > > > > > +- > > > > > > 4 files changed, 533 insertions(+), 86 deletions(-) > > > > > > > > > > That checkpatch.pl error still remains as well. > > > > > > > > I guess you refer to: > > > > > > > > ERROR: do not use assignment in if condition > > > > #633: FILE: drivers/gpu/drm/nouveau/nouveau_uvmm.c:1165: > > > > + if (!(op->gem.obj = obj)) > > > > > > > > This was an intentional decision, since in this specific case > > > > it > > > > seems to > > > > be more readable than the alternatives. > > > > > > > > However, if we consider this to be a hard rule, which we never > > > > ever > > > > break, > > > > I'm fine changing it too. > > > > > > With the errors, sooner or later they are going to start generate > > > patches to "fix" them. In this particular case also Xe CI is > > > complaining and abort building when I submit the Xe adaptation, > > > so > > > it'd > > > be good to be checkpatch.pl conformant IMHO. > > > > Ok, I will change this one. > > > > However, in general my opinion on coding style is that we should > > preserve us > > the privilege to deviate from it when we agree it makes sense and > > improves > > the code quality. > > > > Having a CI forcing people to *blindly* follow certain rules and > > even > > abort > > building isn't very beneficial in that respect. > > > > Also, consider patches which partially change a line of code that > > already > > contains a coding style "issue" - the CI would also block you on > > that > > one I > > guess. Besides that it seems to block you on unrelated code, note > > that the > > assignment in question is from Nouveau and not from GPUVM. > > Yes, I completely agree that having CI enforce error free coding > style > checks is bad, and I'll see if I can get that changed on Xe CI. To my > Knowledge It hasn't always been like that. > > But OTOH my take on this is that if there are coding style rules and > recommendations we should try to follow them unless there are > *strong* > reasons not to. Sometimes that may result in code that may be a > little > harder to read, but OTOH a reviewer won't have to read up on the > component's style flavor before reviewing and it will avoid future > style fix patches. Basically meaning I'll continue to point those out when reviewing in case the author made an oversight, but won't require fixing for an R-B if the component owner thinks otherwise. Thanks, Thomas > > Thanks, > Thomas > > > > > > - Danilo > > > > > > > > Thanks, > > > Thomas > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > Thomas > > > > > > > > > > > > > > >
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
Hi, Danilo, On Tue, 2023-10-31 at 18:52 +0100, Danilo Krummrich wrote: > On 10/31/23 17:45, Thomas Hellström wrote: > > On Tue, 2023-10-31 at 17:39 +0100, Danilo Krummrich wrote: > > > On 10/31/23 12:25, Thomas Hellström wrote: > > > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > > > Add an abstraction layer between the drm_gpuva mappings of a > > > > > particular > > > > > drm_gem_object and this GEM object itself. The abstraction > > > > > represents > > > > > a > > > > > combination of a drm_gem_object and drm_gpuvm. The > > > > > drm_gem_object > > > > > holds > > > > > a list of drm_gpuvm_bo structures (the structure representing > > > > > this > > > > > abstraction), while each drm_gpuvm_bo contains list of > > > > > mappings > > > > > of > > > > > this > > > > > GEM object. > > > > > > > > > > This has multiple advantages: > > > > > > > > > > 1) We can use the drm_gpuvm_bo structure to attach it to > > > > > various > > > > > lists > > > > > of the drm_gpuvm. This is useful for tracking external > > > > > and > > > > > evicted > > > > > objects per VM, which is introduced in subsequent > > > > > patches. > > > > > > > > > > 2) Finding mappings of a certain drm_gem_object mapped in a > > > > > certain > > > > > drm_gpuvm becomes much cheaper. > > > > > > > > > > 3) Drivers can derive and extend the structure to easily > > > > > represent > > > > > driver specific states of a BO for a certain GPUVM. > > > > > > > > > > The idea of this abstraction was taken from amdgpu, hence the > > > > > credit > > > > > for > > > > > this idea goes to the developers of amdgpu. > > > > > > > > > > Cc: Christian König > > > > > Signed-off-by: Danilo Krummrich > > > > > --- > > > > > drivers/gpu/drm/drm_gpuvm.c | 335 > > > > > +-- > > > > > -- > > > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > > > > > include/drm/drm_gem.h | 32 +-- > > > > > include/drm/drm_gpuvm.h | 188 > > > > > +- > > > > > 4 files changed, 533 insertions(+), 86 deletions(-) > > > > > > > > That checkpatch.pl error still remains as well. > > > > > > I guess you refer to: > > > > > > ERROR: do not use assignment in if condition > > > #633: FILE: drivers/gpu/drm/nouveau/nouveau_uvmm.c:1165: > > > + if (!(op->gem.obj = obj)) > > > > > > This was an intentional decision, since in this specific case it > > > seems to > > > be more readable than the alternatives. > > > > > > However, if we consider this to be a hard rule, which we never > > > ever > > > break, > > > I'm fine changing it too. > > > > With the errors, sooner or later they are going to start generate > > patches to "fix" them. In this particular case also Xe CI is > > complaining and abort building when I submit the Xe adaptation, so > > it'd > > be good to be checkpatch.pl conformant IMHO. > > Ok, I will change this one. > > However, in general my opinion on coding style is that we should > preserve us > the privilege to deviate from it when we agree it makes sense and > improves > the code quality. > > Having a CI forcing people to *blindly* follow certain rules and even > abort > building isn't very beneficial in that respect. > > Also, consider patches which partially change a line of code that > already > contains a coding style "issue" - the CI would also block you on that > one I > guess. Besides that it seems to block you on unrelated code, note > that the > assignment in question is from Nouveau and not from GPUVM. Yes, I completely agree that having CI enforce error free coding style checks is bad, and I'll see if I can get that changed on Xe CI. To my Knowledge It hasn't always been like that. But OTOH my take on this is that if there are coding style rules and recommendations we should try to follow them unless there are *strong* reasons not to. Sometimes that may result in code that may be a little harder to read, but OTOH a reviewer won't have to read up on the component's style flavor before reviewing and it will avoid future style fix patches. Thanks, Thomas > > - Danilo > > > > > Thanks, > > Thomas > > > > > > > > > > > > > > > > > > > Thanks, > > > > Thomas > > > > > > > > > >
Re: [Nouveau] [PATCH drm-misc-next v7 5/7] drm/gpuvm: track/lock/validate external/evicted objects
On Tue, 2023-10-31 at 17:41 +0100, Danilo Krummrich wrote: > On 10/31/23 12:34, Thomas Hellström wrote: > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > Currently the DRM GPUVM offers common infrastructure to track GPU > > > VA > > > allocations and mappings, generically connect GPU VA mappings to > > > their > > > backing buffers and perform more complex mapping operations on > > > the > > > GPU VA > > > space. > > > > > > However, there are more design patterns commonly used by drivers, > > > which > > > can potentially be generalized in order to make the DRM GPUVM > > > represent > > > a basis for GPU-VM implementations. In this context, this patch > > > aims > > > at generalizing the following elements. > > > > > > 1) Provide a common dma-resv for GEM objects not being used > > > outside > > > of > > > this GPU-VM. > > > > > > 2) Provide tracking of external GEM objects (GEM objects which > > > are > > > shared with other GPU-VMs). > > > > > > 3) Provide functions to efficiently lock all GEM objects dma-resv > > > the > > > GPU-VM contains mappings of. > > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM contains > > > mappings > > > of, such that validation of evicted GEM objects is > > > accelerated. > > > > > > 5) Provide some convinience functions for common patterns. > > > > > > Big thanks to Boris Brezillon for his help to figure out locking > > > for > > > drivers updating the GPU VA space within the fence signalling > > > path. > > > > > > Suggested-by: Matthew Brost > > > Signed-off-by: Danilo Krummrich > > > > The checkpatch.pl warning still persists: > > WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP > > #627: FILE: drivers/gpu/drm/drm_gpuvm.c:1347: > > + return -ENOTSUPP; > > Hm, I thought I changed this one. Seems like it slipped through. > Gonna > fix that. > > > > > > --- > > > drivers/gpu/drm/drm_gpuvm.c | 633 > > > > > > include/drm/drm_gpuvm.h | 250 ++ > > > 2 files changed, 883 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > b/drivers/gpu/drm/drm_gpuvm.c > > > index 7f4f5919f84c..01cbeb98755a 100644 > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > @@ -82,6 +82,21 @@ > > > * &drm_gem_object list of &drm_gpuvm_bos for an existing > > > instance > > > of this > > > * particular combination. If not existent a new instance is > > > created > > > and linked > > > * to the &drm_gem_object. > > > + * > > > + * &drm_gpuvm_bo structures, since unique for a given > > > &drm_gpuvm, > > > are also used > > > + * as entry for the &drm_gpuvm's lists of external and evicted > > > objects. Those > > > + * lists are maintained in order to accelerate locking of dma- > > > resv > > > locks and > > > + * validation of evicted objects bound in a &drm_gpuvm. For > > > instance, all > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be > > > locked > > > by calling > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call > > > drm_gpuvm_validate() in > > > + * order to validate all evicted &drm_gem_objects. It is also > > > possible to lock > > > + * additional &drm_gem_objects by providing the corresponding > > > parameters to > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop > > > while making > > > + * use of helper functions such as drm_gpuvm_prepare_range() or > > > + * drm_gpuvm_prepare_objects(). > > > + * > > > + * Every bound &drm_gem_object is treated as external object > > > when > > > its &dma_resv > > > + * structure is different than the &drm_gpuvm's common &dma_resv > > > structure. > > > */ > > > > > > /** > > > @@ -429,6 +444,20 @@ > > > * Subsequent calls to drm_gpuvm_bo_obtain() for the same > > > &drm_gpuvm > > > and > > > * &drm_gem_object must be able to observe
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Tue, 2023-10-31 at 17:30 +0100, Danilo Krummrich wrote: > On 10/31/23 12:45, Jani Nikula wrote: > > On Tue, 31 Oct 2023, Thomas Hellström > > wrote: > > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > > + * Returns: a pointer to the &drm_gpuvm_bo on success, NULL on > > > > > > Still needs s/Returns:/Return:/g > > > > FWIW, both work to accommodate the variance across the kernel, > > although > > I think only the latter is documented and recommended. It's also > > the > > most popular: > > > > 10577 Return > > 3596 Returns > > I'd like to keep "Returns", since that's what GPUVM uses already > everywhere else. Ok. It looks like the Returns: are converted to Return in the rendered output so I guess that's why it's the form that is documented. I pointed this out since in the last review you replied you were going to change it, and also when the code starts seeing updates from other, it might become inconsistent if those patches follow the documented way. But I'm OK either way. /Thomas > > > 1104 RETURN > > 568 return > > 367 returns > > 352 RETURNS > > 1 RETURNs > > > > BR, > > Jani. > > > > >
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Tue, 2023-10-31 at 17:39 +0100, Danilo Krummrich wrote: > On 10/31/23 12:25, Thomas Hellström wrote: > > On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > > > Add an abstraction layer between the drm_gpuva mappings of a > > > particular > > > drm_gem_object and this GEM object itself. The abstraction > > > represents > > > a > > > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > > > holds > > > a list of drm_gpuvm_bo structures (the structure representing > > > this > > > abstraction), while each drm_gpuvm_bo contains list of mappings > > > of > > > this > > > GEM object. > > > > > > This has multiple advantages: > > > > > > 1) We can use the drm_gpuvm_bo structure to attach it to various > > > lists > > > of the drm_gpuvm. This is useful for tracking external and > > > evicted > > > objects per VM, which is introduced in subsequent patches. > > > > > > 2) Finding mappings of a certain drm_gem_object mapped in a > > > certain > > > drm_gpuvm becomes much cheaper. > > > > > > 3) Drivers can derive and extend the structure to easily > > > represent > > > driver specific states of a BO for a certain GPUVM. > > > > > > The idea of this abstraction was taken from amdgpu, hence the > > > credit > > > for > > > this idea goes to the developers of amdgpu. > > > > > > Cc: Christian König > > > Signed-off-by: Danilo Krummrich > > > --- > > > drivers/gpu/drm/drm_gpuvm.c | 335 > > > +-- > > > -- > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > > > include/drm/drm_gem.h | 32 +-- > > > include/drm/drm_gpuvm.h | 188 +- > > > 4 files changed, 533 insertions(+), 86 deletions(-) > > > > That checkpatch.pl error still remains as well. > > I guess you refer to: > > ERROR: do not use assignment in if condition > #633: FILE: drivers/gpu/drm/nouveau/nouveau_uvmm.c:1165: > + if (!(op->gem.obj = obj)) > > This was an intentional decision, since in this specific case it > seems to > be more readable than the alternatives. > > However, if we consider this to be a hard rule, which we never ever > break, > I'm fine changing it too. With the errors, sooner or later they are going to start generate patches to "fix" them. In this particular case also Xe CI is complaining and abort building when I submit the Xe adaptation, so it'd be good to be checkpatch.pl conformant IMHO. Thanks, Thomas > > > > > Thanks, > > Thomas > > >
Re: [Nouveau] [PATCH drm-misc-next v7 5/7] drm/gpuvm: track/lock/validate external/evicted objects
On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > Currently the DRM GPUVM offers common infrastructure to track GPU VA > allocations and mappings, generically connect GPU VA mappings to > their > backing buffers and perform more complex mapping operations on the > GPU VA > space. > > However, there are more design patterns commonly used by drivers, > which > can potentially be generalized in order to make the DRM GPUVM > represent > a basis for GPU-VM implementations. In this context, this patch aims > at generalizing the following elements. > > 1) Provide a common dma-resv for GEM objects not being used outside > of > this GPU-VM. > > 2) Provide tracking of external GEM objects (GEM objects which are > shared with other GPU-VMs). > > 3) Provide functions to efficiently lock all GEM objects dma-resv the > GPU-VM contains mappings of. > > 4) Provide tracking of evicted GEM objects the GPU-VM contains > mappings > of, such that validation of evicted GEM objects is accelerated. > > 5) Provide some convinience functions for common patterns. > > Big thanks to Boris Brezillon for his help to figure out locking for > drivers updating the GPU VA space within the fence signalling path. > > Suggested-by: Matthew Brost > Signed-off-by: Danilo Krummrich The checkpatch.pl warning still persists: WARNING: ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP #627: FILE: drivers/gpu/drm/drm_gpuvm.c:1347: + return -ENOTSUPP; > --- > drivers/gpu/drm/drm_gpuvm.c | 633 > > include/drm/drm_gpuvm.h | 250 ++ > 2 files changed, 883 insertions(+) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 7f4f5919f84c..01cbeb98755a 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -82,6 +82,21 @@ > * &drm_gem_object list of &drm_gpuvm_bos for an existing instance > of this > * particular combination. If not existent a new instance is created > and linked > * to the &drm_gem_object. > + * > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, > are also used > + * as entry for the &drm_gpuvm's lists of external and evicted > objects. Those > + * lists are maintained in order to accelerate locking of dma-resv > locks and > + * validation of evicted objects bound in a &drm_gpuvm. For > instance, all > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked > by calling > + * drm_gpuvm_exec_lock(). Once locked drivers can call > drm_gpuvm_validate() in > + * order to validate all evicted &drm_gem_objects. It is also > possible to lock > + * additional &drm_gem_objects by providing the corresponding > parameters to > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop > while making > + * use of helper functions such as drm_gpuvm_prepare_range() or > + * drm_gpuvm_prepare_objects(). > + * > + * Every bound &drm_gem_object is treated as external object when > its &dma_resv > + * structure is different than the &drm_gpuvm's common &dma_resv > structure. > */ > > /** > @@ -429,6 +444,20 @@ > * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm > and > * &drm_gem_object must be able to observe previous creations and > destructions > * of &drm_gpuvm_bos in order to keep instances unique. > + * > + * The &drm_gpuvm's lists for keeping track of external and evicted > objects are > + * protected against concurrent insertion / removal and iteration > internally. > + * > + * However, drivers still need ensure to protect concurrent calls to > functions > + * iterating those lists, namely drm_gpuvm_prepare_objects() and > + * drm_gpuvm_validate(). > + * > + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag > to indicate > + * that the corresponding &dma_resv locks are held in order to > protect the > + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is > disabled and > + * the corresponding lockdep checks are enabled. This is an > optimization for > + * drivers which are capable of taking the corresponding &dma_resv > locks and > + * hence do not require internal locking. > */ > > /** > @@ -641,6 +670,201 @@ > * } > */ > > +/** > + * get_next_vm_bo_from_list() - get the next vm_bo element > + * @__gpuvm: the &drm_gpuvm > + * @__list_name: the name of the list we're iterating on > + * @__local_list: a pointer to the local list used to store already > iterated items > + * @__prev_vm_bo: the previous element we got from > get_next_vm_bo_from_list() > + * > + * This helper is here to provide lockless list iteration. Lockless > as in, the > + * iterator releases the lock immediately after picking the first > element from > + * the list, so list insertion deletion can happen concurrently. > + * > + * Elements popped from the original list are kept in a local list, > so removal > + * and is_empty checks can still happen while we're iterating the > list. > + */ > +#define ge
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > Add an abstraction layer between the drm_gpuva mappings of a > particular > drm_gem_object and this GEM object itself. The abstraction represents > a > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > holds > a list of drm_gpuvm_bo structures (the structure representing this > abstraction), while each drm_gpuvm_bo contains list of mappings of > this > GEM object. > > This has multiple advantages: > > 1) We can use the drm_gpuvm_bo structure to attach it to various > lists > of the drm_gpuvm. This is useful for tracking external and evicted > objects per VM, which is introduced in subsequent patches. > > 2) Finding mappings of a certain drm_gem_object mapped in a certain > drm_gpuvm becomes much cheaper. > > 3) Drivers can derive and extend the structure to easily represent > driver specific states of a BO for a certain GPUVM. > > The idea of this abstraction was taken from amdgpu, hence the credit > for > this idea goes to the developers of amdgpu. > > Cc: Christian König > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 335 +-- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > include/drm/drm_gem.h | 32 +-- > include/drm/drm_gpuvm.h | 188 +- > 4 files changed, 533 insertions(+), 86 deletions(-) That checkpatch.pl error still remains as well. Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v7 4/7] drm/gpuvm: add an abstraction for a VM / BO combination
On Mon, 2023-10-23 at 22:16 +0200, Danilo Krummrich wrote: > Add an abstraction layer between the drm_gpuva mappings of a > particular > drm_gem_object and this GEM object itself. The abstraction represents > a > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > holds > a list of drm_gpuvm_bo structures (the structure representing this > abstraction), while each drm_gpuvm_bo contains list of mappings of > this > GEM object. > > This has multiple advantages: > > 1) We can use the drm_gpuvm_bo structure to attach it to various > lists > of the drm_gpuvm. This is useful for tracking external and evicted > objects per VM, which is introduced in subsequent patches. > > 2) Finding mappings of a certain drm_gem_object mapped in a certain > drm_gpuvm becomes much cheaper. > > 3) Drivers can derive and extend the structure to easily represent > driver specific states of a BO for a certain GPUVM. > > The idea of this abstraction was taken from amdgpu, hence the credit > for > this idea goes to the developers of amdgpu. > > Cc: Christian König > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 335 +-- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > include/drm/drm_gem.h | 32 +-- > include/drm/drm_gpuvm.h | 188 +- > 4 files changed, 533 insertions(+), 86 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index c03332883432..7f4f5919f84c 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -70,6 +70,18 @@ > * &drm_gem_object, such as the &drm_gem_object containing the root > page table, > * but it can also be a 'dummy' object, which can be allocated with > * drm_gpuvm_resv_object_alloc(). > + * > + * In order to connect a struct drm_gpuva its backing > &drm_gem_object each > + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and > each > + * &drm_gpuvm_bo contains a list of &drm_gpuva structures. > + * > + * A &drm_gpuvm_bo is an abstraction that represents a combination > of a > + * &drm_gpuvm and a &drm_gem_object. Every such combination should > be unique. > + * This is ensured by the API through drm_gpuvm_bo_obtain() and > + * drm_gpuvm_bo_obtain_prealloc() which first look into the > corresponding > + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance > of this > + * particular combination. If not existent a new instance is created > and linked > + * to the &drm_gem_object. > */ > > /** > @@ -395,21 +407,28 @@ > /** > * DOC: Locking > * > - * Generally, the GPU VA manager does not take care of locking > itself, it is > - * the drivers responsibility to take care about locking. Drivers > might want to > - * protect the following operations: inserting, removing and > iterating > - * &drm_gpuva objects as well as generating all kinds of operations, > such as > - * split / merge or prefetch. > - * > - * The GPU VA manager also does not take care of the locking of the > backing > - * &drm_gem_object buffers GPU VA lists by itself; drivers are > responsible to > - * enforce mutual exclusion using either the GEMs dma_resv lock or > alternatively > - * a driver specific external lock. For the latter see also > - * drm_gem_gpuva_set_lock(). > - * > - * However, the GPU VA manager contains lockdep checks to ensure > callers of its > - * API hold the corresponding lock whenever the &drm_gem_objects GPU > VA list is > - * accessed by functions such as drm_gpuva_link() or > drm_gpuva_unlink(). > + * In terms of managing &drm_gpuva entries DRM GPUVM does not take > care of > + * locking itself, it is the drivers responsibility to take care > about locking. > + * Drivers might want to protect the following operations: > inserting, removing > + * and iterating &drm_gpuva objects as well as generating all kinds > of > + * operations, such as split / merge or prefetch. > + * > + * DRM GPUVM also does not take care of the locking of the backing > + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo > abstractions by > + * itself; drivers are responsible to enforce mutual exclusion using > either the > + * GEMs dma_resv lock or alternatively a driver specific external > lock. For the > + * latter see also drm_gem_gpuva_set_lock(). > + * > + * However, DRM GPUVM contains lockdep checks to ensure callers of > its API hold > + * the corresponding lock whenever the &drm_gem_objects GPU VA list > is accessed > + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but > also > + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put(). > + * > + * The latter is required since on creation and destruction of a > &drm_gpuvm_bo > + * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects > gpuva list. > + * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm > and > + * &drm_gem_object must be able to observe previous creations and > destructions
Re: [Nouveau] [PATCH drm-misc-next v7 1/7] drm/gpuvm: convert WARN() to drm_WARN() variants
gt; - WARN(1, "Can't destroy kernel reserved node.\n"); > + drm_WARN(gpuvm->drm, 1, > + "Can't destroy kernel reserved node.\n"); > return; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > index 5cf892c50f43..aaf5d28bd587 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > @@ -1808,6 +1808,7 @@ int > nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli > *cli, > u64 kernel_managed_addr, u64 kernel_managed_size) > { > + struct drm_device *drm = cli->drm->dev; > int ret; > u64 kernel_managed_end = kernel_managed_addr + > kernel_managed_size; > > @@ -1836,7 +1837,7 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, > struct nouveau_cli *cli, > uvmm->kernel_managed_addr = kernel_managed_addr; > uvmm->kernel_managed_size = kernel_managed_size; > > - drm_gpuvm_init(&uvmm->base, cli->name, > + drm_gpuvm_init(&uvmm->base, cli->name, drm, > NOUVEAU_VA_SPACE_START, > NOUVEAU_VA_SPACE_END, > kernel_managed_addr, kernel_managed_size, > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > index bdfafc4a7705..687fd5893624 100644 > --- a/include/drm/drm_gpuvm.h > +++ b/include/drm/drm_gpuvm.h > @@ -29,6 +29,7 @@ > #include > #include > > +#include > #include > > struct drm_gpuvm; > @@ -201,6 +202,11 @@ struct drm_gpuvm { > */ > const char *name; > > + /** > + * @drm: the &drm_device this VM lives in > + */ Could a one-liner do? /** */ > + struct drm_device *drm; > + > /** > * @mm_start: start of the VA space > */ > @@ -241,6 +247,7 @@ struct drm_gpuvm { > }; > > void drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name, > + struct drm_device *drm, > u64 start_offset, u64 range, > u64 reserve_offset, u64 reserve_range, > const struct drm_gpuvm_ops *ops); I figure Christian's commend can be addressed in a follow-up patch if neeed. Reviewed-by: Thomas Hellström
Re: [Nouveau] [PATCH drm-misc-next v6 3/6] drm/gpuvm: add an abstraction for a VM / BO combination
Hi, On 10/17/23 11:58, Danilo Krummrich wrote: On Fri, Oct 13, 2023 at 02:30:29PM +0200, Thomas Hellström wrote: On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: Add an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 332 +-- -- drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- include/drm/drm_gem.h | 32 +-- include/drm/drm_gpuvm.h | 177 - 4 files changed, 521 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 6368dfdbe9dd..28282283ddaf 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -70,6 +70,18 @@ * &drm_gem_object, such as the &drm_gem_object containing the root page table, * but it can also be a 'dummy' object, which can be allocated with * drm_gpuvm_root_object_alloc(). + * + * In order to connect a struct drm_gpuva its backing &drm_gem_object each NIT: Same as previous patch regarding kerneldoc references I was intentionally using generic references here to make the documentation more readable while still keeping references to be able to look up the structure's fields. + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each + * &drm_gpuvm_bo contains a list of &&drm_gpuva structures. + * + * A &drm_gpuvm_bo is an abstraction that represents a combination of a + * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique. + * This is ensured by the API through drm_gpuvm_bo_obtain() and + * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this + * particular combination. If not existent a new instance is created and linked + * to the &drm_gem_object. */ /** @@ -395,21 +407,28 @@ /** * DOC: Locking * - * Generally, the GPU VA manager does not take care of locking itself, it is - * the drivers responsibility to take care about locking. Drivers might want to - * protect the following operations: inserting, removing and iterating - * &drm_gpuva objects as well as generating all kinds of operations, such as - * split / merge or prefetch. - * - * The GPU VA manager also does not take care of the locking of the backing - * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to - * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively - * a driver specific external lock. For the latter see also - * drm_gem_gpuva_set_lock(). - * - * However, the GPU VA manager contains lockdep checks to ensure callers of its - * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is - * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink(). + * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of + * locking itself, it is the drivers responsibility to take care about locking. + * Drivers might want to protect the following operations: inserting, removing + * and iterating &drm_gpuva objects as well as generating all kinds of + * operations, such as split / merge or prefetch. + * + * DRM GPUVM also does not take care of the locking of the backing + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by + * itself; drivers are responsible to enforce mutual exclusion using either the + * GEMs dma_resv lock or alternatively a driver specific external lock. For the + * latter see also drm_gem_gpuva_set_lock(). + * + * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold + * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put(). + * + * The latter is required since on creation and destruction of a &drm_gpuvm_bo + * the &drm_gpuvm_bo is
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/13/23 15:37, Thomas Hellström wrote: Hi, On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 646 include/drm/drm_gpuvm.h | 246 ++ 2 files changed, 892 insertions(+) There's a checkpatch.pl warning and a number of random macro CHECKs if using --strict. Also the overall s/Returns:/Return/ (and possibly function line break). diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 28282283ddaf..6977bd30eca5 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and s/list/lists/ + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ * } */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element macros use a different kerneldoc syntax: https://return42.github.io/linuxdoc/linuxdoc-howto/kernel-doc-syntax.html#macro The syntax for macros in that page does not appear to be valid from what I can tell. Please ignore that. /Thomas
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
Hi, On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: > Currently the DRM GPUVM offers common infrastructure to track GPU VA > allocations and mappings, generically connect GPU VA mappings to > their > backing buffers and perform more complex mapping operations on the > GPU VA > space. > > However, there are more design patterns commonly used by drivers, > which > can potentially be generalized in order to make the DRM GPUVM > represent > a basis for GPU-VM implementations. In this context, this patch aims > at generalizing the following elements. > > 1) Provide a common dma-resv for GEM objects not being used outside > of > this GPU-VM. > > 2) Provide tracking of external GEM objects (GEM objects which are > shared with other GPU-VMs). > > 3) Provide functions to efficiently lock all GEM objects dma-resv the > GPU-VM contains mappings of. > > 4) Provide tracking of evicted GEM objects the GPU-VM contains > mappings > of, such that validation of evicted GEM objects is accelerated. > > 5) Provide some convinience functions for common patterns. > > Big thanks to Boris Brezillon for his help to figure out locking for > drivers updating the GPU VA space within the fence signalling path. > > Suggested-by: Matthew Brost > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 646 > > include/drm/drm_gpuvm.h | 246 ++ > 2 files changed, 892 insertions(+) > There's a checkpatch.pl warning and a number of random macro CHECKs if using --strict. Also the overall s/Returns:/Return/ (and possibly function line break). > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 28282283ddaf..6977bd30eca5 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -82,6 +82,21 @@ > * &drm_gem_object list of &drm_gpuvm_bos for an existing instance > of this > * particular combination. If not existent a new instance is created > and linked > * to the &drm_gem_object. > + * > + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, > are also used > + * as entry for the &drm_gpuvm's lists of external and evicted > objects. Those > + * list are maintained in order to accelerate locking of dma-resv > locks and s/list/lists/ > + * validation of evicted objects bound in a &drm_gpuvm. For > instance, all > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked > by calling > + * drm_gpuvm_exec_lock(). Once locked drivers can call > drm_gpuvm_validate() in > + * order to validate all evicted &drm_gem_objects. It is also > possible to lock > + * additional &drm_gem_objects by providing the corresponding > parameters to > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop > while making > + * use of helper functions such as drm_gpuvm_prepare_range() or > + * drm_gpuvm_prepare_objects(). > + * > + * Every bound &drm_gem_object is treated as external object when > its &dma_resv > + * structure is different than the &drm_gpuvm's common &dma_resv > structure. > */ > > /** > @@ -429,6 +444,20 @@ > * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm > and > * &drm_gem_object must be able to observe previous creations and > destructions > * of &drm_gpuvm_bos in order to keep instances unique. > + * > + * The &drm_gpuvm's lists for keeping track of external and evicted > objects are > + * protected against concurrent insertion / removal and iteration > internally. > + * > + * However, drivers still need ensure to protect concurrent calls to > functions > + * iterating those lists, namely drm_gpuvm_prepare_objects() and > + * drm_gpuvm_validate(). > + * > + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag > to indicate > + * that the corresponding &dma_resv locks are held in order to > protect the > + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is > disabled and > + * the corresponding lockdep checks are enabled. This is an > optimization for > + * drivers which are capable of taking the corresponding &dma_resv > locks and > + * hence do not require internal locking. > */ > > /** > @@ -641,6 +670,195 @@ > * } > */ > > +/** > + * get_next_vm_bo_from_list() - get the next vm_bo element macros use a different kerneldoc syntax: https://return42.github.io/linuxdoc/linuxdoc-howto/kernel-doc-syntax.html#macro > + * @__gpuvm: The GPU VM > + * @__list_name: The name of the list we're iterating on > + * @__local_list: A pointer to the local list used to store already > iterated items > + * @__prev_vm_bo: The previous element we got from > drm_gpuvm_get_next_cached_vm_bo() > + * > + * This helper is here to provide lockless list iteration. Lockless > as in, the > + * iterator releases the lock immediately after picking the first > element from > + * the list, so list insertion deletion can happen concurrently. > + * > + * Elements popped from the original list are kept in a
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On Fri, 2023-10-13 at 14:04 +0200, Danilo Krummrich wrote: > On 10/10/23 08:26, Thomas Hellström wrote: > > > > On 10/9/23 16:45, Danilo Krummrich wrote: > > > On 10/9/23 15:36, Thomas Hellström wrote: > > > > > > > > On 10/9/23 01:32, Danilo Krummrich wrote: > > > > > Currently the DRM GPUVM offers common infrastructure to track > > > > > GPU VA > > > > > allocations and mappings, generically connect GPU VA mappings > > > > > to their > > > > > backing buffers and perform more complex mapping operations > > > > > on the GPU VA > > > > > space. > > > > > > > > > > However, there are more design patterns commonly used by > > > > > drivers, which > > > > > can potentially be generalized in order to make the DRM GPUVM > > > > > represent > > > > > a basis for GPU-VM implementations. In this context, this > > > > > patch aims > > > > > at generalizing the following elements. > > > > > > > > > > 1) Provide a common dma-resv for GEM objects not being used > > > > > outside of > > > > > this GPU-VM. > > > > > > > > > > 2) Provide tracking of external GEM objects (GEM objects > > > > > which are > > > > > shared with other GPU-VMs). > > > > > > > > > > 3) Provide functions to efficiently lock all GEM objects dma- > > > > > resv the > > > > > GPU-VM contains mappings of. > > > > > > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM > > > > > contains mappings > > > > > of, such that validation of evicted GEM objects is > > > > > accelerated. > > > > > > > > > > 5) Provide some convinience functions for common patterns. > > > > > > > > > > Big thanks to Boris Brezillon for his help to figure out > > > > > locking for > > > > > drivers updating the GPU VA space within the fence signalling > > > > > path. > > > > > > > > > > Suggested-by: Matthew Brost > > > > > Signed-off-by: Danilo Krummrich > > > > > --- > > > > > drivers/gpu/drm/drm_gpuvm.c | 646 > > > > > > > > > > include/drm/drm_gpuvm.h | 246 ++ > > > > > 2 files changed, 892 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > > > b/drivers/gpu/drm/drm_gpuvm.c > > > > > index 28282283ddaf..6977bd30eca5 100644 > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > > > @@ -82,6 +82,21 @@ > > > > > * &drm_gem_object list of &drm_gpuvm_bos for an existing > > > > > instance of this > > > > > * particular combination. If not existent a new instance > > > > > is created and linked > > > > > * to the &drm_gem_object. > > > > > + * > > > > > + * &drm_gpuvm_bo structures, since unique for a given > > > > > &drm_gpuvm, are also used > > > > > + * as entry for the &drm_gpuvm's lists of external and > > > > > evicted objects. Those > > > > > + * list are maintained in order to accelerate locking of > > > > > dma-resv locks and > > > > > + * validation of evicted objects bound in a &drm_gpuvm. For > > > > > instance, all > > > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be > > > > > locked by calling > > > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call > > > > > drm_gpuvm_validate() in > > > > > + * order to validate all evicted &drm_gem_objects. It is > > > > > also possible to lock > > > > > + * additional &drm_gem_objects by providing the > > > > > corresponding parameters to > > > > > + * drm_gpuvm_exec_lock() as well as open code the &drm_exec > > > > > loop while making > > > > > + * use of helper functions such as drm_gpuvm_prepare_range() > > > > > or > > > > > + * drm_gpuvm_prepare_objects(). > > > > > + * > > > >
Re: [Nouveau] [PATCH drm-misc-next v6 1/6] drm/gpuvm: add common dma-resv per struct drm_gpuvm
On Fri, 2023-10-13 at 13:51 +0200, Danilo Krummrich wrote: > On 10/13/23 13:38, Thomas Hellström wrote: > > On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: > > > Provide a common dma-resv for GEM objects not being used outside > > > of > > > this > > > GPU-VM. This is used in a subsequent patch to generalize dma- > > > resv, > > > external and evicted object handling and GEM validation. > > > > > > Signed-off-by: Danilo Krummrich > > > --- > > > drivers/gpu/drm/drm_gpuvm.c | 56 > > > +- > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 13 +- > > > include/drm/drm_gpuvm.h | 35 +++- > > > 3 files changed, 99 insertions(+), 5 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > b/drivers/gpu/drm/drm_gpuvm.c > > > index 02ecb45a2544..ebda9d594165 100644 > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > @@ -61,6 +61,15 @@ > > > * contained within struct drm_gpuva already. Hence, for > > > inserting > > > &drm_gpuva > > > * entries from within dma-fence signalling critical sections > > > it is > > > enough to > > > * pre-allocate the &drm_gpuva structures. > > > + * > > > + * &drm_gem_objects which are private to a single VM can share a > > > common > > > + * &dma_resv in order to improve locking efficiency (e.g. with > > > &drm_exec). > > > + * For this purpose drivers must pass a &drm_gem_object to > > > drm_gpuvm_init(), in > > > + * the following called 'root object', which serves as the > > > container > > > > Nit: Perhaps resv object altough it might typically be the root > > page- > > table object, that doesn't have any meaning to drm_gpuvm, which > > uses it > > solely as a container for the resv? > > With "root" I didn't want to refer to the object representing the > root > page-table object, but being *the* object every other (internal) > object > needs to keep a reference to. OK, yes but the reason they need a reference is because of the shared resv, so IMO resv_object is a good fit. (I later noticed there's even the function name drm_gpuvm_resv_obj()). And it will probably get confused with the driver's "root" page table object, but up to you. > Maybe I should be more explicit here and say > that drivers need to make sure every internal object requires a > reference > to take a reference to this root object. > > > > > > of the > > > + * GPUVM's shared &dma_resv. This root object can be a driver > > > specific > > > + * &drm_gem_object, such as the &drm_gem_object containing the > > > root > > > page table, > > > + * but it can also be a 'dummy' object, which can be allocated > > > with > > > + * drm_gpuvm_root_object_alloc(). > > > */ > > > > > > /** > > > @@ -652,9 +661,47 @@ drm_gpuvm_range_valid(struct drm_gpuvm > > > *gpuvm, > > > !drm_gpuvm_in_kernel_node(gpuvm, addr, range); > > > } > > > > > > +static void > > > +drm_gpuvm_gem_object_free(struct drm_gem_object *obj) > > > +{ > > > + drm_gem_object_release(obj); > > > + kfree(obj); > > > +} > > > + > > > +static const struct drm_gem_object_funcs drm_gpuvm_object_funcs > > > = { > > > + .free = drm_gpuvm_gem_object_free, > > > +}; > > > + > > > +/** > > > + * drm_gpuvm_root_object_alloc() - allocate a dummy > > > &drm_gem_object > > > + * @drm: the drivers &drm_device > > > + * > > > + * Allocates a dummy &drm_gem_object which can be passed to > > > drm_gpuvm_init() in > > > + * order to serve as root GEM object providing the &drm_resv > > > shared > > > across > > > + * &drm_gem_objects local to a single GPUVM. > > > + * > > > + * Returns: the &drm_gem_object on success, NULL on failure > > > + */ > > > +struct drm_gem_object * > > > +drm_gpuvm_root_object_alloc(struct drm_device *drm) > > > +{ > > > + struct drm_gem_object *obj; > > > + > > > + obj = kzalloc(sizeof(*obj), GFP_KERNEL); > > > + if (!obj) > > > + return NULL; &g
Re: [Nouveau] [PATCH drm-misc-next v6 3/6] drm/gpuvm: add an abstraction for a VM / BO combination
On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: > Add an abstraction layer between the drm_gpuva mappings of a > particular > drm_gem_object and this GEM object itself. The abstraction represents > a > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > holds > a list of drm_gpuvm_bo structures (the structure representing this > abstraction), while each drm_gpuvm_bo contains list of mappings of > this > GEM object. > > This has multiple advantages: > > 1) We can use the drm_gpuvm_bo structure to attach it to various > lists > of the drm_gpuvm. This is useful for tracking external and evicted > objects per VM, which is introduced in subsequent patches. > > 2) Finding mappings of a certain drm_gem_object mapped in a certain > drm_gpuvm becomes much cheaper. > > 3) Drivers can derive and extend the structure to easily represent > driver specific states of a BO for a certain GPUVM. > > The idea of this abstraction was taken from amdgpu, hence the credit > for > this idea goes to the developers of amdgpu. > > Cc: Christian König > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 332 +-- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > include/drm/drm_gem.h | 32 +-- > include/drm/drm_gpuvm.h | 177 - > 4 files changed, 521 insertions(+), 84 deletions(-) Forgot to mention, there are a couple of checkpatch.pl --strict issues with this patch that might need looking at. Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v6 3/6] drm/gpuvm: add an abstraction for a VM / BO combination
On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: > Add an abstraction layer between the drm_gpuva mappings of a > particular > drm_gem_object and this GEM object itself. The abstraction represents > a > combination of a drm_gem_object and drm_gpuvm. The drm_gem_object > holds > a list of drm_gpuvm_bo structures (the structure representing this > abstraction), while each drm_gpuvm_bo contains list of mappings of > this > GEM object. > > This has multiple advantages: > > 1) We can use the drm_gpuvm_bo structure to attach it to various > lists > of the drm_gpuvm. This is useful for tracking external and evicted > objects per VM, which is introduced in subsequent patches. > > 2) Finding mappings of a certain drm_gem_object mapped in a certain > drm_gpuvm becomes much cheaper. > > 3) Drivers can derive and extend the structure to easily represent > driver specific states of a BO for a certain GPUVM. > > The idea of this abstraction was taken from amdgpu, hence the credit > for > this idea goes to the developers of amdgpu. > > Cc: Christian König > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 332 +-- > -- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- > include/drm/drm_gem.h | 32 +-- > include/drm/drm_gpuvm.h | 177 - > 4 files changed, 521 insertions(+), 84 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index 6368dfdbe9dd..28282283ddaf 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -70,6 +70,18 @@ > * &drm_gem_object, such as the &drm_gem_object containing the root > page table, > * but it can also be a 'dummy' object, which can be allocated with > * drm_gpuvm_root_object_alloc(). > + * > + * In order to connect a struct drm_gpuva its backing > &drm_gem_object each NIT: Same as previous patch regarding kerneldoc references > + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and > each > + * &drm_gpuvm_bo contains a list of &&drm_gpuva structures. > + * > + * A &drm_gpuvm_bo is an abstraction that represents a combination > of a > + * &drm_gpuvm and a &drm_gem_object. Every such combination should > be unique. > + * This is ensured by the API through drm_gpuvm_bo_obtain() and > + * drm_gpuvm_bo_obtain_prealloc() which first look into the > corresponding > + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance > of this > + * particular combination. If not existent a new instance is created > and linked > + * to the &drm_gem_object. > */ > > /** > @@ -395,21 +407,28 @@ > /** > * DOC: Locking > * > - * Generally, the GPU VA manager does not take care of locking > itself, it is > - * the drivers responsibility to take care about locking. Drivers > might want to > - * protect the following operations: inserting, removing and > iterating > - * &drm_gpuva objects as well as generating all kinds of operations, > such as > - * split / merge or prefetch. > - * > - * The GPU VA manager also does not take care of the locking of the > backing > - * &drm_gem_object buffers GPU VA lists by itself; drivers are > responsible to > - * enforce mutual exclusion using either the GEMs dma_resv lock or > alternatively > - * a driver specific external lock. For the latter see also > - * drm_gem_gpuva_set_lock(). > - * > - * However, the GPU VA manager contains lockdep checks to ensure > callers of its > - * API hold the corresponding lock whenever the &drm_gem_objects GPU > VA list is > - * accessed by functions such as drm_gpuva_link() or > drm_gpuva_unlink(). > + * In terms of managing &drm_gpuva entries DRM GPUVM does not take > care of > + * locking itself, it is the drivers responsibility to take care > about locking. > + * Drivers might want to protect the following operations: > inserting, removing > + * and iterating &drm_gpuva objects as well as generating all kinds > of > + * operations, such as split / merge or prefetch. > + * > + * DRM GPUVM also does not take care of the locking of the backing > + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo > abstractions by > + * itself; drivers are responsible to enforce mutual exclusion using > either the > + * GEMs dma_resv lock or alternatively a driver specific external > lock. For the > + * latter see also drm_gem_gpuva_set_lock(). > + * > + * However, DRM GPUVM contains lockdep checks to ensure callers of > its API hold > + * the corresponding lock whenever the &drm_gem_objects GPU VA list > is accessed > + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but > also > + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put(). > + * > + * The latter is required since on creation and destruction of a > &drm_gpuvm_bo > + * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects > gpuva list. > + * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm > and > + * &drm_gem_object m
Re: [Nouveau] [PATCH drm-misc-next v6 2/6] drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm
On Mon, 2023-10-09 at 01:32 +0200, Danilo Krummrich wrote: > Introduce flags for struct drm_gpuvm, this required by subsequent > commits. > > Signed-off-by: Danilo Krummrich > --- > drivers/gpu/drm/drm_gpuvm.c | 4 +++- > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +- > include/drm/drm_gpuvm.h | 17 - > 3 files changed, 20 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > b/drivers/gpu/drm/drm_gpuvm.c > index ebda9d594165..6368dfdbe9dd 100644 > --- a/drivers/gpu/drm/drm_gpuvm.c > +++ b/drivers/gpu/drm/drm_gpuvm.c > @@ -703,6 +703,7 @@ EXPORT_SYMBOL_GPL(drm_gpuvm_root_object_alloc); > * @gpuvm: pointer to the &drm_gpuvm to initialize > * @r_obj: the root &drm_gem_object providing the GPUVM's common > &dma_resv > * @name: the name of the GPU VA space > + * @flags: the &drm_gpuvm_flags for this GPUVM NIT: It looks like kerneldoc guidelines recommends using &enum drm_gpuvm_flags in new code > * @start_offset: the start offset of the GPU VA space > * @range: the size of the GPU VA space > * @reserve_offset: the start of the kernel reserved GPU VA area > @@ -716,7 +717,7 @@ EXPORT_SYMBOL_GPL(drm_gpuvm_root_object_alloc); > */ > void > drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object > *r_obj, > - const char *name, > + const char *name, enum drm_gpuvm_flags flags, > u64 start_offset, u64 range, > u64 reserve_offset, u64 reserve_range, > const struct drm_gpuvm_ops *ops) > @@ -729,6 +730,7 @@ drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct > drm_gem_object *r_obj, > gpuvm->mm_range = range; > > gpuvm->name = name ? name : "unknown"; > + gpuvm->flags = flags; > gpuvm->ops = ops; > gpuvm->r_obj = r_obj; > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > index 4dea847ef989..93ad2ba7ec8b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > @@ -1843,7 +1843,7 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, > struct nouveau_cli *cli, > uvmm->kernel_managed_addr = kernel_managed_addr; > uvmm->kernel_managed_size = kernel_managed_size; > > - drm_gpuvm_init(&uvmm->base, r_obj, cli->name, > + drm_gpuvm_init(&uvmm->base, r_obj, cli->name, 0, > NOUVEAU_VA_SPACE_START, > NOUVEAU_VA_SPACE_END, > kernel_managed_addr, kernel_managed_size, > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > index 0aec14d8b259..13539f32c2e2 100644 > --- a/include/drm/drm_gpuvm.h > +++ b/include/drm/drm_gpuvm.h > @@ -183,6 +183,16 @@ static inline bool drm_gpuva_invalidated(struct > drm_gpuva *va) > return va->flags & DRM_GPUVA_INVALIDATED; > } > > +/** > + * enum drm_gpuvm_flags - flags for struct drm_gpuvm > + */ > +enum drm_gpuvm_flags { > + /** > + * @DRM_GPUVM_USERBITS: user defined bits > + */ > + DRM_GPUVM_USERBITS = (1 << 0), BIT(0) > +}; > + > /** > * struct drm_gpuvm - DRM GPU VA Manager > * > @@ -201,6 +211,11 @@ struct drm_gpuvm { > */ > const char *name; > > + /** > + * @flags: the &drm_gpuvm_flags of this GPUVM enum? > + */ > + enum drm_gpuvm_flags flags; > + > /** > * @mm_start: start of the VA space > */ > @@ -246,7 +261,7 @@ struct drm_gpuvm { > }; > > void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object > *r_obj, > - const char *name, > + const char *name, enum drm_gpuvm_flags flags, > u64 start_offset, u64 range, > u64 reserve_offset, u64 reserve_range, > const struct drm_gpuvm_ops *ops); Reviewed-by: Thomas Hellström
Re: [Nouveau] [PATCH drm-misc-next v6 1/6] drm/gpuvm: add common dma-resv per struct drm_gpuvm
ice in struct drm_gpuvm and use drm_warn() here instead of WARN? > + > + drm_gem_object_put(gpuvm->r_obj); > } > EXPORT_SYMBOL_GPL(drm_gpuvm_destroy); > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > index 5cf892c50f43..4dea847ef989 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > @@ -1808,8 +1808,9 @@ int > nouveau_uvmm_init(struct nouveau_uvmm *uvmm, struct nouveau_cli > *cli, > u64 kernel_managed_addr, u64 kernel_managed_size) > { > - int ret; > + struct drm_gem_object *r_obj; > u64 kernel_managed_end = kernel_managed_addr + > kernel_managed_size; > + int ret; > > mutex_init(&uvmm->mutex); > dma_resv_init(&uvmm->resv); > @@ -1833,14 +1834,22 @@ nouveau_uvmm_init(struct nouveau_uvmm *uvmm, > struct nouveau_cli *cli, > goto out_unlock; > } > > + r_obj = drm_gpuvm_root_object_alloc(cli->drm->dev); > + if (!r_obj) { > + ret = -ENOMEM; > + goto out_unlock; > + } > + > uvmm->kernel_managed_addr = kernel_managed_addr; > uvmm->kernel_managed_size = kernel_managed_size; > > - drm_gpuvm_init(&uvmm->base, cli->name, > + drm_gpuvm_init(&uvmm->base, r_obj, cli->name, > NOUVEAU_VA_SPACE_START, > NOUVEAU_VA_SPACE_END, > kernel_managed_addr, kernel_managed_size, > NULL); > + /* GPUVM takes care from here on. */ > + drm_gem_object_put(r_obj); > > ret = nvif_vmm_ctor(&cli->mmu, "uvmm", > cli->vmm.vmm.object.oclass, RAW, > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > index c7ed6bf441d4..0aec14d8b259 100644 > --- a/include/drm/drm_gpuvm.h > +++ b/include/drm/drm_gpuvm.h > @@ -238,9 +238,15 @@ struct drm_gpuvm { > * @ops: &drm_gpuvm_ops providing the split/merge steps to > drivers > */ > const struct drm_gpuvm_ops *ops; > + > + /** > + * @r_obj: Root GEM object; representing the GPUVM's common > &dma_resv. > + */ > + struct drm_gem_object *r_obj; > }; > > -void drm_gpuvm_init(struct drm_gpuvm *gpuvm, const char *name, > +void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_gem_object > *r_obj, > + const char *name, > u64 start_offset, u64 range, > u64 reserve_offset, u64 reserve_range, > const struct drm_gpuvm_ops *ops); > @@ -248,6 +254,33 @@ void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm); > > bool drm_gpuvm_interval_empty(struct drm_gpuvm *gpuvm, u64 addr, u64 > range); > > +struct drm_gem_object * > +drm_gpuvm_root_object_alloc(struct drm_device *drm); > + > +/** > + * drm_gpuvm_resv() - returns the &drm_gpuvm's &dma_resv > + * @gpuvm__: the &drm_gpuvm > + * > + * Returns: a pointer to the &drm_gpuvm's shared &dma_resv > + */ > +#define drm_gpuvm_resv(gpuvm__) ((gpuvm__)->r_obj->resv) > + > +/** > + * drm_gpuvm_resv_obj() - returns the &drm_gem_object holding the > &drm_gpuvm's > + * &dma_resv > + * @gpuvm__: the &drm_gpuvm > + * > + * Returns: a pointer to the &drm_gem_object holding the > &drm_gpuvm's shared > + * &dma_resv > + */ > +#define drm_gpuvm_resv_obj(gpuvm__) ((gpuvm__)->r_obj) > + > +#define drm_gpuvm_resv_held(gpuvm__) \ > + dma_resv_held(drm_gpuvm_resv(gpuvm__)) > + > +#define drm_gpuvm_resv_assert_held(gpuvm__) \ > + dma_resv_assert_held(drm_gpuvm_resv(gpuvm__)) > + > static inline struct drm_gpuva * > __drm_gpuva_next(struct drm_gpuva *va) > { Reviewed-by: Thomas Hellström
Re: [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
On Wed, 2023-10-11 at 06:23 +1000, Dave Airlie wrote: > > I think we're then optimizing for different scenarios. Our compute > > driver will use mostly external objects only, and if shared, I > > don't > > forsee them bound to many VMs. What saves us currently here is that > > in > > compute mode we only really traverse the extobj list after a > > preempt > > fence wait, or when a vm is using a new context for the first time. > > So > > vm's extobj list is pretty large. Each bo's vma list will typically > > be > > pretty small. > > Can I ask why we are optimising for this userspace, this seems > incredibly broken. First Judging from the discussion with Christian this is not really uncommon. There *are* ways that we can play tricks in KMD of assorted cleverness to reduce the extobj list size, but doing that in KMD that wouldn't be much different than accepting a large extobj list size and do what we can to reduce overhead of iterating over it. Second the discussion here really was about whether we should be using a lower level lock to allow for async state updates, with a rather complex mechanism with weak reference counting and a requirement to drop the locks within the loop to avoid locking inversion. If that were a simplification with little or no overhead all fine, but IMO it's not a simplification? > > We've has this sort of problem in the past with Intel letting the > tail > wag the horse, does anyone remember optimising relocations for a > userspace that didn't actually need to use relocations? > > We need to ask why this userspace is doing this, can we get some > pointers to it? compute driver should have no reason to use mostly > external objects, the OpenCL and level0 APIs should be good enough to > figure this out. TBH for the compute UMD case, I'd be prepared to drop the *performance* argument of fine-grained locking the extobj list since it's really only traversed on new contexts and preemption. But as Christian mentions there might be other cases. We should perhaps figure those out and document? /Thoams > > Dave.
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/9/23 01:32, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich +/** + * drm_gpuvm_resv_add_fence - add fence to private and all extobj + * dma-resv + * @gpuvm: the &drm_gpuvm to add a fence to + * @exec: the &drm_exec locking context + * @fence: fence to add + * @private_usage: private dma-resv usage + * @extobj_usage: extobj dma-resv usage + */ +void +drm_gpuvm_resv_add_fence(struct drm_gpuvm *gpuvm, +struct drm_exec *exec, +struct dma_fence *fence, +enum dma_resv_usage private_usage, +enum dma_resv_usage extobj_usage) +{ + struct drm_gem_object *obj; + unsigned long index; + + drm_exec_for_each_locked_object(exec, index, obj) { + dma_resv_assert_held(obj->resv); + dma_resv_add_fence(obj->resv, fence, + drm_gpuvm_is_extobj(gpuvm, obj) ? + private_usage : extobj_usage); It looks like private_usage and extobj_usage are mixed up above? + } +} +EXPORT_SYMBOL_GPL(drm_gpuvm_resv_add_fence); + Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/9/23 16:45, Danilo Krummrich wrote: On 10/9/23 15:36, Thomas Hellström wrote: On 10/9/23 01:32, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 646 include/drm/drm_gpuvm.h | 246 ++ 2 files changed, 892 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 28282283ddaf..6977bd30eca5 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ * } */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. + * + * Elements popped from the original list are kept in a local list, so removal + * and is_empty checks can still happen while we're iterating the list. + */ +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \ + ({ \ + struct
Re: [Nouveau] [PATCH drm-misc-next v6 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/9/23 01:32, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 646 include/drm/drm_gpuvm.h | 246 ++ 2 files changed, 892 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 28282283ddaf..6977bd30eca5 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ *} */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. + * + * Elements popped from the original list are kept in a local list, so removal + * and is_empty checks can still happen while we're iterating the list. + */ +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \ + ({ \ + struct drm_gpuvm_bo *__vm_bo = NULL; \ + \ + drm_gpuvm_bo_put(__prev_vm_bo); \
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 9/28/23 21:16, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 642 include/drm/drm_gpuvm.h | 240 ++ 2 files changed, 882 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 27100423154b..770bb3d68d1f 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ *} */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. + * + * Elements popped from the original list are kept in a local list, so removal + * and is_empty checks can still happen while we're iterating the list. + */ +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \ + ({ \ + struct drm_gpuvm_bo *__vm_bo = NULL; \ + \ + drm_gpuvm_bo_put(__prev_vm_bo); \
Re: [Nouveau] [PATCH drm-misc-next v5 3/6] drm/gpuvm: add an abstraction for a VM / BO combination
Hi, On 9/28/23 21:16, Danilo Krummrich wrote: This patch adds an abstraction layer between the drm_gpuva mappings of NIT: imperative: s/This patch adds/Add/ a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c| 334 + drivers/gpu/drm/nouveau/nouveau_uvmm.c | 64 +++-- include/drm/drm_gem.h | 32 +-- include/drm/drm_gpuvm.h| 177 - 4 files changed, 523 insertions(+), 84 deletions(-) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 6368dfdbe9dd..27100423154b 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -70,6 +70,18 @@ * &drm_gem_object, such as the &drm_gem_object containing the root page table, * but it can also be a 'dummy' object, which can be allocated with * drm_gpuvm_root_object_alloc(). + * + * In order to connect a struct drm_gpuva its backing &drm_gem_object each + * &drm_gem_object maintains a list of &drm_gpuvm_bo structures, and each + * &drm_gpuvm_bo contains a list of &&drm_gpuva structures. + * + * A &drm_gpuvm_bo is an abstraction that represents a combination of a + * &drm_gpuvm and a &drm_gem_object. Every such combination should be unique. + * This is ensured by the API through drm_gpuvm_bo_obtain() and + * drm_gpuvm_bo_obtain_prealloc() which first look into the corresponding + * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this + * particular combination. If not existent a new instance is created and linked + * to the &drm_gem_object. */ /** @@ -395,21 +407,28 @@ /** * DOC: Locking * - * Generally, the GPU VA manager does not take care of locking itself, it is - * the drivers responsibility to take care about locking. Drivers might want to - * protect the following operations: inserting, removing and iterating - * &drm_gpuva objects as well as generating all kinds of operations, such as - * split / merge or prefetch. - * - * The GPU VA manager also does not take care of the locking of the backing - * &drm_gem_object buffers GPU VA lists by itself; drivers are responsible to - * enforce mutual exclusion using either the GEMs dma_resv lock or alternatively - * a driver specific external lock. For the latter see also - * drm_gem_gpuva_set_lock(). - * - * However, the GPU VA manager contains lockdep checks to ensure callers of its - * API hold the corresponding lock whenever the &drm_gem_objects GPU VA list is - * accessed by functions such as drm_gpuva_link() or drm_gpuva_unlink(). + * In terms of managing &drm_gpuva entries DRM GPUVM does not take care of + * locking itself, it is the drivers responsibility to take care about locking. + * Drivers might want to protect the following operations: inserting, removing + * and iterating &drm_gpuva objects as well as generating all kinds of + * operations, such as split / merge or prefetch. + * + * DRM GPUVM also does not take care of the locking of the backing + * &drm_gem_object buffers GPU VA lists and &drm_gpuvm_bo abstractions by + * itself; drivers are responsible to enforce mutual exclusion using either the + * GEMs dma_resv lock or alternatively a driver specific external lock. For the + * latter see also drm_gem_gpuva_set_lock(). + * + * However, DRM GPUVM contains lockdep checks to ensure callers of its API hold + * the corresponding lock whenever the &drm_gem_objects GPU VA list is accessed + * by functions such as drm_gpuva_link() or drm_gpuva_unlink(), but also + * drm_gpuvm_bo_obtain() and drm_gpuvm_bo_put(). + * + * The latter is required since on creation and destruction of a &drm_gpuvm_bo + * the &drm_gpuvm_bo is attached / removed from the &drm_gem_objects gpuva list. + * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and + * &drm_gem_object must be able to observe previous creations and destructions + * of &drm_gpuvm_bos in order to keep instances unique. */ /** @@ -439,6 +458,7 @@ *{ *struct drm_gpuva_ops *ops; *struct drm_gpuva_op *op + *
Re: [Nouveau] [PATCH drm-misc-next v5 0/6] [RFC] DRM GPUVM features
Hi, Danilo On 9/28/23 21:16, Danilo Krummrich wrote: Currently GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make GPUVM represent the basis of a VM implementation. In this context, this patch series aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. The implementation introduces struct drm_gpuvm_bo, which serves as abstraction combining a struct drm_gpuvm and struct drm_gem_object, similar to what amdgpu does with struct amdgpu_bo_vm. While this adds a bit of complexity it improves the efficiency of tracking external and evicted GEM objects. This patch series is also available at [3]. [1] https://gitlab.freedesktop.org/nouvelles/kernel/-/commits/gpuvm-next Changes in V2: == - rename 'drm_gpuva_manager' -> 'drm_gpuvm' which generally leads to more consistent naming - properly separate commits (introduce common dma-resv, drm_gpuvm_bo abstraction, etc.) - remove maple tree for tracking external objects, use a list drm_gpuvm_bos per drm_gpuvm instead - rework dma-resv locking helpers (Thomas) - add a locking helper for a given range of the VA space (Christian) - make the GPUVA manager buildable as module, rather than drm_exec builtin (Christian) Changes in V3: == - rename missing function and files (Boris) - warn if vm_obj->obj != obj in drm_gpuva_link() (Boris) - don't expose drm_gpuvm_bo_destroy() (Boris) - unlink VM_BO from GEM in drm_gpuvm_bo_destroy() rather than drm_gpuva_unlink() and link within drm_gpuvm_bo_obtain() to keep drm_gpuvm_bo instances unique - add internal locking to external and evicted object lists to support drivers updating the VA space from within the fence signalling critical path (Boris) - unlink external objects and evicted objects from the GPUVM's list in drm_gpuvm_bo_destroy() - add more documentation and fix some kernel doc issues Changes in V4: == - add a drm_gpuvm_resv() helper (Boris) - add a drm_gpuvmlocal_list field (Boris) - remove drm_gpuvm_bo_get_unless_zero() helper (Boris) - fix missing NULL assignment in get_next_vm_bo_from_list() (Boris) - keep a drm_gem_object reference on potential vm_bo destroy (alternatively we could free the vm_bo and drop the vm_bo's drm_gem_object reference through async work) - introduce DRM_GPUVM_RESV_PROTECTED flag to indicate external locking through the corresponding dma-resv locks to optimize for drivers already holding them when needed; add the corresponding lock_assert_held() calls (Thomas) - make drm_gpuvm_bo_evict() per vm_bo and add a drm_gpuvm_bo_gem_evict() helper (Thomas) - pass a drm_gpuvm_bo in drm_gpuvm_ops::vm_bo_validate() (Thomas) - documentation fixes Changes in V5: == - use a root drm_gem_object provided by the driver as a base for the VM's common dma-resv (Christian) - provide a helper to allocate a "dummy" root GEM object in case a driver specific root GEM object isn't available - add a dedicated patch for nouveau to make use of the GPUVM's shared dma-resv - improve documentation (Boris) - the following patches are removed from the series, since they already landed in drm-misc-next - f72c2db47080 ("drm/gpuvm: rename struct drm_gpuva_manager to struct drm_gpuvm") - fe7acaa727e1 ("drm/gpuvm: allow building as module") - 78f54469b871 ("drm/nouveau: uvmm: rename 'umgr' to 'base'") Danilo Krummrich (6): drm/gpuvm: add common dma-resv per struct drm_gpuvm drm/gpuvm: add drm_gpuvm_flags to drm_gpuvm drm/gpuvm: add an abstraction for a VM / BO combination drm/gpuvm: track/lock/validate external/evicted objects drm/nouveau: make use of the GPUVM's shared dma-resv drm/nouveau: use GPUVM common infrastructure drivers/gpu/drm/drm_gpuvm.c | 1036 +-- drivers/gpu/drm/nouveau/nouveau_bo.c| 15 +- drivers/gpu/drm/nouveau/nouveau_bo.h|5 + drivers/gpu/drm/nouveau/nouveau_exec.c | 52 +- drivers/gpu/drm/nouveau/nouveau_exec.h |4 - drivers/gpu/drm/nouveau/nouveau_gem.c | 10 +- drivers/gpu/drm/nouveau/nouveau_sched.h |4 +- drivers/gpu/drm/nouveau/nouveau_uvmm.c
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On Wed, 2023-10-04 at 19:17 +0200, Danilo Krummrich wrote: > On 10/4/23 17:29, Thomas Hellström wrote: > > > > On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote: > > > On 10/3/23 11:11, Thomas Hellström wrote: > > > > > > > > > > > > > > > + > > > > > > +/** > > > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to > > > > > > / > > > > > > from the &drm_gpuvms > > > > > > + * evicted list > > > > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove > > > > > > + * @evict: indicates whether the object is evicted > > > > > > + * > > > > > > + * Adds a &drm_gpuvm_bo to or removes it from the > > > > > > &drm_gpuvms > > > > > > evicted list. > > > > > > + */ > > > > > > +void > > > > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) > > > > > > +{ > > > > > > + struct drm_gem_object *obj = vm_bo->obj; > > > > > > + > > > > > > + dma_resv_assert_held(obj->resv); > > > > > > + > > > > > > + /* Always lock list transactions, even if > > > > > > DRM_GPUVM_RESV_PROTECTED is > > > > > > + * set. This is required to protect multiple > > > > > > concurrent > > > > > > calls to > > > > > > + * drm_gpuvm_bo_evict() with BOs with different > > > > > > dma_resv. > > > > > > + */ > > > > > > > > > > This doesn't work. The RESV_PROTECTED case requires the > > > > > evicted > > > > > flag we discussed before. The list is either protected by the > > > > > spinlock or the resv. Otherwise a list add could race with a > > > > > list > > > > > removal elsewhere. > > > > > > I think it does unless I miss something, but it might be a bit > > > subtle > > > though. > > > > > > Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. > > > Additionally, when > > > drm_gpuvm_bo_evict() is called we hold the dma-resv of the > > > corresponding GEM object. > > > > > > In drm_gpuvm_validate() I assert that we hold *all* dma-resv, > > > which > > > implies that no > > > one can call drm_gpuvm_bo_evict() on any of the VM's objects and > > > no > > > one can add a new > > > one and directly call drm_gpuvm_bo_evict() on it either. > > > > But translated into how the data (the list in this case) is > > protected > > it becomes > > > > "Either the spinlock and the bo resv of a single list item OR the > > bo > > resvs of all bos that can potentially be on the list", > > > > while this is certainly possible to assert, any new / future code > > that > > manipulates the evict list will probably get this wrong and as a > > result > > the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() > > already > > gets it wrong in that it, while holding a single resv, doesn't take > > the > > spinlock. > > That's true and I don't like it either. Unfortunately, with the dma- > resv > locking scheme we can't really protect the evict list without the > drm_gpuvm_bo::evicted trick properly. > > But as pointed out in my other reply, I'm a bit worried about the > drm_gpuvm_bo::evicted trick being too restrictive, but maybe it's > fine > doing it in the RESV_PROTECTED case. Ah, indeed. I misread that as discussing the current code rather than the drm_gpuvm_bo::evicted trick. If validating only a subset, or a range, then with the drm_gpuvm_bo::evicted trick would be valid only for that subset. But the current code would break because the condition of locking "the resvs of all bos that can potentially be on the list" doesn't hold anymore, and you'd get list corruption. What *would* work, though, is the solution currently in xe, The original evict list, and a staging evict list whose items are copied over on validation. The staging evict list being protected by the spinlock, the original evict list by the resv, and they'd use separate list heads in the drm_gpuvm_bo, but that is yet another complication. But I think if this becomes an issue, those VMs (perhaps OpenGL UMD VMs) only wanting to validate a subset, would simply initially rely on the current non-RESV solution. It looks like it's only a matter of flipping the flag on a per-vm basis. /Thomas > > > > > So I think that needs fixing, and if keeping that protection I > > think it > > needs to be documented with the list member and ideally an assert. > > But > > also note that lockdep_assert_held will typically give false true > > for > > dma_resv locks; as long as the first dma_resv lock locked in a > > drm_exec > > sequence remains locked, lockdep thinks *all* dma_resv locks are > > held. > > (or something along those lines), so the resv lockdep asserts are > > currently pretty useless. > > > > /Thomas > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Thomas > > > > > > > > > > > > > > > > > > > >
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On Wed, 2023-10-04 at 14:57 +0200, Danilo Krummrich wrote: > On 10/3/23 11:11, Thomas Hellström wrote: > > > > > > > + > > > > +/** > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gpuvm_bo to / > > > > from the &drm_gpuvms > > > > + * evicted list > > > > + * @vm_bo: the &drm_gpuvm_bo to add or remove > > > > + * @evict: indicates whether the object is evicted > > > > + * > > > > + * Adds a &drm_gpuvm_bo to or removes it from the &drm_gpuvms > > > > evicted list. > > > > + */ > > > > +void > > > > +drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict) > > > > +{ > > > > + struct drm_gem_object *obj = vm_bo->obj; > > > > + > > > > + dma_resv_assert_held(obj->resv); > > > > + > > > > + /* Always lock list transactions, even if > > > > DRM_GPUVM_RESV_PROTECTED is > > > > + * set. This is required to protect multiple concurrent > > > > calls to > > > > + * drm_gpuvm_bo_evict() with BOs with different dma_resv. > > > > + */ > > > > > > This doesn't work. The RESV_PROTECTED case requires the evicted > > > flag we discussed before. The list is either protected by the > > > spinlock or the resv. Otherwise a list add could race with a list > > > removal elsewhere. > > I think it does unless I miss something, but it might be a bit subtle > though. > > Concurrent drm_gpuvm_bo_evict() are protected by the spinlock. > Additionally, when > drm_gpuvm_bo_evict() is called we hold the dma-resv of the > corresponding GEM object. > > In drm_gpuvm_validate() I assert that we hold *all* dma-resv, which > implies that no > one can call drm_gpuvm_bo_evict() on any of the VM's objects and no > one can add a new > one and directly call drm_gpuvm_bo_evict() on it either. But translated into how the data (the list in this case) is protected it becomes "Either the spinlock and the bo resv of a single list item OR the bo resvs of all bos that can potentially be on the list", while this is certainly possible to assert, any new / future code that manipulates the evict list will probably get this wrong and as a result the code becomes pretty fragile. I think drm_gpuvm_bo_destroy() already gets it wrong in that it, while holding a single resv, doesn't take the spinlock. So I think that needs fixing, and if keeping that protection I think it needs to be documented with the list member and ideally an assert. But also note that lockdep_assert_held will typically give false true for dma_resv locks; as long as the first dma_resv lock locked in a drm_exec sequence remains locked, lockdep thinks *all* dma_resv locks are held. (or something along those lines), so the resv lockdep asserts are currently pretty useless. /Thomas > > > > > > > Thanks, > > > > > > Thomas > > > > > > > > >
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
On 10/3/23 18:55, Danilo Krummrich wrote: It seems like we're mostly aligned on this series, except for the key controversy we're discussing for a few versions now: locking of the internal lists. Hence, let's just re-iterate the options we have to get this out of the way. (1) The spinlock dance. This basically works for every use case, updating the VA space from the IOCTL, from the fence signaling path or anywhere else. However, it has the downside of requiring spin_lock() / spin_unlock() for *each* list element when locking all external objects and validating all evicted objects. Typically, the amount of extobjs and evicted objects shouldn't be excessive, but there might be exceptions, e.g. Xe. (2) The dma-resv lock dance. This is convinient for drivers updating the VA space from a VM_BIND ioctl() and is especially efficient if such drivers have a huge amount of external and/or evicted objects to manage. However, the downsides are that it requires a few tricks in drivers updating the VA space from the fence signaling path (e.g. job_run()). Design wise, I'm still skeptical that it is a good idea to protect internal data structures with external locks in a way that it's not clear to callers that a certain function would access one of those resources and hence needs protection. E.g. it is counter intuitive that drm_gpuvm_bo_put() would require both the dma-resv lock of the corresponding object and the VM's dma-resv lock held. (Additionally, there were some concerns from amdgpu regarding flexibility in terms of using GPUVM for non-VM_BIND uAPIs and compute, however, AFAICS those discussions did not complete and to me it's still unclear why it wouldn't work.) (3) Simply use an internal mutex per list. This adds a tiny (IMHO negligible) overhead for drivers updating the VA space from a VM_BIND ioctl(), namely a *single* mutex_lock()/mutex_unlock() when locking all external objects and validating all evicted objects. And it still requires some tricks for drivers updating the VA space from the fence signaling path. However, it's as simple as it can be and hence way less error prone as well as self-contained and hence easy to use. Additionally, it's flexible in a way that we don't have any expections on drivers to already hold certain locks that the driver in some situation might not be able to acquire in the first place. (4) Arbitrary combinations of the above. For instance, the current V5 implements both (1) and (2) (as either one or the other). But also (1) and (3) (as in (1) additionally to (3)) would be an option, where a driver could opt-in for the spinlock dance in case it updates the VA space from the fence signaling path. I also considered a few other options as well, however, they don't seem to be flexible enough. For instance, as by now we could use SRCU for the external object list. However, this falls apart once a driver wants to remove and re-add extobjs for the same VM_BO instance. (For the same reason it wouldn't work for evicted objects.) Personally, after seeing the weird implications of (1), (2) and a combination of both, I tend to go with (3). Optionally, with an opt-in for (1). The reason for the latter is that with (3) the weirdness of (1) by its own mostly disappears. Please let me know what you think, and, of course, other ideas than the mentioned ones above are still welcome. - Danilo Here are the locking principles Daniel put together and Dave once called out for us to be applying when reviewing DRM code. These were prompted by very fragile and hard to understand locking patterns in the i915 driver and I think the xe vm_bind locking design was made with these in mind, (not sure exactly who wrote what, though so can't say for sure). https://blog.ffwll.ch/2022/07/locking-engineering.html https://blog.ffwll.ch/2022/08/locking-hierarchy.html At least to me, this motivates using the resv design unless we strictly need lower level locks that are taken in the eviction paths or userptr invalidation paths, but doesn't rule out spinlocks or lock dropping tricks where these are really necessary. But pretty much rules out RCU / SRCU from what I can tell. It also calls for documenting how individual members of structs are protected when ever possible. Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
Hi, Danilo On Tue, 2023-10-03 at 18:55 +0200, Danilo Krummrich wrote: > It seems like we're mostly aligned on this series, except for the key > controversy we're discussing for a few versions now: locking of the > internal > lists. Hence, let's just re-iterate the options we have to get this > out of the > way. > > (1) The spinlock dance. This basically works for every use case, > updating the VA > space from the IOCTL, from the fence signaling path or anywhere > else. > However, it has the downside of requiring spin_lock() / > spin_unlock() for > *each* list element when locking all external objects and > validating all > evicted objects. Typically, the amount of extobjs and evicted > objects > shouldn't be excessive, but there might be exceptions, e.g. Xe. > > (2) The dma-resv lock dance. This is convinient for drivers updating > the VA > space from a VM_BIND ioctl() and is especially efficient if such > drivers > have a huge amount of external and/or evicted objects to manage. > However, > the downsides are that it requires a few tricks in drivers > updating the VA > space from the fence signaling path (e.g. job_run()). Design > wise, I'm still > skeptical that it is a good idea to protect internal data > structures with > external locks in a way that it's not clear to callers that a > certain > function would access one of those resources and hence needs > protection. > E.g. it is counter intuitive that drm_gpuvm_bo_put() would > require both the > dma-resv lock of the corresponding object and the VM's dma-resv > lock held. > (Additionally, there were some concerns from amdgpu regarding > flexibility in > terms of using GPUVM for non-VM_BIND uAPIs and compute, however, > AFAICS > those discussions did not complete and to me it's still unclear > why it > wouldn't work.) > > (3) Simply use an internal mutex per list. This adds a tiny (IMHO > negligible) > overhead for drivers updating the VA space from a VM_BIND > ioctl(), namely > a *single* mutex_lock()/mutex_unlock() when locking all external > objects > and validating all evicted objects. And it still requires some > tricks for > drivers updating the VA space from the fence signaling path. > However, it's > as simple as it can be and hence way less error prone as well as > self-contained and hence easy to use. Additionally, it's flexible > in a way > that we don't have any expections on drivers to already hold > certain locks > that the driver in some situation might not be able to acquire in > the first > place. Such an overhead is fully OK IMO, But didn't we conclude at some point that using a mutex in this way isn't possible due to the fact that validate() needs to be able to lock dma_resv, and then we have dma_resv()->mutex->dma_resv()? > > (4) Arbitrary combinations of the above. For instance, the current V5 > implements > both (1) and (2) (as either one or the other). But also (1) and > (3) (as in > (1) additionally to (3)) would be an option, where a driver could > opt-in for > the spinlock dance in case it updates the VA space from the fence > signaling > path. > > I also considered a few other options as well, however, they don't > seem to be > flexible enough. For instance, as by now we could use SRCU for the > external > object list. However, this falls apart once a driver wants to remove > and re-add > extobjs for the same VM_BO instance. (For the same reason it wouldn't > work for > evicted objects.) > > Personally, after seeing the weird implications of (1), (2) and a > combination of > both, I tend to go with (3). Optionally, with an opt-in for (1). The > reason for > the latter is that with (3) the weirdness of (1) by its own mostly > disappears. > > Please let me know what you think, and, of course, other ideas than > the > mentioned ones above are still welcome. Personally, after converting xe to version 5, I think it's pretty convenient for the driver, (although had to add the evict trick), so I think I'd vote for this, even if not currently using the opt-in for (1). /Thomas > > - Danilo > > On Tue, Oct 03, 2023 at 04:21:43PM +0200, Boris Brezillon wrote: > > On Tue, 03 Oct 2023 14:25:56 +0200 > > Thomas Hellström wrote: > > > > > > > > +/** > > > > > > + * get_next_vm_bo_from_list() - get the next vm_bo element > > > > > > + * @__gpuvm: The GPU VM > > > > > > + * @__list_name: The name of the lis
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
Hi, Boris, On Tue, 2023-10-03 at 12:05 +0200, Boris Brezillon wrote: > Hello Thomas, > > On Tue, 3 Oct 2023 10:36:10 +0200 > Thomas Hellström wrote: > > > > +/** > > > + * get_next_vm_bo_from_list() - get the next vm_bo element > > > + * @__gpuvm: The GPU VM > > > + * @__list_name: The name of the list we're iterating on > > > + * @__local_list: A pointer to the local list used to store > > > already iterated items > > > + * @__prev_vm_bo: The previous element we got from > > > drm_gpuvm_get_next_cached_vm_bo() > > > + * > > > + * This helper is here to provide lockless list iteration. > > > Lockless as in, the > > > + * iterator releases the lock immediately after picking the > > > first element from > > > + * the list, so list insertion deletion can happen concurrently. > > > + * > > > + * Elements popped from the original list are kept in a local > > > list, so removal > > > + * and is_empty checks can still happen while we're iterating > > > the list. > > > + */ > > > +#define get_next_vm_bo_from_list(__gpuvm, __list_name, > > > __local_list, __prev_vm_bo) \ > > > + ({ > > > \ > > > + struct drm_gpuvm_bo *__vm_bo = > > > NULL;\ > > > + > > > \ > > > + drm_gpuvm_bo_put(__prev_vm_bo); > > > \ > > > + > > > \ > > > + spin_lock(&(__gpuvm)- > > > >__list_name.lock);\ > > > > Here we unconditionally take the spinlocks while iterating, and the > > main > > point of DRM_GPUVM_RESV_PROTECTED was really to avoid that? > > > > > > > + if (!(__gpuvm)- > > > >__list_name.local_list) \ > > > + (__gpuvm)->__list_name.local_list = > > > __local_list; \ > > > + else > > > \ > > > + WARN_ON((__gpuvm)->__list_name.local_list > > > != __local_list); \ > > > + > > > \ > > > + while (!list_empty(&(__gpuvm)->__list_name.list)) > > > { \ > > > + __vm_bo = list_first_entry(&(__gpuvm)- > > > >__list_name.list,\ > > > + struct > > > drm_gpuvm_bo, \ > > > + > > > list.entry.__list_name); \ > > > + if (kref_get_unless_zero(&__vm_bo->kref)) > > > { > > And unnecessarily grab a reference in the RESV_PROTECTED case. > > > \ > > > + list_move_tail(&(__vm_bo)- > > > >list.entry.__list_name, \ > > > + > > > __local_list); \ > > > + break; > > > \ > > > + } else > > > {\ > > > + list_del_init(&(__vm_bo)- > > > >list.entry.__list_name); \ > > > + __vm_bo = > > > NULL; \ > > > + } > > > \ > > > + } > > > \ > > > + spin_unlock(&(__gpuvm)- > > > >__list_name.lock); \ > > > + > > > \ > > > + __vm_bo; > > > \ > > > + }) > > > > IMHO this lockless list itera
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
Hi Again, On 10/3/23 10:36, Thomas Hellström wrote: Hi, Danilo, On 9/28/23 21:16, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 642 include/drm/drm_gpuvm.h | 240 ++ 2 files changed, 882 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 27100423154b..770bb3d68d1f 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ * } */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. + * + * Elements popped from the original list are kept in a local list, so removal + * and is_empty checks can still happen while we're iterating the list. + */ +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \ + ({ \ + struct drm_gpuvm_bo *__vm_bo = NULL; \ +
Re: [Nouveau] [PATCH drm-misc-next v5 4/6] drm/gpuvm: track/lock/validate external/evicted objects
Hi, Danilo, On 9/28/23 21:16, Danilo Krummrich wrote: Currently the DRM GPUVM offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVM represent a basis for GPU-VM implementations. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Big thanks to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 642 include/drm/drm_gpuvm.h | 240 ++ 2 files changed, 882 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index 27100423154b..770bb3d68d1f 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -82,6 +82,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance, all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -429,6 +444,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, namely drm_gpuvm_prepare_objects() and + * drm_gpuvm_validate(). + * + * Alternatively, drivers can set the &DRM_GPUVM_RESV_PROTECTED flag to indicate + * that the corresponding &dma_resv locks are held in order to protect the + * lists. If &DRM_GPUVM_RESV_PROTECTED is set, internal locking is disabled and + * the corresponding lockdep checks are enabled. This is an optimization for + * drivers which are capable of taking the corresponding &dma_resv locks and + * hence do not require internal locking. */ /** @@ -641,6 +670,195 @@ *} */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. + * + * Elements popped from the original list are kept in a local list, so removal + * and is_empty checks can still happen while we're iterating the list. + */ +#define get_next_vm_bo_from_list(__gpuvm, __list_name, __local_list, __prev_vm_bo) \ + ({ \ + struct drm_gpuvm_bo *__vm_bo = NULL; \ + \ + drm_gpuvm_bo_put(__prev_vm_bo);
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi On 9/20/23 15:48, Christian König wrote: Am 20.09.23 um 15:38 schrieb Thomas Hellström: On 9/20/23 15:06, Christian König wrote: Am 20.09.23 um 14:06 schrieb Thomas Hellström: On 9/20/23 12:51, Christian König wrote: Am 20.09.23 um 09:44 schrieb Thomas Hellström: Hi, On 9/20/23 07:37, Christian König wrote: Am 19.09.23 um 17:23 schrieb Thomas Hellström: On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/20/23 15:06, Christian König wrote: Am 20.09.23 um 14:06 schrieb Thomas Hellström: On 9/20/23 12:51, Christian König wrote: Am 20.09.23 um 09:44 schrieb Thomas Hellström: Hi, On 9/20/23 07:37, Christian König wrote: Am 19.09.23 um 17:23 schrieb Thomas Hellström: On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/20/23 12:51, Christian König wrote: Am 20.09.23 um 09:44 schrieb Thomas Hellström: Hi, On 9/20/23 07:37, Christian König wrote: Am 19.09.23 um 17:23 schrieb Thomas Hellström: On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about the BOs having the VM's dma-resv? Oh, they can be added to the evict list directly (no
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/20/23 09:44, Thomas Hellström wrote: Hi, On 9/20/23 07:37, Christian König wrote: Am 19.09.23 um 17:23 schrieb Thomas Hellström: On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about the BOs having the VM's dma-resv? Oh, they can be added to the evict list directly (no bool needed) in the eviction code, like in v3. S
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, On 9/20/23 07:37, Christian König wrote: Am 19.09.23 um 17:23 schrieb Thomas Hellström: On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about the BOs having the VM's dma-resv? Oh, they can be added to the evict list directly (no bool needed) in the eviction code, like in v3. Since for those we indeed hold the VM
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/19/23 17:16, Danilo Krummrich wrote: On 9/19/23 14:21, Thomas Hellström wrote: Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. What I still miss with this idea is how do we find all the drm_gpuvm_bo structures with the evicted bool set to true? When doing the drm_exec dance we come across all external ones and can add them to the list if needed, but what about the BOs having the VM's dma-resv? Oh, they can be added to the evict list directly (no bool needed) in the eviction code, like in v3. Since for those we indeed hold the VM's dma_resv since it's aliased with the object's dma-resv. /Thomas If you me
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi Christian On 9/19/23 14:07, Christian König wrote: Am 13.09.23 um 17:46 schrieb Danilo Krummrich: On 9/13/23 17:33, Christian König wrote: Am 13.09.23 um 17:15 schrieb Danilo Krummrich: On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Well, more precisely I should have said "don't support GPUVM updated from within fence signaling critical sections". And looking at the code, that doesn't seem what you're doing there. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. What do you mean with "invalidate GPUVM mappings" in this context? drm_gpuvm_bo_evict() should only be called from a ttm_device_funcs::move callback, we should hold the dma-resv lock there. Well the question is which dma-resv lock do we hold? In the move callback we only hold the dma-resv lock of the BO which is moved, but when that is a shared BO then that's not the same as the one for the VM. Correct, Thomas' idea was to use the GEM's dma_resv lock to protect drm_gpuvm_bo::evicted and then actually move the drm_gpuvm_bo to the VM's evicted list once we grabbed all dma-resv locks when locking the VM's BOs using drm_exec. We can remove them from the evicted list on validate(). This way we never touch the evicted list without holding at least the VM's dma-resv lock. Do you have any concerns about that? Scratching my head a bit how that is supposed to work. This implies that you go over all the evicted BOs during validation and not just the one mentioned in the CS. That might work for Vulkan, but is pretty much a no-go for OpenGL. See what the eviction lock in amdgpu is doing for example. The eviction_lock seems to protect a VM state "evicting" of whether any BO that is associated with the VM is currently evicting. At the same time amdgpu protects the eviceted list of the VM with a different lock. So this seems to be entirely unrelated. Tracking a "currently evicting" state is not part of the GPUVM implementation currently and hence nothing would change for amdgpu there. Sorry for the confusion we use different terminology in amdgpu. The eviction lock and evicted state is for the VM page tables, e.g. if the whole VM is currently not used and swapped out or even de-allocated. This is necessary because we have cases where we need to access the VM data without holding the dma-resv lock of this VM. Especially figuring out which parts of an address space contain mappings and which doesn't. I think this is fine, this has nothing to do with lists of evicted GEM objects or external GEM objects, right? Marking mappings (drm_gpuva) as invalidated (DRM_GPUVA_INVALIDATED) or accessing the VA space does not require any dma-resv locks. I hope so, but I'm not 100% sure. This is a requirement which comes with HMM handling, you won't see this with Vulkan (or OpenGL, VAAPI etc..). The invalidation lock on the other hand is what in this discussion is called eviction lock. This one is needed because what I wrote above, during the move callback only the dma-resv of the BO which is moved is locked, but not necessarily the dma-resv of the VM. That's yet another thing, right? This is used to track whether *any* BO that belongs to the VM is currently being evicted, correct? As mentioned, as by now this is not supported in GPUVM and hence would be the same driver specific code with the same driver specifc lock. That is most likely a show stopper using this for OpenGL based workloads as far as I can see. For those you need to able to figure out which non-VM BOs have been evicted and which parts of the VM needs updates. We identify those with a bool in the gpuvm_bo, and that bool is protected by the bo_resv. In essence, the "evicted" list must be made up-to-date with all relevant locks held before traversing in the next exec. If you mean that we need to unbind all vmas of all vms of evicted bos before evicting, We don't do that, at least not in Xe, since evicting we wait for VM idle, and it cant access anything through the stale vmas until they have been revalidated and rebound. /Thomas Regards, Christian. Regards, Christian. On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote: Hi! On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote: On 9/12/23 18:50, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 06:20:32P
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On Thu, 2023-09-14 at 19:25 +0200, Danilo Krummrich wrote: > On 9/14/23 19:21, Thomas Hellström wrote: > > On Thu, 2023-09-14 at 18:36 +0200, Danilo Krummrich wrote: > > > On 9/14/23 15:48, Thomas Hellström wrote: > > > > Hi, Danilo > > > > > > > > Some additional minor comments as xe conversion progresses. > > > > > > > > On 9/9/23 17:31, Danilo Krummrich wrote: > > > > > So far the DRM GPUVA manager offers common infrastructure to > > > > > track GPU VA > > > > > allocations and mappings, generically connect GPU VA mappings > > > > > to > > > > > their > > > > > backing buffers and perform more complex mapping operations > > > > > on > > > > > the GPU VA > > > > > space. > > > > > > > > > > However, there are more design patterns commonly used by > > > > > drivers, > > > > > which > > > > > can potentially be generalized in order to make the DRM GPUVA > > > > > manager > > > > > represent a basic GPU-VM implementation. In this context, > > > > > this > > > > > patch aims > > > > > at generalizing the following elements. > > > > > > > > > > 1) Provide a common dma-resv for GEM objects not being used > > > > > outside of > > > > > this GPU-VM. > > > > > > > > > > 2) Provide tracking of external GEM objects (GEM objects > > > > > which > > > > > are > > > > > shared with other GPU-VMs). > > > > > > > > > > 3) Provide functions to efficiently lock all GEM objects dma- > > > > > resv > > > > > the > > > > > GPU-VM contains mappings of. > > > > > > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM > > > > > contains > > > > > mappings > > > > > of, such that validation of evicted GEM objects is > > > > > accelerated. > > > > > > > > > > 5) Provide some convinience functions for common patterns. > > > > > > > > > > Rather than being designed as a "framework", the target is to > > > > > make all > > > > > features appear as a collection of optional helper functions, > > > > > such that > > > > > drivers are free to make use of the DRM GPUVA managers basic > > > > > functionality and opt-in for other features without setting > > > > > any > > > > > feature > > > > > flags, just by making use of the corresponding functions. > > > > > > > > > > Big kudos to Boris Brezillon for his help to figure out > > > > > locking > > > > > for drivers > > > > > updating the GPU VA space within the fence signalling path. > > > > > > > > > > Suggested-by: Matthew Brost > > > > > Signed-off-by: Danilo Krummrich > > > > > --- > > > > > > > > > > +/** > > > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to > > > > > / > > > > > from a > > > > > + * &drm_gpuvms evicted list > > > > > + * @obj: the &drm_gem_object to add or remove > > > > > + * @evict: indicates whether the object is evicted > > > > > + * > > > > > + * Adds a &drm_gem_object to or removes it from all > > > > > &drm_gpuvms > > > > > evicted > > > > > + * list containing a mapping of this &drm_gem_object. > > > > > + */ > > > > > +void > > > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict) > > > > > +{ > > > > > + struct drm_gpuvm_bo *vm_bo; > > > > > + > > > > > + drm_gem_for_each_gpuvm_bo(vm_bo, obj) { > > > > > + if (evict) > > > > > + drm_gpuvm_bo_list_add(vm_bo, evict); > > > > > + else > > > > > + drm_gpuvm_bo_list_del(vm_bo, evict); > > > > > + } > > > > > +} > > > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict); > > > > > + > > > > > > > > We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) > > > >
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On Thu, 2023-09-14 at 18:36 +0200, Danilo Krummrich wrote: > On 9/14/23 15:48, Thomas Hellström wrote: > > Hi, Danilo > > > > Some additional minor comments as xe conversion progresses. > > > > On 9/9/23 17:31, Danilo Krummrich wrote: > > > So far the DRM GPUVA manager offers common infrastructure to > > > track GPU VA > > > allocations and mappings, generically connect GPU VA mappings to > > > their > > > backing buffers and perform more complex mapping operations on > > > the GPU VA > > > space. > > > > > > However, there are more design patterns commonly used by drivers, > > > which > > > can potentially be generalized in order to make the DRM GPUVA > > > manager > > > represent a basic GPU-VM implementation. In this context, this > > > patch aims > > > at generalizing the following elements. > > > > > > 1) Provide a common dma-resv for GEM objects not being used > > > outside of > > > this GPU-VM. > > > > > > 2) Provide tracking of external GEM objects (GEM objects which > > > are > > > shared with other GPU-VMs). > > > > > > 3) Provide functions to efficiently lock all GEM objects dma-resv > > > the > > > GPU-VM contains mappings of. > > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM contains > > > mappings > > > of, such that validation of evicted GEM objects is > > > accelerated. > > > > > > 5) Provide some convinience functions for common patterns. > > > > > > Rather than being designed as a "framework", the target is to > > > make all > > > features appear as a collection of optional helper functions, > > > such that > > > drivers are free to make use of the DRM GPUVA managers basic > > > functionality and opt-in for other features without setting any > > > feature > > > flags, just by making use of the corresponding functions. > > > > > > Big kudos to Boris Brezillon for his help to figure out locking > > > for drivers > > > updating the GPU VA space within the fence signalling path. > > > > > > Suggested-by: Matthew Brost > > > Signed-off-by: Danilo Krummrich > > > --- > > > > > > +/** > > > + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / > > > from a > > > + * &drm_gpuvms evicted list > > > + * @obj: the &drm_gem_object to add or remove > > > + * @evict: indicates whether the object is evicted > > > + * > > > + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms > > > evicted > > > + * list containing a mapping of this &drm_gem_object. > > > + */ > > > +void > > > +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict) > > > +{ > > > + struct drm_gpuvm_bo *vm_bo; > > > + > > > + drm_gem_for_each_gpuvm_bo(vm_bo, obj) { > > > + if (evict) > > > + drm_gpuvm_bo_list_add(vm_bo, evict); > > > + else > > > + drm_gpuvm_bo_list_del(vm_bo, evict); > > > + } > > > +} > > > +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict); > > > + > > > > We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that > > puts a single gpuvm_bo on the list, the above function could > > perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ). > > Makes sense - gonna change that. > > > > > Reason is some vm's are faulting vms which don't have an evict > > list, but validate from the pagefault handler. Also evict == false > > is dangerous because if called from within an exec, it might remove > > the obj from other vm's evict list before they've had a chance to > > rebind their VMAs. > > > > > static int > > > __drm_gpuva_insert(struct drm_gpuvm *gpuvm, > > > struct drm_gpuva *va) > > > diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h > > > index afa50b9059a2..834bb6d6617e 100644 > > > --- a/include/drm/drm_gpuvm.h > > > +++ b/include/drm/drm_gpuvm.h > > > @@ -26,10 +26,12 @@ > > > */ > > > #include > > > +#include > > > #include > > > #include > > > #include > > > +#include > > > struct drm_gpuvm; > > > struct drm_gpuvm_bo; > > > @@ -259,6 +261,38 @@ struct
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On Thu, 2023-09-14 at 17:27 +0200, Danilo Krummrich wrote: > On 9/14/23 13:32, Thomas Hellström wrote: > > > > On 9/14/23 12:57, Danilo Krummrich wrote: > > > On 9/13/23 14:16, Danilo Krummrich wrote: > > > > > > > > > > > > > > > And validate() can remove it while still holding all dma- > > > > > > resv locks, > > > > > > neat! > > > > > > However, what if two tasks are trying to lock the VA space > > > > > > concurrently? What > > > > > > do we do when the drm_gpuvm_bo's refcount drops to zero in > > > > > > drm_gpuva_unlink()? > > > > > > Are we guaranteed that at this point of time the > > > > > > drm_gpuvm_bo is not > > > > > > on the > > > > > > evicted list? Because otherwise we would call > > > > > > drm_gpuvm_bo_destroy() > > > > > > with the > > > > > > dma-resv lock held, which wouldn't be allowed, since > > > > > > drm_gpuvm_bo_destroy() > > > > > > might drop the last reference to the drm_gem_object and > > > > > > hence we'd > > > > > > potentially > > > > > > free the dma-resv lock while holding it, at least if it's > > > > > > an external > > > > > > object. > > > > > > > > > > Easiest way in this scheme is to think of the lists as being > > > > > protected > > > > > by the vm's resv lock. That means anybody calling unlink() > > > > > must also > > > > > hold the vm's resv lock. (Which is OK from an UAF point of > > > > > view, but > > > > > perhaps not from a locking inversion POW from an async list > > > > > update). > > > > > > > > This would mean that on unlink() we'd need to hold the VM's > > > > resv lock and the > > > > corresponding GEM's resv lock (in case they're not the same > > > > anyways) because the > > > > VM's resv lock would protect the external / evicted object > > > > lists and the GEM > > > > objects resv lock protects the GEM's list of drm_gpuvm_bos and > > > > the > > > > drm_gpuvm_bo's list of drm_gpuvas. > > > > > > As mentioned below the same applies for drm_gpuvm_bo_put() since > > > it might > > > destroy the vm_bo, which includes removing the vm_bo from > > > external / evicted > > > object lists and the GEMs list of vm_bos. > > > > > > As mentioned, if the GEM's dma-resv is different from the VM's > > > dma-resv we need > > > to take both locks. Ultimately, this would mean we need a > > > drm_exec loop, because > > > we can't know the order in which to take these locks. Doing a > > > full drm_exec loop > > > just to put() a vm_bo doesn't sound reasonable to me. > > > > > > Can we instead just have an internal mutex for locking the lists > > > such that we > > > avoid taking and dropping the spinlocks, which we use currently, > > > in a loop? > > > > You'd have the same locking inversion problem with a mutex, right? > > Since in the eviction path you have resv->mutex, from exec you have > > resv->mutex->resv because validate would attempt to grab resv. > > Both lists, evict and extobj, would need to have a separate mutex, > not a common one. > We'd also need a dedicated GEM gpuva lock. Then the only rule would > be that you can't > hold the dma-resv lock when calling put(). Which I admit is not that > nice. > > With the current spinlock solution drivers wouldn't need to worry > about anything locking > related though. So maybe I come back to your proposal of having a > switch for external > locking with dma-resv locks entirely. Such that with external dma- > resv locking I skip > all the spinlocks and add lockdep checks instead. > > I think that makes the most sense in terms of taking advantage of > external dma-resv locking > where possible and on the other hand having a self-contained solution > if not. This should > get all concerns out of the way, yours, Christian's and Boris'. If we need additional locks yes, I'd prefer the opt-in/opt-out spinlock solution, and check back after a while to see if we can remove either option once most pitfalls are hit. Thanks, /Thomas >
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, On 9/14/23 13:54, Boris Brezillon wrote: On Thu, 14 Sep 2023 12:45:44 +0200 Thomas Hellström wrote: On 9/14/23 10:20, Boris Brezillon wrote: On Wed, 13 Sep 2023 15:22:56 +0200 Thomas Hellström wrote: On 9/13/23 13:33, Boris Brezillon wrote: On Wed, 13 Sep 2023 12:39:01 +0200 Thomas Hellström wrote: Hi, On 9/13/23 09:19, Boris Brezillon wrote: On Wed, 13 Sep 2023 17:05:42 +1000 Dave Airlie wrote: On Wed, 13 Sept 2023 at 17:03, Boris Brezillon wrote: On Tue, 12 Sep 2023 18:20:32 +0200 Thomas Hellström wrote: +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() hook will be in this situation (Panthor at the moment, PowerVR soon). I get that Xe and Nouveau don't need that because they update the VM state early (in the ioctl path), but I keep thinking this will hurt us if we don't think it through from the beginning, because once you've set this logic to depend only on resv locks, it will be pretty hard to get back to a solution which lets synchronous VM_BINDs take precedence on asynchronous request, and, with vkQueueBindSparse() passing external deps (plus the fact the VM_BIND queue might be pretty deep), it can take a long time to get your synchronous VM_BIND executed... So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? Deferred _unlink() would not be an issue, since I already defer the drm_gpuva destruction to a wq, it would just a be a matter of moving the _unlink() call there as well. But _link() also takes the GEM gpuva list lock, and that one is bit tricky, in that sm_map() can trigger 2 more _link() calls for the prev/next mappings, which we can't guess until we get to execute the VM update. If we mandate the use of the GEM resv lock, that simply means async VM updates (AKA calling drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone agrees on, then I'd like the APIs that make this sort of async VM update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step* methods, and probably other things) to be dropped, so we don't make it look like it's something we support. BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP? _unlink() makes sure the GEM gpuva list lock is taken, but this can be a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list protection. We make sure we never take this lock while allocating memory to guarantee the dma-signalling path can't deadlock. btw what is the use case for this? do we have actual vulkan applications we know will have problems here? I don't, but I think that's a concern Faith raised at some point (dates back from when I was reading threads describing how VM_BIND on i915 should work, and I was clearly discovering this whole VM_BIND thing at that time, so maybe I misunderstood). it feels like a bit of premature optimisation, but maybe we have use cases. Might be, but that's the sort of thing that would put us in a corner if we don't have a plan for when the needs arise. Besides, if we don't want to support that case because it's too complicated, I'd recommend dropping all the drm_gpuvm APIs that let people think this mode is valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the confusion. Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap. So, you check the VM state with some VM lock held (would be the VM resv in my case), and if the mapping is new (no overlaps with pre-existing mappings), you queue it to the fast-track/sync-VM_BIND queue. What would be missing I guess is a way to know if the mapping is active (MMU has been updated) or pending (MMU update queued to the bind-queue), so I can fast-track mapping/unmapping of active mappings. Ok, so I started modifying the implementation, and quickly realized the overlap test can't be done without your xe_range_fenc
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, Danilo Some additional minor comments as xe conversion progresses. On 9/9/23 17:31, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Big kudos to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- +/** + * drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a + * &drm_gpuvms evicted list + * @obj: the &drm_gem_object to add or remove + * @evict: indicates whether the object is evicted + * + * Adds a &drm_gem_object to or removes it from all &drm_gpuvms evicted + * list containing a mapping of this &drm_gem_object. + */ +void +drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict) +{ + struct drm_gpuvm_bo *vm_bo; + + drm_gem_for_each_gpuvm_bo(vm_bo, obj) { + if (evict) + drm_gpuvm_bo_list_add(vm_bo, evict); + else + drm_gpuvm_bo_list_del(vm_bo, evict); + } +} +EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict); + We need a drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, ...) that puts a single gpuvm_bo on the list, the above function could perhaps be renamed as drm_gpuvm_gem_obj_evict(obj, ). Reason is some vm's are faulting vms which don't have an evict list, but validate from the pagefault handler. Also evict == false is dangerous because if called from within an exec, it might remove the obj from other vm's evict list before they've had a chance to rebind their VMAs. static int __drm_gpuva_insert(struct drm_gpuvm *gpuvm, struct drm_gpuva *va) diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h index afa50b9059a2..834bb6d6617e 100644 --- a/include/drm/drm_gpuvm.h +++ b/include/drm/drm_gpuvm.h @@ -26,10 +26,12 @@ */ #include +#include #include #include #include +#include struct drm_gpuvm; struct drm_gpuvm_bo; @@ -259,6 +261,38 @@ struct drm_gpuvm { * space */ struct dma_resv *resv; + + /** +* @extobj: structure holding the extobj list +*/ + struct { + /** +* @list: &list_head storing &drm_gpuvm_bos serving as +* external object +*/ + struct list_head list; + + /** +* @lock: spinlock to protect the extobj list +*/ + spinlock_t lock; + } extobj; + + /** +* @evict: structure holding the evict list and evict list lock +*/ + struct { + /** +* @list: &list_head storing &drm_gpuvm_bos currently being +* evicted +*/ + struct list_head list; + + /** +* @lock: spinlock to protect the evict list +*/ + spinlock_t lock; + } evict; }; void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm, @@ -268,6 +302,21 @@ void drm_gpuvm_init(struct drm_gpuvm *gpuvm, struct drm_device *drm, const struct drm_gpuvm_ops *ops); void drm_gpuvm_destroy(struct drm_gpuvm *gpuvm); +/** + * drm_gpuvm_is_extobj() - indicates whether the given &drm_gem_object is an + * external object + * @gpuvm: the &drm_gpuvm to check + * @obj: the &drm_gem_object to check + * + * Returns: true if the &drm_gem_object &dma_resv differs from the + * &drm_gpuvms &dma_resv, false otherwise + */ +static inline bool drm_gpuvm_is_extobj(struct drm_gpuvm *gpuvm, + struct drm_gem_object *obj) +{ + return obj && obj->resv != gpuvm->resv; +}
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/14/23 12:57, Danilo Krummrich wrote: On 9/13/23 14:16, Danilo Krummrich wrote: And validate() can remove it while still holding all dma-resv locks, neat! However, what if two tasks are trying to lock the VA space concurrently? What do we do when the drm_gpuvm_bo's refcount drops to zero in drm_gpuva_unlink()? Are we guaranteed that at this point of time the drm_gpuvm_bo is not on the evicted list? Because otherwise we would call drm_gpuvm_bo_destroy() with the dma-resv lock held, which wouldn't be allowed, since drm_gpuvm_bo_destroy() might drop the last reference to the drm_gem_object and hence we'd potentially free the dma-resv lock while holding it, at least if it's an external object. Easiest way in this scheme is to think of the lists as being protected by the vm's resv lock. That means anybody calling unlink() must also hold the vm's resv lock. (Which is OK from an UAF point of view, but perhaps not from a locking inversion POW from an async list update). This would mean that on unlink() we'd need to hold the VM's resv lock and the corresponding GEM's resv lock (in case they're not the same anyways) because the VM's resv lock would protect the external / evicted object lists and the GEM objects resv lock protects the GEM's list of drm_gpuvm_bos and the drm_gpuvm_bo's list of drm_gpuvas. As mentioned below the same applies for drm_gpuvm_bo_put() since it might destroy the vm_bo, which includes removing the vm_bo from external / evicted object lists and the GEMs list of vm_bos. As mentioned, if the GEM's dma-resv is different from the VM's dma-resv we need to take both locks. Ultimately, this would mean we need a drm_exec loop, because we can't know the order in which to take these locks. Doing a full drm_exec loop just to put() a vm_bo doesn't sound reasonable to me. Can we instead just have an internal mutex for locking the lists such that we avoid taking and dropping the spinlocks, which we use currently, in a loop? You'd have the same locking inversion problem with a mutex, right? Since in the eviction path you have resv->mutex, from exec you have resv->mutex->resv because validate would attempt to grab resv. That said, xe currently indeed does the vm+bo exec dance on vma put. One reason why that seemingly horrible construct is good, is that when evicting an extobj and you need to access individual vmas to Zap page table entries or TLB flush, those VMAs are not allowed to go away (we're not refcounting them). Holding the bo resv on gpuva put prevents that from happening. Possibly one could use another mutex to protect the gem->vm_bo list to achieve the same, but we'd need to hold it on gpuva put. /Thomas - Danilo For extobjs an outer lock would be enough in case of Xe, but I really would not like to add even more complexity just to get the spinlock out of the way in case the driver already has an outer lock protecting this path. I must disagree here. These spinlocks and atomic operations are pretty costly and as discussed earlier this type of locking was the reason (at least according to the commit message) that made Christian drop the XArray use in drm_exec for the same set of objects: "The locking overhead is unecessary and measurable". IMHO the spinlock is the added complexity and a single wide lock following the drm locking guidelines set out by Daniel and David should really be the default choice with an opt-in for a spinlock if needed for async and pushing out to a wq is not an option. For the external object list an outer lock would work as long as it's not the dma-resv lock of the corresponding GEM object, since here we actually need to remove the list entry from the external object list on drm_gpuvm_bo_destroy(). It's just a bit weird design wise that drivers would need to take this outer lock on: - drm_gpuvm_bo_extobj_add() - drm_gpuvm_bo_destroy()(and hence also drm_gpuvm_bo_put()) - drm_gpuva_unlink()(because it needs to call drm_gpuvm_bo_put()) - drm_gpuvm_exec_lock() - drm_gpuvm_exec_lock_array() - drm_gpuvm_prepare_range() Given that it seems reasonable to do all the required locking internally. From a design POW, there has been a clear direction in XE to make things similar to mmap() / munmap(), so this outer lock, which in Xe is an rwsem, is used in a similar way as the mmap_lock. It's protecting the page-table structures and vma rb tree, the userptr structures and the extobj list. Basically it's taken early in the exec IOCTL, the VM_BIND ioctl, the compute rebind worker and the pagefault handler, so all of the above are just asserting that it is taken in the correct mode. But strictly with this scheme one could also use the vm's dma_resv for the extobj list since with drm_exec, it's locked before traversing the list. The whole point of this scheme is to rely on locks that you already are supposed to be holding for various reasons and is simple to comprehend. I don't agree that we're suppos
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/14/23 10:20, Boris Brezillon wrote: On Wed, 13 Sep 2023 15:22:56 +0200 Thomas Hellström wrote: On 9/13/23 13:33, Boris Brezillon wrote: On Wed, 13 Sep 2023 12:39:01 +0200 Thomas Hellström wrote: Hi, On 9/13/23 09:19, Boris Brezillon wrote: On Wed, 13 Sep 2023 17:05:42 +1000 Dave Airlie wrote: On Wed, 13 Sept 2023 at 17:03, Boris Brezillon wrote: On Tue, 12 Sep 2023 18:20:32 +0200 Thomas Hellström wrote: +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() hook will be in this situation (Panthor at the moment, PowerVR soon). I get that Xe and Nouveau don't need that because they update the VM state early (in the ioctl path), but I keep thinking this will hurt us if we don't think it through from the beginning, because once you've set this logic to depend only on resv locks, it will be pretty hard to get back to a solution which lets synchronous VM_BINDs take precedence on asynchronous request, and, with vkQueueBindSparse() passing external deps (plus the fact the VM_BIND queue might be pretty deep), it can take a long time to get your synchronous VM_BIND executed... So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? Deferred _unlink() would not be an issue, since I already defer the drm_gpuva destruction to a wq, it would just a be a matter of moving the _unlink() call there as well. But _link() also takes the GEM gpuva list lock, and that one is bit tricky, in that sm_map() can trigger 2 more _link() calls for the prev/next mappings, which we can't guess until we get to execute the VM update. If we mandate the use of the GEM resv lock, that simply means async VM updates (AKA calling drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone agrees on, then I'd like the APIs that make this sort of async VM update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step* methods, and probably other things) to be dropped, so we don't make it look like it's something we support. BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP? _unlink() makes sure the GEM gpuva list lock is taken, but this can be a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list protection. We make sure we never take this lock while allocating memory to guarantee the dma-signalling path can't deadlock. btw what is the use case for this? do we have actual vulkan applications we know will have problems here? I don't, but I think that's a concern Faith raised at some point (dates back from when I was reading threads describing how VM_BIND on i915 should work, and I was clearly discovering this whole VM_BIND thing at that time, so maybe I misunderstood). it feels like a bit of premature optimisation, but maybe we have use cases. Might be, but that's the sort of thing that would put us in a corner if we don't have a plan for when the needs arise. Besides, if we don't want to support that case because it's too complicated, I'd recommend dropping all the drm_gpuvm APIs that let people think this mode is valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the confusion. Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap. So, you check the VM state with some VM lock held (would be the VM resv in my case), and if the mapping is new (no overlaps with pre-existing mappings), you queue it to the fast-track/sync-VM_BIND queue. What would be missing I guess is a way to know if the mapping is active (MMU has been updated) or pending (MMU update queued to the bind-queue), so I can fast-track mapping/unmapping of active mappings. Ok, so I started modifying the implementation, and quickly realized the overlap test can't be done without your xe_range_fence tree because of unmaps. Since we call drm_gpuva_unmap() early/in the IOCTL path (IOW, before the mapping
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi Christian On 9/13/23 16:26, Christian König wrote: Am 13.09.23 um 14:16 schrieb Danilo Krummrich: As mentioned in a different mail thread, the reply is based on the assumption that we don't support anything else than GPUVM updates from the IOCTL. I think that this assumption is incorrect. Vulkan is just once specific use case, but this here should probably be able to handle other use cases as well. Especially with HMM you get the requirement that you need to be able to invalidate GPUVM mappings without grabbing a reservation lock. Are you referring to the MMU range invalidation notifiers here? See what the eviction lock in amdgpu is doing for example. IMO the statement regarding GPUVM updates from the IOCTL mostly refers to the need to protect the evicted- and extobj lists with additional spinlocks. Supporting userptr and faulting will ofc require additional locks / locking mechanisms. But this code doesn't do that yet. Is your concern that these particular spinlocks for these lists are indeed needed? /Thomas Regards, Christian. On Wed, Sep 13, 2023 at 11:14:46AM +0200, Thomas Hellström wrote: Hi! On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote: On 9/12/23 18:50, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote: Hi, Danilo, On 9/9/23 17:31, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma- resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Big kudos to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 516 include/drm/drm_gpuvm.h | 197 ++ 2 files changed, 713 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index f4411047dbb3..8e62a043f719 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -73,6 +73,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance the all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -420,6 +435,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concur
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/13/23 16:01, Boris Brezillon wrote: On Wed, 13 Sep 2023 15:22:56 +0200 Thomas Hellström wrote: On 9/13/23 13:33, Boris Brezillon wrote: On Wed, 13 Sep 2023 12:39:01 +0200 Thomas Hellström wrote: Hi, On 9/13/23 09:19, Boris Brezillon wrote: On Wed, 13 Sep 2023 17:05:42 +1000 Dave Airlie wrote: On Wed, 13 Sept 2023 at 17:03, Boris Brezillon wrote: On Tue, 12 Sep 2023 18:20:32 +0200 Thomas Hellström wrote: +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() hook will be in this situation (Panthor at the moment, PowerVR soon). I get that Xe and Nouveau don't need that because they update the VM state early (in the ioctl path), but I keep thinking this will hurt us if we don't think it through from the beginning, because once you've set this logic to depend only on resv locks, it will be pretty hard to get back to a solution which lets synchronous VM_BINDs take precedence on asynchronous request, and, with vkQueueBindSparse() passing external deps (plus the fact the VM_BIND queue might be pretty deep), it can take a long time to get your synchronous VM_BIND executed... So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? Deferred _unlink() would not be an issue, since I already defer the drm_gpuva destruction to a wq, it would just a be a matter of moving the _unlink() call there as well. But _link() also takes the GEM gpuva list lock, and that one is bit tricky, in that sm_map() can trigger 2 more _link() calls for the prev/next mappings, which we can't guess until we get to execute the VM update. If we mandate the use of the GEM resv lock, that simply means async VM updates (AKA calling drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone agrees on, then I'd like the APIs that make this sort of async VM update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step* methods, and probably other things) to be dropped, so we don't make it look like it's something we support. BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP? _unlink() makes sure the GEM gpuva list lock is taken, but this can be a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list protection. We make sure we never take this lock while allocating memory to guarantee the dma-signalling path can't deadlock. btw what is the use case for this? do we have actual vulkan applications we know will have problems here? I don't, but I think that's a concern Faith raised at some point (dates back from when I was reading threads describing how VM_BIND on i915 should work, and I was clearly discovering this whole VM_BIND thing at that time, so maybe I misunderstood). it feels like a bit of premature optimisation, but maybe we have use cases. Might be, but that's the sort of thing that would put us in a corner if we don't have a plan for when the needs arise. Besides, if we don't want to support that case because it's too complicated, I'd recommend dropping all the drm_gpuvm APIs that let people think this mode is valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the confusion. Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap. So, you check the VM state with some VM lock held (would be the VM resv in my case), and if the mapping is new (no overlaps with pre-existing mappings), you queue it to the fast-track/sync-VM_BIND queue. What would be missing I guess is a way to know if the mapping is active (MMU has been updated) or pending (MMU update queued to the bind-queue), so I can fast-track mapping/unmapping of active mappings. This would leave overlapping sync/async VM updates, which can't happen in practice unless userspace is doing something wrong (sparse bindings always go through vkQueueBindSparse). User-space is allowed to create new bind queues at will, and they execute indepe
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/13/23 13:33, Boris Brezillon wrote: On Wed, 13 Sep 2023 12:39:01 +0200 Thomas Hellström wrote: Hi, On 9/13/23 09:19, Boris Brezillon wrote: On Wed, 13 Sep 2023 17:05:42 +1000 Dave Airlie wrote: On Wed, 13 Sept 2023 at 17:03, Boris Brezillon wrote: On Tue, 12 Sep 2023 18:20:32 +0200 Thomas Hellström wrote: +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() hook will be in this situation (Panthor at the moment, PowerVR soon). I get that Xe and Nouveau don't need that because they update the VM state early (in the ioctl path), but I keep thinking this will hurt us if we don't think it through from the beginning, because once you've set this logic to depend only on resv locks, it will be pretty hard to get back to a solution which lets synchronous VM_BINDs take precedence on asynchronous request, and, with vkQueueBindSparse() passing external deps (plus the fact the VM_BIND queue might be pretty deep), it can take a long time to get your synchronous VM_BIND executed... So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? Deferred _unlink() would not be an issue, since I already defer the drm_gpuva destruction to a wq, it would just a be a matter of moving the _unlink() call there as well. But _link() also takes the GEM gpuva list lock, and that one is bit tricky, in that sm_map() can trigger 2 more _link() calls for the prev/next mappings, which we can't guess until we get to execute the VM update. If we mandate the use of the GEM resv lock, that simply means async VM updates (AKA calling drm_gpuvm_sm_[un]map()) are not an option. And if this is what everyone agrees on, then I'd like the APIs that make this sort of async VM update possible (drm_gpuvm_sm_[un]map(), the drm_gpuvm_ops::sm_step* methods, and probably other things) to be dropped, so we don't make it look like it's something we support. BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP? _unlink() makes sure the GEM gpuva list lock is taken, but this can be a custom lock (see drm_gem_gpuva_set_lock()). In panthor we have panthor_gem_object::gpuva_list_lock that's dedicated the gpuva list protection. We make sure we never take this lock while allocating memory to guarantee the dma-signalling path can't deadlock. btw what is the use case for this? do we have actual vulkan applications we know will have problems here? I don't, but I think that's a concern Faith raised at some point (dates back from when I was reading threads describing how VM_BIND on i915 should work, and I was clearly discovering this whole VM_BIND thing at that time, so maybe I misunderstood). it feels like a bit of premature optimisation, but maybe we have use cases. Might be, but that's the sort of thing that would put us in a corner if we don't have a plan for when the needs arise. Besides, if we don't want to support that case because it's too complicated, I'd recommend dropping all the drm_gpuvm APIs that let people think this mode is valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the confusion. Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap. So, you check the VM state with some VM lock held (would be the VM resv in my case), and if the mapping is new (no overlaps with pre-existing mappings), you queue it to the fast-track/sync-VM_BIND queue. What would be missing I guess is a way to know if the mapping is active (MMU has been updated) or pending (MMU update queued to the bind-queue), so I can fast-track mapping/unmapping of active mappings. This would leave overlapping sync/async VM updates, which can't happen in practice unless userspace is doing something wrong (sparse bindings always go through vkQueueBindSparse). User-space is allowed to create new bind queues at will, and they execute independently save for range overlaps. And the overlapping granularity depends very much on the detail of the r
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, On 9/13/23 09:19, Boris Brezillon wrote: On Wed, 13 Sep 2023 17:05:42 +1000 Dave Airlie wrote: On Wed, 13 Sept 2023 at 17:03, Boris Brezillon wrote: On Tue, 12 Sep 2023 18:20:32 +0200 Thomas Hellström wrote: +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Any driver calling _[un]link() from its drm_gpu_scheduler::run_job() hook will be in this situation (Panthor at the moment, PowerVR soon). I get that Xe and Nouveau don't need that because they update the VM state early (in the ioctl path), but I keep thinking this will hurt us if we don't think it through from the beginning, because once you've set this logic to depend only on resv locks, it will be pretty hard to get back to a solution which lets synchronous VM_BINDs take precedence on asynchronous request, and, with vkQueueBindSparse() passing external deps (plus the fact the VM_BIND queue might be pretty deep), it can take a long time to get your synchronous VM_BIND executed... So this would boil down to either (possibly opt-in) keeping the spinlock approach or pushing the unlink out to a wq then? BTW, as also asked in a reply to Danilo, how do you call unlink from run_job() when it was requiring the obj->dma_resv lock, or was that a WIP? btw what is the use case for this? do we have actual vulkan applications we know will have problems here? I don't, but I think that's a concern Faith raised at some point (dates back from when I was reading threads describing how VM_BIND on i915 should work, and I was clearly discovering this whole VM_BIND thing at that time, so maybe I misunderstood). it feels like a bit of premature optimisation, but maybe we have use cases. Might be, but that's the sort of thing that would put us in a corner if we don't have a plan for when the needs arise. Besides, if we don't want to support that case because it's too complicated, I'd recommend dropping all the drm_gpuvm APIs that let people think this mode is valid/supported (map/remap/unmap hooks in drm_gpuvm_ops, drm_gpuvm_sm_[un]map helpers, etc). Keeping them around just adds to the confusion. Xe allows bypassing the bind-queue with another bind-queue, but to completely avoid dependencies between queues the Operations may not overlap. (And the definition of overlap is currently page-table structure updates may not overlap) but no guarantees are made about priority. /Thomas
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi! On Wed, 2023-09-13 at 01:36 +0200, Danilo Krummrich wrote: > On Tue, Sep 12, 2023 at 09:23:08PM +0200, Thomas Hellström wrote: > > > > On 9/12/23 18:50, Danilo Krummrich wrote: > > > On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote: > > > > Hi, Danilo, > > > > > > > > On 9/9/23 17:31, Danilo Krummrich wrote: > > > > > So far the DRM GPUVA manager offers common infrastructure to > > > > > track GPU VA > > > > > allocations and mappings, generically connect GPU VA mappings > > > > > to their > > > > > backing buffers and perform more complex mapping operations > > > > > on the GPU VA > > > > > space. > > > > > > > > > > However, there are more design patterns commonly used by > > > > > drivers, which > > > > > can potentially be generalized in order to make the DRM GPUVA > > > > > manager > > > > > represent a basic GPU-VM implementation. In this context, > > > > > this patch aims > > > > > at generalizing the following elements. > > > > > > > > > > 1) Provide a common dma-resv for GEM objects not being used > > > > > outside of > > > > > this GPU-VM. > > > > > > > > > > 2) Provide tracking of external GEM objects (GEM objects > > > > > which are > > > > > shared with other GPU-VMs). > > > > > > > > > > 3) Provide functions to efficiently lock all GEM objects dma- > > > > > resv the > > > > > GPU-VM contains mappings of. > > > > > > > > > > 4) Provide tracking of evicted GEM objects the GPU-VM > > > > > contains mappings > > > > > of, such that validation of evicted GEM objects is > > > > > accelerated. > > > > > > > > > > 5) Provide some convinience functions for common patterns. > > > > > > > > > > Rather than being designed as a "framework", the target is to > > > > > make all > > > > > features appear as a collection of optional helper functions, > > > > > such that > > > > > drivers are free to make use of the DRM GPUVA managers basic > > > > > functionality and opt-in for other features without setting > > > > > any feature > > > > > flags, just by making use of the corresponding functions. > > > > > > > > > > Big kudos to Boris Brezillon for his help to figure out > > > > > locking for drivers > > > > > updating the GPU VA space within the fence signalling path. > > > > > > > > > > Suggested-by: Matthew Brost > > > > > Signed-off-by: Danilo Krummrich > > > > > --- > > > > > drivers/gpu/drm/drm_gpuvm.c | 516 > > > > > > > > > > include/drm/drm_gpuvm.h | 197 ++ > > > > > 2 files changed, 713 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > > > b/drivers/gpu/drm/drm_gpuvm.c > > > > > index f4411047dbb3..8e62a043f719 100644 > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > > > @@ -73,6 +73,21 @@ > > > > > * &drm_gem_object list of &drm_gpuvm_bos for an existing > > > > > instance of this > > > > > * particular combination. If not existent a new instance > > > > > is created and linked > > > > > * to the &drm_gem_object. > > > > > + * > > > > > + * &drm_gpuvm_bo structures, since unique for a given > > > > > &drm_gpuvm, are also used > > > > > + * as entry for the &drm_gpuvm's lists of external and > > > > > evicted objects. Those > > > > > + * list are maintained in order to accelerate locking of > > > > > dma-resv locks and > > > > > + * validation of evicted objects bound in a &drm_gpuvm. For > > > > > instance the all > > > > > + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be > > > > > locked by calling > > > > > + * drm_gpuvm_exec_lock(). Once locked drivers can call > > > > >
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
On 9/12/23 18:50, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 06:20:32PM +0200, Thomas Hellström wrote: Hi, Danilo, On 9/9/23 17:31, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Big kudos to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 516 include/drm/drm_gpuvm.h | 197 ++ 2 files changed, 713 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index f4411047dbb3..8e62a043f719 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -73,6 +73,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance the all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -420,6 +435,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, such as drm_gpuvm_validate() and + * drm_gpuvm_prepare_objects(). Every such function contains a particular + * comment and lockdep checks if possible. + * + * Functions adding or removing entries from those lists, such as + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external + * locks being held, e.g. in order to avoid the corresponding list to be + * (safely) modified while potentially being iternated by other API functions. + * However, this is entirely optional. */ /** @@ -632,6 +661,131 @@ * } */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks neede
Re: [Nouveau] [PATCH drm-misc-next v3 6/7] drm/gpuvm: generalize dma_resv/extobj handling and GEM validation
Hi, Danilo, On 9/9/23 17:31, Danilo Krummrich wrote: So far the DRM GPUVA manager offers common infrastructure to track GPU VA allocations and mappings, generically connect GPU VA mappings to their backing buffers and perform more complex mapping operations on the GPU VA space. However, there are more design patterns commonly used by drivers, which can potentially be generalized in order to make the DRM GPUVA manager represent a basic GPU-VM implementation. In this context, this patch aims at generalizing the following elements. 1) Provide a common dma-resv for GEM objects not being used outside of this GPU-VM. 2) Provide tracking of external GEM objects (GEM objects which are shared with other GPU-VMs). 3) Provide functions to efficiently lock all GEM objects dma-resv the GPU-VM contains mappings of. 4) Provide tracking of evicted GEM objects the GPU-VM contains mappings of, such that validation of evicted GEM objects is accelerated. 5) Provide some convinience functions for common patterns. Rather than being designed as a "framework", the target is to make all features appear as a collection of optional helper functions, such that drivers are free to make use of the DRM GPUVA managers basic functionality and opt-in for other features without setting any feature flags, just by making use of the corresponding functions. Big kudos to Boris Brezillon for his help to figure out locking for drivers updating the GPU VA space within the fence signalling path. Suggested-by: Matthew Brost Signed-off-by: Danilo Krummrich --- drivers/gpu/drm/drm_gpuvm.c | 516 include/drm/drm_gpuvm.h | 197 ++ 2 files changed, 713 insertions(+) diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c index f4411047dbb3..8e62a043f719 100644 --- a/drivers/gpu/drm/drm_gpuvm.c +++ b/drivers/gpu/drm/drm_gpuvm.c @@ -73,6 +73,21 @@ * &drm_gem_object list of &drm_gpuvm_bos for an existing instance of this * particular combination. If not existent a new instance is created and linked * to the &drm_gem_object. + * + * &drm_gpuvm_bo structures, since unique for a given &drm_gpuvm, are also used + * as entry for the &drm_gpuvm's lists of external and evicted objects. Those + * list are maintained in order to accelerate locking of dma-resv locks and + * validation of evicted objects bound in a &drm_gpuvm. For instance the all + * &drm_gem_object's &dma_resv of a given &drm_gpuvm can be locked by calling + * drm_gpuvm_exec_lock(). Once locked drivers can call drm_gpuvm_validate() in + * order to validate all evicted &drm_gem_objects. It is also possible to lock + * additional &drm_gem_objects by providing the corresponding parameters to + * drm_gpuvm_exec_lock() as well as open code the &drm_exec loop while making + * use of helper functions such as drm_gpuvm_prepare_range() or + * drm_gpuvm_prepare_objects(). + * + * Every bound &drm_gem_object is treated as external object when its &dma_resv + * structure is different than the &drm_gpuvm's common &dma_resv structure. */ /** @@ -420,6 +435,20 @@ * Subsequent calls to drm_gpuvm_bo_obtain() for the same &drm_gpuvm and * &drm_gem_object must be able to observe previous creations and destructions * of &drm_gpuvm_bos in order to keep instances unique. + * + * The &drm_gpuvm's lists for keeping track of external and evicted objects are + * protected against concurrent insertion / removal and iteration internally. + * + * However, drivers still need ensure to protect concurrent calls to functions + * iterating those lists, such as drm_gpuvm_validate() and + * drm_gpuvm_prepare_objects(). Every such function contains a particular + * comment and lockdep checks if possible. + * + * Functions adding or removing entries from those lists, such as + * drm_gpuvm_bo_evict() or drm_gpuvm_bo_extobj_add() may be called with external + * locks being held, e.g. in order to avoid the corresponding list to be + * (safely) modified while potentially being iternated by other API functions. + * However, this is entirely optional. */ /** @@ -632,6 +661,131 @@ *} */ +/** + * get_next_vm_bo_from_list() - get the next vm_bo element + * @__gpuvm: The GPU VM + * @__list_name: The name of the list we're iterating on + * @__local_list: A pointer to the local list used to store already iterated items + * @__prev_vm_bo: The previous element we got from drm_gpuvm_get_next_cached_vm_bo() + * + * This helper is here to provide lockless list iteration. Lockless as in, the + * iterator releases the lock immediately after picking the first element from + * the list, so list insertion deletion can happen concurrently. Are the list spinlocks needed for that async state update from within the dma-fence critical section we've discussed previously? Otherwise it should be sufficient to protect the lists with the gpuvm's resv (or for the extobj list with an outer lock). If those sp
Re: [Nouveau] [PATCH drm-misc-next v3 5/7] drm/gpuvm: add an abstraction for a VM / BO combination
On 9/12/23 12:06, Danilo Krummrich wrote: On Tue, Sep 12, 2023 at 09:42:44AM +0200, Thomas Hellström wrote: Hi, Danilo On 9/11/23 19:49, Danilo Krummrich wrote: Hi Thomas, On 9/11/23 19:19, Thomas Hellström wrote: Hi, Danilo On 9/9/23 17:31, Danilo Krummrich wrote: This patch adds an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich Did you consider having the drivers embed the struct drm_gpuvm_bo in their own bo definition? I figure that would mean using the gem bo's refcounting and providing a helper to call from the driver's bo release. Looks like that could potentially save a lot of code? Or is there something that won't work with that approach? There are drm_gpuvm_ops::vm_bo_alloc and drm_gpuvm_ops::vm_bo_free callback for drivers to register for that purpose. - Danilo Now after looking a bit deeper, I think actually the question could be rephrased as, why don't we just use the struct drm_gem_object::gpuva struct as the drm_gpuvm_bo in the spirit of keeping things simple? Drivers would then just embed it in their bo subclass and we'd avoid unnecessary fields in the struct drm_gem_object for drivers that don't do VM_BIND yet. struct drm_gem_object::gpuva is just a container containing a list in order to (currently) attach drm_gpuva structs to it and with this patch attach drm_gpuvm_bo structs (combination of BO + VM) to it. Doing the above basically means "leave everything as it is, but move the list_head of drm_gpuvs per GEM to the driver specific BO structure". Having a common connection between GEM objects and drm_gpuva structs was one of the goals of the initial GPUVA manager patch series however. Sure, this won't be per bo and per vm, but it'd really only make a slight difference where we have multiple VMAs per bo, where per-vm per-bo state either needs to be duplicated or attached to a single vma (as in the case of the external bo list). Correct, one implication is that we don't get a per VM and BO abstraction, and hence are left with a list of all drm_gpuva structs having the same backing BO, regardless of the VM. For amdgpu this was always a concern. Now that we want to keep track of external and evicted objects it's going to be a concern for most drivers I guess. Because the only structure we could use for tracking external and evicted objects we are left with (without having a VM_BO abstraction) is struct drm_gpuva. But this structure isn't unique and we need to consider cases where userspace just allocates rather huge BOs and creates tons of mappings from it. Running the full list of drm_gpuva structs (with even the ones from other VMs included) for adding an external or evicted object isn't very efficient. Not to mention that the maintenance when the mapping we've (randomly) picked as an entry for the external/evicted object list is unmapped, but there are still mappings left in the VM with the same backing BO. For the evicted object it's not much of an issue; we maintain a list of vmas needing rebinding for each VM rather than objects evicted, so there is no or very little additional overhead there. The extobj list is indeed a problem if many VMAs are bound to the same bo. Not that the code snippets are complicated, but the list traversals would be excessive. Now, a way to get rid of the VM_BO abstraction would be to use maple trees instead, since then we can store drm_gem_object structs directly for each VM. However, Xe had concerns about using maple trees and preferred lists, plus having maple trees wouldn't get rid of the concerns of amdgpu not having a VM_BO abstraction for cases with tons of VMs and tons of mappings per BO. Hence, having a VM_BO abstraction enabling us to track external/evicted objects with lists seems to satisfy everyone's needs. Indeed this is a tradeoff between a simple implementation that is OK for situations with not many VMs nor VMAs per bo vs a more complex implementation that optimizes for the opposite ca
Re: [Nouveau] [PATCH drm-misc-next v3 5/7] drm/gpuvm: add an abstraction for a VM / BO combination
Hi, Danilo On 9/11/23 19:49, Danilo Krummrich wrote: Hi Thomas, On 9/11/23 19:19, Thomas Hellström wrote: Hi, Danilo On 9/9/23 17:31, Danilo Krummrich wrote: This patch adds an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich Did you consider having the drivers embed the struct drm_gpuvm_bo in their own bo definition? I figure that would mean using the gem bo's refcounting and providing a helper to call from the driver's bo release. Looks like that could potentially save a lot of code? Or is there something that won't work with that approach? There are drm_gpuvm_ops::vm_bo_alloc and drm_gpuvm_ops::vm_bo_free callback for drivers to register for that purpose. - Danilo Now after looking a bit deeper, I think actually the question could be rephrased as, why don't we just use the struct drm_gem_object::gpuva struct as the drm_gpuvm_bo in the spirit of keeping things simple? Drivers would then just embed it in their bo subclass and we'd avoid unnecessary fields in the struct drm_gem_object for drivers that don't do VM_BIND yet. Sure, this won't be per bo and per vm, but it'd really only make a slight difference where we have multiple VMAs per bo, where per-vm per-bo state either needs to be duplicated or attached to a single vma (as in the case of the external bo list). To me that looks like a substantial amount of less code / complexity? /Thomas Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v3 5/7] drm/gpuvm: add an abstraction for a VM / BO combination
On 9/11/23 19:49, Danilo Krummrich wrote: Hi Thomas, On 9/11/23 19:19, Thomas Hellström wrote: Hi, Danilo On 9/9/23 17:31, Danilo Krummrich wrote: This patch adds an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich Did you consider having the drivers embed the struct drm_gpuvm_bo in their own bo definition? I figure that would mean using the gem bo's refcounting and providing a helper to call from the driver's bo release. Looks like that could potentially save a lot of code? Or is there something that won't work with that approach? There are drm_gpuvm_ops::vm_bo_alloc and drm_gpuvm_ops::vm_bo_free callback for drivers to register for that purpose. Ah OK. Thanks, I'll take a deeper look. /Thomas - Danilo Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next v3 5/7] drm/gpuvm: add an abstraction for a VM / BO combination
Hi, Danilo On 9/9/23 17:31, Danilo Krummrich wrote: This patch adds an abstraction layer between the drm_gpuva mappings of a particular drm_gem_object and this GEM object itself. The abstraction represents a combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds a list of drm_gpuvm_bo structures (the structure representing this abstraction), while each drm_gpuvm_bo contains list of mappings of this GEM object. This has multiple advantages: 1) We can use the drm_gpuvm_bo structure to attach it to various lists of the drm_gpuvm. This is useful for tracking external and evicted objects per VM, which is introduced in subsequent patches. 2) Finding mappings of a certain drm_gem_object mapped in a certain drm_gpuvm becomes much cheaper. 3) Drivers can derive and extend the structure to easily represent driver specific states of a BO for a certain GPUVM. The idea of this abstraction was taken from amdgpu, hence the credit for this idea goes to the developers of amdgpu. Cc: Christian König Signed-off-by: Danilo Krummrich Did you consider having the drivers embed the struct drm_gpuvm_bo in their own bo definition? I figure that would mean using the gem bo's refcounting and providing a helper to call from the driver's bo release. Looks like that could potentially save a lot of code? Or is there something that won't work with that approach? Thanks, Thomas
Re: [Nouveau] [PATCH drm-misc-next 2/3] drm/gpuva_mgr: generalize dma_resv/extobj handling and GEM validation
On 8/31/23 18:53, Thomas Hellström (Intel) wrote: Hi, On 8/31/23 13:18, Danilo Krummrich wrote: On Thu, Aug 31, 2023 at 11:04:06AM +0200, Thomas Hellström (Intel) wrote: Hi! On 8/30/23 17:00, Danilo Krummrich wrote: On Wed, Aug 30, 2023 at 03:42:08PM +0200, Thomas Hellström (Intel) wrote: On 8/30/23 14:49, Danilo Krummrich wrote: Hi Thomas, thanks for having a look! On Wed, Aug 30, 2023 at 09:27:45AM +0200, Thomas Hellström (Intel) wrote: Hi, Danilo. Some quick comments since I'm doing some Xe work in this area. Will probably get back with more. On 8/20/23 23:53, Danilo Krummrich wrote: diff --git a/include/drm/drm_gpuva_mgr.h b/include/drm/drm_gpuva_mgr.h index ed8d50200cc3..693e2da3f425 100644 --- a/include/drm/drm_gpuva_mgr.h +++ b/include/drm/drm_gpuva_mgr.h @@ -26,12 +26,16 @@ */ #include +#include +#include #include #include #include +#include struct drm_gpuva_manager; +struct drm_gpuva_gem; struct drm_gpuva_fn_ops; /** @@ -140,7 +144,7 @@ struct drm_gpuva { int drm_gpuva_insert(struct drm_gpuva_manager *mgr, struct drm_gpuva *va); void drm_gpuva_remove(struct drm_gpuva *va); -void drm_gpuva_link(struct drm_gpuva *va); +void drm_gpuva_link(struct drm_gpuva *va, struct drm_gpuva_gem *vm_bo); void drm_gpuva_unlink(struct drm_gpuva *va); struct drm_gpuva *drm_gpuva_find(struct drm_gpuva_manager *mgr, @@ -240,15 +244,137 @@ struct drm_gpuva_manager { * @ops: &drm_gpuva_fn_ops providing the split/merge steps to drivers */ const struct drm_gpuva_fn_ops *ops; + + /** + * @d_obj: Dummy GEM object; used internally to pass the GPU VMs + * dma-resv to &drm_exec. + */ + struct drm_gem_object d_obj; + + /** + * @resv: the &dma_resv for &drm_gem_objects mapped in this GPU VA + * space + */ + struct dma_resv *resv; + + /** + * @exec: the &drm_exec helper to lock external &drm_gem_objects + */ + struct drm_exec exec; + + /** + * @mt_ext: &maple_tree storing external &drm_gem_objects + */ + struct maple_tree mt_ext; Why are you using a maple tree here? Insertion and removal is O(log(n)) instead of O(1) for a list? Having a list of drm_gem_objects directly wouldn't work, as multiple GPU-VMs could have mappings of the same extobj. I considered using the VM_BO abstraction (struct drm_gpuva_gem) as list entry instead, which also seems to be the obvious choice. However, there is a locking conflict. A drm_gem_object keeps a list of drm_gpuva_gems, while each drm_gpuva_gem keeps a list of drm_gpuvas. Both lists are either protected with the dma-resv lock of the corresponding drm_gem_object, or with an external lock provided by the driver (see drm_gem_gpuva_set_lock()). The latter is used by drivers performing changes on the GPUVA space directly from the fence signalling path. Now, similar to what drm_gpuva_link() and drm_gpuva_unlink() are doing already, we'd want to add a drm_gpuva_gem to the extobj list for the first mapping being linked and we'd want to remove it for the last one being unlinked. (Actually we'd want to add the drm_gpuva_gem object to the extobj list even before, because otherwise we'd not acquire it's dma-resv lock of this GEM object through drm_gpuva_manager_lock(). But that's trival, we could do that when we create the drm_gpuva_gem, which we need to do anyways.) Anyway, we'd probably want to keep removing the drm_gpuva_gem from the extobj list from drm_gpuva_unlink() when the last mapping of this BO is unlinked. In order to do so, we'd (as discussed above) either need to hold the outer GPU-VM lock or the GPU-VMs dma-resv lock. Both would be illegal in the case drm_gpuva_unlink() is called from within the fence signalling path. For drivers like XE or Nouveau, we'd at least need to make sure to not mess up the locking hierarchy of GPU-VM lock and dma-resv lock of the corresponding BO. Considering all that, I thought it's probably better to track extobjs separate from the drm_gpuva_gem, hence the maple tree choice. Hm. OK, in Xe we're having a list of the xe_vmas (drm_gpuvas) that point to external objects, or in the case of multiple mappings to the same gem object, only one of the drm_gpuvas is in the list. These are protected by the GPU-VM lock. I don't see a problem with removing those from the fence signalling path, though? I intentionally tried to avoid keeping a list of drm_gpuvas to track extobjs, since this is generic code I don't know how much mappings of an external object the corresponding driver potentially creates. This could become a pretty large list to iterate. Another reason was, that I want to keep the drm_gpuva structure as small as possible, hence avoiding another list_head. Yes, the list might be pretty large, but OTOH you never iterate t
Re: [Nouveau] [PATCH 00/17] Convert TTM to the new fence interface.
On 2014-07-09 14:29, Maarten Lankhorst wrote: This series applies on top of the driver-core-next branch of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git Before converting ttm to the new fence interface I had to fix some drivers to require a reservation before poking with fence_obj. After flipping the switch RCU becomes available instead, and the extra reservations can be dropped again. :-) I've done at least basic testing on all the drivers I've converted at some point, but more testing is definitely welcomed! I'm currently on vacation for the next couple of weeks, so I can't test or review but otherwise Acked-by: Thomas Hellstrom --- Maarten Lankhorst (17): drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers drm/ttm: kill off some members to ttm_validate_buffer drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence drm/ttm: call ttm_bo_wait while inside a reservation drm/ttm: kill fence_lock drm/nouveau: rework to new fence interface drm/radeon: add timeout argument to radeon_fence_wait_seq drm/radeon: use common fence implementation for fences drm/qxl: rework to new fence interface drm/vmwgfx: get rid of different types of fence_flags entirely drm/vmwgfx: rework to new fence interface drm/ttm: flip the switch, and convert to dma_fence drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep drm/radeon: use rcu waits in some ioctls drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab drm/ttm: use rcu in core ttm drivers/gpu/drm/nouveau/core/core/event.c |4 drivers/gpu/drm/nouveau/nouveau_bo.c | 59 +--- drivers/gpu/drm/nouveau/nouveau_display.c | 25 +- drivers/gpu/drm/nouveau/nouveau_fence.c | 431 +++-- drivers/gpu/drm/nouveau/nouveau_fence.h | 22 + drivers/gpu/drm/nouveau/nouveau_gem.c | 55 +--- drivers/gpu/drm/nouveau/nv04_fence.c |4 drivers/gpu/drm/nouveau/nv10_fence.c |4 drivers/gpu/drm/nouveau/nv17_fence.c |2 drivers/gpu/drm/nouveau/nv50_fence.c |2 drivers/gpu/drm/nouveau/nv84_fence.c | 11 - drivers/gpu/drm/qxl/Makefile |2 drivers/gpu/drm/qxl/qxl_cmd.c |7 drivers/gpu/drm/qxl/qxl_debugfs.c | 16 + drivers/gpu/drm/qxl/qxl_drv.h | 20 - drivers/gpu/drm/qxl/qxl_fence.c | 91 -- drivers/gpu/drm/qxl/qxl_kms.c |1 drivers/gpu/drm/qxl/qxl_object.c |2 drivers/gpu/drm/qxl/qxl_object.h |6 drivers/gpu/drm/qxl/qxl_release.c | 172 ++-- drivers/gpu/drm/qxl/qxl_ttm.c | 93 -- drivers/gpu/drm/radeon/radeon.h | 15 - drivers/gpu/drm/radeon/radeon_cs.c| 10 + drivers/gpu/drm/radeon/radeon_device.c| 60 drivers/gpu/drm/radeon/radeon_display.c | 21 + drivers/gpu/drm/radeon/radeon_fence.c | 283 +++ drivers/gpu/drm/radeon/radeon_gem.c | 19 + drivers/gpu/drm/radeon/radeon_object.c|8 - drivers/gpu/drm/radeon/radeon_ttm.c | 34 -- drivers/gpu/drm/radeon/radeon_uvd.c | 10 - drivers/gpu/drm/radeon/radeon_vm.c| 16 + drivers/gpu/drm/ttm/ttm_bo.c | 187 ++--- drivers/gpu/drm/ttm/ttm_bo_util.c | 28 -- drivers/gpu/drm/ttm/ttm_bo_vm.c |3 drivers/gpu/drm/ttm/ttm_execbuf_util.c| 146 +++--- drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c| 47 --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h |1 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 24 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.c | 329 -- drivers/gpu/drm/vmwgfx/vmwgfx_fence.h | 35 +- drivers/gpu/drm/vmwgfx/vmwgfx_resource.c | 43 +-- include/drm/ttm/ttm_bo_api.h |7 include/drm/ttm/ttm_bo_driver.h | 29 -- include/drm/ttm/ttm_execbuf_util.h| 22 + 44 files changed, 1256 insertions(+), 1150 deletions(-) delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] RFC: TTM extra bo space
Jerome Glisse wrote: > On Thu, 2009-11-05 at 20:04 +0200, Pekka Paalanen wrote: > >> On Wed, 4 Nov 2009 17:42:26 + >> Jakob Bornecrantz wrote: >> >> >>> Hi Jerome >>> >>> On 4 nov 2009, at 15.58, Jerome Glisse wrote: >>> Note: For reference my issue is with cursor on old radeon hw, cursor must be in the next 128M from the crtc scanout buffer. We got issue when someone start to resize their screen at which point the scanout buffer can endup after the cursor in vram. Other solution would be to add multiple bo adjacent validation function to ttm (likely less ttm change). >>> Can you solve your problem by being able to place the buffer at >>> certain location? We had the same need but managed to work around >>> it with a quick and dirty hack. >>> >>> Implementing that would mostly be about changing drm_mm.c to >>> handle placing buffers at certain locations. And in TTM core >>> being able to evict buffers that are in that place. >>> >> That sounds like something that could solve an issue in Nouveau >> with nv04 card family. The following is hearsay, but I try to >> describe it. >> >> Of 32 MB of VRAM, the scanout buffer must reside within the >> first 16 MB. Any(?) other buffers do not have this limitation >> e.g. textures. Setting up separate memory spaces for these >> halves, say, TTM VRAM and TTM PRIV1, would be inconvenient, >> because buffers could not be laid accross the boundary. >> >> Does this make sense? >> Nouveau people, did I get this right? >> >> Sorry for the late response to this. I'm mostly changing diapers nowadays... I think this would be excellent. There is a priv member of an mm block that can be a bo reference if needed. I think the semantics should be such that the call would fail with -EBUSY if it can not succeed due to pinned buffers in the way, and otherwise sleep until it succeeds. /Thomas > I am bit busy with bugs right now but i will look into doing > the actual code soon (one my bug need this). > > > -- > Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day > trial. Simplify your report design, integration and deployment - and focus on > what you do best, core application coding. Discover what's new with > Crystal Reports now. http://p.sf.net/sfu/bobj-july > -- > ___ > Dri-devel mailing list > dri-de...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/dri-devel > ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Why was old TTM removed from drm.git?
Michel Dänzer skrev: On Wed, 2009-06-24 at 19:26 +0200, Thomas Hellström wrote: Just prior to the commit I sent out a message explaining what I was going to do and why, but apparently it didn't make it to the list (which seems to be the case of quite a few mails these days). What was the From: address and subject of that mail (or any others that were apparently lost)? I can't seem to find anything in the dri-devel moderation queue mails around the weekend, so apparently it was dropped before it reached mailman. Maybe some sf.net spam filter or something. Hi! Attaching the mail and another one that was stripped. Furthermore I had two patches sent by git-send-email stripped away, but when I routed them through the vmware smtp server they arrived. It might be that sf.net doesn't like my isp email server. /Thomas. --- Begin Message --- Hi! I'm about to push a commit that strips old TTM from the drm git repo. Not sure if anybody uses it for other things than libdrm, but at least that will remove some unused code. The master nouveau driver will be disabled since it depends on old ttm. /Thomas --- End Message --- --- Begin Message --- Okias, The documentation available is the Xorg wiki TTM page (a little outdated) and the ttm header files which are quite well commented. Currently there are three drivers using it 1) The Radeon KMS driver using a subset of the TTM functionality. 2) The Intel moorestown / Poulsbo driver which uses the full TTM functionality including modesetting. Look at the list archives for pointers to that. 3) The openchrome driver in the modesetting-newttm branch. No modesetting yet for that one. Note that the latter 2 drivers are using a tiny TTM user-space interface which is never going to make it to the mainstream kernel. The openChrome driver will be patched up to use a driver-specific version of that interface. /Thomas okias wrote: Hello, exist any documentation related to newttm (except already exist drivers) + any HowTo for 'convert' fb driver + xorg driver to support memory manager + kms? Thanks okias -- Are you an open source citizen? Join us for the Open Source Bridge conference! Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250. Need another reason to go? 24-hour hacker lounge. Register today! http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org -- ___ Dri-devel mailing list dri-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel --- End Message --- ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau
Re: [Nouveau] Why was old TTM removed from drm.git?
Hi, Pekka! I'm sorry for this breakage. I thought drm master was currently used only for libdrm development, but I see now that I didn't pay enough attention. Just prior to the commit I sent out a message explaining what I was going to do and why, but apparently it didn't make it to the list (which seems to be the case of quite a few mails these days). Please feel free to revert that commit and pull it in again when you think it's OK, or perhaps anchor a nouveau drm branch just ahead of this commit. The old TTM is completely unmaintained and blocks any attempt to bring the drm.git master repo reasonably up to sync with what's in the kernel, so the sooner it goes away the better. Again, Sorry for the breakage. /Thomas Pekka Paalanen wrote: > Hi Thomas, > > I meant to ask you this 24h ago: > why did you deliberately break Nouveau in drm.git? > > The commit 9a33f62be1c478334572ea9384af60 "drm: Strip old ttm." not only > removes the old TTM, it explicitly removes Nouveau from Makefile. IMHO > this means you knew it broke Nouveau and you did not care. And there is > no explanation in the commit as to why. Did you ask any Nouveau developer > if it was ok? > > It is true, that Nouveau is aiming for its own kernel tree for DRM > development, but it is not for all end users yet. Majority of our users > use the drm.git kernel modules for Nouveau, and now we get to explain > how it is not there anymore, and we have no guide to point them to. > > 2.6.31-rc1 is coming, bearing newTTM. If you had waited a week, we would > have had the kernel tree up and user documentation written on how to > build that, and breaking Nouveau in drm.git would not have mattered. > > Well, rc1 is so near, that we probably won't bother fixing Nouveau in > drm.git. I just wish the transition would have been Nouveau developers' > decision. > > > Regards, pq > > ___ Nouveau mailing list Nouveau@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/nouveau