Hi Krzysztof

On Wed Nov 26, 2025 at 7:49 AM UTC, Krzysztof Karas wrote:
> igt_mmap_migrate() tests migration with various parameters.
> In one of the cases, where FILL and UNFAULTABLE flags are set,
> during first stages of this test, a mock file is opened in
> igt_mmap_offset(), which results in allocating GEM objects for
> page table structures and scratch in GPU mappable memory.
>
> Then, also in igt_mmap_offset(), the file is closed (fput) and
> the cleanup of these objects is scheduled on a delayed worqueue,
> which is designed to execute after unspecified amount of time.
>
> Next, the test calls igt_fill_mappable() to fill mappable GPU
> memory. At this point, three scenarios are possible
> (N = max size of GPU memory for this test in MiB):
>  1) the objects allocated for the mock file get cleaned up after
>     crucial part of the test is over, so the memory is full with
>     the 1 MiB they occupy and N - 1 MiB added by
>     igt_fill_mappable(), so the migration fails properly;
>  2) the object cleanup fires before igt_fill_mappable()
>     completes, so the whole memory is populated with N MiB from
>     igt_fill_mappable(), so migration fails as well;
>  3) the object cleanup is performed right after fill is done,
>     so only N - 1 MiB are in the mappable portion of GPU memory,
>     allowing the migration to succeed - we'd expect no space
>     left to perform migration, but an object was able to fit in
>     the remaining 1 MiB, which caused get_user() to succeed, so
>     a page fault did not fail.
>
> The test incorrectly assumes that the GPU mappable memory state
> is unchanging during the test. Amend this by keeping the mock
> file open until migration and page fault checking is complete.
>
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13929
> Signed-off-by: Krzysztof Karas <[email protected]>
> ---
> Resending, because the patch did not get any reviews.
>
> There are 7 GEM objects in total allocated as a result of mock
> file creation:
> a) 1 object from __i915_gem_object_create_lmem_with_ps()
>  -> i915_gem_object_create_region(),
> b) 3 objects from i915_vm_alloc_pt_stash() -> alloc_pt()
>  -> alloc_pt_lmem(),
> c) 3 objects from i915_vm_alloc_pt_stash() -> alloc_pd()
>  -> alloc_pt_lmem().
>
> I inspected the behavior of this test on ATS-M and DG2
> platforms. The latter always freed the GEM objects originating
> from mock file creation at the end of the test. ATS-M, on the
> other hand, performed the release much sooner - around the time
> a call to igt_fill_mappable() would be returning, so there was
> not much leeway in the timing. I confirmed this by delaying the
> fill operation by one second and that did away with the issue.
> On the other end, adding a delay to __i915_gem_free_objects()
> produced 100% reproduction rate of the issue. However, I felt
> that juggling delays would not have resolved the problem, only
> decreased the probability of this race, so I decided to increase
> control over the contents of mappable memory instead.
>
> Chris Wilson had a suspicion that this patch papers over leaking
> vm_area struct, which was addressed in
> f768ebbba9110846c9f986a96109d70154d60b5d, but that did not
> resolve the original issue.
>
> v2:
> * change ownership of the file used in igt_mmap_offset*
>   functions to the caller (Krzysztof, Sebastian);
> * rename igt_mmap_offset_get_file() to
>   igt_mmap_offset_with_file();
>
> v3:
> * remove double fput() call (Krzysztof);
>
> v4:
> * extend the comment above mock_drm_getfile();
> * reword commit message to contain information about GEM
>   objects;
> * rebase;
>
>  .../drm/i915/gem/selftests/i915_gem_mman.c    | 23 +++++++---
>  drivers/gpu/drm/i915/selftests/igt_mmap.c     | 46 +++++++++++++------
>  drivers/gpu/drm/i915/selftests/igt_mmap.h     |  8 ++++
>  3 files changed, 57 insertions(+), 20 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> index 0d250d57496a..c561df41ba49 100644
> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
> @@ -1159,6 +1159,7 @@ static int __igt_mmap_migrate(struct 
> intel_memory_region **placements,
>       struct drm_i915_gem_object *obj;
>       struct i915_request *rq = NULL;
>       struct vm_area_struct *area;
> +     struct file *mock_file;
>       unsigned long addr;
>       LIST_HEAD(objects);
>       u64 offset;
> @@ -1178,13 +1179,22 @@ static int __igt_mmap_migrate(struct 
> intel_memory_region **placements,
>               goto out_put;
>  
>       /*
> -      * This will eventually create a GEM context, due to opening dummy drm
> -      * file, which needs a tiny amount of mappable device memory for the top
> -      * level paging structures(and perhaps scratch), so make sure we
> -      * allocate early, to avoid tears.
> +      * Pretend to open("/dev/dri/card0"), which will eventually create a GEM
> +      * context along with multiple GEM objects (for paging structures and
> +      * scratch) that are placed in mappable portion of GPU memory.
> +      * Calling fput() on the file places objects' cleanup routines in 
> delayed
> +      * worqueues, which execute after unspecified amount of time.
> +      * Keep the file open until migration and page fault checks are done to
> +      * make sure object cleanup is not executed after igt_fill_mappable()
> +      * finishes and before migration is attempted - that would leave a gap
> +      * large enough for the migration to succeed, when we'd expect it to 
> fail.
>        */
> -     addr = igt_mmap_offset(i915, offset, obj->base.size,
> -                            PROT_WRITE, MAP_SHARED);
> +     mock_file = mock_drm_getfile(i915->drm.primary, O_RDWR);
> +     if (IS_ERR(mock_file))
> +             return PTR_ERR(mock_file);
> +
> +     addr = igt_mmap_offset_with_file(i915, offset, obj->base.size,
> +                                      PROT_WRITE, MAP_SHARED, mock_file);
>       if (IS_ERR_VALUE(addr)) {
>               err = addr;
>               goto out_put;
> @@ -1295,6 +1305,7 @@ static int __igt_mmap_migrate(struct 
> intel_memory_region **placements,
>       vm_munmap(addr, obj->base.size);
>  
>  out_put:
> +     fput(mock_file);
>       i915_gem_object_put(obj);
>       igt_close_objects(i915, &objects);
>       return err;
> diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.c 
> b/drivers/gpu/drm/i915/selftests/igt_mmap.c
> index e920a461bd36..6f1b6d5cc2d3 100644
> --- a/drivers/gpu/drm/i915/selftests/igt_mmap.c
> +++ b/drivers/gpu/drm/i915/selftests/igt_mmap.c
> @@ -9,17 +9,22 @@
>  #include "i915_drv.h"
>  #include "igt_mmap.h"
>  
> -unsigned long igt_mmap_offset(struct drm_i915_private *i915,
> -                           u64 offset,
> -                           unsigned long size,
> -                           unsigned long prot,
> -                           unsigned long flags)
> +unsigned long igt_mmap_offset_with_file(struct drm_i915_private *i915,
> +                                     u64 offset,
> +                                     unsigned long size,
> +                                     unsigned long prot,
> +                                     unsigned long flags,
> +                                     struct file *file)
>  {
>       struct drm_vma_offset_node *node;
> -     struct file *file;
>       unsigned long addr;
>       int err;
>  
> +     if (!file) {
Minor suggestion, consider using WARN_ON() here for better debugging.

This patch seems okay from my perspective.
Reviewed-by: Sebastian Brzezinka <[email protected]>

-- 
Best regards,
Sebastian

Reply via email to