On Wed, 26 Nov 2025, Sebastian Brzezinka <[email protected]> wrote:
> Hi Krzysztof
>
> On Wed Nov 26, 2025 at 7:49 AM UTC, Krzysztof Karas wrote:
>> igt_mmap_migrate() tests migration with various parameters.
>> In one of the cases, where FILL and UNFAULTABLE flags are set,
>> during first stages of this test, a mock file is opened in
>> igt_mmap_offset(), which results in allocating GEM objects for
>> page table structures and scratch in GPU mappable memory.
>>
>> Then, also in igt_mmap_offset(), the file is closed (fput) and
>> the cleanup of these objects is scheduled on a delayed worqueue,
>> which is designed to execute after unspecified amount of time.
>>
>> Next, the test calls igt_fill_mappable() to fill mappable GPU
>> memory. At this point, three scenarios are possible
>> (N = max size of GPU memory for this test in MiB):
>>  1) the objects allocated for the mock file get cleaned up after
>>     crucial part of the test is over, so the memory is full with
>>     the 1 MiB they occupy and N - 1 MiB added by
>>     igt_fill_mappable(), so the migration fails properly;
>>  2) the object cleanup fires before igt_fill_mappable()
>>     completes, so the whole memory is populated with N MiB from
>>     igt_fill_mappable(), so migration fails as well;
>>  3) the object cleanup is performed right after fill is done,
>>     so only N - 1 MiB are in the mappable portion of GPU memory,
>>     allowing the migration to succeed - we'd expect no space
>>     left to perform migration, but an object was able to fit in
>>     the remaining 1 MiB, which caused get_user() to succeed, so
>>     a page fault did not fail.
>>
>> The test incorrectly assumes that the GPU mappable memory state
>> is unchanging during the test. Amend this by keeping the mock
>> file open until migration and page fault checking is complete.
>>
>> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/13929
>> Signed-off-by: Krzysztof Karas <[email protected]>
>> ---
>> Resending, because the patch did not get any reviews.
>>
>> There are 7 GEM objects in total allocated as a result of mock
>> file creation:
>> a) 1 object from __i915_gem_object_create_lmem_with_ps()
>>  -> i915_gem_object_create_region(),
>> b) 3 objects from i915_vm_alloc_pt_stash() -> alloc_pt()
>>  -> alloc_pt_lmem(),
>> c) 3 objects from i915_vm_alloc_pt_stash() -> alloc_pd()
>>  -> alloc_pt_lmem().
>>
>> I inspected the behavior of this test on ATS-M and DG2
>> platforms. The latter always freed the GEM objects originating
>> from mock file creation at the end of the test. ATS-M, on the
>> other hand, performed the release much sooner - around the time
>> a call to igt_fill_mappable() would be returning, so there was
>> not much leeway in the timing. I confirmed this by delaying the
>> fill operation by one second and that did away with the issue.
>> On the other end, adding a delay to __i915_gem_free_objects()
>> produced 100% reproduction rate of the issue. However, I felt
>> that juggling delays would not have resolved the problem, only
>> decreased the probability of this race, so I decided to increase
>> control over the contents of mappable memory instead.
>>
>> Chris Wilson had a suspicion that this patch papers over leaking
>> vm_area struct, which was addressed in
>> f768ebbba9110846c9f986a96109d70154d60b5d, but that did not
>> resolve the original issue.
>>
>> v2:
>> * change ownership of the file used in igt_mmap_offset*
>>   functions to the caller (Krzysztof, Sebastian);
>> * rename igt_mmap_offset_get_file() to
>>   igt_mmap_offset_with_file();
>>
>> v3:
>> * remove double fput() call (Krzysztof);
>>
>> v4:
>> * extend the comment above mock_drm_getfile();
>> * reword commit message to contain information about GEM
>>   objects;
>> * rebase;
>>
>>  .../drm/i915/gem/selftests/i915_gem_mman.c    | 23 +++++++---
>>  drivers/gpu/drm/i915/selftests/igt_mmap.c     | 46 +++++++++++++------
>>  drivers/gpu/drm/i915/selftests/igt_mmap.h     |  8 ++++
>>  3 files changed, 57 insertions(+), 20 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c 
>> b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> index 0d250d57496a..c561df41ba49 100644
>> --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c
>> @@ -1159,6 +1159,7 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>      struct drm_i915_gem_object *obj;
>>      struct i915_request *rq = NULL;
>>      struct vm_area_struct *area;
>> +    struct file *mock_file;
>>      unsigned long addr;
>>      LIST_HEAD(objects);
>>      u64 offset;
>> @@ -1178,13 +1179,22 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>              goto out_put;
>>  
>>      /*
>> -     * This will eventually create a GEM context, due to opening dummy drm
>> -     * file, which needs a tiny amount of mappable device memory for the top
>> -     * level paging structures(and perhaps scratch), so make sure we
>> -     * allocate early, to avoid tears.
>> +     * Pretend to open("/dev/dri/card0"), which will eventually create a GEM
>> +     * context along with multiple GEM objects (for paging structures and
>> +     * scratch) that are placed in mappable portion of GPU memory.
>> +     * Calling fput() on the file places objects' cleanup routines in 
>> delayed
>> +     * worqueues, which execute after unspecified amount of time.
>> +     * Keep the file open until migration and page fault checks are done to
>> +     * make sure object cleanup is not executed after igt_fill_mappable()
>> +     * finishes and before migration is attempted - that would leave a gap
>> +     * large enough for the migration to succeed, when we'd expect it to 
>> fail.
>>       */
>> -    addr = igt_mmap_offset(i915, offset, obj->base.size,
>> -                           PROT_WRITE, MAP_SHARED);
>> +    mock_file = mock_drm_getfile(i915->drm.primary, O_RDWR);
>> +    if (IS_ERR(mock_file))
>> +            return PTR_ERR(mock_file);
>> +
>> +    addr = igt_mmap_offset_with_file(i915, offset, obj->base.size,
>> +                                     PROT_WRITE, MAP_SHARED, mock_file);
>>      if (IS_ERR_VALUE(addr)) {
>>              err = addr;
>>              goto out_put;
>> @@ -1295,6 +1305,7 @@ static int __igt_mmap_migrate(struct 
>> intel_memory_region **placements,
>>      vm_munmap(addr, obj->base.size);
>>  
>>  out_put:
>> +    fput(mock_file);
>>      i915_gem_object_put(obj);
>>      igt_close_objects(i915, &objects);
>>      return err;
>> diff --git a/drivers/gpu/drm/i915/selftests/igt_mmap.c 
>> b/drivers/gpu/drm/i915/selftests/igt_mmap.c
>> index e920a461bd36..6f1b6d5cc2d3 100644
>> --- a/drivers/gpu/drm/i915/selftests/igt_mmap.c
>> +++ b/drivers/gpu/drm/i915/selftests/igt_mmap.c
>> @@ -9,17 +9,22 @@
>>  #include "i915_drv.h"
>>  #include "igt_mmap.h"
>>  
>> -unsigned long igt_mmap_offset(struct drm_i915_private *i915,
>> -                          u64 offset,
>> -                          unsigned long size,
>> -                          unsigned long prot,
>> -                          unsigned long flags)
>> +unsigned long igt_mmap_offset_with_file(struct drm_i915_private *i915,
>> +                                    u64 offset,
>> +                                    unsigned long size,
>> +                                    unsigned long prot,
>> +                                    unsigned long flags,
>> +                                    struct file *file)
>>  {
>>      struct drm_vma_offset_node *node;
>> -    struct file *file;
>>      unsigned long addr;
>>      int err;
>>  
>> +    if (!file) {
> Minor suggestion, consider using WARN_ON() here for better debugging.

Please remove the whole check instead. It can't happen. Please don't set
the example of checking and error handling things that can't happen.

BR,
Jani.

>
> This patch seems okay from my perspective.
> Reviewed-by: Sebastian Brzezinka <[email protected]>

-- 
Jani Nikula, Intel

Reply via email to