On Wed, Jan 07, 2026 at 10:27:16AM -0800, Matthew Brost wrote:
> Device-to-device migration is causing xe_exec_system_allocator --r
> *race*no* to intermittently fail with engine resets and a kernel hang on
> a page lock. This should work but is clearly buggy somewhere. Disable
> device-to-device migration in the interim until the issue can be
> root-caused.
> 
> The only downside of disabling device-to-device migration is that memory
> will bounce through system memory during migration. However, this path
> should be rare, as it only occurs when madvise attributes are changed or
> atomics are used.
> 
> Cc: Thomas Hellström <[email protected]>
> Fixes: ec265e1f1cfc ("drm/pagemap: Support source migration over 
> interconnect")
> Signed-off-by: Matthew Brost <[email protected]>
> ---
>  drivers/gpu/drm/drm_pagemap.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
> index aa43a8475100..03ee39a761a4 100644
> --- a/drivers/gpu/drm/drm_pagemap.c
> +++ b/drivers/gpu/drm/drm_pagemap.c
> @@ -480,8 +480,18 @@ int drm_pagemap_migrate_to_devmem(struct 
> drm_pagemap_devmem *devmem_allocation,
>               .start          = start,
>               .end            = end,
>               .pgmap_owner    = pagemap->owner,
> -             .flags          = MIGRATE_VMA_SELECT_SYSTEM | 
> MIGRATE_VMA_SELECT_DEVICE_COHERENT |
> -             MIGRATE_VMA_SELECT_DEVICE_PRIVATE,
> +             /*
> +              * FIXME: MIGRATE_VMA_SELECT_DEVICE_PRIVATE intermittently
> +              * causes 'xe_exec_system_allocator --r *race*no*' to trigger aa

s/aa/an/

Reviewed-by: Francois Dugast <[email protected]>

> +              * engine reset and a hard hang due to getting stuck on a folio
> +              * lock. This should work and needs to be root-caused. The only
> +              * downside of not selecting MIGRATE_VMA_SELECT_DEVICE_PRIVATE
> +              * is that device-to-device migrations won’t work; instead,
> +              * memory will bounce through system memory. This path should be
> +              * rare and only occur when the madvise attributes of memory are
> +              * changed or atomics are being used.
> +              */
> +             .flags          = MIGRATE_VMA_SELECT_SYSTEM | 
> MIGRATE_VMA_SELECT_DEVICE_COHERENT,
>       };
>       unsigned long i, npages = npages_in_range(start, end);
>       unsigned long own_pages = 0, migrated_pages = 0;
> -- 
> 2.34.1
> 

Reply via email to