Device-to-device migration is causing xe_exec_system_allocator --r
*race*no* to intermittently fail with engine resets and a kernel hang on
a page lock. This should work but is clearly buggy somewhere. Disable
device-to-device migration in the interim until the issue can be
root-caused.

The only downside of disabling device-to-device migration is that memory
will bounce through system memory during migration. However, this path
should be rare, as it only occurs when madvise attributes are changed or
atomics are used.

Cc: Thomas Hellström <[email protected]>
Fixes: ec265e1f1cfc ("drm/pagemap: Support source migration over interconnect")
Signed-off-by: Matthew Brost <[email protected]>
---
 drivers/gpu/drm/drm_pagemap.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_pagemap.c b/drivers/gpu/drm/drm_pagemap.c
index aa43a8475100..03ee39a761a4 100644
--- a/drivers/gpu/drm/drm_pagemap.c
+++ b/drivers/gpu/drm/drm_pagemap.c
@@ -480,8 +480,18 @@ int drm_pagemap_migrate_to_devmem(struct 
drm_pagemap_devmem *devmem_allocation,
                .start          = start,
                .end            = end,
                .pgmap_owner    = pagemap->owner,
-               .flags          = MIGRATE_VMA_SELECT_SYSTEM | 
MIGRATE_VMA_SELECT_DEVICE_COHERENT |
-               MIGRATE_VMA_SELECT_DEVICE_PRIVATE,
+               /*
+                * FIXME: MIGRATE_VMA_SELECT_DEVICE_PRIVATE intermittently
+                * causes 'xe_exec_system_allocator --r *race*no*' to trigger aa
+                * engine reset and a hard hang due to getting stuck on a folio
+                * lock. This should work and needs to be root-caused. The only
+                * downside of not selecting MIGRATE_VMA_SELECT_DEVICE_PRIVATE
+                * is that device-to-device migrations won’t work; instead,
+                * memory will bounce through system memory. This path should be
+                * rare and only occur when the madvise attributes of memory are
+                * changed or atomics are being used.
+                */
+               .flags          = MIGRATE_VMA_SELECT_SYSTEM | 
MIGRATE_VMA_SELECT_DEVICE_COHERENT,
        };
        unsigned long i, npages = npages_in_range(start, end);
        unsigned long own_pages = 0, migrated_pages = 0;
-- 
2.34.1

Reply via email to