[Why]
Current vm_pte entities have NORMAL priority, in SRIOV multi-vf
use case, the vf flr happens first and then job time out is found.
There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty as it only
has NORMAL priority. This will lead to a page fault after
resubmitting job.

[How]
sdma should always have KERNEL priority. The kernel job will always
be resubmitted.

Signed-off-by: Jingwen Chen <jingwen.ch...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 358316d6a38c..f7526b67cc5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
        INIT_LIST_HEAD(&vm->done);
 
        /* create scheduler entities for page table updates */
-       r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL,
+       r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_KERNEL,
                                  adev->vm_manager.vm_pte_scheds,
                                  adev->vm_manager.vm_pte_num_scheds, NULL);
        if (r)
                return r;
 
-       r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL,
+       r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL,
                                  adev->vm_manager.vm_pte_scheds,
                                  adev->vm_manager.vm_pte_num_scheds, NULL);
        if (r)
-- 
2.25.1

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to