Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
On 08/29/2018 04:53 PM, Christian König wrote: Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei): On 08/28/2018 08:40 PM, Emily Deng wrote: Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. IMO, the evicted bo should be created previously. On BO creation we will add it to the bo->va as below: amdgpu_gem_create_ioctl drm_gem_handle_create amdgpu_gem_object_open And here is the problem. Between creating the BO and opening it in the client the BO can be evicted. Thanks to explain that. It's the key point, falling in Murphy's law as well. Jerry That's what Emily's patch is handling here. Christian. amdgpu_vm_bo_add amdgpu_vm_bo_base_init list_add_tail(&base->bo_list, &bo->va) Then it could be set moved in bo invalidate when evicting. could you provide a bit more backgroud about the issue? looks a per vm bo is evicted and a new same bo created. Jerry 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); +base->moved = true; } /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
Am 29.08.2018 um 04:52 schrieb Zhang, Jerry (Junwei): On 08/28/2018 08:40 PM, Emily Deng wrote: Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. IMO, the evicted bo should be created previously. On BO creation we will add it to the bo->va as below: amdgpu_gem_create_ioctl drm_gem_handle_create amdgpu_gem_object_open And here is the problem. Between creating the BO and opening it in the client the BO can be evicted. That's what Emily's patch is handling here. Christian. amdgpu_vm_bo_add amdgpu_vm_bo_base_init list_add_tail(&base->bo_list, &bo->va) Then it could be set moved in bo invalidate when evicting. could you provide a bit more backgroud about the issue? looks a per vm bo is evicted and a new same bo created. Jerry 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); + base->moved = true; } /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
On 08/28/2018 08:40 PM, Emily Deng wrote: Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. IMO, the evicted bo should be created previously. On BO creation we will add it to the bo->va as below: amdgpu_gem_create_ioctl drm_gem_handle_create amdgpu_gem_object_open amdgpu_vm_bo_add amdgpu_vm_bo_base_init list_add_tail(&base->bo_list, &bo->va) Then it could be set moved in bo invalidate when evicting. could you provide a bit more backgroud about the issue? looks a per vm bo is evicted and a new same bo created. Jerry 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); + base->moved = true; } /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
>-Original Message- >From: amd-gfx On Behalf Of >zhoucm1 >Sent: Wednesday, August 29, 2018 10:12 AM >To: Koenig, Christian ; Deng, Emily >; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo > > > >On 2018年08月28日 20:47, Christian König wrote: >> Am 28.08.2018 um 14:40 schrieb Emily Deng: >>> Fix the VMC page fault when the running sequence is as below: >>> 1.amdgpu_gem_create_ioctl >>> 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called >>> amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, >>> &bo->va). Even the bo was evicted, it won't set the bo_base->moved. >>> 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called >>> list_move_tail(&base->vm_status, &vm->evicted), but not set the >>> bo_base->moved. >>> 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base- >>moved is >>> not set true, the function amdgpu_vm_bo_insert_map will call >>> list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't >>> validate the swapout bo, as it is only in the moved list, not in the >>> evict list. So VMC page fault occurs. >>> >>> Signed-off-by: Emily Deng >> >> Good catch, patch is Reviewed-by: Christian König >> >Really good debug, Emily, you can add my Reviewed-by: Chunming Zhou > as well if you still don't push it yet. David, thanks you for helping me to find out the root cause, already pushed the patch. >Regards, >David Zhou >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + >>> 1 file changed, 1 insertion(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> index 1f4b8df..015e20e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c >>> @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct >>> amdgpu_vm_bo_base *base, >>> * is validated on next vm use to avoid fault. >>> * */ >>> list_move_tail(&base->vm_status, &vm->evicted); >>> + base->moved = true; >>> } >>> /** >> >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > >___ >amd-gfx mailing list >amd-gfx@lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
On 2018年08月28日 20:47, Christian König wrote: Am 28.08.2018 um 14:40 schrieb Emily Deng: Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng Good catch, patch is Reviewed-by: Christian König Really good debug, Emily, you can add my Reviewed-by: Chunming Zhou as well if you still don't push it yet. Regards, David Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); + base->moved = true; } /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Need to set moved to true when evict bo
Am 28.08.2018 um 14:40 schrieb Emily Deng: Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng Good catch, patch is Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); + base->moved = true; } /** ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: Need to set moved to true when evict bo
Fix the VMC page fault when the running sequence is as below: 1.amdgpu_gem_create_ioctl 2.ttm_bo_swapout->amdgpu_vm_bo_invalidate, as not called amdgpu_vm_bo_base_init, so won't called list_add_tail(&base->bo_list, &bo->va). Even the bo was evicted, it won't set the bo_base->moved. 3.drm_gem_open_ioctl->amdgpu_vm_bo_base_init, here only called list_move_tail(&base->vm_status, &vm->evicted), but not set the bo_base->moved. 4.amdgpu_vm_bo_map->amdgpu_vm_bo_insert_map, as the bo_base->moved is not set true, the function amdgpu_vm_bo_insert_map will call list_move(&bo_va->base.vm_status, &vm->moved) 5.amdgpu_cs_ioctl won't validate the swapout bo, as it is only in the moved list, not in the evict list. So VMC page fault occurs. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 1f4b8df..015e20e 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -172,6 +172,7 @@ static void amdgpu_vm_bo_base_init(struct amdgpu_vm_bo_base *base, * is validated on next vm use to avoid fault. * */ list_move_tail(&base->vm_status, &vm->evicted); + base->moved = true; } /** -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx