amdgpu: reset fpriv vram_lost_counter

Christian König Wed, 17 May 2017 01:56:52 -0700

Am 17.05.2017 um 10:46 schrieb zhoucm1:

On 2017年05月17日 16:40, Christian König wrote:
Am 17.05.2017 um 10:01 schrieb Michel Dänzer:
On 17/05/17 04:13 PM, zhoucm1 wrote:
On 2017年05月17日 14:57, Michel Dänzer wrote:
On 17/05/17 01:28 PM, zhoucm1 wrote:
On 2017年05月17日 11:15, Michel Dänzer wrote:
On 17/05/17 12:04 PM, zhoucm1 wrote:
On 2017年05月17日 09:18, Michel Dänzer wrote:
On 16/05/17 06:25 PM, Chunming Zhou wrote:
Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
Signed-off-by: Chunming Zhou <david1.z...@amd.com>

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bca1fb5..f3e7525 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device*dev,
void *data, struct drm_file *filp)
         case AMDGPU_VM_OP_UNRESERVE_VMID:
             amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm,
AMDGPU_GFXHUB);
             break;
+    case AMDGPU_VM_OP_RESET:
+        fpriv->vram_lost_counter =
atomic_read(&adev->vram_lost_counter);
+        break;
How do you envision the UMDs using this? I can mostly think ofthemcalling this ioctl when a context is created or destroyed. Butthatwould also allow any other remaining contexts using the sameDRM filedescriptor to use all ioctls again. So, I think there needs tobe a
vram_lost_counter in struct amdgpu_ctx instead of in struct
amdgpu_fpriv.
struct amdgpu_fpriv for vram_lost_counter is proper place,especially
for ioctl return value.
if you need to reset ctx one by one, we can mark all contextsof that
vm, and then reset by userspace.
I'm not following. With vram_lost_counter in amdgpu_fpriv, if any
context calls this ioctl, all other contexts using the same file
descriptor will also be considered safe again, right?
Yes, but it really depends on userspace requirement, if you need to
reset ctx one by one, we can mark all contexts of that vm toguilty, and
then reset one context by userspace.
Still not sure what you mean by that.
E.g. what do you mean by "guilty"? I thought that refers to thecontextwhich caused a hang. But it seems like you're using it to refer toany
context which hasn't reacted yet to VRAM contents being lost.
When vram is lost, we treat all contexts need to reset.
Essentially, your patches only track VRAM contents being lost per file
descriptor, not per context. I'm not sure (rather skeptical) that this
is suitable for OpenGL UMDs, since state is usually tracked percontext.
Marek / Nicolai?
Oh, yeah that's a good point.
The problem with tracking it per context is that Vulkan also wantsthe ENODEV on the amdgpu_gem_va_ioct() and amdgpu_info_ioctl() whichare context less.
But thinking more about this blocking those two doesn't make muchsense. The VM content can be restored and why should be disallowreading GPU info?
I can re-paste the Vulkan APIs requiring ENODEV:
"
The Vulkan APIs listed below could return VK_ERROR_DEVICE_LOSTaccording to the spec.
I tries to provide a list of u/k interfaces that could be called foreach vk API.

Well those are the Vulkan requirements, but that doesn't necessary meanwe must follow that on the kernel side. Keep in mind that Vulkan can'tmade any requirements towards the kernel driver.

IIRC we already have a query Vulkan can use to figure out if a GPU resethappened or not. So they could use that instead.


Regards,
Christian.

vkCreateDevice

-amdgpu_device_initialize.

-amdgpu_query_gpu_info

vkQueueSubmit

-amdgpu_cs_submit

vkWaitForFences

                amdgpu_cs_wait_fences

vkGetEventStatus

vkQueueWaitIdle

vkDeviceWaitIdle

vkGetQueryPoolResults**

                amdgpu_cs_query_Fence_status

vkQueueBindSparse**

                amdgpu_bo_va_op

                amdgpu_bo_va_op_raw

vkCreateSwapchainKHR**

vkAcquireNextImageKHR**

vkQueuePresentKHR

                Not related with u/k interface.**

**

Besides those listed above, I thinkamdgpu_cs_signal_Sem/amdgpu_cs_wait_sem should respond to gpu reset aswell."


Christian.




_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter

Reply via email to