amdgpu: reset fpriv vram_lost_counter

zhoucm1 Tue, 16 May 2017 21:32:50 -0700


On 2017年05月17日 11:15, Michel Dänzer wrote:

On 17/05/17 12:04 PM, zhoucm1 wrote:

On 2017年05月17日 09:18, Michel Dänzer wrote:

On 16/05/17 06:25 PM, Chunming Zhou wrote:

Change-Id: I8eb6d7f558da05510e429d3bf1d48c8cec6c1977
Signed-off-by: Chunming Zhou <[email protected]>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 +++
   include/uapi/drm/amdgpu_drm.h          | 1 +
   2 files changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index bca1fb5..f3e7525 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2547,6 +2547,9 @@ int amdgpu_vm_ioctl(struct drm_device *dev,
void *data, struct drm_file *filp)
       case AMDGPU_VM_OP_UNRESERVE_VMID:
           amdgpu_vm_free_reserved_vmid(adev, &fpriv->vm, AMDGPU_GFXHUB);
           break;
+    case AMDGPU_VM_OP_RESET:
+        fpriv->vram_lost_counter =
atomic_read(&adev->vram_lost_counter);
+        break;

How do you envision the UMDs using this? I can mostly think of them
calling this ioctl when a context is created or destroyed. But that
would also allow any other remaining contexts using the same DRM file
descriptor to use all ioctls again. So, I think there needs to be a
vram_lost_counter in struct amdgpu_ctx instead of in struct amdgpu_fpriv.

struct amdgpu_fpriv for vram_lost_counter is proper place, especially
for ioctl return value.
if you need to reset ctx one by one, we can mark all contexts of that
vm, and then reset by userspace.

I'm not following. With vram_lost_counter in amdgpu_fpriv, if any
context calls this ioctl, all other contexts using the same file
descriptor will also be considered safe again, right?

Yes, but it really depends on userspace requirement, if you need toreset ctx one by one, we can mark all contexts of that vm to guilty, andthen reset one context by userspace.

So I'm still not
sure how this is supposed to be used by the UMDs. Can you describe your
idea for that?

Correct first, this idea is picked up from Christian. We just one toprovide a possibility to handle ENODEV and recover system, rather thanjust system dead when vram is lost.And how UMDs handle reset? which obviously need to more discussionbetween kernel and userspace.


Regards,
David Zhou

It's hard to be sure whether that's workable for the UMD without at
least a working prototype...

Totally agree, if you can help to do this in userspace, I'd like to
support you from kernel side, or Christian.

I'm busy with other stuff.


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/4] drm/amdgpu: reset fpriv vram_lost_counter

Reply via email to