sriov:return -ENODEV if gpu reseted

Christian König Mon, 09 Oct 2017 23:59:32 -0700

As Nicolai explained that approach simply won't work.

The fd is used by more than just the closed source Vulkan driver and Ithink even by some components not developed by AMD (common X code?Michel please comment as well).

So closing it and reopening it to handle a GPU reset is simply not anoption.


Regards,
Christian.

Am 10.10.2017 um 06:26 schrieb Liu, Monk:

After VRAM lost happens, all clients no matter radv/mesa/ogl is useless,

Any drivers uses this FD should be denied by KMD after VRAM lost, and UMD can 
destroy/close this FD and re-open it and rebuild all resources

That's the only option for VRAM lost case



-----Original Message-----
From: Nicolai Hähnle [mailto:nhaeh...@gmail.com]
Sent: 2017年10月9日 19:01
To: Liu, Monk <monk....@amd.com>; Koenig, Christian <christian.koe...@amd.com>; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

On 09.10.2017 10:35, Liu, Monk wrote:

Please be aware that this policy is what the strict mode defined and
what customer want, And also please check VK spec, it defines that
after GPU reset all vk INSTANCE should close/release its
resource/device/ctx and all buffers, and call re-initvkinstance after
gpu reset

Sorry, but you simply cannot implement a correct user-space implementation of 
those specs on top of this.

It will break as soon as you have both OpenGL and Vulkan running in the same 
process (or heck, our Vulkan and radv :)), because both drivers will use the 
same fd.

Cheers,
Nicolai

So this whole approach is what just aligned with the spec, and to not
influence with current MESA/OGL client that's why I put the whole
approach into the strict mode And by default strict mode is not
selected


BR Monk

-----Original Message-----
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
Sent: 2017年10月9日 16:26
To: Liu, Monk <monk....@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu
reseted

Am 30.09.2017 um 08:03 schrieb Monk Liu:

for SRIOV strict mode gpu reset:

In kms open we mark the latest adev->gpu_reset_counter in fpriv we
return -ENODEV in cs_ioctl or info_ioctl if they found
fpriv->gpu_reset_counter != adev->gpu_reset_counter.

this way we prevent a potential bad process/FD from submitting cmds
and notify userspace with -ENODEV.

userspace should close all BO/ctx and re-open dri FD to re-create
virtual memory system for this process

The whole aproach is a NAK from my side.

We need to enable userspace to continue, not force it into process termination 
to recover. Otherwise we could send a SIGTERM in the first place.

Regards,
Christian.

Change-Id: Ib4c179f28a3d0783837566f29de07fc14aa9b9a4
Signed-off-by: Monk Liu <monk....@amd.com>
---
    drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 5 +++++
    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 7 +++++++
    3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index de9c164..b40d4ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -772,6 +772,7 @@ struct amdgpu_fpriv {
        struct idr              bo_list_handles;
        struct amdgpu_ctx_mgr   ctx_mgr;
        u32                     vram_lost_counter;
+       int gpu_reset_counter;
    };

/*

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 9467cf6..6a1515e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1199,6 +1199,11 @@ int amdgpu_cs_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
        if (amdgpu_kms_vram_lost(adev, fpriv))
                return -ENODEV;

+ if (amdgpu_sriov_vf(adev) &&

+               amdgpu_sriov_reset_level == 1 &&
+               fpriv->gpu_reset_counter < 
atomic_read(&adev->gpu_reset_counter))
+               return -ENODEV;
+
        parser.adev = adev;
        parser.filp = filp;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 282f45b..bd389cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -285,6 +285,11 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
        if (amdgpu_kms_vram_lost(adev, fpriv))
                return -ENODEV;

+ if (amdgpu_sriov_vf(adev) &&

+               amdgpu_sriov_reset_level == 1 &&
+               fpriv->gpu_reset_counter < 
atomic_read(&adev->gpu_reset_counter))
+               return -ENODEV;
+
        switch (info->query) {
        case AMDGPU_INFO_ACCEL_WORKING:
                ui32 = adev->accel_working;
@@ -824,6 +829,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
                goto out_suspend;
        }

+ fpriv->gpu_reset_counter = atomic_read(&adev->gpu_reset_counter);

+
        r = amdgpu_vm_init(adev, &fpriv->vm,
                           AMDGPU_VM_CONTEXT_GFX, 0);
        if (r) {


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.



_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 09/12] drm/amdgpu/sriov:return -ENODEV if gpu reseted

Reply via email to