Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König

Am 05.01.22 um 08:34 schrieb JingWen Chen:

On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:

On 2022-01-04 6:36 a.m., Christian König wrote:

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about 
to start or was already executed, but host will do FLR anyway without waiting 
for guest too long


Then we have a major design issue in the SRIOV protocol and really need to 
question this.

How do you want to prevent a race between the hypervisor resetting the hardware 
and the client trying the same because of a timeout?

As far as I can see the procedure should be:
1. We detect that a reset is necessary, either because of a fault a timeout or 
signal from hypervisor.
2. For each of those potential reset sources a work item is send to the single 
workqueue.
3. One of those work items execute first and prepares the reset.
4. We either do the reset our self or notify the hypervisor that we are ready 
for the reset.
5. Cleanup after the reset, eventually resubmit jobs etc..
6. Cancel work items which might have been scheduled from other reset sources.

It does make sense that the hypervisor resets the hardware without waiting for 
the clients for too long, but if we don't follow this general steps we will 
always have a race between the different components.


Monk, just to add to this - if indeed as you say that 'FLR from hypervisor is 
just to notify guest the hw VF FLR is about to start or was already executed, 
but host will do FLR anyway without waiting for guest too long'
and there is no strict waiting from the hypervisor for IDH_READY_TO_RESET to be 
recived from guest before starting the reset then setting in_gpu_reset and 
locking reset_sem from guest side is not really full proof
protection from MMIO accesses by the guest - it only truly helps if hypervisor 
waits for that message before initiation of HW reset.


Hi Andrey, this cannot be done. If somehow guest kernel hangs and never has the 
chance to send the response back, then other VFs will have to wait it reset. 
All the vfs will hang in this case. Or sometimes the mailbox has some delay and 
other VFs will also wait. The user of other VFs will be affected in this case.


Yeah, agree completely with JingWen. The hypervisor is the one in charge 
here, not the guest.


What the hypervisor should do (and it already seems to be designed that 
way) is to send the guest a message that a reset is about to happen and 
give it some time to response appropriately.


The guest on the other hand then tells the hypervisor that all 
processing has stopped and it is ready to restart. If that doesn't 
happen in time the hypervisor should eliminate the guest probably 
trigger even more severe consequences, e.g. restart the whole VM etc...


Christian.


Andrey



Regards,
Christian.

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about 
to start or was already executed, but host will do FLR anyway without waiting 
for guest too long


In other words I strongly think that the current SRIOV reset implementation is 
severely broken and what Andrey is doing is actually fixing it.

It makes the code to crash ... how could it be a fix ?

I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do not 
ruin the logic, Andry or jingwen can try it if needed.

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, January 4, 2022 6:19 PM
To: Chen, JingWen ; Christian König ; Grodzovsky, 
Andrey ; Deng, Emily ; Liu, Monk ; 
dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Chen, Horace ; Chen, JingWen 

Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

Hi Jingwen,

well what I mean is that we need to adjust the implementation in amdgpu to 
actually match the requirements.

Could be that the reset sequence is questionable in general, but I doubt so at 
least for now.

See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen


On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:
>
> On 2022-01-04 6:36 a.m., Christian König wrote:
>> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>>> [AMD Official Use Only]
>>>
> See the FLR request from the hypervisor is just another source of 
> signaling the need for a reset, similar to each job timeout on each 
> queue. Otherwise you have a race condition between the hypervisor and the 
> scheduler.
>>> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is 
>>> about to start or was already executed, but host will do FLR anyway without 
>>> waiting for guest too long
>>>
>>
>> Then we have a major design issue in the SRIOV protocol and really need to 
>> question this.
>>
>> How do you want to prevent a race between the hypervisor resetting the 
>> hardware and the client trying the same because of a timeout?
>>
>> As far as I can see the procedure should be:
>> 1. We detect that a reset is necessary, either because of a fault a timeout 
>> or signal from hypervisor.
>> 2. For each of those potential reset sources a work item is send to the 
>> single workqueue.
>> 3. One of those work items execute first and prepares the reset.
>> 4. We either do the reset our self or notify the hypervisor that we are 
>> ready for the reset.
>> 5. Cleanup after the reset, eventually resubmit jobs etc..
>> 6. Cancel work items which might have been scheduled from other reset 
>> sources.
>>
>> It does make sense that the hypervisor resets the hardware without waiting 
>> for the clients for too long, but if we don't follow this general steps we 
>> will always have a race between the different components.
>
>
> Monk, just to add to this - if indeed as you say that 'FLR from hypervisor is 
> just to notify guest the hw VF FLR is about to start or was already executed, 
> but host will do FLR anyway without waiting for guest too long'
> and there is no strict waiting from the hypervisor for IDH_READY_TO_RESET to 
> be recived from guest before starting the reset then setting in_gpu_reset and 
> locking reset_sem from guest side is not really full proof
> protection from MMIO accesses by the guest - it only truly helps if 
> hypervisor waits for that message before initiation of HW reset.
>
Hi Andrey, this cannot be done. If somehow guest kernel hangs and never has the 
chance to send the response back, then other VFs will have to wait it reset. 
All the vfs will hang in this case. Or sometimes the mailbox has some delay and 
other VFs will also wait. The user of other VFs will be affected in this case.
> Andrey
>
>
>>
>> Regards,
>> Christian.
>>
>> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>>> [AMD Official Use Only]
>>>
> See the FLR request from the hypervisor is just another source of 
> signaling the need for a reset, similar to each job timeout on each 
> queue. Otherwise you have a race condition between the hypervisor and the 
> scheduler.
>>> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is 
>>> about to start or was already executed, but host will do FLR anyway without 
>>> waiting for guest too long
>>>
> In other words I strongly think that the current SRIOV reset 
> implementation is severely broken and what Andrey is doing is actually 
> fixing it.
>>> It makes the code to crash ... how could it be a fix ?
>>>
>>> I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do 
>>> not ruin the logic, Andry or jingwen can try it if needed.
>>>
>>> Thanks
>>> ---
>>> Monk Liu | Cloud GPU & Virtualization Solution | AMD
>>> ---
>>> we are hiring software manager for CVS core team
>>> ---
>>>
>>> -Original Message-
>>> From: Koenig, Christian 
>>> Sent: Tuesday, January 4, 2022 6:19 PM
>>> To: Chen, JingWen ; Christian König 
>>> ; Grodzovsky, Andrey 
>>> ; Deng, Emily ; Liu, Monk 
>>> ; dri-de...@lists.freedesktop.org; 
>>> amd-gfx@lists.freedesktop.org; Chen, Horace ; Chen, 
>>> JingWen 
>>> Cc: dan...@ffwll.ch
>>> Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
>>> protection for SRIOV
>>>
>>> Hi Jingwen,
>>>
>>> well what I mean is that we need to adjust the implementation in amdgpu to 
>>> actually match the requirements.
>>>
>>> Could be that the reset sequence is questionable in general, but I doubt so 
>>> at least for now.
>>>
>>> See the FLR request from the hypervisor is just another source of signaling 
>>> the need for a reset, similar to each job timeout on each queue. Otherwise 
>>> you have a race condition between the hypervisor and the scheduler.
>>>
>>> Properly setting in_gpu_reset is indeed mandatory, but should happen at a 
>>> central place and not in the SRIOV specific code.
>>>
>>> In other words I strongly think that the current SRIOV reset implementation 
>>> is severely 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen


On 2022/1/4 下午7:36, Christian König wrote:
> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>> [AMD Official Use Only]
>>
 See the FLR request from the hypervisor is just another source of 
 signaling the need for a reset, similar to each job timeout on each queue. 
 Otherwise you have a race condition between the hypervisor and the 
 scheduler.
>> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is 
>> about to start or was already executed, but host will do FLR anyway without 
>> waiting for guest too long
>>
>
> Then we have a major design issue in the SRIOV protocol and really need to 
> question this.
>
> How do you want to prevent a race between the hypervisor resetting the 
> hardware and the client trying the same because of a timeout?
>
> As far as I can see the procedure should be:
> 1. We detect that a reset is necessary, either because of a fault a timeout 
> or signal from hypervisor.
> 2. For each of those potential reset sources a work item is send to the 
> single workqueue.

I think Andrey has already used the same ordered work queue to handle the reset 
from both ring timeout and hypervisor. (Patch 5)

So there should be no race between different reset sources. As ring timeout is 
much longer than the world switch time slice(6ms), we should see a reset from 
hypervisor queued into reset domain wq first and after the flr work done, then 
the ring timeout reset queued into reset domain.

> 3. One of those work items execute first and prepares the reset.
> 4. We either do the reset our self or notify the hypervisor that we are ready 
> for the reset.
> 5. Cleanup after the reset, eventually resubmit jobs etc..
> 6. Cancel work items which might have been scheduled from other reset sources.
>
> It does make sense that the hypervisor resets the hardware without waiting 
> for the clients for too long, but if we don't follow this general steps we 
> will always have a race between the different components.

So the reset_sem and in_gpu_reset is to prevent race between 
reset_domain(mostly hypervisor source) and other user spaces(e.g. kfd).

>
> Regards,
> Christian.
>
> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>> [AMD Official Use Only]
>>
 See the FLR request from the hypervisor is just another source of 
 signaling the need for a reset, similar to each job timeout on each queue. 
 Otherwise you have a race condition between the hypervisor and the 
 scheduler.
>> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is 
>> about to start or was already executed, but host will do FLR anyway without 
>> waiting for guest too long
>>
 In other words I strongly think that the current SRIOV reset 
 implementation is severely broken and what Andrey is doing is actually 
 fixing it.
>> It makes the code to crash ... how could it be a fix ?
>>
>> I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do 
>> not ruin the logic, Andry or jingwen can try it if needed.
>>
>> Thanks
>> ---
>> Monk Liu | Cloud GPU & Virtualization Solution | AMD
>> ---
>> we are hiring software manager for CVS core team
>> ---
>>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Tuesday, January 4, 2022 6:19 PM
>> To: Chen, JingWen ; Christian König 
>> ; Grodzovsky, Andrey 
>> ; Deng, Emily ; Liu, Monk 
>> ; dri-de...@lists.freedesktop.org; 
>> amd-gfx@lists.freedesktop.org; Chen, Horace ; Chen, 
>> JingWen 
>> Cc: dan...@ffwll.ch
>> Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
>> for SRIOV
>>
>> Hi Jingwen,
>>
>> well what I mean is that we need to adjust the implementation in amdgpu to 
>> actually match the requirements.
>>
>> Could be that the reset sequence is questionable in general, but I doubt so 
>> at least for now.
>>
>> See the FLR request from the hypervisor is just another source of signaling 
>> the need for a reset, similar to each job timeout on each queue. Otherwise 
>> you have a race condition between the hypervisor and the scheduler.
>>
>> Properly setting in_gpu_reset is indeed mandatory, but should happen at a 
>> central place and not in the SRIOV specific code.
>>
>> In other words I strongly think that the current SRIOV reset implementation 
>> is severely broken and what Andrey is doing is actually fixing it.
>>
>> Regards,
>> Christian.
>>
>> Am 04.01.22 um 10:07 schrieb JingWen Chen:
>>> Hi Christian,
>>> I'm not sure what do you mean by "we need to change SRIOV not the driver".
>>>
>>> Do you mean we should change the reset sequence in SRIOV? This will be a 
>>> huge change for our SRIOV solution.
>>>
>>>   From my point of view, we can directly use amdgpu_device_lock_adev
>>> and amdgpu_device_unlock_adev in flr_work instead of try_lock since no one 
>>> will 

[PATCH v2] drm/amdgpu: Unmap MMIO mappings when device is not unplugged

2022-01-04 Thread Leslie Shi
Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device
is unplugged to prevent crash in GPU initialization failure") makes call to
amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps
MMIO mappings even when device is not unplugged.

Signed-off-by: Leslie Shi 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 12 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 +++
 3 files changed, 34 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 412f377f80b1..16dc16c860cc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3832,6 +3832,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
 static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)
 {
+
/* Clear all CPU mappings pointing to this device */
unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
 
@@ -3912,6 +3913,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
 
 void amdgpu_device_fini_sw(struct amdgpu_device *adev)
 {
+   int idx;
+
amdgpu_fence_driver_sw_fini(adev);
amdgpu_device_ip_fini(adev);
release_firmware(adev->firmware.gpu_info_fw);
@@ -3936,6 +3939,14 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
if ((adev->pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA)
vga_client_register(adev->pdev, NULL, NULL, NULL);
 
+   if (drm_dev_enter(adev_to_drm(adev), )) {
+
+   iounmap(adev->rmmio);
+   adev->rmmio = NULL;
+   amdgpu_device_doorbell_fini(adev);
+   drm_dev_exit(idx);
+   }
+
if (IS_ENABLED(CONFIG_PERF_EVENTS))
amdgpu_pmu_fini(adev);
if (adev->mman.discovery_bin)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 156002db24e1..ff9dc377a3a0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include "amdgpu.h"
@@ -1061,7 +1062,18 @@ int amdgpu_bo_init(struct amdgpu_device *adev)
  */
 void amdgpu_bo_fini(struct amdgpu_device *adev)
 {
+   int idx;
+
amdgpu_ttm_fini(adev);
+
+   if (drm_dev_enter(adev_to_drm(adev), )) {
+
+   if (!adev->gmc.xgmi.connected_to_cpu) {
+   arch_phys_wc_del(adev->gmc.vram_mtrr);
+   arch_io_free_memtype_wc(adev->gmc.aper_base, 
adev->gmc.aper_size);
+   }
+   drm_dev_exit(idx);
+   }
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..ea897feeddd2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -1801,6 +1802,7 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
  */
 void amdgpu_ttm_fini(struct amdgpu_device *adev)
 {
+   int idx;
if (!adev->mman.initialized)
return;
 
@@ -1815,6 +1817,15 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
  NULL, NULL);
amdgpu_ttm_fw_reserve_vram_fini(adev);
 
+   if (drm_dev_enter(adev_to_drm(adev), )) {
+
+   if (adev->mman.aper_base_kaddr)
+   iounmap(adev->mman.aper_base_kaddr);
+   adev->mman.aper_base_kaddr = NULL;
+
+   drm_dev_exit(idx);
+   }
+
amdgpu_vram_mgr_fini(adev);
amdgpu_gtt_mgr_fini(adev);
amdgpu_preempt_mgr_fini(adev);
-- 
2.25.1



RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Shaoyun
[AMD Official Use Only]

I see, I didn't notice you already have  this implemented . so the flr_work 
routine itself is synced now, in this case , I  agree it should be safe to 
remove the in_gpu_reset and  reset_semm in the flr_work. 

Regards
Shaoyun.liu

-Original Message-
From: Grodzovsky, Andrey  
Sent: Tuesday, January 4, 2022 3:55 PM
To: Liu, Shaoyun ; Koenig, Christian 
; Liu, Monk ; Chen, JingWen 
; Christian König ; 
Deng, Emily ; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Chen, Horace 
Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

On 2022-01-04 12:13 p.m., Liu, Shaoyun wrote:

> [AMD Official Use Only]
>
> I mostly agree with the sequences Christian  described .  Just one  thing 
> might need to  discuss here.  For FLR notified from host,  in new sequenceas 
> described  , driver need to reply the  READY_TO_RESET in the  workitem  from 
> a reset  work queue which means inside flr_work, driver can not directly 
> reply to host but need to queue another workqueue .


Can you clarify why 'driver can not directly reply to host but need to queue 
another workqueue' ? To my understating all steps 3-6 in Christian's 
description happen from the same single wq thread serially.


>   For current  code ,  the flr_work for sriov itself is a work queue queued 
> from ISR .  I think we should try to response to the host driver as soon as 
> possible.  Queue another workqueue  inside  the workqueue  doesn't sounds 
> efficient to me.


Check patch 5 please [1] - I just substituted
schedule_work(>virt.flr_work) for
queue_work(adev->reset_domain.wq,>virt.flr_work) so no extra requeue 
here, just instead of sending to system_wq it's sent to dedicated reset wq

[1] -
https://lore.kernel.org/all/2021121400.790842-1-andrey.grodzov...@amd.com/

Andrey


> Anyway, what we need is a working  solution for our project.  So if we need 
> to change the sequence, we  need to make sure it's been tested first and 
> won't break the functionality before the code is landed in the branch .
>
> Regards
> Shaoyun.liu
>
>
> -Original Message-
> From: amd-gfx  On Behalf Of 
> Christian König
> Sent: Tuesday, January 4, 2022 6:36 AM
> To: Liu, Monk ; Chen, JingWen 
> ; Christian König 
> ; Grodzovsky, Andrey 
> ; Deng, Emily ; 
> dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Chen, 
> Horace 
> Cc: dan...@ffwll.ch
> Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
> protection for SRIOV
>
> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>> [AMD Official Use Only]
>>
 See the FLR request from the hypervisor is just another source of 
 signaling the need for a reset, similar to each job timeout on each queue. 
 Otherwise you have a race condition between the hypervisor and the 
 scheduler.
>> No it's not, FLR from hypervisor is just to notify guest the hw VF 
>> FLR is about to start or was already executed, but host will do FLR 
>> anyway without waiting for guest too long
>>
> Then we have a major design issue in the SRIOV protocol and really need to 
> question this.
>
> How do you want to prevent a race between the hypervisor resetting the 
> hardware and the client trying the same because of a timeout?
>
> As far as I can see the procedure should be:
> 1. We detect that a reset is necessary, either because of a fault a timeout 
> or signal from hypervisor.
> 2. For each of those potential reset sources a work item is send to the 
> single workqueue.
> 3. One of those work items execute first and prepares the reset.
> 4. We either do the reset our self or notify the hypervisor that we are ready 
> for the reset.
> 5. Cleanup after the reset, eventually resubmit jobs etc..
> 6. Cancel work items which might have been scheduled from other reset sources.
>
> It does make sense that the hypervisor resets the hardware without waiting 
> for the clients for too long, but if we don't follow this general steps we 
> will always have a race between the different components.
>
> Regards,
> Christian.
>
> Am 04.01.22 um 11:49 schrieb Liu, Monk:
>> [AMD Official Use Only]
>>
 See the FLR request from the hypervisor is just another source of 
 signaling the need for a reset, similar to each job timeout on each queue. 
 Otherwise you have a race condition between the hypervisor and the 
 scheduler.
>> No it's not, FLR from hypervisor is just to notify guest the hw VF 
>> FLR is about to start or was already executed, but host will do FLR 
>> anyway without waiting for guest too long
>>
 In other words I strongly think that the current SRIOV reset 
 implementation is severely broken and what Andrey is doing is actually 
 fixing it.
>> It makes the code to crash ... how could it be a fix ?
>>
>> I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do 
>> not ruin the logic, Andry or jingwen can try it if needed.
>>
>> Thanks
>> 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Andrey Grodzovsky

On 2022-01-04 12:13 p.m., Liu, Shaoyun wrote:


[AMD Official Use Only]

I mostly agree with the sequences Christian  described .  Just one  thing might 
need to  discuss here.  For FLR notified from host,  in new sequenceas 
described  , driver need to reply the  READY_TO_RESET in the  workitem  from a 
reset  work queue which means inside flr_work, driver can not directly reply to 
host but need to queue another workqueue .



Can you clarify why 'driver can not directly reply to host but need to 
queue another workqueue' ? To my understating all steps 3-6 in 
Christian's description happen from the same single wq thread serially.




  For current  code ,  the flr_work for sriov itself is a work queue queued 
from ISR .  I think we should try to response to the host driver as soon as 
possible.  Queue another workqueue  inside  the workqueue  doesn't sounds 
efficient to me.



Check patch 5 please [1] - I just substituted 
schedule_work(>virt.flr_work) for 
queue_work(adev->reset_domain.wq,>virt.flr_work) so no extra 
requeue here, just instead of sending to system_wq it's sent

to dedicated reset wq

[1] - 
https://lore.kernel.org/all/2021121400.790842-1-andrey.grodzov...@amd.com/


Andrey



Anyway, what we need is a working  solution for our project.  So if we need to 
change the sequence, we  need to make sure it's been tested first and won't 
break the functionality before the code is landed in the branch .

Regards
Shaoyun.liu


-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Tuesday, January 4, 2022 6:36 AM
To: Liu, Monk ; Chen, JingWen ; Christian König 
; Grodzovsky, Andrey ; Deng, Emily 
; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Chen, Horace 

Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR
is about to start or was already executed, but host will do FLR anyway
without waiting for guest too long


Then we have a major design issue in the SRIOV protocol and really need to 
question this.

How do you want to prevent a race between the hypervisor resetting the hardware 
and the client trying the same because of a timeout?

As far as I can see the procedure should be:
1. We detect that a reset is necessary, either because of a fault a timeout or 
signal from hypervisor.
2. For each of those potential reset sources a work item is send to the single 
workqueue.
3. One of those work items execute first and prepares the reset.
4. We either do the reset our self or notify the hypervisor that we are ready 
for the reset.
5. Cleanup after the reset, eventually resubmit jobs etc..
6. Cancel work items which might have been scheduled from other reset sources.

It does make sense that the hypervisor resets the hardware without waiting for 
the clients for too long, but if we don't follow this general steps we will 
always have a race between the different components.

Regards,
Christian.

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR
is about to start or was already executed, but host will do FLR anyway
without waiting for guest too long


In other words I strongly think that the current SRIOV reset implementation is 
severely broken and what Andrey is doing is actually fixing it.

It makes the code to crash ... how could it be a fix ?

I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do not 
ruin the logic, Andry or jingwen can try it if needed.

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, January 4, 2022 6:19 PM
To: Chen, JingWen ; Christian König
; Grodzovsky, Andrey
; Deng, Emily ; Liu,
Monk ; dri-de...@lists.freedesktop.org;
amd-gfx@lists.freedesktop.org; Chen, Horace ;
Chen, JingWen 
Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset
protection for SRIOV

Hi Jingwen,

well what I mean is that we need to adjust the implementation in amdgpu to 
actually match the requirements.

Could be that the reset sequence is questionable in general, 

Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Alex Deucher
On Tue, Jan 4, 2022 at 12:26 PM Limonciello, Mario
 wrote:
>
> [AMD Official Use Only]
>
>
>
> Maybe it was used more widely previously?
>
> The only place that I found it in use was amdgpu_device_evict_resources.
>
>
>
> From: Deucher, Alexander 
> Sent: Tuesday, January 4, 2022 11:24
> To: Limonciello, Mario ; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time
>
>
>
> [AMD Official Use Only]
>
>
>
> I don't think this will work properly.  The in_s3 flag was mainly for runtime 
> pm vs system suspend.  I'm not sure if in_s0ix is properly handled everywhere 
> we check in_s3.
>

In that case, Acked-by: Alex Deucher 

>
>
> Alex
>
>
>
> 
>
> From: amd-gfx  on behalf of Mario 
> Limonciello 
> Sent: Monday, January 3, 2022 10:23 AM
> To: amd-gfx@lists.freedesktop.org 
> Cc: Limonciello, Mario 
> Subject: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time
>
>
>
> This makes it clearer which codepaths are in use specifically in
> one state or the other.
>
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index db2a9dfd5918..413fecc89e6c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2165,9 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)
>
>  if (amdgpu_acpi_is_s0ix_active(adev))
>  adev->in_s0ix = true;
> -   adev->in_s3 = true;
> +   else
> +   adev->in_s3 = true;
>  r = amdgpu_device_suspend(drm_dev, true);
> -   adev->in_s3 = false;
>  if (r)
>  return r;
>  if (!adev->in_s0ix)
> @@ -2188,6 +2188,8 @@ static int amdgpu_pmops_resume(struct device *dev)
>  r = amdgpu_device_resume(drm_dev, true);
>  if (amdgpu_acpi_is_s0ix_active(adev))
>  adev->in_s0ix = false;
> +   else
> +   adev->in_s3 = false;
>  return r;
>  }
>
> --
> 2.25.1


Re: [PATCH] drm/ttm: Don't inherit GEM object VMAs in child process

2022-01-04 Thread Felix Kuehling
[+Adrian]

Am 2021-12-23 um 2:05 a.m. schrieb Christian König:

> Am 22.12.21 um 21:53 schrieb Daniel Vetter:
>> On Mon, Dec 20, 2021 at 01:12:51PM -0500, Bhardwaj, Rajneesh wrote:
>>
>> [SNIP]
>> Still sounds funky. I think minimally we should have an ack from CRIU
>> developers that this is officially the right way to solve this
>> problem. I
>> really don't want to have random one-off hacks that don't work across
>> the
>> board, for a problem where we (drm subsystem) really shouldn't be the
>> only
>> one with this problem. Where "this problem" means that the mmap space is
>> per file description, and not per underlying inode or real device or
>> whatever. That part sounds like a CRIU problem, and I expect CRIU folks
>> want a consistent solution across the board for this. Hence please
>> grab an
>> ack from them.
>
> Unfortunately it's a KFD design problem. AMD used a single device
> node, then mmaped different objects from the same offset to different
> processes and expected it to work the rest of the fs subsystem without
> churn.

This may be true for mmaps in the KFD device, but not for mmaps in the
DRM render nodes.


>
> So yes, this is indeed because the mmap space is per file descriptor
> for the use case here.

No. This is a different problem.

The problem has to do with the way that DRM manages mmap permissions. In
order to be able to mmap an offset in the render node, there needs to be
a BO that was created in the same render node. If you fork a process, it
inherits the VMA. But KFD doesn't know anything about the inherited BOs
from the parent process. Therefore those BOs don't get checkpointed and
restored in the child process. When the CRIU checkpoint is restored, our
CRIU plugin never creates a BO corresponding to the VMA in the child
process' render node FD. We've also lost the relationship between the
parent and child-process' render node FDs. After "fork" the render node
FD points to the same struct file in parent and child. After restoring
the CRIU checkpoint, they are separate struct files, created by separate
"open" system calls. Therefore the mmap call that restores the VMA fails
in the child process.

At least for KFD, there is no point inheriting BOs from a child process,
because the GPU has no way of accessing the BOs in the child process.
The child process has no GPU address space, no user mode queues, no way
to do anything with the GPU before it completely reinitializes its KFD
context.

We can workaround this issue in user mode with madvise(...,
MADV_DONTFORK). In fact we've already done this for some BOs to avoid a
memory leak in the parent process while a child process exists. But it's
slightly racy because there is a short time window where VMA exists
without the VM_DONTCOPY flag. A fork during that time window could still
create a child process with an inherited VMA.

Therefore a safer solution is to set the vm_flags in the VMA in the
driver when the VMA is first created.

Regards,
  Felix


>
> And thanks for pointing this out, this indeed makes the whole change
> extremely questionable.
>
> Regards,
> Christian.
>
>>
>> Cheers, Daniel
>>
>


RE: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Limonciello, Mario
[AMD Official Use Only]

Maybe it was used more widely previously?
The only place that I found it in use was amdgpu_device_evict_resources.

From: Deucher, Alexander 
Sent: Tuesday, January 4, 2022 11:24
To: Limonciello, Mario ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time


[AMD Official Use Only]

I don't think this will work properly.  The in_s3 flag was mainly for runtime 
pm vs system suspend.  I'm not sure if in_s0ix is properly handled everywhere 
we check in_s3.

Alex


From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 on behalf of Mario Limonciello 
mailto:mario.limoncie...@amd.com>>
Sent: Monday, January 3, 2022 10:23 AM
To: amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Limonciello, Mario 
mailto:mario.limoncie...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

This makes it clearer which codepaths are in use specifically in
one state or the other.

Signed-off-by: Mario Limonciello 
mailto:mario.limoncie...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index db2a9dfd5918..413fecc89e6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2165,9 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)

 if (amdgpu_acpi_is_s0ix_active(adev))
 adev->in_s0ix = true;
-   adev->in_s3 = true;
+   else
+   adev->in_s3 = true;
 r = amdgpu_device_suspend(drm_dev, true);
-   adev->in_s3 = false;
 if (r)
 return r;
 if (!adev->in_s0ix)
@@ -2188,6 +2188,8 @@ static int amdgpu_pmops_resume(struct device *dev)
 r = amdgpu_device_resume(drm_dev, true);
 if (amdgpu_acpi_is_s0ix_active(adev))
 adev->in_s0ix = false;
+   else
+   adev->in_s3 = false;
 return r;
 }

--
2.25.1


Re: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

2022-01-04 Thread Deucher, Alexander
[AMD Official Use Only]

I don't think this will work properly.  The in_s3 flag was mainly for runtime 
pm vs system suspend.  I'm not sure if in_s0ix is properly handled everywhere 
we check in_s3.

Alex


From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Monday, January 3, 2022 10:23 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH 2/2] drm/amdgpu: don't set s3 and s0ix at the same time

This makes it clearer which codepaths are in use specifically in
one state or the other.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index db2a9dfd5918..413fecc89e6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -2165,9 +2165,9 @@ static int amdgpu_pmops_suspend(struct device *dev)

 if (amdgpu_acpi_is_s0ix_active(adev))
 adev->in_s0ix = true;
-   adev->in_s3 = true;
+   else
+   adev->in_s3 = true;
 r = amdgpu_device_suspend(drm_dev, true);
-   adev->in_s3 = false;
 if (r)
 return r;
 if (!adev->in_s0ix)
@@ -2188,6 +2188,8 @@ static int amdgpu_pmops_resume(struct device *dev)
 r = amdgpu_device_resume(drm_dev, true);
 if (amdgpu_acpi_is_s0ix_active(adev))
 adev->in_s0ix = false;
+   else
+   adev->in_s3 = false;
 return r;
 }

--
2.25.1



Re: [PATCH 1/2] drm/amdgpu: explicitly check for s0ix when evicting resources

2022-01-04 Thread Deucher, Alexander
[Public]

Reviewed-by: Alex Deucher 

From: amd-gfx  on behalf of Mario 
Limonciello 
Sent: Monday, January 3, 2022 10:23 AM
To: amd-gfx@lists.freedesktop.org 
Cc: Limonciello, Mario 
Subject: [PATCH 1/2] drm/amdgpu: explicitly check for s0ix when evicting 
resources

This codepath should be running in both s0ix and s3, but only does
currently because s3 and s0ix are both set in the s0ix case.

Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ce93a304292c..412f377f80b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3956,8 +3956,8 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
  */
 static void amdgpu_device_evict_resources(struct amdgpu_device *adev)
 {
-   /* No need to evict vram on APUs for suspend to ram */
-   if (adev->in_s3 && (adev->flags & AMD_IS_APU))
+   /* No need to evict vram on APUs for suspend to ram or s2idle */
+   if ((adev->in_s3 || adev->in_s0ix) && (adev->flags & AMD_IS_APU))
 return;

 if (amdgpu_ttm_evict_resources(adev, TTM_PL_VRAM))
--
2.25.1



RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Shaoyun
[AMD Official Use Only]

I mostly agree with the sequences Christian  described .  Just one  thing might 
need to  discuss here.  For FLR notified from host,  in new sequenceas 
described  , driver need to reply the  READY_TO_RESET in the  workitem  from a 
reset  work queue which means inside flr_work, driver can not directly reply to 
host but need to queue another workqueue . For current  code ,  the flr_work 
for sriov itself is a work queue queued from ISR .  I think we should try to 
response to the host driver as soon as possible.  Queue another workqueue  
inside  the workqueue  doesn't sounds efficient to me.  
Anyway, what we need is a working  solution for our project.  So if we need to 
change the sequence, we  need to make sure it's been tested first and won't 
break the functionality before the code is landed in the branch . 

Regards
Shaoyun.liu


-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Tuesday, January 4, 2022 6:36 AM
To: Liu, Monk ; Chen, JingWen ; 
Christian König ; Grodzovsky, Andrey 
; Deng, Emily ; 
dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Chen, Horace 

Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

Am 04.01.22 um 11:49 schrieb Liu, Monk:
> [AMD Official Use Only]
>
>>> See the FLR request from the hypervisor is just another source of signaling 
>>> the need for a reset, similar to each job timeout on each queue. Otherwise 
>>> you have a race condition between the hypervisor and the scheduler.
> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR 
> is about to start or was already executed, but host will do FLR anyway 
> without waiting for guest too long
>

Then we have a major design issue in the SRIOV protocol and really need to 
question this.

How do you want to prevent a race between the hypervisor resetting the hardware 
and the client trying the same because of a timeout?

As far as I can see the procedure should be:
1. We detect that a reset is necessary, either because of a fault a timeout or 
signal from hypervisor.
2. For each of those potential reset sources a work item is send to the single 
workqueue.
3. One of those work items execute first and prepares the reset.
4. We either do the reset our self or notify the hypervisor that we are ready 
for the reset.
5. Cleanup after the reset, eventually resubmit jobs etc..
6. Cancel work items which might have been scheduled from other reset sources.

It does make sense that the hypervisor resets the hardware without waiting for 
the clients for too long, but if we don't follow this general steps we will 
always have a race between the different components.

Regards,
Christian.

Am 04.01.22 um 11:49 schrieb Liu, Monk:
> [AMD Official Use Only]
>
>>> See the FLR request from the hypervisor is just another source of signaling 
>>> the need for a reset, similar to each job timeout on each queue. Otherwise 
>>> you have a race condition between the hypervisor and the scheduler.
> No it's not, FLR from hypervisor is just to notify guest the hw VF FLR 
> is about to start or was already executed, but host will do FLR anyway 
> without waiting for guest too long
>
>>> In other words I strongly think that the current SRIOV reset implementation 
>>> is severely broken and what Andrey is doing is actually fixing it.
> It makes the code to crash ... how could it be a fix ?
>
> I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do not 
> ruin the logic, Andry or jingwen can try it if needed.
>
> Thanks
> ---
> Monk Liu | Cloud GPU & Virtualization Solution | AMD
> ---
> we are hiring software manager for CVS core team
> ---
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Tuesday, January 4, 2022 6:19 PM
> To: Chen, JingWen ; Christian König 
> ; Grodzovsky, Andrey 
> ; Deng, Emily ; Liu, 
> Monk ; dri-de...@lists.freedesktop.org; 
> amd-gfx@lists.freedesktop.org; Chen, Horace ; 
> Chen, JingWen 
> Cc: dan...@ffwll.ch
> Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
> protection for SRIOV
>
> Hi Jingwen,
>
> well what I mean is that we need to adjust the implementation in amdgpu to 
> actually match the requirements.
>
> Could be that the reset sequence is questionable in general, but I doubt so 
> at least for now.
>
> See the FLR request from the hypervisor is just another source of signaling 
> the need for a reset, similar to each job timeout on each queue. Otherwise 
> you have a race condition between the hypervisor and the scheduler.
>
> Properly setting in_gpu_reset is indeed mandatory, but should happen at a 
> central place and not in the SRIOV specific code.
>
> In other words I strongly think that the current SRIOV reset implementation 
> is 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Andrey Grodzovsky



On 2022-01-04 6:36 a.m., Christian König wrote:

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]

See the FLR request from the hypervisor is just another source of 
signaling the need for a reset, similar to each job timeout on each 
queue. Otherwise you have a race condition between the hypervisor 
and the scheduler.
No it's not, FLR from hypervisor is just to notify guest the hw VF 
FLR is about to start or was already executed, but host will do FLR 
anyway without waiting for guest too long




Then we have a major design issue in the SRIOV protocol and really 
need to question this.


How do you want to prevent a race between the hypervisor resetting the 
hardware and the client trying the same because of a timeout?


As far as I can see the procedure should be:
1. We detect that a reset is necessary, either because of a fault a 
timeout or signal from hypervisor.
2. For each of those potential reset sources a work item is send to 
the single workqueue.

3. One of those work items execute first and prepares the reset.
4. We either do the reset our self or notify the hypervisor that we 
are ready for the reset.

5. Cleanup after the reset, eventually resubmit jobs etc..
6. Cancel work items which might have been scheduled from other reset 
sources.


It does make sense that the hypervisor resets the hardware without 
waiting for the clients for too long, but if we don't follow this 
general steps we will always have a race between the different 
components.



Monk, just to add to this - if indeed as you say that 'FLR from 
hypervisor is just to notify guest the hw VF FLR is about to start or 
was already executed, but host will do FLR anyway without waiting for 
guest too long'
and there is no strict waiting from the hypervisor for 
IDH_READY_TO_RESET to be recived from guest before starting the reset 
then setting in_gpu_reset and locking reset_sem from guest side is not 
really full proof
protection from MMIO accesses by the guest - it only truly helps if 
hypervisor waits for that message before initiation of HW reset.


Andrey




Regards,
Christian.

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]

See the FLR request from the hypervisor is just another source of 
signaling the need for a reset, similar to each job timeout on each 
queue. Otherwise you have a race condition between the hypervisor 
and the scheduler.
No it's not, FLR from hypervisor is just to notify guest the hw VF 
FLR is about to start or was already executed, but host will do FLR 
anyway without waiting for guest too long


In other words I strongly think that the current SRIOV reset 
implementation is severely broken and what Andrey is doing is 
actually fixing it.

It makes the code to crash ... how could it be a fix ?

I'm afraid the patch is NAK from me,  but it is welcome if the 
cleanup do not ruin the logic, Andry or jingwen can try it if needed.


Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, January 4, 2022 6:19 PM
To: Chen, JingWen ; Christian König 
; Grodzovsky, Andrey 
; Deng, Emily ; Liu, 
Monk ; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Chen, Horace ; 
Chen, JingWen 

Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
protection for SRIOV


Hi Jingwen,

well what I mean is that we need to adjust the implementation in 
amdgpu to actually match the requirements.


Could be that the reset sequence is questionable in general, but I 
doubt so at least for now.


See the FLR request from the hypervisor is just another source of 
signaling the need for a reset, similar to each job timeout on each 
queue. Otherwise you have a race condition between the hypervisor and 
the scheduler.


Properly setting in_gpu_reset is indeed mandatory, but should happen 
at a central place and not in the SRIOV specific code.


In other words I strongly think that the current SRIOV reset 
implementation is severely broken and what Andrey is doing is 
actually fixing it.


Regards,
Christian.

Am 04.01.22 um 10:07 schrieb JingWen Chen:

Hi Christian,
I'm not sure what do you mean by "we need to change SRIOV not the 
driver".


Do you mean we should change the reset sequence in SRIOV? This will 
be a huge change for our SRIOV solution.


  From my point of view, we can directly use amdgpu_device_lock_adev
and amdgpu_device_unlock_adev in flr_work instead of try_lock since 
no one will conflict with this thread with reset_domain introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device 
untouched via user space.


Best Regards,
Jingwen Chen

On 2022/1/3 下午6:17, Christian König wrote:
Please don't. 

Re: [PATCH v2 08/11] lib: add support for device coherent type in test_hmm

2022-01-04 Thread Liam Howlett
* Alex Sierra  [211206 14:00]:
> Device Coherent type uses device memory that is coherently accesible by
> the CPU. This could be shown as SP (special purpose) memory range
> at the BIOS-e820 memory enumeration. If no SP memory is supported in
> system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP.
> 
> Currently, test_hmm only supports two different SP ranges of at least
> 256MB size. This could be specified in the kernel parameter variable
> efi_fake_mem. Ex. Two SP ranges of 1GB starting at 0x1 &
> 0x14000 physical address. Ex.
> efi_fake_mem=1G@0x1:0x4,1G@0x14000:0x4
> 
> Private and coherent device mirror instances can be created in the same
> probed. This is done by passing the module parameters spm_addr_dev0 &
> spm_addr_dev1. In this case, it will create four instances of
> device_mirror. The first two correspond to private device type, the
> last two to coherent type. Then, they can be easily accessed from user
> space through /dev/hmm_mirror. Usually num_device 0 and 1
> are for private, and 2 and 3 for coherent types. If no module
> parameters are passed, two instances of private type device_mirror will
> be created only.
> 
> Signed-off-by: Alex Sierra 
> ---
>  lib/test_hmm.c  | 252 +---
>  lib/test_hmm_uapi.h |  15 ++-
>  2 files changed, 198 insertions(+), 69 deletions(-)
> 
> diff --git a/lib/test_hmm.c b/lib/test_hmm.c
> index 9edeff52302e..a1985226d788 100644
> --- a/lib/test_hmm.c
> +++ b/lib/test_hmm.c
> @@ -29,11 +29,22 @@
>  
>  #include "test_hmm_uapi.h"
>  
> -#define DMIRROR_NDEVICES 2
> +#define DMIRROR_NDEVICES 4
>  #define DMIRROR_RANGE_FAULT_TIMEOUT  1000
>  #define DEVMEM_CHUNK_SIZE(256 * 1024 * 1024U)
>  #define DEVMEM_CHUNKS_RESERVE16
>  
> +/*
> + * For device_private pages, dpage is just a dummy struct page
> + * representing a piece of device memory. dmirror_devmem_alloc_page
> + * allocates a real system memory page as backing storage to fake a
> + * real device. zone_device_data points to that backing page. But
> + * for device_coherent memory, the struct page represents real
> + * physical CPU-accessible memory that we can use directly.
> + */
> +#define BACKING_PAGE(page) (is_device_private_page((page)) ? \
> +(page)->zone_device_data : (page))
> +
>  static unsigned long spm_addr_dev0;
>  module_param(spm_addr_dev0, long, 0644);
>  MODULE_PARM_DESC(spm_addr_dev0,
> @@ -122,6 +133,21 @@ static int dmirror_bounce_init(struct dmirror_bounce 
> *bounce,
>   return 0;
>  }
>  
> +static bool dmirror_is_private_zone(struct dmirror_device *mdevice)
> +{
> + return (mdevice->zone_device_type ==
> + HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ? true : false;
> +}
> +
> +static enum migrate_vma_direction
> + dmirror_select_device(struct dmirror *dmirror)
> +{
> + return (dmirror->mdevice->zone_device_type ==
> + HMM_DMIRROR_MEMORY_DEVICE_PRIVATE) ?
> + MIGRATE_VMA_SELECT_DEVICE_PRIVATE :
> + MIGRATE_VMA_SELECT_DEVICE_COHERENT;
> +}
> +
>  static void dmirror_bounce_fini(struct dmirror_bounce *bounce)
>  {
>   vfree(bounce->ptr);
> @@ -572,16 +598,19 @@ static int dmirror_allocate_chunk(struct dmirror_device 
> *mdevice,
>  static struct page *dmirror_devmem_alloc_page(struct dmirror_device *mdevice)
>  {
>   struct page *dpage = NULL;
> - struct page *rpage;
> + struct page *rpage = NULL;
>  
>   /*
> -  * This is a fake device so we alloc real system memory to store
> -  * our device memory.
> +  * For ZONE_DEVICE private type, this is a fake device so we alloc real
> +  * system memory to store our device memory.
> +  * For ZONE_DEVICE coherent type we use the actual dpage to store the 
> data
> +  * and ignore rpage.
>*/
> - rpage = alloc_page(GFP_HIGHUSER);
> - if (!rpage)
> - return NULL;
> -
> + if (dmirror_is_private_zone(mdevice)) {
> + rpage = alloc_page(GFP_HIGHUSER);
> + if (!rpage)
> + return NULL;
> + }
>   spin_lock(>lock);
>  
>   if (mdevice->free_pages) {
> @@ -601,7 +630,8 @@ static struct page *dmirror_devmem_alloc_page(struct 
> dmirror_device *mdevice)
>   return dpage;
>  
>  error:
> - __free_page(rpage);
> + if (rpage)
> + __free_page(rpage);
>   return NULL;
>  }
>  
> @@ -627,12 +657,15 @@ static void dmirror_migrate_alloc_and_copy(struct 
> migrate_vma *args,
>* unallocated pte_none() or read-only zero page.
>*/
>   spage = migrate_pfn_to_page(*src);
> + WARN(spage && is_zone_device_page(spage),
> +  "page already in device spage pfn: 0x%lx\n",
> +  page_to_pfn(spage));
>  
>   dpage = dmirror_devmem_alloc_page(mdevice);
>   if (!dpage)
>   continue;

Re: [PATCH] drm/amd/display: explicitly update clocks when DC is set to D3

2022-01-04 Thread Harry Wentland



On 2022-01-04 10:33, Mario Limonciello wrote:
> The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for
> hard hang on HPD") causes a regression in s0ix where the system will
> fail to resume properly.  This may be because an HPD was active the last
> time clocks were updated but clocks didn't get updated again during s0ix.
> 
> So add an extra call to update clocks as part of the suspend routine:
> dm_suspend->dc_set_power_state->clk_mgr_optimize_pwr_state
> 
> Cc: Qingqing Zhuo 
> BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215436
> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1821
> Fixes: 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD")
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index 91c4874473d6..6968ad6c5a64 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -3299,6 +3299,8 @@ void dc_set_power_state(
>   }
>  
>   break;
> + case DC_ACPI_CM_POWER_STATE_D3:
> + clk_mgr_optimize_pwr_state(dc, dc->clk_mgr);

We probably want a "fallthrough;" statement here.

Harry

>   default:
>   ASSERT(dc->current_state->stream_count == 0);
>   /* Zero out the current context so that on resume we start with


Re: [PATCH] drm/amdgpu: Delay unmapping MMIO VRAM to amdgpu_ttm_fini() in GPU initialization failure

2022-01-04 Thread Andrey Grodzovsky

On 2022-01-03 9:30 p.m., Leslie Shi wrote:

If the driver loads failed during hw_init(), delay unmapping MMIO VRAM to 
amdgpu_ttm_fini().
Its prevents accessing invalid memory address in vcn_v3_0_sw_fini().

Signed-off-by: Leslie Shi 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  4 
  2 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index ce93a304292c..d6006de57af5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3830,7 +3830,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return r;
  }
  
-static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev)

+static void amdgpu_device_unmap_mmio(struct amdgpu_device *adev, bool 
unmap_mmio_vram)
  {
/* Clear all CPU mappings pointing to this device */
unmap_mapping_range(adev->ddev.anon_inode->i_mapping, 0, 0, 1);
@@ -3840,9 +3840,12 @@ static void amdgpu_device_unmap_mmio(struct 
amdgpu_device *adev)
  
  	iounmap(adev->rmmio);

adev->rmmio = NULL;
-   if (adev->mman.aper_base_kaddr)
-   iounmap(adev->mman.aper_base_kaddr);
-   adev->mman.aper_base_kaddr = NULL;



Why only VRAM ? Why not register BAR above ? In general I don't see why 
not just follow

what i suggested here https://www.spinics.net/lists/amd-gfx/msg72217.html

Andrey



+
+   if (unmap_mmio_vram) {
+   if (adev->mman.aper_base_kaddr)
+   iounmap(adev->mman.aper_base_kaddr);
+   adev->mman.aper_base_kaddr = NULL;
+   }
  
  	/* Memory manager related */

if (!adev->gmc.xgmi.connected_to_cpu) {
@@ -3905,8 +3908,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
  
  	amdgpu_gart_dummy_page_fini(adev);
  
-	if (drm_dev_is_unplugged(adev_to_drm(adev)))

-   amdgpu_device_unmap_mmio(adev);
+   amdgpu_device_unmap_mmio(adev, drm_dev_is_unplugged(adev_to_drm(adev)));
  
  }
  
@@ -5727,7 +5729,7 @@ void amdgpu_device_halt(struct amdgpu_device *adev)
  
  	adev->no_hw_access = true;
  
-	amdgpu_device_unmap_mmio(adev);

+   amdgpu_device_unmap_mmio(adev, true);
  
  	pci_disable_device(pdev);

pci_wait_for_pending_transaction(pdev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 367abed1d6e6..67cd12caf019 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1815,6 +1815,10 @@ void amdgpu_ttm_fini(struct amdgpu_device *adev)
  NULL, NULL);
amdgpu_ttm_fw_reserve_vram_fini(adev);
  
+if (adev->mman.aper_base_kaddr)

+iounmap(adev->mman.aper_base_kaddr);
+adev->mman.aper_base_kaddr = NULL;
+
amdgpu_vram_mgr_fini(adev);
amdgpu_gtt_mgr_fini(adev);
amdgpu_preempt_mgr_fini(adev);


[PATCH] drm/amd/display: explicitly update clocks when DC is set to D3

2022-01-04 Thread Mario Limonciello
The WA from commit 5965280abd30 ("drm/amd/display: Apply w/a for
hard hang on HPD") causes a regression in s0ix where the system will
fail to resume properly.  This may be because an HPD was active the last
time clocks were updated but clocks didn't get updated again during s0ix.

So add an extra call to update clocks as part of the suspend routine:
dm_suspend->dc_set_power_state->clk_mgr_optimize_pwr_state

Cc: Qingqing Zhuo 
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=215436
BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1821
Fixes: 5965280abd30 ("drm/amd/display: Apply w/a for hard hang on HPD")
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/display/dc/core/dc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c 
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 91c4874473d6..6968ad6c5a64 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3299,6 +3299,8 @@ void dc_set_power_state(
}
 
break;
+   case DC_ACPI_CM_POWER_STATE_D3:
+   clk_mgr_optimize_pwr_state(dc, dc->clk_mgr);
default:
ASSERT(dc->current_state->stream_count == 0);
/* Zero out the current context so that on resume we start with
-- 
2.25.1



Re: [PATCH v6 4/6] drm: implement a method to free unused pages

2022-01-04 Thread Matthew Auld

On 26/12/2021 22:24, Arunpravin wrote:

On contiguous allocation, we round up the size
to the *next* power of 2, implement a function
to free the unused pages after the newly allocate block.

v2(Matthew Auld):
   - replace function name 'drm_buddy_free_unused_pages' with
 drm_buddy_block_trim
   - replace input argument name 'actual_size' with 'new_size'
   - add more validation checks for input arguments
   - add overlaps check to avoid needless searching and splitting
   - merged the below patch to see the feature in action
  - add free unused pages support to i915 driver
   - lock drm_buddy_block_trim() function as it calls mark_free/mark_split
 are all globally visible

v3(Matthew Auld):
   - remove trim method error handling as we address the failure case
 at drm_buddy_block_trim() function

v4:
   - in case of trim, at __alloc_range() split_block failure path
 marks the block as free and removes it from the original list,
 potentially also freeing it, to overcome this problem, we turn
 the drm_buddy_block_trim() input node into a temporary node to
 prevent recursively freeing itself, but still retain the
 un-splitting/freeing of the other nodes(Matthew Auld)

   - modify the drm_buddy_block_trim() function return type

Signed-off-by: Arunpravin 
---
  drivers/gpu/drm/drm_buddy.c   | 61 +++
  drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |  8 +++
  include/drm/drm_buddy.h   |  4 ++
  3 files changed, 73 insertions(+)

diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index eddc1eeda02e..855afcaf7edd 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -538,6 +538,67 @@ static int __drm_buddy_alloc_range(struct drm_buddy_mm *mm,
return __alloc_range(mm, , start, size, blocks);
  }
  
+/**

+ * drm_buddy_block_trim - free unused pages
+ *
+ * @mm: DRM buddy manager
+ * @new_size: original size requested
+ * @blocks: output list head to add allocated blocks
+ *
+ * For contiguous allocation, we round up the size to the nearest
+ * power of two value, drivers consume *actual* size, so remaining
+ * portions are unused and it can be freed.
+ */
+void drm_buddy_block_trim(struct drm_buddy_mm *mm,
+ u64 new_size,
+ struct list_head *blocks)


It might be better to just return the error, and let the user decide if 
they want to ignore it? Also we might want some kind of unit test for 
this, so having an actual return value might be useful there.



+{
+   struct drm_buddy_block *parent;
+   struct drm_buddy_block *block;
+   LIST_HEAD(dfs);
+   u64 new_start;
+   int err;
+
+   if (!list_is_singular(blocks))
+   return;
+
+   block = list_first_entry(blocks,
+struct drm_buddy_block,
+link);
+
+   if (!drm_buddy_block_is_allocated(block))
+   return;
+
+   if (new_size > drm_buddy_block_size(mm, block))
+   return;
+
+   if (!new_size && !IS_ALIGNED(new_size, mm->chunk_size))
+   return;
+
+   if (new_size == drm_buddy_block_size(mm, block))
+   return;
+
+   list_del(>link);
+   mark_free(mm, block);
+   mm->avail += drm_buddy_block_size(mm, block);
+
+   /* Prevent recursively freeing this node */
+   parent = block->parent;
+   block->parent = NULL;
+
+   new_start = drm_buddy_block_offset(block);
+   list_add(>tmp_link, );
+   err =  __alloc_range(mm, , new_start, new_size, blocks);
+   if (err) {
+   mark_allocated(block);
+   mm->avail -= drm_buddy_block_size(mm, block);
+   list_add(>link, blocks);
+   }
+
+   block->parent = parent;
+}
+EXPORT_SYMBOL(drm_buddy_block_trim);
+
  /**
   * drm_buddy_alloc - allocate power-of-two blocks
   *
diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c 
b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
index 7c58efb60dba..05f924f32e96 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.c
@@ -97,6 +97,14 @@ static int i915_ttm_buddy_man_alloc(struct 
ttm_resource_manager *man,
if (unlikely(err))
goto err_free_blocks;
  
+	if (place->flags & TTM_PL_FLAG_CONTIGUOUS) {

+   mutex_lock(>lock);
+   drm_buddy_block_trim(mm,
+   (u64)n_pages << PAGE_SHIFT,


AFAIK, n_pages has already been rounded up to the next power-of-two 
here, so this becomes a noop. I assume we need to use the "original" 
size here?



+   _res->blocks);
+   mutex_unlock(>lock);
+   }
+
*res = _res->base;
return 0;
  
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h

index f573b02304f4..703866a87939 100644
--- a/include/drm/drm_buddy.h
+++ 

Re: [PATCH v6 2/6] drm: improve drm_buddy_alloc function

2022-01-04 Thread Matthew Auld

On 26/12/2021 22:24, Arunpravin wrote:

- Make drm_buddy_alloc a single function to handle
   range allocation and non-range allocation demands

- Implemented a new function alloc_range() which allocates
   the requested power-of-two block comply with range limitations

- Moved order computation and memory alignment logic from
   i915 driver to drm buddy

v2:
   merged below changes to keep the build unbroken
- drm_buddy_alloc_range() becomes obsolete and may be removed
- enable ttm range allocation (fpfn / lpfn) support in i915 driver
- apply enhanced drm_buddy_alloc() function to i915 driver

v3(Matthew Auld):
   - Fix alignment issues and remove unnecessary list_empty check
   - add more validation checks for input arguments
   - make alloc_range() block allocations as bottom-up
   - optimize order computation logic
   - replace uint64_t with u64, which is preferred in the kernel

v4(Matthew Auld):
   - keep drm_buddy_alloc_range() function implementation for generic
 actual range allocations
   - keep alloc_range() implementation for end bias allocations

Signed-off-by: Arunpravin 





@@ -73,34 +83,16 @@ static int i915_ttm_buddy_man_alloc(struct 
ttm_resource_manager *man,
  
  	n_pages = size >> ilog2(mm->chunk_size);
  
-	do {

-   struct drm_buddy_block *block;
-   unsigned int order;
-
-   order = fls(n_pages) - 1;
-   GEM_BUG_ON(order > mm->max_order);
-   GEM_BUG_ON(order < min_order);
-
-   do {
-   mutex_lock(>lock);
-   block = drm_buddy_alloc(mm, order);
-   mutex_unlock(>lock);
-   if (!IS_ERR(block))
-   break;
-
-   if (order-- == min_order) {
-   err = -ENOSPC;
-   goto err_free_blocks;
-   }
-   } while (1);
-
-   n_pages -= BIT(order);
-
-   list_add_tail(>link, _res->blocks);
-
-   if (!n_pages)
-   break;
-   } while (1);
+   mutex_lock(>lock);
+   err = drm_buddy_alloc(mm, (u64)place->fpfn << PAGE_SHIFT,
+   (u64)place->lpfn << PAGE_SHIFT,


place->lpfn will currently always be zero for i915, AFAIK. I assume here 
we want s/place->lpfn/lpfn/?


Also something in this series is preventing i915 from loading on 
discrete devices, according to CI. Hopefully that is just the lpfn 
issue...which might explain seeing -EINVAL here[1] when allocating some 
vram for the firmware.


[1] 
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_21904/bat-dg1-6/boot0.txt




+   (u64)n_pages << PAGE_SHIFT,
+min_page_size,
+_res->blocks,
+bman_res->flags);
+   mutex_unlock(>lock);
+   if (unlikely(err))
+   goto err_free_blocks;
  
  	*res = _res->base;

return 0;
@@ -266,10 +258,17 @@ int i915_ttm_buddy_man_reserve(struct 
ttm_resource_manager *man,
  {
struct i915_ttm_buddy_manager *bman = to_buddy_manager(man);
struct drm_buddy_mm *mm = >mm;
+   unsigned long flags = 0;
int ret;
  
+	flags |= DRM_BUDDY_RANGE_ALLOCATION;

+
mutex_lock(>lock);
-   ret = drm_buddy_alloc_range(mm, >reserved, start, size);
+   ret = drm_buddy_alloc(mm, start,
+   start + size,
+   size, mm->chunk_size,
+   >reserved,
+   flags);
mutex_unlock(>lock);
  
  	return ret;

diff --git a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h 
b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
index fa644b512c2e..5ba490875f66 100644
--- a/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
+++ b/drivers/gpu/drm/i915/i915_ttm_buddy_manager.h
@@ -20,6 +20,7 @@ struct drm_buddy_mm;
   *
   * @base: struct ttm_resource base class we extend
   * @blocks: the list of struct i915_buddy_block for this resource/allocation
+ * @flags: DRM_BUDDY_*_ALLOCATION flags
   * @mm: the struct i915_buddy_mm for this resource
   *
   * Extends the struct ttm_resource to manage an address space allocation with
@@ -28,6 +29,7 @@ struct drm_buddy_mm;
  struct i915_ttm_buddy_resource {
struct ttm_resource base;
struct list_head blocks;
+   unsigned long flags;
struct drm_buddy_mm *mm;
  };
  
diff --git a/include/drm/drm_buddy.h b/include/drm/drm_buddy.h

index 09d73328c268..4368acaad222 100644
--- a/include/drm/drm_buddy.h
+++ b/include/drm/drm_buddy.h
@@ -13,15 +13,22 @@
  
  #include 
  
-#define range_overflows(start, size, max) ({ \

+#define check_range_overflow(start, end, size, max) ({ \
typeof(start) start__ = (start); \
+   typeof(end) end__ = (end);\
typeof(size) size__ = (size); \
typeof(max) max__ = (max); \
(void)(__ == __); \
(void)(__ == __); \
-

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about 
to start or was already executed, but host will do FLR anyway without waiting 
for guest too long



Then we have a major design issue in the SRIOV protocol and really need 
to question this.


How do you want to prevent a race between the hypervisor resetting the 
hardware and the client trying the same because of a timeout?


As far as I can see the procedure should be:
1. We detect that a reset is necessary, either because of a fault a 
timeout or signal from hypervisor.
2. For each of those potential reset sources a work item is send to the 
single workqueue.

3. One of those work items execute first and prepares the reset.
4. We either do the reset our self or notify the hypervisor that we are 
ready for the reset.

5. Cleanup after the reset, eventually resubmit jobs etc..
6. Cancel work items which might have been scheduled from other reset 
sources.


It does make sense that the hypervisor resets the hardware without 
waiting for the clients for too long, but if we don't follow this 
general steps we will always have a race between the different components.


Regards,
Christian.

Am 04.01.22 um 11:49 schrieb Liu, Monk:

[AMD Official Use Only]


See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about 
to start or was already executed, but host will do FLR anyway without waiting 
for guest too long


In other words I strongly think that the current SRIOV reset implementation is 
severely broken and what Andrey is doing is actually fixing it.

It makes the code to crash ... how could it be a fix ?

I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do not 
ruin the logic, Andry or jingwen can try it if needed.

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, January 4, 2022 6:19 PM
To: Chen, JingWen ; Christian König ; Grodzovsky, 
Andrey ; Deng, Emily ; Liu, Monk ; 
dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Chen, Horace ; Chen, JingWen 

Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

Hi Jingwen,

well what I mean is that we need to adjust the implementation in amdgpu to 
actually match the requirements.

Could be that the reset sequence is questionable in general, but I doubt so at 
least for now.

See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

Properly setting in_gpu_reset is indeed mandatory, but should happen at a 
central place and not in the SRIOV specific code.

In other words I strongly think that the current SRIOV reset implementation is 
severely broken and what Andrey is doing is actually fixing it.

Regards,
Christian.

Am 04.01.22 um 10:07 schrieb JingWen Chen:

Hi Christian,
I'm not sure what do you mean by "we need to change SRIOV not the driver".

Do you mean we should change the reset sequence in SRIOV? This will be a huge 
change for our SRIOV solution.

  From my point of view, we can directly use amdgpu_device_lock_adev
and amdgpu_device_unlock_adev in flr_work instead of try_lock since no one will 
conflict with this thread with reset_domain introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device untouched 
via user space.

Best Regards,
Jingwen Chen

On 2022/1/3 下午6:17, Christian König wrote:

Please don't. This patch is vital to the cleanup of the reset procedure.

If SRIOV doesn't work with that we need to change SRIOV and not the driver.

Christian.

Am 30.12.21 um 19:45 schrieb Andrey Grodzovsky:

Sure, I guess i can drop this patch then.

Andrey

On 2021-12-24 4:57 a.m., JingWen Chen wrote:

I do agree with shaoyun, if the host find the gpu engine hangs first, and do 
the flr, guest side thread may not know this and still try to access HW(e.g. 
kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset 
status). And this may lead to very bad result.

On 2021/12/24 下午4:58, Deng, Emily wrote:

These patches look good to me. JingWen 

RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Liu, Monk
[AMD Official Use Only]

>> See the FLR request from the hypervisor is just another source of signaling 
>> the need for a reset, similar to each job timeout on each queue. Otherwise 
>> you have a race condition between the hypervisor and the scheduler.
No it's not, FLR from hypervisor is just to notify guest the hw VF FLR is about 
to start or was already executed, but host will do FLR anyway without waiting 
for guest too long

>> In other words I strongly think that the current SRIOV reset implementation 
>> is severely broken and what Andrey is doing is actually fixing it.
It makes the code to crash ... how could it be a fix ?

I'm afraid the patch is NAK from me,  but it is welcome if the cleanup do not 
ruin the logic, Andry or jingwen can try it if needed.

Thanks 
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian  
Sent: Tuesday, January 4, 2022 6:19 PM
To: Chen, JingWen ; Christian König 
; Grodzovsky, Andrey 
; Deng, Emily ; Liu, Monk 
; dri-de...@lists.freedesktop.org; 
amd-gfx@lists.freedesktop.org; Chen, Horace ; Chen, 
JingWen 
Cc: dan...@ffwll.ch
Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection 
for SRIOV

Hi Jingwen,

well what I mean is that we need to adjust the implementation in amdgpu to 
actually match the requirements.

Could be that the reset sequence is questionable in general, but I doubt so at 
least for now.

See the FLR request from the hypervisor is just another source of signaling the 
need for a reset, similar to each job timeout on each queue. Otherwise you have 
a race condition between the hypervisor and the scheduler.

Properly setting in_gpu_reset is indeed mandatory, but should happen at a 
central place and not in the SRIOV specific code.

In other words I strongly think that the current SRIOV reset implementation is 
severely broken and what Andrey is doing is actually fixing it.

Regards,
Christian.

Am 04.01.22 um 10:07 schrieb JingWen Chen:
> Hi Christian,
> I'm not sure what do you mean by "we need to change SRIOV not the driver".
>
> Do you mean we should change the reset sequence in SRIOV? This will be a huge 
> change for our SRIOV solution.
>
>  From my point of view, we can directly use amdgpu_device_lock_adev 
> and amdgpu_device_unlock_adev in flr_work instead of try_lock since no one 
> will conflict with this thread with reset_domain introduced.
> But we do need the reset_sem and adev->in_gpu_reset to keep device untouched 
> via user space.
>
> Best Regards,
> Jingwen Chen
>
> On 2022/1/3 下午6:17, Christian König wrote:
>> Please don't. This patch is vital to the cleanup of the reset procedure.
>>
>> If SRIOV doesn't work with that we need to change SRIOV and not the driver.
>>
>> Christian.
>>
>> Am 30.12.21 um 19:45 schrieb Andrey Grodzovsky:
>>> Sure, I guess i can drop this patch then.
>>>
>>> Andrey
>>>
>>> On 2021-12-24 4:57 a.m., JingWen Chen wrote:
 I do agree with shaoyun, if the host find the gpu engine hangs first, and 
 do the flr, guest side thread may not know this and still try to access 
 HW(e.g. kfd is using a lot of amdgpu_in_reset and reset_sem to identify 
 the reset status). And this may lead to very bad result.

 On 2021/12/24 下午4:58, Deng, Emily wrote:
> These patches look good to me. JingWen will pull these patches and do 
> some basic TDR test on sriov environment, and give feedback.
>
> Best wishes
> Emily Deng
>
>
>
>> -Original Message-
>> From: Liu, Monk 
>> Sent: Thursday, December 23, 2021 6:14 PM
>> To: Koenig, Christian ; Grodzovsky, 
>> Andrey ; 
>> dri-de...@lists.freedesktop.org; amd- g...@lists.freedesktop.org; 
>> Chen, Horace ; Chen, JingWen 
>> ; Deng, Emily 
>> Cc: dan...@ffwll.ch
>> Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
>> protection for SRIOV
>>
>> [AMD Official Use Only]
>>
>> @Chen, Horace @Chen, JingWen @Deng, Emily
>>
>> Please take a review on Andrey's patch
>>
>> Thanks
>> -
>> -- Monk Liu | Cloud GPU & Virtualization Solution | AMD
>> -
>> -- we are hiring software manager for CVS core team
>> -
>> --
>>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Thursday, December 23, 2021 4:42 PM
>> To: Grodzovsky, Andrey ; dri- 
>> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
>> Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace 
>> 
>> 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread Christian König

Hi Jingwen,

well what I mean is that we need to adjust the implementation in amdgpu 
to actually match the requirements.


Could be that the reset sequence is questionable in general, but I doubt 
so at least for now.


See the FLR request from the hypervisor is just another source of 
signaling the need for a reset, similar to each job timeout on each 
queue. Otherwise you have a race condition between the hypervisor and 
the scheduler.


Properly setting in_gpu_reset is indeed mandatory, but should happen at 
a central place and not in the SRIOV specific code.


In other words I strongly think that the current SRIOV reset 
implementation is severely broken and what Andrey is doing is actually 
fixing it.


Regards,
Christian.

Am 04.01.22 um 10:07 schrieb JingWen Chen:

Hi Christian,
I'm not sure what do you mean by "we need to change SRIOV not the driver".

Do you mean we should change the reset sequence in SRIOV? This will be a huge 
change for our SRIOV solution.

 From my point of view, we can directly use
amdgpu_device_lock_adev and amdgpu_device_unlock_adev in flr_work instead of 
try_lock since no one will conflict with this thread with reset_domain 
introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device untouched 
via user space.

Best Regards,
Jingwen Chen

On 2022/1/3 下午6:17, Christian König wrote:

Please don't. This patch is vital to the cleanup of the reset procedure.

If SRIOV doesn't work with that we need to change SRIOV and not the driver.

Christian.

Am 30.12.21 um 19:45 schrieb Andrey Grodzovsky:

Sure, I guess i can drop this patch then.

Andrey

On 2021-12-24 4:57 a.m., JingWen Chen wrote:

I do agree with shaoyun, if the host find the gpu engine hangs first, and do 
the flr, guest side thread may not know this and still try to access HW(e.g. 
kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset 
status). And this may lead to very bad result.

On 2021/12/24 下午4:58, Deng, Emily wrote:

These patches look good to me. JingWen will pull these patches and do some 
basic TDR test on sriov environment, and give feedback.

Best wishes
Emily Deng




-Original Message-
From: Liu, Monk 
Sent: Thursday, December 23, 2021 6:14 PM
To: Koenig, Christian ; Grodzovsky, Andrey
; dri-de...@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; Chen, Horace ; Chen,
JingWen ; Deng, Emily 
Cc: dan...@ffwll.ch
Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

[AMD Official Use Only]

@Chen, Horace @Chen, JingWen @Deng, Emily

Please take a review on Andrey's patch

Thanks
---
Monk Liu | Cloud GPU & Virtualization Solution | AMD
---
we are hiring software manager for CVS core team
---

-Original Message-
From: Koenig, Christian 
Sent: Thursday, December 23, 2021 4:42 PM
To: Grodzovsky, Andrey ; dri-
de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace

Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection
for SRIOV

Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:

Since now flr work is serialized against  GPU resets there is no need
for this.

Signed-off-by: Andrey Grodzovsky 

Acked-by: Christian König 


---
    drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
    drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
    2 files changed, 22 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 487cd654b69e..7d59a66e3988 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

    struct amdgpu_device *adev = container_of(virt, struct

amdgpu_device, virt);

    int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;

-    /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
- * otherwise the mailbox msg will be ruined/reseted by
- * the VF FLR.
- */
-    if (!down_write_trylock(>reset_sem))
-    return;
-
    amdgpu_virt_fini_data_exchange(adev);
-    atomic_set(>in_gpu_reset, 1);

    xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);

@@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct

work_struct *work)

    } while (timeout > 1);

    flr_done:
-    atomic_set(>in_gpu_reset, 0);
-    up_write(>reset_sem);
-
    /* Trigger recovery for world switch failure if no TDR */
    if (amdgpu_device_should_recover_gpu(adev)
    && (!amdgpu_device_has_job_running(adev) || diff --git
a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
index e3869067a31d..f82c066c8e8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
@@ 

Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset protection for SRIOV

2022-01-04 Thread JingWen Chen
Hi Christian,
I'm not sure what do you mean by "we need to change SRIOV not the driver".

Do you mean we should change the reset sequence in SRIOV? This will be a huge 
change for our SRIOV solution.

>From my point of view, we can directly use
amdgpu_device_lock_adev and amdgpu_device_unlock_adev in flr_work instead of 
try_lock since no one will conflict with this thread with reset_domain 
introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device untouched 
via user space.

Best Regards,
Jingwen Chen

On 2022/1/3 下午6:17, Christian König wrote:
> Please don't. This patch is vital to the cleanup of the reset procedure.
>
> If SRIOV doesn't work with that we need to change SRIOV and not the driver.
>
> Christian.
>
> Am 30.12.21 um 19:45 schrieb Andrey Grodzovsky:
>> Sure, I guess i can drop this patch then.
>>
>> Andrey
>>
>> On 2021-12-24 4:57 a.m., JingWen Chen wrote:
>>> I do agree with shaoyun, if the host find the gpu engine hangs first, and 
>>> do the flr, guest side thread may not know this and still try to access 
>>> HW(e.g. kfd is using a lot of amdgpu_in_reset and reset_sem to identify the 
>>> reset status). And this may lead to very bad result.
>>>
>>> On 2021/12/24 下午4:58, Deng, Emily wrote:
 These patches look good to me. JingWen will pull these patches and do some 
 basic TDR test on sriov environment, and give feedback.

 Best wishes
 Emily Deng



> -Original Message-
> From: Liu, Monk 
> Sent: Thursday, December 23, 2021 6:14 PM
> To: Koenig, Christian ; Grodzovsky, Andrey
> ; dri-de...@lists.freedesktop.org; amd-
> g...@lists.freedesktop.org; Chen, Horace ; Chen,
> JingWen ; Deng, Emily 
> Cc: dan...@ffwll.ch
> Subject: RE: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
> protection
> for SRIOV
>
> [AMD Official Use Only]
>
> @Chen, Horace @Chen, JingWen @Deng, Emily
>
> Please take a review on Andrey's patch
>
> Thanks
> ---
> Monk Liu | Cloud GPU & Virtualization Solution | AMD
> ---
> we are hiring software manager for CVS core team
> ---
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Thursday, December 23, 2021 4:42 PM
> To: Grodzovsky, Andrey ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org
> Cc: dan...@ffwll.ch; Liu, Monk ; Chen, Horace
> 
> Subject: Re: [RFC v2 8/8] drm/amd/virt: Drop concurrent GPU reset 
> protection
> for SRIOV
>
> Am 22.12.21 um 23:14 schrieb Andrey Grodzovsky:
>> Since now flr work is serialized against  GPU resets there is no need
>> for this.
>>
>> Signed-off-by: Andrey Grodzovsky 
> Acked-by: Christian König 
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 11 ---
>>    drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 11 ---
>>    2 files changed, 22 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> index 487cd654b69e..7d59a66e3988 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> @@ -248,15 +248,7 @@ static void xgpu_ai_mailbox_flr_work(struct
> work_struct *work)
>>    struct amdgpu_device *adev = container_of(virt, struct
> amdgpu_device, virt);
>>    int timeout = AI_MAILBOX_POLL_FLR_TIMEDOUT;
>>
>> -    /* block amdgpu_gpu_recover till msg FLR COMPLETE received,
>> - * otherwise the mailbox msg will be ruined/reseted by
>> - * the VF FLR.
>> - */
>> -    if (!down_write_trylock(>reset_sem))
>> -    return;
>> -
>>    amdgpu_virt_fini_data_exchange(adev);
>> -    atomic_set(>in_gpu_reset, 1);
>>
>>    xgpu_ai_mailbox_trans_msg(adev, IDH_READY_TO_RESET, 0, 0, 0);
>>
>> @@ -269,9 +261,6 @@ static void xgpu_ai_mailbox_flr_work(struct
> work_struct *work)
>>    } while (timeout > 1);
>>
>>    flr_done:
>> -    atomic_set(>in_gpu_reset, 0);
>> -    up_write(>reset_sem);
>> -
>>    /* Trigger recovery for world switch failure if no TDR */
>>    if (amdgpu_device_should_recover_gpu(adev)
>>    && (!amdgpu_device_has_job_running(adev) || diff --git
>> a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> index e3869067a31d..f82c066c8e8d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c
>> @@ -277,15 +277,7 @@ static void xgpu_nv_mailbox_flr_work(struct
> work_struct *work)
>>    struct amdgpu_device *adev = container_of(virt, struct
> 

RE: [PATCH V3 01/12] drm/amdgpu: Unify ras block interface for each ras block

2022-01-04 Thread Zhou1, Tao
[AMD Official Use Only]

The series is:

Reviewed-by: Tao Zhou 

Please make sure basic RAS tests are successful before submit the series.

> -Original Message-
> From: Chai, Thomas 
> Sent: Wednesday, December 29, 2021 2:32 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Chai, Thomas ; Zhang, Hawking
> ; Zhou1, Tao ; Clements,
> John ; Chai, Thomas 
> Subject: [PATCH V3 01/12] drm/amdgpu: Unify ras block interface for each ras
> block
> 
> 1. Define unified ops interface for each block.
> 2. Add ras_block_match function pointer in ops interface, each ras block can
> customize specail match function to identify itself.
> 3. Add amdgpu_ras_block_match_default new function. If a ras block doesn't
> define .ras_block_match, default execute amdgpu_ras_block_match_default to
> identify this ras block.
> 4. Define unified basic ras block data for each ras block.
> 5. Create dedicated amdgpu device ras block link list to manage all of the ras
> blocks.
> 6. Add amdgpu_ras_register_ras_block new function interface for each ras block
> to register itself to ras controlling block.
> 
> Signed-off-by: yipechai 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 46 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h| 28 +
>  4 files changed, 78 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index db1505455761..eddf230856e2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1151,6 +1151,8 @@ struct amdgpu_device {
>   boolbarrier_has_auto_waitcnt;
> 
>   struct amdgpu_reset_control *reset_cntl;
> +
> + struct list_headras_list;
>  };
> 
>  static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev) diff
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 73ec46140d68..0980396ee709 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3578,6 +3578,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
> 
>   INIT_LIST_HEAD(>reset_list);
> 
> + INIT_LIST_HEAD(>ras_list);
> +
>   INIT_DELAYED_WORK(>delayed_init_work,
> amdgpu_device_delayed_init_work_handler);
>   INIT_DELAYED_WORK(>gfx.gfx_off_delay_work,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 90f0db3b4f65..9dd698354e04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -862,6 +862,40 @@ static int amdgpu_ras_enable_all_features(struct
> amdgpu_device *adev,  }
>  /* feature ctl end */
> 
> +int amdgpu_ras_block_match_default(struct amdgpu_ras_block_object*
> +block_obj, enum amdgpu_ras_block block) {
> + if(!block_obj)
> + return -EINVAL;
> +
> + if (block_obj->block == block)
> + return 0;
> +
> + return -EINVAL;
> +}
> +
> +static struct amdgpu_ras_block_object* amdgpu_ras_get_ras_block(struct
> amdgpu_device *adev,
> + enum amdgpu_ras_block block,
> uint32_t sub_block_index) {
> + struct amdgpu_ras_block_object *obj, *tmp;
> +
> + if (block >= AMDGPU_RAS_BLOCK__LAST)
> + return NULL;
> +
> + if (!amdgpu_ras_is_supported(adev, block))
> + return NULL;
> +
> + list_for_each_entry_safe(obj, tmp, >ras_list, node) {
> + if (obj->ras_block_match) {
> + if (obj->ras_block_match(obj, block, sub_block_index)
> == 0)
> + return obj;
> + } else {
> + if (amdgpu_ras_block_match_default(obj, block) == 0)
> + return obj;
> + }
> + }
> +
> + return NULL;
> +}
> 
>  void amdgpu_ras_mca_query_error_status(struct amdgpu_device *adev,
>  struct ras_common_if *ras_block, @@ -
> 2739,3 +2773,15 @@ static void
> amdgpu_register_bad_pages_mca_notifier(void)
>  }
>  }
>  #endif
> +/* Register each ip ras block into amdgpu ras */ int
> +amdgpu_ras_register_ras_block(struct amdgpu_device *adev,
> + struct amdgpu_ras_block_object* ras_block_obj) {
> + if (!adev || !ras_block_obj)
> + return -EINVAL;
> +
> + INIT_LIST_HEAD(_block_obj->node);
> + list_add_tail(_block_obj->node, >ras_list);
> +
> + return 0;
> +}
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> index cdd0010a5389..9dbe8d49b891 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
> @@ -469,6 +469,33 @@ struct ras_debug_if {
>   };
>   int op;
>  };
> +
> +struct amdgpu_ras_block_object {
> +