[bug report] drm/amd/display: Reverted DSC programming sequence change

2022-09-15 Thread Dan Carpenter
Hello Nagulendran, Iswara,

The patch c7783a6ed4fc: "drm/amd/display: Reverted DSC programming
sequence change" from Aug 23, 2022, leads to the following Smatch
static checker warning:

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c:4310 
core_link_enable_stream()
warn: if statement not indented

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link.c
4299 
4300 if (pipe_ctx->stream->dpms_off)
4301 return;
4302 
4303 /* Have to setup DSC before DIG FE and BE are 
connected (which happens before the
4304  * link training). This is to make sure the bandwidth 
sent to DIG BE won't be
4305  * bigger than what the link and/or DIG BE can handle. 
VBID[6]/CompressedStream_flag
4306  * will be automatically set at a later time when the 
video is enabled
4307  * (DP_VID_STREAM_EN = 1).
4308  */
4309 if (pipe_ctx->stream->timing.flags.DSC) {
--> 4310 if (dc_is_dp_signal(pipe_ctx->stream->signal) 
||
4311 
dc_is_virtual_signal(pipe_ctx->stream->signal))
4312 dp_set_dsc_enable(pipe_ctx, true);
4313 
4314 }

This seems likes a bug?  Like a line was deleted.  Or should it be:

if (pipe_ctx->stream->timing.flags.DSC &&
(dc_is_dp_signal(pipe_ctx->stream->signal) ||
 dc_is_virtual_signal(pipe_ctx->stream->signal)))
dp_set_dsc_enable(pipe_ctx, true);

4315 
4316 status = enable_link(state, pipe_ctx);
4317 
4318 if (status != DC_OK) {
4319 DC_LOG_WARNING("enabling link %u failed: %d\n",
4320 pipe_ctx->stream->link->link_index,
4321 status);
4322 
4323 /* Abort stream enable *unless* the failure 
was due to
4324  * DP link training - some DP monitors will 
recover and
4325  * show the stream anyway. But MST displays 
can't proceed

regards,
dan carpenter


Re: [PATCH 2/2] drm/amdgpu: move enable irq later to avoid race with ih resume

2022-09-15 Thread Lazar, Lijo




On 9/15/2022 12:08 PM, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Lijo,

IH resume was added to resolve an issue found during mode2 bring up on sienna 
cichlid:
- close down mode2 reset and do a mode1 reset first
- open mode2 reset and do a mode2 reset. Mode2 reset was found fail in this 
case.

Resume IH helps in this case



Sorry, what do you mean by 'close down' mode2 /'open mode2 reset'? Do 
you mean if mode-1 reset is done first, a subsequent mode-2 reset 
doesn't work without IH resume?


Thanks,
Lijo



Thanks,
Victor



-Original Message-
From: Lazar, Lijo 
Sent: Thursday, September 15, 2022 1:58 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey 

Subject: Re: [PATCH 2/2] drm/amdgpu: move enable irq later to avoid race with 
ih resume



On 9/14/2022 3:40 PM, Victor Zhao wrote:

[background]
On current sienna cichlid mode2 reset, on the slow job hang cases,
since page table context was reverted to completely stop gpu, it will
generate page fault interrupt.

Since the irq are open during recovery stage, during ih resume step,
if this interrupt was in processing, which increased ih ring rptr, and
ih resume meanwhile will set rptr and wptr to 0. This may cause


AFAIK, only GFX/SDMA are affected by mode-2. IH is not suspended before mode-2. 
Why do you resume IH after mode-2 when it is not suspended? Is it a special 
case for virtualization?

Thanks,
Lijo


rptr greater than wptr. Such case was not handled in ih process, and
it will cause rptr continue increasing util reaches the max.
Such case will make fence fallback situation happen.

[how]
Move the enable of irq after ih resumed and before ib test.
Adjusting the position of enable irq on other reset paths accordingly.

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 8 
   drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 1 +
   2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index c0cfae52f12b..0b658225e9ef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4625,8 +4625,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_fence_driver_force_completion(ring);
}
   
-	amdgpu_fence_driver_isr_toggle(adev, false);

-
if (job && job->vm)
drm_sched_increase_karma(&job->base);
   
@@ -4758,6 +4756,10 @@ int amdgpu_do_asic_reset(struct list_head *device_list_handle,

test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
skip_hw_reset = test_bit(AMDGPU_SKIP_HW_RESET,
&reset_context->flags);
   
+	list_for_each_entry (tmp_adev, device_list_handle, reset_list) {

+   amdgpu_fence_driver_isr_toggle(tmp_adev, false);
+   }
+
/*
 * ASIC reset has to be done on all XGMI hive nodes ASAP
 * to allow proper links negotiation in FW (within 1 sec) @@
-5031,8 +5033,6 @@ static void amdgpu_device_recheck_guilty_jobs(
/* Clear this failed job from fence array */
amdgpu_fence_driver_clear_job_fences(ring);
   
-			amdgpu_fence_driver_isr_toggle(adev, false);

-
/* Since the job won't signal and we go for
 * another resubmit drop this parent pointer
 */
diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
index 7aa570c1ce4a..953036482d1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
+++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
@@ -240,6 +240,7 @@ sienna_cichlid_mode2_restore_hwcontext(struct 
amdgpu_reset_control *reset_ctl,
* Add this ASIC as tracked as reset was already
* complete successfully.
*/
+   amdgpu_fence_driver_isr_toggle(tmp_adev, false);
amdgpu_register_gpu_instance(tmp_adev);
   
   	/* Resume RAS */




[PATCH 4/4] drm/amdgpu: Implement OS triggered MCBP(v4)

2022-09-15 Thread jiadong.zhu
From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c |  91 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h |  29 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 153 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  22 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  27 
 9 files changed, 340 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 85224bc81ce5..24c5aa19bbf2 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
-   amdgpu_sw_ring.o amdgpu_ring_mux.o
+   amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
 
+   amdgpu_ring_ib_begin(ring);
if (job && ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
 
@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
ring->funcs->emit_wave_limit(ring, false);
 
+   amdgpu_ring_ib_end(ring);
amdgpu_ring_commit(ring);
return 0;
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
new file mode 100644
index ..4b0aae1a7ad6
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
@@ -0,0 +1,91 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "amdgpu.h"
+#include "amdgpu_mcbp.h"
+#include "amdgpu_ring.h"
+
+/* trigger mcbp and find if we need resubmit */
+int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux)
+{
+   struct amdgpu_mux_entry *e;
+   struct amdgpu_ring *ring = NULL;
+   int i;
+
+   spin_lock(&mux->lock);
+
+   amdgpu_ring_preempt_ib(mux->real_ring);
+
+   ring = NULL;
+   for (i = 0; i < mux->num_ring_entries; i++) {
+   e = &mux->ring_entry[i];
+   if (e->ring->hw_prio <= AMDGPU_RING_PRIO_DEFAULT) {
+   ring = e->ring;
+   break;
+   }
+   }
+
+   if (!ring) {
+   DRM_ERROR("cannot find low priority ring\n");
+   return -ENOENT;
+ 

RE: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Zhao, Victor
[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a lot of 
containers submitting gfx jobs. We have advanced tdr mode and mode2 reset 
enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx pending list 
maybe signaled after drm_sched_stop. So they will not be removed from pending 
list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and 
removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be resubmitted. 
Since it still has signaled bit, drm_sched_job_done will be called directly. 
This decrease the hw_rq_count which allows more jobs emitted but did not clean 
fence_drv rcu ptr. 
This results in an overflow in the fence_drv. Since we will use num_fences_mask 
in amdgpu_fence_process, when overflow happens, the signal of some job will be 
skipped which result in an infinite wait for the fence_drv rcu ptr.

So close irq before sched_stop could avoid signal jobs after drm_sched_stop. 
And signal job one by one in fence_process instead of using a mask will handle 
the overflow situation.

Another fix could be skip submitting jobs which already signaled during 
resubmit stage, which may look cleaner.

Please help give some advice.


Thanks,
Victor



-Original Message-
From: Koenig, Christian  
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; 
Grodzovsky, Andrey 
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:
> [AMD Official Use Only - General]
>
> Ping.
>
> Hi @Koenig, Christian and @Grodzovsky, Andrey,
>
> We found some reset related issues during stress test on the sequence. Please 
> help give some comments.
>
>
> Thanks,
> Victor
>
>
>
> -Original Message-
> From: Victor Zhao 
> Sent: Wednesday, September 14, 2022 6:10 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deng, Emily ; Grodzovsky, Andrey 
> ; Zhao, Victor 
> Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow
>
> [background]
> For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from 
> another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. 
> The signal bit will not be cleared.
>
> At the resubmit stage after recovery, the job with hw fence signaled bit set 
> will call job done directly instead go through fence process.
> This makes the hw_rq_count decrease but rcu fence pointer not cleared yet.
>
> Then overflow happens in the fence driver slots and some jobs may be skipped 
> and leave the rcu pointer not cleared which makes an infinite wait for the 
> slot on the next fence emitted.
>
> This infinite wait cause a job timeout on the emitting job. And driver will 
> stuck at the its sched stop step because kthread_park cannot be done.
>
> [how]
> 1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt 
> before drm sched stop 2. handle all fences in fence process to aviod 
> skip when overflow happens
>
> Signed-off-by: Victor Zhao 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  6 +-
>   2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 943c9e750575..c0cfae52f12b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
> *adev,
>   amdgpu_virt_fini_data_exchange(adev);
>   }
>   
> - amdgpu_fence_driver_isr_toggle(adev, true);
> -
>   /* block all schedulers and reset given job's ring */
>   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 
> @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
> amdgpu_device_ip_need_full_reset(tmp_adev))
>   amdgpu_ras_suspend(tmp_adev);
>   
> + amdgpu_fence_driver_isr_toggle(tmp_adev, true);
> +
>   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   struct amdgpu_ring *ring = tmp_adev->rings[i];
>   
> @@ -5228,8 +5228,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device 
> *adev,
>   atomic_inc(&tmp_adev->gpu_reset_counter);
>   }
>   
> - if (need_emergency_restart)
> + if (need_emergency_restart) {
> + list_for_each_entry (tmp_adev, device_list_handle, reset_list) {
> + amdgpu_fence_driver_isr_toggle(tmp_adev, false);
> + }
>   goto skip_sched_resume;
> + }
>   
>   /*
>* Must check guilty signal here since after this point all old @@ 
> -5240,6 +5244,9 @@ int amdgpu_device_gpu_recover(struct amdgpu_d

RE: [PATCH 2/2] drm/amdgpu: move enable irq later to avoid race with ih resume

2022-09-15 Thread Zhao, Victor
[AMD Official Use Only - General]

Hi Lijo,

Yes, this is what I observed in sienna cichlid. 


Thanks,
Victor



-Original Message-
From: Lazar, Lijo  
Sent: Thursday, September 15, 2022 4:00 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey 

Subject: Re: [PATCH 2/2] drm/amdgpu: move enable irq later to avoid race with 
ih resume



On 9/15/2022 12:08 PM, Zhao, Victor wrote:
> [AMD Official Use Only - General]
> 
> Hi Lijo,
> 
> IH resume was added to resolve an issue found during mode2 bring up on sienna 
> cichlid:
> - close down mode2 reset and do a mode1 reset first
> - open mode2 reset and do a mode2 reset. Mode2 reset was found fail in this 
> case.
> 
> Resume IH helps in this case
> 

Sorry, what do you mean by 'close down' mode2 /'open mode2 reset'? Do you mean 
if mode-1 reset is done first, a subsequent mode-2 reset doesn't work without 
IH resume?

Thanks,
Lijo

> 
> Thanks,
> Victor
> 
> 
> 
> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, September 15, 2022 1:58 PM
> To: Zhao, Victor ; amd-gfx@lists.freedesktop.org
> Cc: Deng, Emily ; Grodzovsky, Andrey 
> 
> Subject: Re: [PATCH 2/2] drm/amdgpu: move enable irq later to avoid 
> race with ih resume
> 
> 
> 
> On 9/14/2022 3:40 PM, Victor Zhao wrote:
>> [background]
>> On current sienna cichlid mode2 reset, on the slow job hang cases, 
>> since page table context was reverted to completely stop gpu, it will 
>> generate page fault interrupt.
>>
>> Since the irq are open during recovery stage, during ih resume step, 
>> if this interrupt was in processing, which increased ih ring rptr, 
>> and ih resume meanwhile will set rptr and wptr to 0. This may cause
> 
> AFAIK, only GFX/SDMA are affected by mode-2. IH is not suspended before 
> mode-2. Why do you resume IH after mode-2 when it is not suspended? Is it a 
> special case for virtualization?
> 
> Thanks,
> Lijo
> 
>> rptr greater than wptr. Such case was not handled in ih process, and 
>> it will cause rptr continue increasing util reaches the max.
>> Such case will make fence fallback situation happen.
>>
>> [how]
>> Move the enable of irq after ih resumed and before ib test.
>> Adjusting the position of enable irq on other reset paths accordingly.
>>
>> Signed-off-by: Victor Zhao 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 8 
>>drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c | 1 +
>>2 files changed, 5 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index c0cfae52f12b..0b658225e9ef 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4625,8 +4625,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
>> *adev,
>>  amdgpu_fence_driver_force_completion(ring);
>>  }
>>
>> -amdgpu_fence_driver_isr_toggle(adev, false);
>> -
>>  if (job && job->vm)
>>  drm_sched_increase_karma(&job->base);
>>
>> @@ -4758,6 +4756,10 @@ int amdgpu_do_asic_reset(struct list_head 
>> *device_list_handle,
>>  test_bit(AMDGPU_NEED_FULL_RESET, &reset_context->flags);
>>  skip_hw_reset = test_bit(AMDGPU_SKIP_HW_RESET, 
>> &reset_context->flags);
>>
>> +list_for_each_entry (tmp_adev, device_list_handle, reset_list) {
>> +amdgpu_fence_driver_isr_toggle(tmp_adev, false);
>> +}
>> +
>>  /*
>>   * ASIC reset has to be done on all XGMI hive nodes ASAP
>>   * to allow proper links negotiation in FW (within 1 sec) @@
>> -5031,8 +5033,6 @@ static void amdgpu_device_recheck_guilty_jobs(
>>  /* Clear this failed job from fence array */
>>  amdgpu_fence_driver_clear_job_fences(ring);
>>
>> -amdgpu_fence_driver_isr_toggle(adev, false);
>> -
>>  /* Since the job won't signal and we go for
>>   * another resubmit drop this parent pointer
>>   */
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
>> b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
>> index 7aa570c1ce4a..953036482d1f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sienna_cichlid.c
>> @@ -240,6 +240,7 @@ sienna_cichlid_mode2_restore_hwcontext(struct 
>> amdgpu_reset_control *reset_ctl,
>>  * Add this ASIC as tracked as reset was already
>>  * complete successfully.
>>  */
>> +amdgpu_fence_driver_isr_toggle(tmp_adev, false);
>>  amdgpu_register_gpu_instance(tmp_adev);
>>
>>  /* Resume RAS */
>>


[PATCH] drivers/pci: Decrease VF memory BAR size to Save host memory occupied by PTEs:

2022-09-15 Thread Rui Ma
The VRAM space is fixed, as the number of VFs increases, the actual BAR
memory space used by each VF decreases. However, the BAR memory mapping is
always based on the initial size of the VRAM. So do not map this unneeded
memory can reduce the amount of PTEs to save host memory.

Signed-off-by: Rui Ma 
---
 drivers/pci/iov.c| 11 +--
 drivers/pci/pci.h| 15 +++
 drivers/pci/quirks.c | 38 ++
 3 files changed, 62 insertions(+), 2 deletions(-)

diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
index 952217572113..d623f46669b6 100644
--- a/drivers/pci/iov.c
+++ b/drivers/pci/iov.c
@@ -295,6 +295,11 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
struct resource *res;
struct pci_sriov *iov = dev->sriov;
struct pci_bus *bus;
+/*
+ * Some SRIOV device's Bar is too large and occupy too much rmap size.
+ * Resize the request resource of VF.
+ */
+   u16 shift = 1;
 
bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
if (!bus)
@@ -328,8 +333,10 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
virtfn->resource[i].name = pci_name(virtfn);
virtfn->resource[i].flags = res->flags;
size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
+   shift = 1;
+   shift = virtfn_get_shift(dev, iov->num_VFs, i);
virtfn->resource[i].start = res->start + size * id;
-   virtfn->resource[i].end = virtfn->resource[i].start + size - 1;
+   virtfn->resource[i].end = virtfn->resource[i].start + (size >> 
(shift - 1)) - 1;
rc = request_resource(res, &virtfn->resource[i]);
BUG_ON(rc);
}
@@ -680,12 +687,12 @@ static int sriov_enable(struct pci_dev *dev, int 
nr_virtfn)
msleep(100);
pci_cfg_access_unlock(dev);
 
+   iov->num_VFs = nr_virtfn;
rc = sriov_add_vfs(dev, initial);
if (rc)
goto err_pcibios;
 
kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
-   iov->num_VFs = nr_virtfn;
 
return 0;
 
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 3d60cabde1a1..befc67a280eb 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -603,6 +603,21 @@ static inline int pci_dev_specific_reset(struct pci_dev 
*dev, bool probe)
 }
 #endif
 
+struct virtfn_get_shift_methods {
+   u16 vendor;
+   u16 device;
+   u16 (*get_shift)(struct pci_dev *dev, u16 arg, int arg2);
+};
+
+#ifdef CONFIG_PCI_QUIRKS
+u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2);
+#else
+static inline u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2)
+{
+   return (u16)1;
+}
+#endif
+
 #if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_ARM64)
 int acpi_get_rc_resources(struct device *dev, const char *hid, u16 segment,
  struct resource *res);
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index da829274fc66..82502c5923e4 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4085,6 +4085,44 @@ int pci_dev_specific_reset(struct pci_dev *dev, bool 
probe)
return -ENOTTY;
 }
 
+static u16 divided_by_VF(struct pci_dev *dev, u16 num_VFs, int bar_num)
+{
+   u16 shift = 1;
+
+   if (bar_num == 0) {
+   while ((1 << shift) <= num_VFs)
+   shift += 1;
+   }
+   pci_info(dev, "Bar %d get shift: %d.\n", bar_num, shift);
+   return shift;
+}
+
+static const struct virtfn_get_shift_methods virtfn_get_shift_methods[] = {
+   { PCI_VENDOR_ID_ATI, 0x73a1, divided_by_VF},
+   { 0 }
+};
+
+/*
+ * Get shift num to calculate SRIOV device bar.
+ * Sometimes the bar size for SRIOV device is too large
+ * and we want to calculate the size to define the end
+ * of virtfn.
+ */
+u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2)
+{
+   const struct virtfn_get_shift_methods *i;
+
+   for (i = virtfn_get_shift_methods; i->get_shift; i++) {
+   if ((i->vendor == dev->vendor ||
+i->vendor == (u16)PCI_ANY_ID) &&
+   (i->device == dev->device ||
+i->device == (u16)PCI_ANY_ID))
+   return i->get_shift(dev, arg1, arg2);
+   }
+
+   return (u16)1;
+}
+
 static void quirk_dma_func0_alias(struct pci_dev *dev)
 {
if (PCI_FUNC(dev->devfn) != 0)
-- 
2.25.1



Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Christian König

Hi Victor,

the advanced tdr mode is not a feature we want to enable in production 
and keep for much longer.


So would that issue happen without this as well? If not it is rather 
questionable if we should look into fixing this in the first place.


Regards,
Christian.

Am 15.09.22 um 12:09 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a lot of 
containers submitting gfx jobs. We have advanced tdr mode and mode2 reset 
enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx pending list 
maybe signaled after drm_sched_stop. So they will not be removed from pending 
list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and 
removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be resubmitted. 
Since it still has signaled bit, drm_sched_job_done will be called directly. 
This decrease the hw_rq_count which allows more jobs emitted but did not clean 
fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use num_fences_mask 
in amdgpu_fence_process, when overflow happens, the signal of some job will be 
skipped which result in an infinite wait for the fence_drv rcu ptr.

So close irq before sched_stop could avoid signal jobs after drm_sched_stop. 
And signal job one by one in fence_process instead of using a mask will handle 
the overflow situation.

Another fix could be skip submitting jobs which already signaled during 
resubmit stage, which may look cleaner.

Please help give some advice.


Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, 
Andrey 
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the sequence. Please 
help give some comments.


Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from 
another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The 
signal bit will not be cleared.

At the resubmit stage after recovery, the job with hw fence signaled bit set 
will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not cleared yet.

Then overflow happens in the fence driver slots and some jobs may be skipped 
and leave the rcu pointer not cleared which makes an infinite wait for the slot 
on the next fence emitted.

This infinite wait cause a job timeout on the emitting job. And driver will 
stuck at the its sched stop step because kthread_park cannot be done.

[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  6 +-
   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_virt_fini_data_exchange(adev);
}
   
-	amdgpu_fence_driver_isr_toggle(adev, true);

-
/* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 
@@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  amdgpu_device_ip_need_full_reset(tmp_adev))
amdgpu_ras_suspend(tmp_adev);
   
+		amdgpu_fence_driver_isr_toggle(tmp_adev, true);

+
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = tmp_adev->rings[i];
   
@@ -5228,8 +5228,12 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,

atomic_inc(&tmp_adev->gpu_reset_counter);
}
   
-	if (need_emergency_restart)

+   if (need_emergency_restart) {
+   list_for_each_entry (tmp_adev, device_list_handle, reset_list) {
+   amdgpu_fence_driver_isr_toggle(tmp_adev, false);
+   }
goto skip_sc

RE: [PATCH v5 06/21] drm/i915: Prepare to dynamic dma-buf locking specification

2022-09-15 Thread Ruhl, Michael J
>-Original Message-
>From: Dmitry Osipenko 
>Sent: Tuesday, September 13, 2022 3:28 PM
>To: David Airlie ; Gerd Hoffmann ;
>Gurchetan Singh ; Chia-I Wu
>; Daniel Vetter ; Daniel Almeida
>; Gert Wollny ;
>Gustavo Padovan ; Daniel Stone
>; Tomeu Vizoso ;
>Maarten Lankhorst ; Maxime Ripard
>; Thomas Zimmermann ;
>Rob Clark ; Sumit Semwal
>; Christian König ;
>Pan, Xinhui ; Thierry Reding
>; Tomasz Figa ; Marek
>Szyprowski ; Mauro Carvalho Chehab
>; Alex Deucher ; Jani
>Nikula ; Joonas Lahtinen
>; Vivi, Rodrigo ;
>Tvrtko Ursulin ; Thomas Hellström
>; Qiang Yu ; Srinivas
>Kandagatla ; Amol Maheshwari
>; Jason Gunthorpe ; Leon
>Romanovsky ; Gross, Jurgen ; Stefano
>Stabellini ; Oleksandr Tyshchenko
>; Tomi Valkeinen ;
>Russell King ; Lucas Stach ;
>Christian Gmeiner ; Ruhl, Michael J
>
>Cc: dri-de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Dmitry
>Osipenko ; linux-me...@vger.kernel.org; linaro-mm-
>s...@lists.linaro.org; amd-gfx@lists.freedesktop.org; intel-
>g...@lists.freedesktop.org; ker...@collabora.com; virtualization@lists.linux-
>foundation.org; linux-r...@vger.kernel.org; linux-arm-
>m...@vger.kernel.org
>Subject: [PATCH v5 06/21] drm/i915: Prepare to dynamic dma-buf locking
>specification
>
>Prepare i915 driver to the common dynamic dma-buf locking convention
>by starting to use the unlocked versions of dma-buf API functions
>and handling cases where importer now holds the reservation lock.
>
>Acked-by: Christian König 
>Signed-off-by: Dmitry Osipenko 
>---
> drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c   |  2 +-
> drivers/gpu/drm/i915/gem/i915_gem_object.c   | 14 ++
> .../gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c | 16 
> 3 files changed, 23 insertions(+), 9 deletions(-)
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>index f5062d0c6333..07eee1c09aaf 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_dmabuf.c
>@@ -72,7 +72,7 @@ static int i915_gem_dmabuf_vmap(struct dma_buf
>*dma_buf,
>   struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
>   void *vaddr;
>
>-  vaddr = i915_gem_object_pin_map_unlocked(obj, I915_MAP_WB);
>+  vaddr = i915_gem_object_pin_map(obj, I915_MAP_WB);
>   if (IS_ERR(vaddr))
>   return PTR_ERR(vaddr);
>
>diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>index 85482a04d158..7cab89618bad 100644
>--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>@@ -290,7 +290,21 @@ void __i915_gem_object_pages_fini(struct
>drm_i915_gem_object *obj)
>   __i915_gem_object_free_mmaps(obj);
>
>   atomic_set(&obj->mm.pages_pin_count, 0);
>+
>+  /*
>+   * dma_buf_unmap_attachment() requires reservation to be
>+   * locked. The imported GEM shouldn't share reservation lock
>+   * and ttm_bo_cleanup_memtype_use() shouldn't be invoked for
>+   * dma-buf, so it's safe to take the lock.
>+   */
>+  if (obj->base.import_attach)
>+  i915_gem_object_lock(obj, NULL);
>+
>   __i915_gem_object_put_pages(obj);
>+
>+  if (obj->base.import_attach)
>+  i915_gem_object_unlock(obj);
>+
>   GEM_BUG_ON(i915_gem_object_has_pages(obj));
> }

Hi Dmitry,

I think that this looks correct and reasonable.

Reviewed-by: Michael J. Ruhl 

m

>diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
>b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
>index 51ed824b020c..f2f3cfad807b 100644
>--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
>+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_dmabuf.c
>@@ -213,7 +213,7 @@ static int igt_dmabuf_import_same_driver(struct
>drm_i915_private *i915,
>   goto out_import;
>   }
>
>-  st = dma_buf_map_attachment(import_attach,
>DMA_BIDIRECTIONAL);
>+  st = dma_buf_map_attachment_unlocked(import_attach,
>DMA_BIDIRECTIONAL);
>   if (IS_ERR(st)) {
>   err = PTR_ERR(st);
>   goto out_detach;
>@@ -226,7 +226,7 @@ static int igt_dmabuf_import_same_driver(struct
>drm_i915_private *i915,
>   timeout = -ETIME;
>   }
>   err = timeout > 0 ? 0 : timeout;
>-  dma_buf_unmap_attachment(import_attach, st,
>DMA_BIDIRECTIONAL);
>+  dma_buf_unmap_attachment_unlocked(import_attach, st,
>DMA_BIDIRECTIONAL);
> out_detach:
>   dma_buf_detach(dmabuf, import_attach);
> out_import:
>@@ -296,7 +296,7 @@ static int igt_dmabuf_import(void *arg)
>   goto out_obj;
>   }
>
>-  err = dma_buf_vmap(dmabuf, &map);
>+  err = dma_buf_vmap_unlocked(dmabuf, &map);
>   dma_map = err ? NULL : map.vaddr;
>   if (!dma_map) {
>   pr_err("dma_buf_vmap failed\n");
>@@ -337,7 +337,7 @@ static int igt_dmabuf_import(void *arg)
>
>   err = 0;
> out_dma_map:
>-  dma_buf_vunmap(dmabuf, &map);
>+  dma_buf_vunmap_unloc

Re: [PATCH v5 1/1] drm/amdgpu: Fix amdgpu_vm_pt_free warning

2022-09-15 Thread Philip Yang



On 2022-09-15 02:21, Christian König wrote:

Am 14.09.22 um 19:45 schrieb Felix Kuehling:

Am 2022-09-14 um 12:08 schrieb Philip Yang:

Free page table BO from vm resv unlocked context generate below
warnings.

Add a pt_free_work in vm to free page table BO from vm->pt_freed list.
pass vm resv unlock status from page table update caller, and add vm_bo
entry to vm->pt_freed list and schedule the pt_free_work if calling 
with

vm resv unlocked.

WARNING: CPU: 12 PID: 3238 at
drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0
Call Trace:
  amdgpu_vm_pt_free+0x42/0xd0 [amdgpu]
  amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu]
  amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu]
  amdgpu_vm_update_range+0x2a6/0x640 [amdgpu]
  svm_range_unmap_from_gpus+0x110/0x300 [amdgpu]
  svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu]
  __mmu_notifier_invalidate_range_start+0x1cd/0x230
  unmap_vmas+0x9d/0x140
  unmap_region+0xa8/0x110

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  5 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  6 
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 
+--

  3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 59cac347baa3..20cfc8c9635b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2022,6 +2022,9 @@ int amdgpu_vm_init(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)

  spin_lock_init(&vm->invalidated_lock);
  INIT_LIST_HEAD(&vm->freed);
  INIT_LIST_HEAD(&vm->done);
+    INIT_LIST_HEAD(&vm->pt_freed);
+    INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
+    spin_lock_init(&vm->pt_free_lock);
    /* create scheduler entities for page table updates */
  r = drm_sched_entity_init(&vm->immediate, 
DRM_SCHED_PRIORITY_NORMAL,
@@ -2223,6 +2226,8 @@ void amdgpu_vm_fini(struct amdgpu_device 
*adev, struct amdgpu_vm *vm)

    amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
  +    flush_work(&vm->pt_free_work);
+
  root = amdgpu_bo_ref(vm->root.bo);
  amdgpu_bo_reserve(root, true);
  amdgpu_vm_set_pasid(adev, vm, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 9ecb7f663e19..b77fe838c327 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -276,6 +276,11 @@ struct amdgpu_vm {
  /* BOs which are invalidated, has been updated in the PTs */
  struct list_head    done;
  +    /* PT BOs scheduled to free and fill with zero if vm_resv is 
not hold */

+    struct list_head    pt_freed;
+    struct work_struct    pt_free_work;
+    spinlock_t    pt_free_lock;
+
  /* contains the page directory */
  struct amdgpu_vm_bo_base root;
  struct dma_fence    *last_update;
@@ -471,6 +476,7 @@ int amdgpu_vm_pde_update(struct 
amdgpu_vm_update_params *params,

  int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
    uint64_t start, uint64_t end,
    uint64_t dst, uint64_t flags);
+void amdgpu_vm_pt_free_work(struct work_struct *work);
    #if defined(CONFIG_DEBUG_FS)
  void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct 
seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c

index 88de9f0d4728..c16579471f22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -641,6 +641,27 @@ static void amdgpu_vm_pt_free(struct 
amdgpu_vm_bo_base *entry)

  amdgpu_bo_unref(&entry->bo);
  }
  +void amdgpu_vm_pt_free_work(struct work_struct *work)
+{
+    struct amdgpu_vm_bo_base *entry, *next;
+    struct amdgpu_vm *vm;
+    LIST_HEAD(pt_freed);
+
+    vm = container_of(work, struct amdgpu_vm, pt_free_work);
+
+    spin_lock(&vm->pt_free_lock);
+    list_splice_init(&vm->pt_freed, &pt_freed);
+    spin_unlock(&vm->pt_free_lock);
+
+    // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid
+    amdgpu_bo_reserve(vm->root.bo, true);
+
+    list_for_each_entry_safe(entry, next, &pt_freed, vm_status)
+    amdgpu_vm_pt_free(entry);
+
+    amdgpu_bo_unreserve(vm->root.bo);
+}
+
  /**
   * amdgpu_vm_pt_free_dfs - free PD/PT levels
   *
@@ -652,11 +673,24 @@ static void amdgpu_vm_pt_free(struct 
amdgpu_vm_bo_base *entry)

   */
  static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
    struct amdgpu_vm *vm,
-  struct amdgpu_vm_pt_cursor *start)
+  struct amdgpu_vm_pt_cursor *start,
+  bool unlocked)
  {
  struct amdgpu_vm_pt_cursor cursor;
  struct amdgpu_vm_bo_base *entry;
  +    if (unlocked) {
+    spin_lock(&vm->pt_free_lock);
+    for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, entry)
+    list_move(&entry->vm_status, &vm->pt_freed);
+
+    if (start)
+    list_move(

Re: [PATCH v5 1/1] drm/amdgpu: Fix amdgpu_vm_pt_free warning

2022-09-15 Thread Christian König

Am 15.09.22 um 16:39 schrieb Philip Yang:


On 2022-09-15 02:21, Christian König wrote:

Am 14.09.22 um 19:45 schrieb Felix Kuehling:

Am 2022-09-14 um 12:08 schrieb Philip Yang:

Free page table BO from vm resv unlocked context generate below
warnings.

Add a pt_free_work in vm to free page table BO from vm->pt_freed list.
pass vm resv unlock status from page table update caller, and add 
vm_bo
entry to vm->pt_freed list and schedule the pt_free_work if calling 
with

vm resv unlocked.

WARNING: CPU: 12 PID: 3238 at
drivers/gpu/drm/ttm/ttm_bo.c:106 ttm_bo_set_bulk_move+0xa1/0xc0
Call Trace:
  amdgpu_vm_pt_free+0x42/0xd0 [amdgpu]
  amdgpu_vm_pt_free_dfs+0xb3/0xf0 [amdgpu]
  amdgpu_vm_ptes_update+0x52d/0x850 [amdgpu]
  amdgpu_vm_update_range+0x2a6/0x640 [amdgpu]
  svm_range_unmap_from_gpus+0x110/0x300 [amdgpu]
  svm_range_cpu_invalidate_pagetables+0x535/0x600 [amdgpu]
  __mmu_notifier_invalidate_range_start+0x1cd/0x230
  unmap_vmas+0x9d/0x140
  unmap_region+0xa8/0x110

Signed-off-by: Philip Yang 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    |  5 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h    |  6 
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 41 
+--

  3 files changed, 49 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c

index 59cac347baa3..20cfc8c9635b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2022,6 +2022,9 @@ int amdgpu_vm_init(struct amdgpu_device 
*adev, struct amdgpu_vm *vm)

  spin_lock_init(&vm->invalidated_lock);
  INIT_LIST_HEAD(&vm->freed);
  INIT_LIST_HEAD(&vm->done);
+    INIT_LIST_HEAD(&vm->pt_freed);
+    INIT_WORK(&vm->pt_free_work, amdgpu_vm_pt_free_work);
+    spin_lock_init(&vm->pt_free_lock);
    /* create scheduler entities for page table updates */
  r = drm_sched_entity_init(&vm->immediate, 
DRM_SCHED_PRIORITY_NORMAL,
@@ -2223,6 +2226,8 @@ void amdgpu_vm_fini(struct amdgpu_device 
*adev, struct amdgpu_vm *vm)

    amdgpu_amdkfd_gpuvm_destroy_cb(adev, vm);
  +    flush_work(&vm->pt_free_work);
+
  root = amdgpu_bo_ref(vm->root.bo);
  amdgpu_bo_reserve(root, true);
  amdgpu_vm_set_pasid(adev, vm, 0);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h

index 9ecb7f663e19..b77fe838c327 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -276,6 +276,11 @@ struct amdgpu_vm {
  /* BOs which are invalidated, has been updated in the PTs */
  struct list_head    done;
  +    /* PT BOs scheduled to free and fill with zero if vm_resv is 
not hold */

+    struct list_head    pt_freed;
+    struct work_struct    pt_free_work;
+    spinlock_t    pt_free_lock;
+
  /* contains the page directory */
  struct amdgpu_vm_bo_base root;
  struct dma_fence    *last_update;
@@ -471,6 +476,7 @@ int amdgpu_vm_pde_update(struct 
amdgpu_vm_update_params *params,

  int amdgpu_vm_ptes_update(struct amdgpu_vm_update_params *params,
    uint64_t start, uint64_t end,
    uint64_t dst, uint64_t flags);
+void amdgpu_vm_pt_free_work(struct work_struct *work);
    #if defined(CONFIG_DEBUG_FS)
  void amdgpu_debugfs_vm_bo_info(struct amdgpu_vm *vm, struct 
seq_file *m);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c

index 88de9f0d4728..c16579471f22 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -641,6 +641,27 @@ static void amdgpu_vm_pt_free(struct 
amdgpu_vm_bo_base *entry)

  amdgpu_bo_unref(&entry->bo);
  }
  +void amdgpu_vm_pt_free_work(struct work_struct *work)
+{
+    struct amdgpu_vm_bo_base *entry, *next;
+    struct amdgpu_vm *vm;
+    LIST_HEAD(pt_freed);
+
+    vm = container_of(work, struct amdgpu_vm, pt_free_work);
+
+    spin_lock(&vm->pt_free_lock);
+    list_splice_init(&vm->pt_freed, &pt_freed);
+    spin_unlock(&vm->pt_free_lock);
+
+    // flush_work in amdgpu_vm_fini ensure vm->root.bo is valid
+    amdgpu_bo_reserve(vm->root.bo, true);
+
+    list_for_each_entry_safe(entry, next, &pt_freed, vm_status)
+    amdgpu_vm_pt_free(entry);
+
+    amdgpu_bo_unreserve(vm->root.bo);
+}
+
  /**
   * amdgpu_vm_pt_free_dfs - free PD/PT levels
   *
@@ -652,11 +673,24 @@ static void amdgpu_vm_pt_free(struct 
amdgpu_vm_bo_base *entry)

   */
  static void amdgpu_vm_pt_free_dfs(struct amdgpu_device *adev,
    struct amdgpu_vm *vm,
-  struct amdgpu_vm_pt_cursor *start)
+  struct amdgpu_vm_pt_cursor *start,
+  bool unlocked)
  {
  struct amdgpu_vm_pt_cursor cursor;
  struct amdgpu_vm_bo_base *entry;
  +    if (unlocked) {
+    spin_lock(&vm->pt_free_lock);
+    for_each_amdgpu_vm_pt_dfs_safe(adev, vm, start, cursor, 
entry)

+    list_move(&entry->vm_status, &vm->pt_freed);

[PATCH 0/4] support query rlcv/rlcp firmware version

2022-09-15 Thread Hawking Zhang
To allow query rlcv/rlcp firmware verion info

Hawking Zhang (4):
  drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx
  drm/amdgpu: support print rlc v2_x ucode hdr
  drm/amdgpu: add two new subquery ids
  drm/amdgpu: add rlcv/rlcp version info to debugfs

 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  24 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 168 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h |   4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   5 +
 include/uapi/drm/amdgpu_drm.h |   4 +
 6 files changed, 159 insertions(+), 50 deletions(-)

-- 
2.17.1



[PATCH 1/4] drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx

2022-09-15 Thread Hawking Zhang
cache rlcv/rlcvp ucode version info in amdgpu_gfx
structure

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Reviewed-by: Feifei Xu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   | 4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h | 4 
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c| 5 +
 3 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 027e993ff45a..d60888f35d1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -314,6 +314,10 @@ struct amdgpu_gfx {
uint32_trlc_srlg_feature_version;
uint32_trlc_srls_fw_version;
uint32_trlc_srls_feature_version;
+   uint32_trlcp_ucode_version;
+   uint32_trlcp_ucode_feature_version;
+   uint32_trlcv_ucode_version;
+   uint32_trlcv_ucode_feature_version;
uint32_tmec_feature_version;
uint32_tmec2_feature_version;
boolmec_fw_write_wait;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
index 96b6cf4c4d54..59edf32f775e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h
@@ -260,8 +260,12 @@ struct rlc_firmware_header_v2_2 {
 /* version_major=2, version_minor=3 */
 struct rlc_firmware_header_v2_3 {
 struct rlc_firmware_header_v2_2 v2_2;
+uint32_t rlcp_ucode_version;
+uint32_t rlcp_ucode_feature_version;
 uint32_t rlcp_ucode_size_bytes;
 uint32_t rlcp_ucode_offset_bytes;
+uint32_t rlcv_ucode_version;
+uint32_t rlcv_ucode_feature_version;
 uint32_t rlcv_ucode_size_bytes;
 uint32_t rlcv_ucode_offset_bytes;
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index fa718318568e..ce8c792cef1a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -475,8 +475,13 @@ static void gfx_v11_0_init_rlcp_rlcv_microcode(struct 
amdgpu_device *adev)
const struct rlc_firmware_header_v2_3 *rlc_hdr;
 
rlc_hdr = (const struct rlc_firmware_header_v2_3 
*)adev->gfx.rlc_fw->data;
+   adev->gfx.rlcp_ucode_version = le32_to_cpu(rlc_hdr->rlcp_ucode_version);
+   adev->gfx.rlcp_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcp_ucode_feature_version);
adev->gfx.rlc.rlcp_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcp_ucode_size_bytes);
adev->gfx.rlc.rlcp_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcp_ucode_offset_bytes);
+
+   adev->gfx.rlcv_ucode_version = le32_to_cpu(rlc_hdr->rlcv_ucode_version);
+   adev->gfx.rlcv_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcv_ucode_feature_version);
adev->gfx.rlc.rlcv_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcv_ucode_size_bytes);
adev->gfx.rlc.rlcv_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcv_ucode_offset_bytes);
 }
-- 
2.17.1



[PATCH 2/4] drm/amdgpu: support print rlc v2_x ucode hdr

2022-09-15 Thread Hawking Zhang
add rlc v2_x support to print_rlc_hdr helper

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Reviewed-by: Feifei Xu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 168 +++---
 1 file changed, 118 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
index 939c8614f0e3..dd0bc649a57d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
@@ -164,70 +164,138 @@ void amdgpu_ucode_print_rlc_hdr(const struct 
common_firmware_header *hdr)
} else if (version_major == 2) {
const struct rlc_firmware_header_v2_0 *rlc_hdr =
container_of(hdr, struct rlc_firmware_header_v2_0, 
header);
+   const struct rlc_firmware_header_v2_1 *rlc_hdr_v2_1 =
+   container_of(rlc_hdr, struct rlc_firmware_header_v2_1, 
v2_0);
+   const struct rlc_firmware_header_v2_2 *rlc_hdr_v2_2 =
+   container_of(rlc_hdr_v2_1, struct 
rlc_firmware_header_v2_2, v2_1);
+   const struct rlc_firmware_header_v2_3 *rlc_hdr_v2_3 =
+   container_of(rlc_hdr_v2_2, struct 
rlc_firmware_header_v2_3, v2_2);
+   const struct rlc_firmware_header_v2_4 *rlc_hdr_v2_4 =
+   container_of(rlc_hdr_v2_3, struct 
rlc_firmware_header_v2_4, v2_3);
 
-   DRM_DEBUG("ucode_feature_version: %u\n",
- le32_to_cpu(rlc_hdr->ucode_feature_version));
-   DRM_DEBUG("jt_offset: %u\n", le32_to_cpu(rlc_hdr->jt_offset));
-   DRM_DEBUG("jt_size: %u\n", le32_to_cpu(rlc_hdr->jt_size));
-   DRM_DEBUG("save_and_restore_offset: %u\n",
- le32_to_cpu(rlc_hdr->save_and_restore_offset));
-   DRM_DEBUG("clear_state_descriptor_offset: %u\n",
- le32_to_cpu(rlc_hdr->clear_state_descriptor_offset));
-   DRM_DEBUG("avail_scratch_ram_locations: %u\n",
- le32_to_cpu(rlc_hdr->avail_scratch_ram_locations));
-   DRM_DEBUG("reg_restore_list_size: %u\n",
- le32_to_cpu(rlc_hdr->reg_restore_list_size));
-   DRM_DEBUG("reg_list_format_start: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_format_start));
-   DRM_DEBUG("reg_list_format_separate_start: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_format_separate_start));
-   DRM_DEBUG("starting_offsets_start: %u\n",
- le32_to_cpu(rlc_hdr->starting_offsets_start));
-   DRM_DEBUG("reg_list_format_size_bytes: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_format_size_bytes));
-   DRM_DEBUG("reg_list_format_array_offset_bytes: %u\n",
- 
le32_to_cpu(rlc_hdr->reg_list_format_array_offset_bytes));
-   DRM_DEBUG("reg_list_size_bytes: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_size_bytes));
-   DRM_DEBUG("reg_list_array_offset_bytes: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_array_offset_bytes));
-   DRM_DEBUG("reg_list_format_separate_size_bytes: %u\n",
- 
le32_to_cpu(rlc_hdr->reg_list_format_separate_size_bytes));
-   DRM_DEBUG("reg_list_format_separate_array_offset_bytes: %u\n",
- 
le32_to_cpu(rlc_hdr->reg_list_format_separate_array_offset_bytes));
-   DRM_DEBUG("reg_list_separate_size_bytes: %u\n",
- le32_to_cpu(rlc_hdr->reg_list_separate_size_bytes));
-   DRM_DEBUG("reg_list_separate_array_offset_bytes: %u\n",
- 
le32_to_cpu(rlc_hdr->reg_list_separate_array_offset_bytes));
-   if (version_minor == 1) {
-   const struct rlc_firmware_header_v2_1 *v2_1 =
-   container_of(rlc_hdr, struct 
rlc_firmware_header_v2_1, v2_0);
+   switch (version_minor) {
+   case 0:
+   /* rlc_hdr v2_0 */
+   DRM_DEBUG("ucode_feature_version: %u\n",
+ le32_to_cpu(rlc_hdr->ucode_feature_version));
+   DRM_DEBUG("jt_offset: %u\n", 
le32_to_cpu(rlc_hdr->jt_offset));
+   DRM_DEBUG("jt_size: %u\n", 
le32_to_cpu(rlc_hdr->jt_size));
+   DRM_DEBUG("save_and_restore_offset: %u\n",
+ 
le32_to_cpu(rlc_hdr->save_and_restore_offset));
+   DRM_DEBUG("clear_state_descriptor_offset: %u\n",
+ 
le32_to_cpu(rlc_hdr->clear_state_descriptor_offset));
+   DRM_DEBUG("avail_scratch_ram_locations: %u\n",
+ 
le32_to_cpu(rlc_hdr->avail_scratch_ram_locations));
+   DRM_DEBUG("reg_r

[PATCH 4/4] drm/amdgpu: add rlcv/rlcp version info to debugfs

2022-09-15 Thread Hawking Zhang
amdgpu_firmware_info debugfs will show rlcv/rlcp
ucode version info

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Reviewed-by: Feifei Xu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 4c022785ded1..49d5c43ae9b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -253,6 +253,14 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
fw_info->ver = adev->gfx.rlc_srls_fw_version;
fw_info->feature = adev->gfx.rlc_srls_feature_version;
break;
+   case AMDGPU_INFO_FW_GFX_RLCP:
+   fw_info->ver = adev->gfx.rlcp_ucode_version;
+   fw_info->feature = adev->gfx.rlcp_ucode_feature_version;
+   break;
+   case AMDGPU_INFO_FW_GFX_RLCV:
+   fw_info->ver = adev->gfx.rlcv_ucode_version;
+   fw_info->feature = adev->gfx.rlcv_ucode_feature_version;
+   break;
case AMDGPU_INFO_FW_GFX_MEC:
if (query_fw->index == 0) {
fw_info->ver = adev->gfx.mec_fw_version;
@@ -1514,6 +1522,22 @@ static int amdgpu_debugfs_firmware_info_show(struct 
seq_file *m, void *unused)
seq_printf(m, "RLC SRLS feature version: %u, firmware version: 
0x%08x\n",
   fw_info.feature, fw_info.ver);
 
+   /* RLCP */
+   query_fw.fw_type = AMDGPU_INFO_FW_GFX_RLCP;
+   ret = amdgpu_firmware_info(&fw_info, &query_fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "RLCP feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
+   /* RLCV */
+query_fw.fw_type = AMDGPU_INFO_FW_GFX_RLCV;
+   ret = amdgpu_firmware_info(&fw_info, &query_fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "RLCV feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
/* MEC */
query_fw.fw_type = AMDGPU_INFO_FW_GFX_MEC;
query_fw.index = 0;
-- 
2.17.1



[PATCH 3/4] drm/amdgpu: add two new subquery ids

2022-09-15 Thread Hawking Zhang
To support query rlcp and rlcv firmware version from
existing AMDGPU_INFO_FW_VERSION interface

Signed-off-by: Hawking Zhang 
Reviewed-by: Likun Gao 
Reviewed-by: Feifei Xu 
---
 include/uapi/drm/amdgpu_drm.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index f01498fe55fa..61a89bc27983 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -816,6 +816,10 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_FW_TOC  0x15
/* Subquery id: Query CAP firmware version */
#define AMDGPU_INFO_FW_CAP  0x16
+   /* Subquery id: Query GFX RLCP firmware version */
+   #define AMDGPU_INFO_FW_GFX_RLCP 0x17
+   /* Subquery id: Query GFX RLCV firmware version */
+   #define AMDGPU_INFO_FW_GFX_RLCV 0x18
 
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
-- 
2.17.1



[PATCH 00/10] switch to common helper for rlc fw init

2022-09-15 Thread Hawking Zhang
Replace IP specific implementation with common rlc helper
for rlc firmware initialization.

This also helps reduce duplicated code when introduce new
ip block support.

Hawking Zhang (10):
  drm/amdgpu: add helper to init rlc fw in header v2_0
  drm/amdgpu: add helper to init rlc fw in header v2_1
  drm/amdgpu: add helper to init rlc fw in header v2_2
  drm/amdgpu: add helper to init rlc fw in header v2_3
  drm/amdgpu: add helper to init rlc fw in header v2_4
  drm/amdgpu: add helper to init rlc firmware
  drm/amdgpu/gfx8: switch to amdgpu_gfx_rlc_init_microcode
  drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode
  drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
  drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode

 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 264 
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h |   4 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 191 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 156 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  60 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 106 +-
 6 files changed, 289 insertions(+), 492 deletions(-)

-- 
2.17.1



[PATCH 02/10] drm/amdgpu: add helper to init rlc fw in header v2_1

2022-09-15 Thread Hawking Zhang
To initialize rlc firmware in header v2_1

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 40 +
 1 file changed, 40 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 10b998cae1ba..04bdb885918d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -336,3 +336,43 @@ static int amdgpu_gfx_rlc_init_microcode_v2_0(struct 
amdgpu_device *adev)
 
return 0;
 }
+
+static void amdgpu_gfx_rlc_init_microcode_v2_1(struct amdgpu_device *adev)
+{
+   const struct rlc_firmware_header_v2_1 *rlc_hdr;
+   struct amdgpu_firmware_info *info;
+
+   rlc_hdr = (const struct rlc_firmware_header_v2_1 
*)adev->gfx.rlc_fw->data;
+   adev->gfx.rlc_srlc_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_ucode_ver);
+   adev->gfx.rlc_srlc_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_feature_ver);
+   adev->gfx.rlc.save_restore_list_cntl_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_size_bytes);
+   adev->gfx.rlc.save_restore_list_cntl = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_offset_bytes);
+   adev->gfx.rlc_srlg_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_ucode_ver);
+   adev->gfx.rlc_srlg_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_feature_ver);
+   adev->gfx.rlc.save_restore_list_gpm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_size_bytes);
+   adev->gfx.rlc.save_restore_list_gpm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_offset_bytes);
+   adev->gfx.rlc_srls_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_ucode_ver);
+   adev->gfx.rlc_srls_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_feature_ver);
+   adev->gfx.rlc.save_restore_list_srm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_size_bytes);
+   adev->gfx.rlc.save_restore_list_srm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_srm_offset_bytes);
+   adev->gfx.rlc.reg_list_format_direct_reg_list_length =
+   le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.save_restore_list_gpm_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM];
+   info->ucode_id = 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_GPM_MEM;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.save_restore_list_gpm_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.save_restore_list_srm_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM];
+   info->ucode_id = 
AMDGPU_UCODE_ID_RLC_RESTORE_LIST_SRM_MEM;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.save_restore_list_srm_size_bytes), PAGE_SIZE);
+   }
+   }
+}
-- 
2.17.1



[PATCH 03/10] drm/amdgpu: add helper to init rlc fw in header v2_2

2022-09-15 Thread Hawking Zhang
To initialize rlc firmware in header v2_2

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 30 +
 1 file changed, 30 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 04bdb885918d..a055818d87f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -376,3 +376,33 @@ static void amdgpu_gfx_rlc_init_microcode_v2_1(struct 
amdgpu_device *adev)
}
}
 }
+
+static void amdgpu_gfx_rlc_init_microcode_v2_2(struct amdgpu_device *adev)
+{
+   const struct rlc_firmware_header_v2_2 *rlc_hdr;
+   struct amdgpu_firmware_info *info;
+
+   rlc_hdr = (const struct rlc_firmware_header_v2_2 
*)adev->gfx.rlc_fw->data;
+   adev->gfx.rlc.rlc_iram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_size_bytes);
+   adev->gfx.rlc.rlc_iram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_offset_bytes);
+   adev->gfx.rlc.rlc_dram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_size_bytes);
+   adev->gfx.rlc.rlc_dram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_offset_bytes);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.rlc_iram_ucode_size_bytes) {
+   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_IRAM];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_IRAM;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.rlc_iram_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.rlc_dram_ucode_size_bytes) {
+   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_DRAM];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_DRAM;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.rlc_dram_ucode_size_bytes), PAGE_SIZE);
+   }
+   }
+}
-- 
2.17.1



[PATCH 01/10] drm/amdgpu: add helper to init rlc fw in header v2_0

2022-09-15 Thread Hawking Zhang
To initialize rlc firmware in header v2_0

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 64 +
 1 file changed, 64 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 6373bfb47d55..10b998cae1ba 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -272,3 +272,67 @@ void amdgpu_gfx_rlc_fini(struct amdgpu_device *adev)
  &adev->gfx.rlc.cp_table_gpu_addr,
  (void **)&adev->gfx.rlc.cp_table_ptr);
 }
+
+static int amdgpu_gfx_rlc_init_microcode_v2_0(struct amdgpu_device *adev)
+{
+   const struct common_firmware_header *common_hdr;
+   const struct rlc_firmware_header_v2_0 *rlc_hdr;
+   struct amdgpu_firmware_info *info;
+   unsigned int *tmp;
+   unsigned int i;
+
+   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
+
+   adev->gfx.rlc_fw_version = le32_to_cpu(rlc_hdr->header.ucode_version);
+   adev->gfx.rlc_feature_version = 
le32_to_cpu(rlc_hdr->ucode_feature_version);
+   adev->gfx.rlc.save_and_restore_offset =
+   le32_to_cpu(rlc_hdr->save_and_restore_offset);
+   adev->gfx.rlc.clear_state_descriptor_offset =
+   le32_to_cpu(rlc_hdr->clear_state_descriptor_offset);
+   adev->gfx.rlc.avail_scratch_ram_locations =
+   le32_to_cpu(rlc_hdr->avail_scratch_ram_locations);
+   adev->gfx.rlc.reg_restore_list_size =
+   le32_to_cpu(rlc_hdr->reg_restore_list_size);
+   adev->gfx.rlc.reg_list_format_start =
+   le32_to_cpu(rlc_hdr->reg_list_format_start);
+   adev->gfx.rlc.reg_list_format_separate_start =
+   le32_to_cpu(rlc_hdr->reg_list_format_separate_start);
+   adev->gfx.rlc.starting_offsets_start =
+   le32_to_cpu(rlc_hdr->starting_offsets_start);
+   adev->gfx.rlc.reg_list_format_size_bytes =
+   le32_to_cpu(rlc_hdr->reg_list_format_size_bytes);
+   adev->gfx.rlc.reg_list_size_bytes =
+   le32_to_cpu(rlc_hdr->reg_list_size_bytes);
+   adev->gfx.rlc.register_list_format =
+   kmalloc(le32_to_cpu(adev->gfx.rlc.reg_list_format_size_bytes) +
+   le32_to_cpu(adev->gfx.rlc.reg_list_size_bytes), 
GFP_KERNEL);
+   if (!adev->gfx.rlc.register_list_format) {
+   dev_err(adev->dev, "failed to allocate memory for rlc 
register_list_format\n");
+   return -ENOMEM;
+   }
+
+   tmp = (unsigned int *)((uintptr_t)rlc_hdr +
+   
le32_to_cpu(rlc_hdr->reg_list_format_array_offset_bytes));
+   for (i = 0 ; i < (rlc_hdr->reg_list_format_size_bytes >> 2); i++)
+   adev->gfx.rlc.register_list_format[i] = le32_to_cpu(tmp[i]);
+
+   adev->gfx.rlc.register_restore = adev->gfx.rlc.register_list_format + i;
+
+   tmp = (unsigned int *)((uintptr_t)rlc_hdr +
+   le32_to_cpu(rlc_hdr->reg_list_array_offset_bytes));
+   for (i = 0 ; i < (rlc_hdr->reg_list_size_bytes >> 2); i++)
+   adev->gfx.rlc.register_restore[i] = le32_to_cpu(tmp[i]);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_G];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_G;
+   info->fw = adev->gfx.rlc_fw;
+   if (info->fw) {
+   common_hdr = (const struct common_firmware_header 
*)info->fw->data;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(common_hdr->ucode_size_bytes), PAGE_SIZE);
+   }
+   }
+
+   return 0;
+}
-- 
2.17.1



[PATCH 04/10] drm/amdgpu: add helper to init rlc fw in header v2_3

2022-09-15 Thread Hawking Zhang
To initialize rlc firmware in header v2_3

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 35 +
 1 file changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index a055818d87f7..adbb9bed86fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -406,3 +406,38 @@ static void amdgpu_gfx_rlc_init_microcode_v2_2(struct 
amdgpu_device *adev)
}
}
 }
+
+static void amdgpu_gfx_rlc_init_microcode_v2_3(struct amdgpu_device *adev)
+{
+   const struct rlc_firmware_header_v2_3 *rlc_hdr;
+   struct amdgpu_firmware_info *info;
+
+   rlc_hdr = (const struct rlc_firmware_header_v2_3 
*)adev->gfx.rlc_fw->data;
+   adev->gfx.rlcp_ucode_version = le32_to_cpu(rlc_hdr->rlcp_ucode_version);
+   adev->gfx.rlcp_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcp_ucode_feature_version);
+   adev->gfx.rlc.rlcp_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcp_ucode_size_bytes);
+   adev->gfx.rlc.rlcp_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcp_ucode_offset_bytes);
+
+   adev->gfx.rlcv_ucode_version = le32_to_cpu(rlc_hdr->rlcv_ucode_version);
+   adev->gfx.rlcv_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcv_ucode_feature_version);
+   adev->gfx.rlc.rlcv_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcv_ucode_size_bytes);
+   adev->gfx.rlc.rlcv_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcv_ucode_offset_bytes);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.rlcp_ucode_size_bytes) {
+   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_P];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_P;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.rlcp_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.rlcv_ucode_size_bytes) {
+   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_V];
+   info->ucode_id = AMDGPU_UCODE_ID_RLC_V;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.rlcv_ucode_size_bytes), PAGE_SIZE);
+   }
+   }
+}
-- 
2.17.1



[PATCH 05/10] drm/amdgpu: add helper to init rlc fw in header v2_4

2022-09-15 Thread Hawking Zhang
To initialize rlc firmware in header v2_4

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 60 +
 1 file changed, 60 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index adbb9bed86fc..34b3290de746 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -441,3 +441,63 @@ static void amdgpu_gfx_rlc_init_microcode_v2_3(struct 
amdgpu_device *adev)
}
}
 }
+
+static void amdgpu_gfx_rlc_init_microcode_v2_4(struct amdgpu_device *adev)
+{
+   const struct rlc_firmware_header_v2_4 *rlc_hdr;
+   struct amdgpu_firmware_info *info;
+
+   rlc_hdr = (const struct rlc_firmware_header_v2_4 
*)adev->gfx.rlc_fw->data;
+   adev->gfx.rlc.global_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->global_tap_delays_ucode_size_bytes);
+   adev->gfx.rlc.global_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->global_tap_delays_ucode_offset_bytes);
+   adev->gfx.rlc.se0_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se0_tap_delays_ucode_size_bytes);
+   adev->gfx.rlc.se0_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se0_tap_delays_ucode_offset_bytes);
+   adev->gfx.rlc.se1_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se1_tap_delays_ucode_size_bytes);
+   adev->gfx.rlc.se1_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se1_tap_delays_ucode_offset_bytes);
+   adev->gfx.rlc.se2_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se2_tap_delays_ucode_size_bytes);
+   adev->gfx.rlc.se2_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se2_tap_delays_ucode_offset_bytes);
+   adev->gfx.rlc.se3_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se3_tap_delays_ucode_size_bytes);
+   adev->gfx.rlc.se3_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se3_tap_delays_ucode_offset_bytes);
+
+   if (adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) {
+   if (adev->gfx.rlc.global_tap_delays_ucode_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_GLOBAL_TAP_DELAYS];
+   info->ucode_id = AMDGPU_UCODE_ID_GLOBAL_TAP_DELAYS;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.global_tap_delays_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.se0_tap_delays_ucode_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_SE0_TAP_DELAYS];
+   info->ucode_id = AMDGPU_UCODE_ID_SE0_TAP_DELAYS;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.se0_tap_delays_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.se1_tap_delays_ucode_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_SE1_TAP_DELAYS];
+   info->ucode_id = AMDGPU_UCODE_ID_SE1_TAP_DELAYS;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.se1_tap_delays_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.se2_tap_delays_ucode_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_SE2_TAP_DELAYS];
+   info->ucode_id = AMDGPU_UCODE_ID_SE2_TAP_DELAYS;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.se2_tap_delays_ucode_size_bytes), PAGE_SIZE);
+   }
+
+   if (adev->gfx.rlc.se3_tap_delays_ucode_size_bytes) {
+   info = 
&adev->firmware.ucode[AMDGPU_UCODE_ID_SE3_TAP_DELAYS];
+   info->ucode_id = AMDGPU_UCODE_ID_SE3_TAP_DELAYS;
+   info->fw = adev->gfx.rlc_fw;
+   adev->firmware.fw_size +=
+   
ALIGN(le32_to_cpu(adev->gfx.rlc.se3_tap_delays_ucode_size_bytes), PAGE_SIZE);
+   }
+   }
+}
-- 
2.17.1



[PATCH 06/10] drm/amdgpu: add helper to init rlc firmware

2022-09-15 Thread Hawking Zhang
To initialzie rlc firmware according to rlc
firmware header version

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 35 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h |  4 ++-
 2 files changed, 38 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
index 34b3290de746..8b115003986a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c
@@ -501,3 +501,38 @@ static void amdgpu_gfx_rlc_init_microcode_v2_4(struct 
amdgpu_device *adev)
}
}
 }
+
+int amdgpu_gfx_rlc_init_microcode(struct amdgpu_device *adev,
+ uint16_t version_major,
+ uint16_t version_minor)
+{
+   int err;
+
+   if (version_major < 2) {
+   /* only support rlc_hdr v2.x and onwards */
+   dev_err(adev->dev, "unsupported rlc fw hdr\n");
+   return -EINVAL;
+   }
+
+   /* is_rlc_v2_1 is still used in APU code path */
+   if (version_major == 2 && version_minor == 1)
+   adev->gfx.rlc.is_rlc_v2_1 = true;
+
+   if (version_minor >= 0) {
+   err = amdgpu_gfx_rlc_init_microcode_v2_0(adev);
+   if (err) {
+   dev_err(adev->dev, "fail to init rlc v2_0 microcode\n");
+   return err;
+   }
+   }
+   if (version_minor >= 1)
+   amdgpu_gfx_rlc_init_microcode_v2_1(adev);
+   if (version_minor >= 2)
+   amdgpu_gfx_rlc_init_microcode_v2_2(adev);
+   if (version_minor >= 3)
+   amdgpu_gfx_rlc_init_microcode_v2_3(adev);
+   if (version_minor >= 4)
+   amdgpu_gfx_rlc_init_microcode_v2_4(adev);
+
+   return 0;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
index 03ac36b2c2cf..23f060db9255 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h
@@ -267,5 +267,7 @@ int amdgpu_gfx_rlc_init_csb(struct amdgpu_device *adev);
 int amdgpu_gfx_rlc_init_cpt(struct amdgpu_device *adev);
 void amdgpu_gfx_rlc_setup_cp_table(struct amdgpu_device *adev);
 void amdgpu_gfx_rlc_fini(struct amdgpu_device *adev);
-
+int amdgpu_gfx_rlc_init_microcode(struct amdgpu_device *adev,
+ uint16_t version_major,
+ uint16_t version_minor);
 #endif
-- 
2.17.1



[PATCH 07/10] drm/amdgpu/gfx8: switch to amdgpu_gfx_rlc_init_microcode

2022-09-15 Thread Hawking Zhang
switch to common helper to initialize rlc firmware
for gfx8

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 60 +--
 1 file changed, 11 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 606d0c873d71..09c63e2d4637 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -951,7 +951,7 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device 
*adev)
const struct common_firmware_header *header = NULL;
const struct gfx_firmware_header_v1_0 *cp_hdr;
const struct rlc_firmware_header_v2_0 *rlc_hdr;
-   unsigned int *tmp = NULL, i;
+   uint16_t version_major, version_minor;
 
DRM_DEBUG("\n");
 
@@ -1064,49 +1064,18 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device 
*adev)
if (err)
goto out;
err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
-   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc_fw_version = le32_to_cpu(rlc_hdr->header.ucode_version);
-   adev->gfx.rlc_feature_version = 
le32_to_cpu(rlc_hdr->ucode_feature_version);
-
-   adev->gfx.rlc.save_and_restore_offset =
-   le32_to_cpu(rlc_hdr->save_and_restore_offset);
-   adev->gfx.rlc.clear_state_descriptor_offset =
-   le32_to_cpu(rlc_hdr->clear_state_descriptor_offset);
-   adev->gfx.rlc.avail_scratch_ram_locations =
-   le32_to_cpu(rlc_hdr->avail_scratch_ram_locations);
-   adev->gfx.rlc.reg_restore_list_size =
-   le32_to_cpu(rlc_hdr->reg_restore_list_size);
-   adev->gfx.rlc.reg_list_format_start =
-   le32_to_cpu(rlc_hdr->reg_list_format_start);
-   adev->gfx.rlc.reg_list_format_separate_start =
-   le32_to_cpu(rlc_hdr->reg_list_format_separate_start);
-   adev->gfx.rlc.starting_offsets_start =
-   le32_to_cpu(rlc_hdr->starting_offsets_start);
-   adev->gfx.rlc.reg_list_format_size_bytes =
-   le32_to_cpu(rlc_hdr->reg_list_format_size_bytes);
-   adev->gfx.rlc.reg_list_size_bytes =
-   le32_to_cpu(rlc_hdr->reg_list_size_bytes);
-
-   adev->gfx.rlc.register_list_format =
-   kmalloc(adev->gfx.rlc.reg_list_format_size_bytes +
-   adev->gfx.rlc.reg_list_size_bytes, 
GFP_KERNEL);
-
-   if (!adev->gfx.rlc.register_list_format) {
-   err = -ENOMEM;
+   if (err)
goto out;
+   rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
+   if (!rlc_hdr) {
+   dev_err(adev->dev, "invalid rlc fw data\n");
+   return -EINVAL;
}
-
-   tmp = (unsigned int *)((uintptr_t)rlc_hdr +
-   
le32_to_cpu(rlc_hdr->reg_list_format_array_offset_bytes));
-   for (i = 0 ; i < (adev->gfx.rlc.reg_list_format_size_bytes >> 2); i++)
-   adev->gfx.rlc.register_list_format[i] = le32_to_cpu(tmp[i]);
-
-   adev->gfx.rlc.register_restore = adev->gfx.rlc.register_list_format + i;
-
-   tmp = (unsigned int *)((uintptr_t)rlc_hdr +
-   le32_to_cpu(rlc_hdr->reg_list_array_offset_bytes));
-   for (i = 0 ; i < (adev->gfx.rlc.reg_list_size_bytes >> 2); i++)
-   adev->gfx.rlc.register_restore[i] = le32_to_cpu(tmp[i]);
+   version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
+   version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
+   err = amdgpu_gfx_rlc_init_microcode(adev, version_major, version_minor);
+   if (err)
+   goto out;
 
if (adev->asic_type >= CHIP_POLARIS10 && adev->asic_type <= 
CHIP_POLARIS12) {
snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_mec_2.bin", 
chip_name);
@@ -1178,13 +1147,6 @@ static int gfx_v8_0_init_microcode(struct amdgpu_device 
*adev)
adev->firmware.fw_size +=
ALIGN(le32_to_cpu(header->ucode_size_bytes), PAGE_SIZE);
 
-   info = &adev->firmware.ucode[AMDGPU_UCODE_ID_RLC_G];
-   info->ucode_id = AMDGPU_UCODE_ID_RLC_G;
-   info->fw = adev->gfx.rlc_fw;
-   header = (const struct common_firmware_header *)info->fw->data;
-   adev->firmware.fw_size +=
-   ALIGN(le32_to_cpu(header->ucode_size_bytes), PAGE_SIZE);
-
info = &adev->firmware.ucode[AMDGPU_UCODE_ID_CP_MEC1];
info->ucode_id = AMDGPU_UCODE_ID_CP_MEC1;
info->fw = adev->gfx.mec_fw;
-- 
2.17.1



[PATCH 08/10] drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode

2022-09-15 Thread Hawking Zhang
switch to common helper to initialize rlc firmware
for gfx9

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 106 +-
 1 file changed, 3 insertions(+), 103 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 253ffa4c62be..07c24380edf7 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1091,27 +1091,6 @@ static void gfx_v9_0_free_microcode(struct amdgpu_device 
*adev)
kfree(adev->gfx.rlc.register_list_format);
 }
 
-static void gfx_v9_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_1 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_1 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc_srlc_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_ucode_ver);
-   adev->gfx.rlc_srlc_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_feature_ver);
-   adev->gfx.rlc.save_restore_list_cntl_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_size_bytes);
-   adev->gfx.rlc.save_restore_list_cntl = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_offset_bytes);
-   adev->gfx.rlc_srlg_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_ucode_ver);
-   adev->gfx.rlc_srlg_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_feature_ver);
-   adev->gfx.rlc.save_restore_list_gpm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_size_bytes);
-   adev->gfx.rlc.save_restore_list_gpm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_offset_bytes);
-   adev->gfx.rlc_srls_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_ucode_ver);
-   adev->gfx.rlc_srls_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_feature_ver);
-   adev->gfx.rlc.save_restore_list_srm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_size_bytes);
-   adev->gfx.rlc.save_restore_list_srm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_srm_offset_bytes);
-   adev->gfx.rlc.reg_list_format_direct_reg_list_length =
-   
le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
-}
-
 static void gfx_v9_0_check_fw_write_wait(struct amdgpu_device *adev)
 {
adev->gfx.me_fw_write_wait = false;
@@ -1353,11 +1332,7 @@ static int gfx_v9_0_init_rlc_microcode(struct 
amdgpu_device *adev,
 {
char fw_name[30];
int err;
-   struct amdgpu_firmware_info *info = NULL;
-   const struct common_firmware_header *header = NULL;
const struct rlc_firmware_header_v2_0 *rlc_hdr;
-   unsigned int *tmp = NULL;
-   unsigned int i = 0;
uint16_t version_major;
uint16_t version_minor;
uint32_t smu_version;
@@ -1386,88 +1361,13 @@ static int gfx_v9_0_init_rlc_microcode(struct 
amdgpu_device *adev,
if (err)
goto out;
err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
+   if (err)
+   goto out;
rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
 
version_major = le16_to_cpu(rlc_hdr->header.header_version_major);
version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
-   if (version_major == 2 && version_minor == 1)
-   adev->gfx.rlc.is_rlc_v2_1 = true;
-
-   adev->gfx.rlc_fw_version = le32_to_cpu(rlc_hdr->header.ucode_version);
-   adev->gfx.rlc_feature_version = 
le32_to_cpu(rlc_hdr->ucode_feature_version);
-   adev->gfx.rlc.save_and_restore_offset =
-   le32_to_cpu(rlc_hdr->save_and_restore_offset);
-   adev->gfx.rlc.clear_state_descriptor_offset =
-   le32_to_cpu(rlc_hdr->clear_state_descriptor_offset);
-   adev->gfx.rlc.avail_scratch_ram_locations =
-   le32_to_cpu(rlc_hdr->avail_scratch_ram_locations);
-   adev->gfx.rlc.reg_restore_list_size =
-   le32_to_cpu(rlc_hdr->reg_restore_list_size);
-   adev->gfx.rlc.reg_list_format_start =
-   le32_to_cpu(rlc_hdr->reg_list_format_start);
-   adev->gfx.rlc.reg_list_format_separate_start =
-   le32_to_cpu(rlc_hdr->reg_list_format_separate_start);
-   adev->gfx.rlc.starting_offsets_start =
-   le32_to_cpu(rlc_hdr->starting_offsets_start);
-   adev->gfx.rlc.reg_list_format_size_bytes =
-   le32_to_cpu(rlc_hdr->reg_list_format_size_bytes);
-   adev->gfx.rlc.reg_list_size_bytes =
-   le32_to_cpu(rlc_hdr->reg_list_size_bytes);
-   adev->gfx.rlc.register_list_format =
-   kmalloc(adev->gfx.rlc.reg_list_format_size_bytes +
-   adev->gfx.rlc.reg_list_size_bytes, GFP_KERNEL);
-   if (!adev->gfx.rlc.register_list_format) {
-   err = -ENOMEM;
-   goto out;
-   }
-
- 

[PATCH 09/10] drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode

2022-09-15 Thread Hawking Zhang
switch to common helper to initialize rlc firmware
for gfx10

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 191 +
 1 file changed, 4 insertions(+), 187 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
index 26ec04fd313b..423b1b6d31b6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
@@ -3943,56 +3943,6 @@ static void gfx_v10_0_check_fw_write_wait(struct 
amdgpu_device *adev)
DRM_WARN_ONCE("CP firmware version too old, please update!");
 }
 
-
-static void gfx_v10_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_1 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_1 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc_srlc_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_ucode_ver);
-   adev->gfx.rlc_srlc_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_feature_ver);
-   adev->gfx.rlc.save_restore_list_cntl_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_size_bytes);
-   adev->gfx.rlc.save_restore_list_cntl = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_offset_bytes);
-   adev->gfx.rlc_srlg_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_ucode_ver);
-   adev->gfx.rlc_srlg_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_feature_ver);
-   adev->gfx.rlc.save_restore_list_gpm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_size_bytes);
-   adev->gfx.rlc.save_restore_list_gpm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_offset_bytes);
-   adev->gfx.rlc_srls_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_ucode_ver);
-   adev->gfx.rlc_srls_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_feature_ver);
-   adev->gfx.rlc.save_restore_list_srm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_size_bytes);
-   adev->gfx.rlc.save_restore_list_srm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_srm_offset_bytes);
-   adev->gfx.rlc.reg_list_format_direct_reg_list_length =
-   
le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
-}
-
-static void gfx_v10_0_init_rlc_iram_dram_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_2 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_2 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc.rlc_iram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_size_bytes);
-   adev->gfx.rlc.rlc_iram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_offset_bytes);
-   adev->gfx.rlc.rlc_dram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_size_bytes);
-   adev->gfx.rlc.rlc_dram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_offset_bytes);
-}
-
-static void gfx_v10_0_init_tap_delays_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_4 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_4 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc.global_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->global_tap_delays_ucode_size_bytes);
-   adev->gfx.rlc.global_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->global_tap_delays_ucode_offset_bytes);
-   adev->gfx.rlc.se0_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se0_tap_delays_ucode_size_bytes);
-   adev->gfx.rlc.se0_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se0_tap_delays_ucode_offset_bytes);
-   adev->gfx.rlc.se1_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se1_tap_delays_ucode_size_bytes);
-   adev->gfx.rlc.se1_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se1_tap_delays_ucode_offset_bytes);
-   adev->gfx.rlc.se2_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se2_tap_delays_ucode_size_bytes);
-   adev->gfx.rlc.se2_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se2_tap_delays_ucode_offset_bytes);
-   adev->gfx.rlc.se3_tap_delays_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->se3_tap_delays_ucode_size_bytes);
-   adev->gfx.rlc.se3_tap_delays_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->se3_tap_delays_ucode_offset_bytes);
-}
-
 static bool gfx_v10_0_navi10_gfxoff_should_enable(struct amdgpu_device *adev)
 {
bool ret = false;
@@ -4032,8 +3982,6 @@ static int gfx_v10_0_init_microcode(struct amdgpu_device 
*adev)
const struct common_firmware_header *header = NULL;
const struct gfx_firmware_header_v1_0 *cp_hdr;
const struct rlc_firmware_header_v2_0 *rlc_hdr;
-   unsigned int *tmp = NULL;
-   unsigned int i = 0;
uint16_t version_major;
uint16_t version_minor;
 
@@ -4123,59 +4071,14 @@ static int gfx_v10_0_init_microcode(struct 
amdgpu_device *adev)
if (err)
goto out;
  

[PATCH 10/10] drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode

2022-09-15 Thread Hawking Zhang
switch to common helper to initialize rlc firmware
for gfx11

Signed-off-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 156 +
 1 file changed, 4 insertions(+), 152 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index ce8c792cef1a..565d7492a43c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -438,54 +438,6 @@ static void gfx_v11_0_free_microcode(struct amdgpu_device 
*adev)
kfree(adev->gfx.rlc.register_list_format);
 }
 
-static void gfx_v11_0_init_rlc_ext_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_1 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_1 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc_srlc_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_ucode_ver);
-   adev->gfx.rlc_srlc_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_feature_ver);
-   adev->gfx.rlc.save_restore_list_cntl_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_size_bytes);
-   adev->gfx.rlc.save_restore_list_cntl = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_cntl_offset_bytes);
-   adev->gfx.rlc_srlg_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_ucode_ver);
-   adev->gfx.rlc_srlg_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_feature_ver);
-   adev->gfx.rlc.save_restore_list_gpm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_size_bytes);
-   adev->gfx.rlc.save_restore_list_gpm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_gpm_offset_bytes);
-   adev->gfx.rlc_srls_fw_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_ucode_ver);
-   adev->gfx.rlc_srls_feature_version = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_feature_ver);
-   adev->gfx.rlc.save_restore_list_srm_size_bytes = 
le32_to_cpu(rlc_hdr->save_restore_list_srm_size_bytes);
-   adev->gfx.rlc.save_restore_list_srm = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->save_restore_list_srm_offset_bytes);
-   adev->gfx.rlc.reg_list_format_direct_reg_list_length =
-   
le32_to_cpu(rlc_hdr->reg_list_format_direct_reg_list_length);
-}
-
-static void gfx_v11_0_init_rlc_iram_dram_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_2 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_2 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlc.rlc_iram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_size_bytes);
-   adev->gfx.rlc.rlc_iram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_iram_ucode_offset_bytes);
-   adev->gfx.rlc.rlc_dram_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_size_bytes);
-   adev->gfx.rlc.rlc_dram_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlc_dram_ucode_offset_bytes);
-}
-
-static void gfx_v11_0_init_rlcp_rlcv_microcode(struct amdgpu_device *adev)
-{
-   const struct rlc_firmware_header_v2_3 *rlc_hdr;
-
-   rlc_hdr = (const struct rlc_firmware_header_v2_3 
*)adev->gfx.rlc_fw->data;
-   adev->gfx.rlcp_ucode_version = le32_to_cpu(rlc_hdr->rlcp_ucode_version);
-   adev->gfx.rlcp_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcp_ucode_feature_version);
-   adev->gfx.rlc.rlcp_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcp_ucode_size_bytes);
-   adev->gfx.rlc.rlcp_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcp_ucode_offset_bytes);
-
-   adev->gfx.rlcv_ucode_version = le32_to_cpu(rlc_hdr->rlcv_ucode_version);
-   adev->gfx.rlcv_ucode_feature_version = 
le32_to_cpu(rlc_hdr->rlcv_ucode_feature_version);
-   adev->gfx.rlc.rlcv_ucode_size_bytes = 
le32_to_cpu(rlc_hdr->rlcv_ucode_size_bytes);
-   adev->gfx.rlc.rlcv_ucode = (u8 *)rlc_hdr + 
le32_to_cpu(rlc_hdr->rlcv_ucode_offset_bytes);
-}
-
 static int gfx_v11_0_init_microcode(struct amdgpu_device *adev)
 {
char fw_name[40];
@@ -496,8 +448,6 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device 
*adev)
const struct gfx_firmware_header_v1_0 *cp_hdr;
const struct gfx_firmware_header_v2_0 *cp_hdr_v2_0;
const struct rlc_firmware_header_v2_0 *rlc_hdr;
-   unsigned int *tmp = NULL;
-   unsigned int i = 0;
uint16_t version_major;
uint16_t version_minor;
 
@@ -552,58 +502,14 @@ static int gfx_v11_0_init_microcode(struct amdgpu_device 
*adev)
if (err)
goto out;
err = amdgpu_ucode_validate(adev->gfx.rlc_fw);
+   if (err)
+   goto out;
rlc_hdr = (const struct rlc_firmware_header_v2_0 
*)adev->gfx.rlc_fw->data;
version_major = 
le16_to_cpu(rlc_hdr->header.header_version_major);
version_minor = 
le16_to_cpu(rlc_hdr->header.header_version_minor);
-
-   adev->gfx.rlc_fw_version = 
le32_to_cpu(rlc_hdr->header.ucode_version);
-   adev->

Re: [PATCH 00/10] switch to common helper for rlc fw init

2022-09-15 Thread Christian König
Nice cleanup, feel free to add an Acked-by: Christian König 
 to the whole series.


Regards,
Christian.

Am 15.09.22 um 19:01 schrieb Hawking Zhang:

Replace IP specific implementation with common rlc helper
for rlc firmware initialization.

This also helps reduce duplicated code when introduce new
ip block support.

Hawking Zhang (10):
   drm/amdgpu: add helper to init rlc fw in header v2_0
   drm/amdgpu: add helper to init rlc fw in header v2_1
   drm/amdgpu: add helper to init rlc fw in header v2_2
   drm/amdgpu: add helper to init rlc fw in header v2_3
   drm/amdgpu: add helper to init rlc fw in header v2_4
   drm/amdgpu: add helper to init rlc firmware
   drm/amdgpu/gfx8: switch to amdgpu_gfx_rlc_init_microcode
   drm/amdgpu/gfx9: switch to amdgpu_gfx_rlc_init_microcode
   drm/amdgpu/gfx10: switch to amdgpu_gfx_rlc_init_microcode
   drm/amdgpu/gfx11: switch to amdgpu_gfx_rlc_init_microcode

  drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.c | 264 
  drivers/gpu/drm/amd/amdgpu/amdgpu_rlc.h |   4 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c  | 191 +
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c  | 156 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   |  60 +-
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 106 +-
  6 files changed, 289 insertions(+), 492 deletions(-)





Re: [PATCH] drivers/pci: Decrease VF memory BAR size to Save host memory occupied by PTEs:

2022-09-15 Thread Alex Deucher
On Thu, Sep 15, 2022 at 6:34 AM Rui Ma  wrote:
>
> The VRAM space is fixed, as the number of VFs increases, the actual BAR
> memory space used by each VF decreases. However, the BAR memory mapping is
> always based on the initial size of the VRAM. So do not map this unneeded
> memory can reduce the amount of PTEs to save host memory.
>
> Signed-off-by: Rui Ma 
> ---
>  drivers/pci/iov.c| 11 +--
>  drivers/pci/pci.h| 15 +++
>  drivers/pci/quirks.c | 38 ++

Please use scripts/get_maintainer.pl to get the appropriate lists to
send this patch to for review.

Alex

>  3 files changed, 62 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> index 952217572113..d623f46669b6 100644
> --- a/drivers/pci/iov.c
> +++ b/drivers/pci/iov.c
> @@ -295,6 +295,11 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> struct resource *res;
> struct pci_sriov *iov = dev->sriov;
> struct pci_bus *bus;
> +/*
> + * Some SRIOV device's Bar is too large and occupy too much rmap size.
> + * Resize the request resource of VF.
> + */
> +   u16 shift = 1;
>
> bus = virtfn_add_bus(dev->bus, pci_iov_virtfn_bus(dev, id));
> if (!bus)
> @@ -328,8 +333,10 @@ int pci_iov_add_virtfn(struct pci_dev *dev, int id)
> virtfn->resource[i].name = pci_name(virtfn);
> virtfn->resource[i].flags = res->flags;
> size = pci_iov_resource_size(dev, i + PCI_IOV_RESOURCES);
> +   shift = 1;
> +   shift = virtfn_get_shift(dev, iov->num_VFs, i);
> virtfn->resource[i].start = res->start + size * id;
> -   virtfn->resource[i].end = virtfn->resource[i].start + size - 
> 1;
> +   virtfn->resource[i].end = virtfn->resource[i].start + (size 
> >> (shift - 1)) - 1;
> rc = request_resource(res, &virtfn->resource[i]);
> BUG_ON(rc);
> }
> @@ -680,12 +687,12 @@ static int sriov_enable(struct pci_dev *dev, int 
> nr_virtfn)
> msleep(100);
> pci_cfg_access_unlock(dev);
>
> +   iov->num_VFs = nr_virtfn;
> rc = sriov_add_vfs(dev, initial);
> if (rc)
> goto err_pcibios;
>
> kobject_uevent(&dev->dev.kobj, KOBJ_CHANGE);
> -   iov->num_VFs = nr_virtfn;
>
> return 0;
>
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index 3d60cabde1a1..befc67a280eb 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -603,6 +603,21 @@ static inline int pci_dev_specific_reset(struct pci_dev 
> *dev, bool probe)
>  }
>  #endif
>
> +struct virtfn_get_shift_methods {
> +   u16 vendor;
> +   u16 device;
> +   u16 (*get_shift)(struct pci_dev *dev, u16 arg, int arg2);
> +};
> +
> +#ifdef CONFIG_PCI_QUIRKS
> +u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2);
> +#else
> +static inline u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2)
> +{
> +   return (u16)1;
> +}
> +#endif
> +
>  #if defined(CONFIG_PCI_QUIRKS) && defined(CONFIG_ARM64)
>  int acpi_get_rc_resources(struct device *dev, const char *hid, u16 segment,
>   struct resource *res);
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index da829274fc66..82502c5923e4 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -4085,6 +4085,44 @@ int pci_dev_specific_reset(struct pci_dev *dev, bool 
> probe)
> return -ENOTTY;
>  }
>
> +static u16 divided_by_VF(struct pci_dev *dev, u16 num_VFs, int bar_num)
> +{
> +   u16 shift = 1;
> +
> +   if (bar_num == 0) {
> +   while ((1 << shift) <= num_VFs)
> +   shift += 1;
> +   }
> +   pci_info(dev, "Bar %d get shift: %d.\n", bar_num, shift);
> +   return shift;
> +}
> +
> +static const struct virtfn_get_shift_methods virtfn_get_shift_methods[] = {
> +   { PCI_VENDOR_ID_ATI, 0x73a1, divided_by_VF},
> +   { 0 }
> +};
> +
> +/*
> + * Get shift num to calculate SRIOV device bar.
> + * Sometimes the bar size for SRIOV device is too large
> + * and we want to calculate the size to define the end
> + * of virtfn.
> + */
> +u16 virtfn_get_shift(struct pci_dev *dev, u16 arg1, int arg2)
> +{
> +   const struct virtfn_get_shift_methods *i;
> +
> +   for (i = virtfn_get_shift_methods; i->get_shift; i++) {
> +   if ((i->vendor == dev->vendor ||
> +i->vendor == (u16)PCI_ANY_ID) &&
> +   (i->device == dev->device ||
> +i->device == (u16)PCI_ANY_ID))
> +   return i->get_shift(dev, arg1, arg2);
> +   }
> +
> +   return (u16)1;
> +}
> +
>  static void quirk_dma_func0_alias(struct pci_dev *dev)
>  {
> if (PCI_FUNC(dev->devfn) != 0)
> --
> 2.25.1
>


Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky



On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a lot of 
containers submitting gfx jobs. We have advanced tdr mode and mode2 reset 
enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx pending list 
maybe signaled after drm_sched_stop. So they will not be removed from pending 
list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be rerun and 
removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be resubmitted. 
Since it still has signaled bit, drm_sched_job_done will be called directly. 
This decrease the hw_rq_count which allows more jobs emitted but did not clean 
fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use num_fences_mask 
in amdgpu_fence_process, when overflow happens, the signal of some job will be 
skipped which result in an infinite wait for the fence_drv rcu ptr.

So close irq before sched_stop could avoid signal jobs after drm_sched_stop. 
And signal job one by one in fence_process instead of using a mask will handle 
the overflow situation.

Another fix could be skip submitting jobs which already signaled during 
resubmit stage, which may look cleaner.

Please help give some advice.



How about the code bellow  instead ? The real problem is that we reuse a 
dma fence twice which is not according to fma fence design, so maybe 
this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+
    /* TO be inline with external fence creation and other 
drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; amd-gfx@lists.freedesktop.org; Grodzovsky, 
Andrey 
Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the sequence. Please 
help give some comments.


Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs from 
another ring (e.g. gfx) may continue signaling during drm_sched_stop stage. The 
signal bit will not be cleared.

At the resubmit stage after recovery, the job with hw fence signaled bit set 
will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not cleared yet.

Then overflow happens in the fence driver slots and some jobs may be skipped 
and leave the rcu pointer not cleared which makes an infinite wait for the slot 
on the next fence emitted.

This infinite wait cause a job timeout on the emitting job. And driver will 
stuck at the its sched stop step because kthread_park cannot be done.

[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  6 +-
   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct amdgpu_device 
*adev,
amdgpu_virt_fini_data_exchange(adev);
}
   
-	amdgpu_fence_driver_isr_toggle(adev, true);

-
/* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 +5212,8 
@@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
  amdgpu_device_ip_need_full_reset(tmp_adev))
 

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky

Had a typo - see bellow

On 2022-09-15 14:29, Andrey Grodzovsky wrote:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a 
lot of containers submitting gfx jobs. We have advanced tdr mode and 
mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not be 
removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be 
rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done will 
be called directly. This decrease the hw_rq_count which allows more 
jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead of 
using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we reuse 
a dma fence twice which is not according to fma fence design, so maybe 
this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags);
+
    /* TO be inline with external fence creation and other 
drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs 
from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence signaled 
bit set will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may be 
skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 +-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

   amdgpu_virt_fini_data_exchange(adev);
   }
   -    amdgpu_fence_driver_isr_toggle(adev, true);
-
   /* block all schedulers and reset given job's ring */
   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
   struct amdgpu_ring *ring = adev->rings[i]; @@ -5214,6 
+5212,8 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,

  

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Christian König

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a 
lot of containers submitting gfx jobs. We have advanced tdr mode and 
mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not be 
removed from pending list but have the DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be 
rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done will 
be called directly. This decrease the hw_rq_count which allows more 
jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead of 
using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we reuse 
a dma fence twice which is not according to fma fence design, so maybe 
this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a massive 
no-go.


Christian.



    /* TO be inline with external fence creation and other 
drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), jobs 
from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence signaled 
bit set will call job done directly instead go through fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may be 
skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 +---  
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 +-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

   amdgpu_virt_fini_data_exchange(adev);
   }
   -    amdgpu_fence_driver_isr_toggle(adev, true);
-
   /* block all schedulers and reset given job's ring */
   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
   struct

Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Andrey Grodzovsky



On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running a 
lot of containers submitting gfx jobs. We have advanced tdr mode and 
mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not 
be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will be 
rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which allows 
more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead 
of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence design, 
so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a massive 
no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge hack 
, no ?


Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may 
be skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
+---  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 
+-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(struct 
amdgpu_device *adev,

   amdgpu_virt_fini_data_exchange(adev);
   }
   -    amdgpu_fence_driver_isr_to

Re: [PATCH V3 46/47] drm/amd/display: Fix failures of disabling primary plans

2022-09-15 Thread Rodrigo Siqueira Jordao

Hi Michel,

First of all, thanks a lot for your review. I want to take this 
opportunity to discuss this topic in more depth and learn more from you 
and others.


+(Nick, Leo, Daniel, Mark, Dave, Sean, Simon)

On 2022-09-15 04:55, Michel Dänzer wrote:

On 2022-09-14 22:08, Alex Hung wrote:

On 2022-09-14 10:55, Michel Dänzer wrote:

On 2022-09-14 18:30, Alex Hung wrote:

On 2022-09-14 07:40, Michel Dänzer wrote:

On 2022-09-14 15:31, Michel Dänzer wrote:

On 2022-09-14 07:10, Wayne Lin wrote:

From: Alex Hung 

[Why & How]
This fixes kernel errors when IGT disables primary planes during the
tests kms_universal_plane::functional_test_pipe/pageflip_test_pipe.


NAK.

This essentially reverts commit b836a274b797 ("drm/amdgpu/dc: Require primary plane 
to be enabled whenever the CRTC is") (except that it goes even further and 
completely removes the requirement for any HW plane to be enabled when the HW cursor is), 
so it would reintroduce the issues described in that commit log.


Actually not exactly the same issues, due to going even further than reverting 
my fix.

Instead, the driver will claim that an atomic commit which enables the CRTC and 
the cursor plane, while leaving all other KMS planes disabled, succeeds. But 
the HW cursor will not actually be visible.


I did not observe problems with cursors (GNOME 42.4 - desktop and youtube/mpv 
video playback: windowed/fullscreen). Are there steps to reproduce cursor 
problems?


As described in my last follow-up e-mail: An atomic KMS commit which enables 
the CRTC and the cursor plane, but disables all other KMS planes for the CRTC. 
The commit will succeed, but will not result in the HW cursor being actually 
visible. (I don't know of any specific application or test which exercises this)


Did you mean cursor plane depends on primary plane (i.e. no primary plane = no 
visible HW cursor)?


Sort of. I understand the HW cursor isn't an actual separate plane in AMD HW. 
Instead, the HW cursor can be displayed as part of any other HW plane. This 
means that the HW cursor can only be visible if any other plane is enabled.


The commit that you mentioned (b836a274b797) was created to address some 
issues reported by the user. Please, correct me if I'm wrong, but the 
original issue could be reproduced by following these steps on Gnome 
with Ellesmere:


1. Lock the screen and wait for suspending;
2. Wake up the system a few minutes later;
3. See two cursors, one that can be used and another one that is not 
working.


I tried to reproduce this issue using an Ellesmere board (+this 
patchset), and Daniel has tested it in multiple ASICs; we never repro 
that issue (Gnome and ChromeOS). It is not evident to me why we cannot 
reproduce this problem. Do you have some suggestions? If we find a case 
showing this bug, we can add it as part of our tests.


I feel that the commit b836a274b797 is not directly related to that 
specific bug. I mean, it might make sense to have it, but for other reasons.





If there is no primary plane, what scenario would it be required to draw a 
cursor?

If this is a rare case, would it still be a concern?


Yes. In the KMS API, the cursor plane is like any other plane. A Wayland compositor or 
other user space may legitimately try to display something (not necessarily a 
"cursor") using only the cursor plane. The driver must accurately signal that 
this cannot work. The established way to do so (if a bit indirectly) is to require the 
primary plane to be enabled whenever the CRTC is.


Is there any real case for this scenario? Is this scenario strong enough 
to say that a driver does not support CRTC enabled without planes?






Also see the commit log of bc92c06525e5 ("drm/amd/display: Allow commits with no 
planes active").


Does it mean dm_crtc_helper_atomic_check does not need to return -EINVAL if 
there is no active cursor because there are no cursors to be shown anyways, 
[...]


This was considered in the review discussion for b836a274b797 ("drm/amdgpu/dc: 
Require primary plane to be enabled whenever the CRTC is"), see 
https://patchwork.freedesktop.org/patch/387230/ .

TL;DR: There were already other KMS drivers requiring the primary plane to be enabled whenever the CRTC is, and there's a special case for that in atomic_remove_fb. 


iirc, this requiring is only available in drm_simple_kms_helper, and 
drivers under the tiny folder are the only ones using it.



Adding another special case for the cursor plane would make things much more 
complicated for common DRM code and user space (and possibly even introduce 
issues which cannot be solved at all).



If IGT tests disable the primary plane while leaving the CRTC enabled, those 
tests are broken and need to be fixed instead.


There are IGT cursor tests fixed by this:

    igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions
    igt@kms_cursor_legacy@flip-vs-cursor@atomic-transitions-varying-size

It also reduces amdgpu workaround in IGT's kms_concurrent:
    https

[PATCH] drm/amdgpu: add MES and MES-KIQ version in debugfs

2022-09-15 Thread Yifan Zhang
This patch addes MES and MES-KIQ version in debugfs.

Signed-off-by: Yifan Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 24 
 include/uapi/drm/amdgpu_drm.h   |  4 
 2 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 1369c25448dc..bb0ed358909f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -328,6 +328,14 @@ static int amdgpu_firmware_info(struct 
drm_amdgpu_info_firmware *fw_info,
fw_info->ver = adev->psp.cap_fw_version;
fw_info->feature = adev->psp.cap_feature_version;
break;
+   case AMDGPU_INFO_FW_MES_KIQ:
+   fw_info->ver = adev->mes.ucode_fw_version[0];
+   fw_info->feature = 0;
+   break;
+   case AMDGPU_INFO_FW_MES:
+   fw_info->ver = adev->mes.ucode_fw_version[1];
+   fw_info->feature = 0;
+   break;
default:
return -EINVAL;
}
@@ -1581,6 +1589,22 @@ static int amdgpu_debugfs_firmware_info_show(struct 
seq_file *m, void *unused)
fw_info.feature, fw_info.ver);
}
 
+   /* MES_KIQ */
+   query_fw.fw_type = AMDGPU_INFO_FW_MES_KIQ;
+   ret = amdgpu_firmware_info(&fw_info, &query_fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "MES_KIQ feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
+   /* MES */
+   query_fw.fw_type = AMDGPU_INFO_FW_MES;
+   ret = amdgpu_firmware_info(&fw_info, &query_fw, adev);
+   if (ret)
+   return ret;
+   seq_printf(m, "MES feature version: %u, firmware version: 0x%08x\n",
+  fw_info.feature, fw_info.ver);
+
seq_printf(m, "VBIOS version: %s\n", ctx->vbios_version);
 
return 0;
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index c2c9c674a223..12fdf62730b8 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -755,6 +755,10 @@ struct drm_amdgpu_cs_chunk_data {
#define AMDGPU_INFO_FW_TOC  0x15
/* Subquery id: Query CAP firmware version */
#define AMDGPU_INFO_FW_CAP  0x16
+   /* Subquery id: Query MES_KIQ firmware version */
+   #define AMDGPU_INFO_FW_MES_KIQ  0x17
+   /* Subquery id: Query MES firmware version */
+   #define AMDGPU_INFO_FW_MES  0x18
 
 /* number of bytes moved for TTM migration */
 #define AMDGPU_INFO_NUM_BYTES_MOVED0x0f
-- 
2.37.3



Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

2022-09-15 Thread Christian König

Am 15.09.22 um 22:37 schrieb Andrey Grodzovsky:


On 2022-09-15 15:26, Christian König wrote:

Am 15.09.22 um 20:29 schrieb Andrey Grodzovsky:


On 2022-09-15 06:09, Zhao, Victor wrote:

[AMD Official Use Only - General]

Hi Christian,

The test sequence is executing a compute engine hang while running 
a lot of containers submitting gfx jobs. We have advanced tdr mode 
and mode2 reset enabled on driver.
When a compute hang job timeout happens, the 2 jobs on the gfx 
pending list maybe signaled after drm_sched_stop. So they will not 
be removed from pending list but have the 
DMA_FENCE_FLAG_SIGNALED_BIT set.
At the amdgpu_device_recheck_guilty_jobs step, the first job will 
be rerun and removed from pending list.
At the resubmit setp, the second job (with signaled bit) will be 
resubmitted. Since it still has signaled bit, drm_sched_job_done 
will be called directly. This decrease the hw_rq_count which allows 
more jobs emitted but did not clean fence_drv rcu ptr.
This results in an overflow in the fence_drv. Since we will use 
num_fences_mask in amdgpu_fence_process, when overflow happens, the 
signal of some job will be skipped which result in an infinite wait 
for the fence_drv rcu ptr.


So close irq before sched_stop could avoid signal jobs after 
drm_sched_stop. And signal job one by one in fence_process instead 
of using a mask will handle the overflow situation.


Another fix could be skip submitting jobs which already signaled 
during resubmit stage, which may look cleaner.


Please help give some advice.



How about the code bellow  instead ? The real problem is that we 
reuse a dma fence twice which is not according to fma fence design, 
so maybe this can help ?



diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c

index 8adeb7469f1e..033f0ae16784 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -164,6 +164,10 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, 
struct dma_fence **f, struct amd

    if (job && job->job_run_counter) {
    /* reinit seq for resubmitted jobs */
    fence->seqno = seq;
+
+   /* For resubmitted job clear the singled bit */
+   celar_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags);
+


Upstream will pretty much kill you for that.

Re-setting a fence from a signaled to an unsignaled state is a 
massive no-go.


Christian.



Is it worse then doing fence->seqno = seq; ? This is already a huge 
hack , no ?


No, it's as equally bad. I don't think we can do either.

Christian.



Andrey






    /* TO be inline with external fence creation and 
other drivers */

    dma_fence_get(fence);
    } else {


Andrey





Thanks,
Victor



-Original Message-
From: Koenig, Christian 
Sent: Thursday, September 15, 2022 2:32 PM
To: Zhao, Victor ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 


Cc: Deng, Emily 
Subject: Re: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow



Am 15.09.22 um 06:02 schrieb Zhao, Victor:

[AMD Official Use Only - General]

Ping.

Hi @Koenig, Christian and @Grodzovsky, Andrey,

We found some reset related issues during stress test on the 
sequence. Please help give some comments.



Thanks,
Victor



-Original Message-
From: Victor Zhao 
Sent: Wednesday, September 14, 2022 6:10 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deng, Emily ; Grodzovsky, Andrey
; Zhao, Victor 
Subject: [PATCH 1/2] drm/amdgpu: fix deadlock caused by overflow

[background]
For a gpu recovery caused by a hang on one ring (e.g. compute), 
jobs from another ring (e.g. gfx) may continue signaling during 
drm_sched_stop stage. The signal bit will not be cleared.


At the resubmit stage after recovery, the job with hw fence 
signaled bit set will call job done directly instead go through 
fence process.
This makes the hw_rq_count decrease but rcu fence pointer not 
cleared yet.


Then overflow happens in the fence driver slots and some jobs may 
be skipped and leave the rcu pointer not cleared which makes an 
infinite wait for the slot on the next fence emitted.


This infinite wait cause a job timeout on the emitting job. And 
driver will stuck at the its sched stop step because kthread_park 
cannot be done.


[how]
1. move amdgpu_fence_driver_isr_toggle earlier to close interrupt
before drm sched stop 2. handle all fences in fence process to aviod
skip when overflow happens

Signed-off-by: Victor Zhao 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 16 
+--- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c |  6 
+-

   2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 943c9e750575..c0cfae52f12b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4610,8 +4610,6 @@ int amdgpu_device_pre_asic_reset(str

RE: [PATCH 0/4] support query rlcv/rlcp firmware version

2022-09-15 Thread Xu, Feifei
[AMD Official Use Only - General]



Series is Reviewed-by: Feifei Xu 

-Original Message-
From: amd-gfx  On Behalf Of Hawking Zhang
Sent: Friday, September 16, 2022 1:00 AM
To: amd-gfx@lists.freedesktop.org; Gao, Likun ; Deucher, 
Alexander 
Cc: Zhang, Hawking 
Subject: [PATCH 0/4] support query rlcv/rlcp firmware version

To allow query rlcv/rlcp firmware verion info

Hawking Zhang (4):
  drm/amdgpu: save rlcv/rlcp ucode version in amdgpu_gfx
  drm/amdgpu: support print rlc v2_x ucode hdr
  drm/amdgpu: add two new subquery ids
  drm/amdgpu: add rlcv/rlcp version info to debugfs

 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h   |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  24 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 168 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h |   4 +
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c|   5 +
 include/uapi/drm/amdgpu_drm.h |   4 +
 6 files changed, 159 insertions(+), 50 deletions(-)

-- 
2.17.1