Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-28 Thread Michel Dänzer
On 2022-09-28 16:46, Christian König wrote:
> Am 28.09.22 um 15:52 schrieb Michel Dänzer:
>> On 2022-09-28 03:01, Zhu, Jiadong wrote:>
>>> Please make sure umd is calling the libdrm function to create context with 
>>> different priories,
>>> amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, 
>>> _handle).
>> Yes, I double-checked that, and that it returns success.
>>
>>
>>> Here is the behavior we could see:
>>> 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in 
>>> previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr 
>>> to see if it is being used.
>>> 2. MCBP happens while the two different priority ibs are submitted at the 
>>> same time. We could check fence info as below:
>>> Last signaled trailing fence++  when the mcbp triggers by kmd. Last 
>>> preempted may not increase as the mcbp is not triggered from CP.
>>>
>>> --- ring 0 (gfx) ---
>>> Last signaled fence  0x0001
>>> Last emitted 0x0001
>>> Last signaled trailing fence 0x0022eb84
>>> Last emitted 0x0022eb84
>>> Last preempted   0x
>>> Last reset   0x
>> I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled 
>> trailing fence" line is changing here (seems to always match the "Last 
>> emitted" line).
>>
>> However, mutter's frame rate still cannot exceed that of GPU-limited 
>> clients. BTW, you can test with a GNOME Wayland session, even without my MR 
>> referenced below. Preemption will just be less effective without that MR. 
>> mutter has used a high priority context when possible for a long time.
>>
>> I'm also seeing intermittent freezes, where not even the mouse cursor moves 
>> for up to around one second, e.g. when interacting with the GNOME top bar. 
>> I'm not sure yet if these are related to this patch series, but I never 
>> noticed it before. I wonder if the freezes might occur when GPU preemption 
>> is attempted.
> 
> Keep in mind that this doesn't have the same fine granularity as the separate 
> hw ring buffer found on gfx10.
> 
> With MCBP we can only preempt on draw command boundary, while the separate hw 
> ring solution can preempt as soon as a CU is available.

Right, but so far I haven't noticed any positive effect. That and the 
intermittent freezes indicate the MCBP based preemption isn't actually working 
as intended yet.


>>> From: Koenig, Christian 
>>>
 This work is solely for gfx9 (e.g. Vega) and older.

 Navi has a completely separate high priority gfx queue we can use for this.
>> Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was 
>> for Sienna Cichlid only, and turned out to be unstable, so it had to 
>> reverted.
>>
>> It would be nice to make the SW ring solution take effect by default 
>> whenever there is no separate high priority HW gfx queue available (and any 
>> other requirements are met).
> 
> I don't think that this will be a good idea. The hw ring buffer or even hw 
> scheduler have much nicer properties and we should focus on getting that 
> working when available.

Of course, the HW features should have priority. I mean as a fallback when the 
HW features effectively aren't available (which is currently always the case 
with amdgpu, even when the GPU has the HW features).


>>> Am 27.09.22 um 19:49 schrieb Michel Dänzer:
 On 2022-09-27 08:06, Christian König wrote:
> Hey Michel,
>
> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
> older hw generations by using mid command buffer preemption.
 Yeah, I've been keeping an eye on these patches. I'm looking forward to 
 this working.


> I know that you have been working on Gnome Mutter to make use from 
> userspace for this. Do you have time to run some tests with that?
 I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel 
 command line, then with it, since I wasn't sure if it's needed) with 
 https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
  on Navi 14.

 Unfortunately, I'm not seeing any change in behaviour. Even though mutter 
 uses a high priority context via the EGL_IMG_context_priority extension, 
 it's unable to reach a higher frame rate than a GPU-limited client[0]. The 
 "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains 
 at 0x.

 Did I miss a step?


 [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel 
 iGPU, mutter can achieve a higher frame rate 

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-28 Thread Christian König

Am 28.09.22 um 15:52 schrieb Michel Dänzer:

On 2022-09-28 03:01, Zhu, Jiadong wrote:>

Please make sure umd is calling the libdrm function to create context with 
different priories,
amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, _handle).

Yes, I double-checked that, and that it returns success.



Here is the behavior we could see:
1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous 
patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it 
is being used.
2. MCBP happens while the two different priority ibs are submitted at the same 
time. We could check fence info as below:
Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted 
may not increase as the mcbp is not triggered from CP.

--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x0022eb84
Last emitted 0x0022eb84
Last preempted   0x
Last reset   0x

I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled trailing fence" 
line is changing here (seems to always match the "Last emitted" line).

However, mutter's frame rate still cannot exceed that of GPU-limited clients. 
BTW, you can test with a GNOME Wayland session, even without my MR referenced 
below. Preemption will just be less effective without that MR. mutter has used 
a high priority context when possible for a long time.

I'm also seeing intermittent freezes, where not even the mouse cursor moves for 
up to around one second, e.g. when interacting with the GNOME top bar. I'm not 
sure yet if these are related to this patch series, but I never noticed it 
before. I wonder if the freezes might occur when GPU preemption is attempted.


Keep in mind that this doesn't have the same fine granularity as the 
separate hw ring buffer found on gfx10.


With MCBP we can only preempt on draw command boundary, while the 
separate hw ring solution can preempt as soon as a CU is available.



From: Koenig, Christian 


This work is solely for gfx9 (e.g. Vega) and older.

Navi has a completely separate high priority gfx queue we can use for this.

Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for 
Sienna Cichlid only, and turned out to be unstable, so it had to reverted.

It would be nice to make the SW ring solution take effect by default whenever 
there is no separate high priority HW gfx queue available (and any other 
requirements are met).


I don't think that this will be a good idea. The hw ring buffer or even 
hw scheduler have much nicer properties and we should focus on getting 
that working when available.


Regards,
Christian.





Am 27.09.22 um 19:49 schrieb Michel Dänzer:

On 2022-09-27 08:06, Christian König wrote:

Hey Michel,

JIadong is working on exposing high/low priority gfx queues for gfx9 and older 
hw generations by using mid command buffer preemption.

Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
working.



I know that you have been working on Gnome Mutter to make use from userspace 
for this. Do you have time to run some tests with that?

I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then 
with it, since I wasn't sure if it's needed) with 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
 on Navi 14.

Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high 
priority context via the EGL_IMG_context_priority extension, it's unable to reach a 
higher frame rate than a GPU-limited client[0]. The "Last preempted" line of 
/sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x.

Did I miss a step?


[0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
mutter can achieve a higher frame rate than plot3d, though not than pixmark piano 
(presumably due to limited GPU preemption granularity).






Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-28 Thread Michel Dänzer
On 2022-09-28 03:01, Zhu, Jiadong wrote:> 
> Please make sure umd is calling the libdrm function to create context with 
> different priories,
> amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, 
> _handle).

Yes, I double-checked that, and that it returns success.


> Here is the behavior we could see:
> 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in 
> previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr 
> to see if it is being used.
> 2. MCBP happens while the two different priority ibs are submitted at the 
> same time. We could check fence info as below:
> Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted 
> may not increase as the mcbp is not triggered from CP.
> 
> --- ring 0 (gfx) ---
> Last signaled fence  0x0001
> Last emitted 0x0001
> Last signaled trailing fence 0x0022eb84
> Last emitted 0x0022eb84
> Last preempted   0x
> Last reset   0x

I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled 
trailing fence" line is changing here (seems to always match the "Last emitted" 
line).

However, mutter's frame rate still cannot exceed that of GPU-limited clients. 
BTW, you can test with a GNOME Wayland session, even without my MR referenced 
below. Preemption will just be less effective without that MR. mutter has used 
a high priority context when possible for a long time.

I'm also seeing intermittent freezes, where not even the mouse cursor moves for 
up to around one second, e.g. when interacting with the GNOME top bar. I'm not 
sure yet if these are related to this patch series, but I never noticed it 
before. I wonder if the freezes might occur when GPU preemption is attempted.


> From: Koenig, Christian 
> 
> > This work is solely for gfx9 (e.g. Vega) and older.
> > 
> > Navi has a completely separate high priority gfx queue we can use for this.

Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for 
Sienna Cichlid only, and turned out to be unstable, so it had to reverted.

It would be nice to make the SW ring solution take effect by default whenever 
there is no separate high priority HW gfx queue available (and any other 
requirements are met).


> Am 27.09.22 um 19:49 schrieb Michel Dänzer:
>> On 2022-09-27 08:06, Christian König wrote:
>>> Hey Michel,
>>>
>>> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
>>> older hw generations by using mid command buffer preemption.
>> Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
>> working.
>>
>>
>>> I know that you have been working on Gnome Mutter to make use from 
>>> userspace for this. Do you have time to run some tests with that?
>> I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel 
>> command line, then with it, since I wasn't sure if it's needed) with 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
>>  on Navi 14.
>>
>> Unfortunately, I'm not seeing any change in behaviour. Even though mutter 
>> uses a high priority context via the EGL_IMG_context_priority extension, 
>> it's unable to reach a higher frame rate than a GPU-limited client[0]. The 
>> "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains 
>> at 0x.
>>
>> Did I miss a step?
>>
>>
>> [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel 
>> iGPU, mutter can achieve a higher frame rate than plot3d, though not than 
>> pixmark piano (presumably due to limited GPU preemption granularity).


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer




RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-27 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

Please make sure umd is calling the libdrm function to create context with 
different priories,
amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, _handle).

Here is the behavior we could see:
1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous 
patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it 
is being used.
2. MCBP happens while the two different priority ibs are submitted at the same 
time. We could check fence info as below:
Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted 
may not increase as the mcbp is not triggered from CP.

--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x0022eb84
Last emitted 0x0022eb84
Last preempted   0x
Last reset   0x

Thanks,
Jiadong
-Original Message-
From: Koenig, Christian 
Sent: Wednesday, September 28, 2022 2:20 AM
To: Michel Dänzer 
Cc: Grodzovsky, Andrey ; Tuikov, Luben 
; Zhu, Jiadong ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

This work is solely for gfx9 (e.g. Vega) and older.

Navi has a completely separate high priority gfx queue we can use for this.

Thanks,
Christian.

Am 27.09.22 um 19:49 schrieb Michel Dänzer:
> On 2022-09-27 08:06, Christian König wrote:
>> Hey Michel,
>>
>> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
>> older hw generations by using mid command buffer preemption.
> Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
> working.
>
>
>> I know that you have been working on Gnome Mutter to make use from userspace 
>> for this. Do you have time to run some tests with that?
> I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel 
> command line, then with it, since I wasn't sure if it's needed) with 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
>  on Navi 14.
>
> Unfortunately, I'm not seeing any change in behaviour. Even though mutter 
> uses a high priority context via the EGL_IMG_context_priority extension, it's 
> unable to reach a higher frame rate than a GPU-limited client[0]. The "Last 
> preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 
> 0x.
>
> Did I miss a step?
>
>
> [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
> mutter can achieve a higher frame rate than plot3d, though not than pixmark 
> piano (presumably due to limited GPU preemption granularity).
>
>> Am 27.09.22 um 05:18 schrieb Zhu, Jiadong:
>>> [AMD Official Use Only - General]
>>>
>>>> I need more time for an in deep review of this, but form the one mile high 
>>>> view it looks correct to me now.
>>>> Can we do some pre-commit qa testing with this?
>>> I changed drm test "Command submission Test (GFX)" to send high
>>> priority ibs meanwhile running Manhattan on Screen/Unigine heaven
>>> foreground, checking mcbp/resubmit triggered by cat
>>> /sys/kernel/debug/dri/0/amdgpu_fence_info
>>>
>>> I have continued running this scenario for 2 daytime and 1 night, no hangs 
>>> happen yet(lots of hangs has been fixed in the previous patches).
>>>
>>> I will ask QA team to do more test.
>>>
>>> Thanks,
>>> JIadong
>>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, September 26, 2022 2:49 PM
>>> To: Zhu, Jiadong ;
>>> amd-gfx@lists.freedesktop.org
>>> Cc: Tuikov, Luben ; Koenig, Christian
>>> ; Grodzovsky, Andrey
>>> 
>>> Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler
>>> (v6)
>>>
>>> Caution: This message originated from an External Source. Use proper 
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
>>>> From: "Jiadong.Zhu" 
>>>>
>>>> Trigger Mid-Command Buffer Preemption according to the priority of
>>>> the software rings and the hw fence signalling condition.
>>>

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-27 Thread Christian König

This work is solely for gfx9 (e.g. Vega) and older.

Navi has a completely separate high priority gfx queue we can use for this.

Thanks,
Christian.

Am 27.09.22 um 19:49 schrieb Michel Dänzer:

On 2022-09-27 08:06, Christian König wrote:

Hey Michel,

JIadong is working on exposing high/low priority gfx queues for gfx9 and older 
hw generations by using mid command buffer preemption.

Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
working.



I know that you have been working on Gnome Mutter to make use from userspace 
for this. Do you have time to run some tests with that?

I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then 
with it, since I wasn't sure if it's needed) with 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
 on Navi 14.

Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high 
priority context via the EGL_IMG_context_priority extension, it's unable to reach a 
higher frame rate than a GPU-limited client[0]. The "Last preempted" line of 
/sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x.

Did I miss a step?


[0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
mutter can achieve a higher frame rate than plot3d, though not than pixmark piano 
(presumably due to limited GPU preemption granularity).


Am 27.09.22 um 05:18 schrieb Zhu, Jiadong:

[AMD Official Use Only - General]


I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.
Can we do some pre-commit qa testing with this?

I changed drm test "Command submission Test (GFX)" to send high priority ibs 
meanwhile running Manhattan on Screen/Unigine heaven foreground, checking mcbp/resubmit 
triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info

I have continued running this scenario for 2 daytime and 1 night, no hangs 
happen yet(lots of hangs has been fixed in the previous patches).

I will ask QA team to do more test.

Thanks,
JIadong

-Original Message-
From: Christian König 
Sent: Monday, September 26, 2022 2:49 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the
software rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the
software ring together with the fence sequence number in its fifo
queue, and pops out those records when the fences are signalled. The
locations are used to resubmit packages in preemption scenarios by coping the 
chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Acked-by: Luben Tuikov 
Signed-off-by: Jiadong.Zhu 

I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.

Can we do some pre-commit qa testing with this?

Thanks,
Christian.


---
    drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
    drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
    drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    |  10 +-
    8 files changed, 368 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
    }
    }

+ amdgpu_ring_ib_begin(ring);
    if (job && ring->funcs->init_cond_exec)
    patch_offset = amdgpu_ring_init_cond_exec(ring);

@@ -285,6 +286,7 @@ int amdgpu_ib_sche

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-27 Thread Michel Dänzer
On 2022-09-27 08:06, Christian König wrote:
> Hey Michel,
> 
> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
> older hw generations by using mid command buffer preemption.

Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
working.


> I know that you have been working on Gnome Mutter to make use from userspace 
> for this. Do you have time to run some tests with that?

I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command 
line, then with it, since I wasn't sure if it's needed) with 
https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 on Navi 14.

Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses 
a high priority context via the EGL_IMG_context_priority extension, it's unable 
to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" 
line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x.

Did I miss a step?


[0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
mutter can achieve a higher frame rate than plot3d, though not than pixmark 
piano (presumably due to limited GPU preemption granularity).

> Am 27.09.22 um 05:18 schrieb Zhu, Jiadong:
>> [AMD Official Use Only - General]
>>
>>> I need more time for an in deep review of this, but form the one mile high 
>>> view it looks correct to me now.
>>> Can we do some pre-commit qa testing with this?
>> I changed drm test "Command submission Test (GFX)" to send high priority ibs 
>> meanwhile running Manhattan on Screen/Unigine heaven foreground, checking 
>> mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info
>>
>> I have continued running this scenario for 2 daytime and 1 night, no hangs 
>> happen yet(lots of hangs has been fixed in the previous patches).
>>
>> I will ask QA team to do more test.
>>
>> Thanks,
>> JIadong
>>
>> -Original Message-
>> From: Christian König 
>> Sent: Monday, September 26, 2022 2:49 PM
>> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
>> Cc: Tuikov, Luben ; Koenig, Christian 
>> ; Grodzovsky, Andrey 
>> Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
>>
>> Caution: This message originated from an External Source. Use proper caution 
>> when opening attachments, clicking links, or responding.
>>
>>
>> Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
>>> From: "Jiadong.Zhu" 
>>>
>>> Trigger Mid-Command Buffer Preemption according to the priority of the
>>> software rings and the hw fence signalling condition.
>>>
>>> The muxer saves the locations of the indirect buffer frames from the
>>> software ring together with the fence sequence number in its fifo
>>> queue, and pops out those records when the fences are signalled. The
>>> locations are used to resubmit packages in preemption scenarios by coping 
>>> the chunks from the software ring.
>>>
>>> v2: Update comment style.
>>> v3: Fix conflict caused by previous modifications.
>>> v4: Remove unnecessary prints.
>>> v5: Fix corner cases for resubmission cases.
>>> v6: Refactor functions for resubmission, calling fence_process in irq 
>>> handler.
>>>
>>> Cc: Christian Koenig 
>>> Cc: Luben Tuikov 
>>> Cc: Andrey Grodzovsky 
>>> Acked-by: Luben Tuikov 
>>> Signed-off-by: Jiadong.Zhu 
>> I need more time for an in deep review of this, but form the one mile high 
>> view it looks correct to me now.
>>
>> Can we do some pre-commit qa testing with this?
>>
>> Thanks,
>> Christian.
>>
>>> ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>>>    drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c    |  10 +-
>>>    8 files changed, 368 insertions(+), 41 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> index 258cffe3c06a..af86d87e2f3b 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>>> @@ -211,6 +211,7 @@ 

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-27 Thread Christian König

Hey Michel,

JIadong is working on exposing high/low priority gfx queues for gfx9 and 
older hw generations by using mid command buffer preemption.


I know that you have been working on Gnome Mutter to make use from 
userspace for this. Do you have time to run some tests with that?


Thanks,
Christian.

Am 27.09.22 um 05:18 schrieb Zhu, Jiadong:

[AMD Official Use Only - General]


I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.
Can we do some pre-commit qa testing with this?

I changed drm test "Command submission Test (GFX)" to send high priority ibs 
meanwhile running Manhattan on Screen/Unigine heaven foreground, checking mcbp/resubmit 
triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info

I have continued running this scenario for 2 daytime and 1 night, no hangs 
happen yet(lots of hangs has been fixed in the previous patches).

I will ask QA team to do more test.

Thanks,
JIadong

-Original Message-
From: Christian König 
Sent: Monday, September 26, 2022 2:49 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the
software rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the
software ring together with the fence sequence number in its fifo
queue, and pops out those records when the fences are signalled. The
locations are used to resubmit packages in preemption scenarios by coping the 
chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Acked-by: Luben Tuikov 
Signed-off-by: Jiadong.Zhu 

I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.

Can we do some pre-commit qa testing with this?

Thanks,
Christian.


---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
   8 files changed, 368 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
   }
   }

+ amdgpu_ring_ib_begin(ring);
   if (job && ring->funcs->init_cond_exec)
   patch_offset = amdgpu_ring_init_cond_exec(ring);

@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
   ring->funcs->emit_wave_limit(ring, false);

+ amdgpu_ring_ib_end(ring);
   amdgpu_ring_commit(ring);
   return 0;
   }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 13db99d653bd..84b0b3c7d40f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -33,6 +33,7 @@

   #include 
   #include "amdgpu.h"
+#include "amdgpu_sw_ring.h"
   #include "atom.h"

   /*
@@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring
*ring)

   return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );
   }
+
+void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) {
+ if (ring->is_sw_ring)
+ amdgpu_sw_ring_ib_begin(ring); }
+
+void amdgpu_ring_ib_end(struct amdgpu_ring *ring) {
+ if (ring->is_sw_ring)
+ amdgpu_sw_ring_ib_end(ring); }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index e90d327a589e..6fbc1627dab7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -312,6 +312,9 @@ struct amdgpu_ring {
   #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)

   int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);
+void amdgpu_ring_ib_beg

RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-26 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>I need more time for an in deep review of this, but form the one mile high 
>view it looks correct to me now.

>Can we do some pre-commit qa testing with this?

I changed drm test "Command submission Test (GFX)" to send high priority ibs 
meanwhile running Manhattan on Screen/Unigine heaven foreground, checking 
mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info

I have continued running this scenario for 2 daytime and 1 night, no hangs 
happen yet(lots of hangs has been fixed in the previous patches).

I will ask QA team to do more test.

Thanks,
JIadong

-Original Message-
From: Christian König 
Sent: Monday, September 26, 2022 2:49 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.
>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Acked-by: Luben Tuikov 
> Signed-off-by: Jiadong.Zhu 

I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.

Can we do some pre-commit qa testing with this?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
>   8 files changed, 368 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
>   ring->funcs->emit_wave_limit(ring, false);
>
> + amdgpu_ring_ib_end(ring);
>   amdgpu_ring_commit(ring);
>   return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 13db99d653bd..84b0b3c7d40f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -33,6 +33,7 @@
>
>   #include 
>   #include "amdgpu.h"
> +#include "amdgpu_sw_ring.h"
>   #include "atom.h"
>
>   /*
> @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring
> *ring)
>
>   return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );
>   }
> +
> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_begin(ring); }
> +
> +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_end(ring); }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index e90d327a589e..6fbc1627dab7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -312,6 +312,9 @@ struct amdgpu_ring {
>   #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
>
>   int amdgpu_ring_alloc(struct amd

Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-26 Thread Christian König

Am 23.09.22 um 15:16 schrieb jiadong@amd.com:

From: "Jiadong.Zhu" 

Trigger Mid-Command Buffer Preemption according to the priority of the software
rings and the hw fence signalling condition.

The muxer saves the locations of the indirect buffer frames from the software
ring together with the fence sequence number in its fifo queue, and pops out
those records when the fences are signalled. The locations are used to resubmit
packages in preemption scenarios by coping the chunks from the software ring.

v2: Update comment style.
v3: Fix conflict caused by previous modifications.
v4: Remove unnecessary prints.
v5: Fix corner cases for resubmission cases.
v6: Refactor functions for resubmission, calling fence_process in irq handler.

Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Acked-by: Luben Tuikov 
Signed-off-by: Jiadong.Zhu 


I need more time for an in deep review of this, but form the one mile 
high view it looks correct to me now.


Can we do some pre-commit qa testing with this?

Thanks,
Christian.


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
  8 files changed, 368 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 258cffe3c06a..af86d87e2f3b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
}
}
  
+	amdgpu_ring_ib_begin(ring);

if (job && ring->funcs->init_cond_exec)
patch_offset = amdgpu_ring_init_cond_exec(ring);
  
@@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,

ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
ring->funcs->emit_wave_limit(ring, false);
  
+	amdgpu_ring_ib_end(ring);

amdgpu_ring_commit(ring);
return 0;
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 13db99d653bd..84b0b3c7d40f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -33,6 +33,7 @@
  
  #include 

  #include "amdgpu.h"
+#include "amdgpu_sw_ring.h"
  #include "atom.h"
  
  /*

@@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring *ring)
  
  	return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );

  }
+
+void amdgpu_ring_ib_begin(struct amdgpu_ring *ring)
+{
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_begin(ring);
+}
+
+void amdgpu_ring_ib_end(struct amdgpu_ring *ring)
+{
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_end(ring);
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index e90d327a589e..6fbc1627dab7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -312,6 +312,9 @@ struct amdgpu_ring {
  #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
  
  int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);

+void amdgpu_ring_ib_begin(struct amdgpu_ring *ring);
+void amdgpu_ring_ib_end(struct amdgpu_ring *ring);
+
  void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);
  void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib 
*ib);
  void amdgpu_ring_commit(struct amdgpu_ring *ring);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 662aadebf111..788567e3b743 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -28,23 +28,146 @@
  
  #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
  
+static struct kmem_cache *amdgpu_mux_chunk_slab;

+
+static inline struct amdgpu_mux_entry *amdgpu_ring_mux_sw_entry(struct 
amdgpu_ring_mux *mux,
+   struct 
amdgpu_ring *ring)
+{
+   return ring->entry_index < mux->ring_entry_size ?
+   >ring_entry[ring->entry_index] : NULL;
+}
+
+/* copy packages on sw ring range[begin, end) */
+static void amdgpu_ring_mux_copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux,
+ struct amdgpu_ring *ring,
+ u64 s_start, u64 s_end)
+{
+   u64 start, end;
+   struct amdgpu_ring *real_ring = mux->real_ring;
+
+   start = s_start & ring->buf_mask;
+   end = s_end & ring->buf_mask;
+
+   if (start == end) {
+