Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
On 2022-09-28 16:46, Christian König wrote: > Am 28.09.22 um 15:52 schrieb Michel Dänzer: >> On 2022-09-28 03:01, Zhu, Jiadong wrote:> >>> Please make sure umd is calling the libdrm function to create context with >>> different priories, >>> amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, >>> &context_handle). >> Yes, I double-checked that, and that it returns success. >> >> >>> Here is the behavior we could see: >>> 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in >>> previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr >>> to see if it is being used. >>> 2. MCBP happens while the two different priority ibs are submitted at the >>> same time. We could check fence info as below: >>> Last signaled trailing fence++ when the mcbp triggers by kmd. Last >>> preempted may not increase as the mcbp is not triggered from CP. >>> >>> --- ring 0 (gfx) --- >>> Last signaled fence 0x0001 >>> Last emitted 0x0001 >>> Last signaled trailing fence 0x0022eb84 >>> Last emitted 0x0022eb84 >>> Last preempted 0x >>> Last reset 0x >> I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled >> trailing fence" line is changing here (seems to always match the "Last >> emitted" line). >> >> However, mutter's frame rate still cannot exceed that of GPU-limited >> clients. BTW, you can test with a GNOME Wayland session, even without my MR >> referenced below. Preemption will just be less effective without that MR. >> mutter has used a high priority context when possible for a long time. >> >> I'm also seeing intermittent freezes, where not even the mouse cursor moves >> for up to around one second, e.g. when interacting with the GNOME top bar. >> I'm not sure yet if these are related to this patch series, but I never >> noticed it before. I wonder if the freezes might occur when GPU preemption >> is attempted. > > Keep in mind that this doesn't have the same fine granularity as the separate > hw ring buffer found on gfx10. > > With MCBP we can only preempt on draw command boundary, while the separate hw > ring solution can preempt as soon as a CU is available. Right, but so far I haven't noticed any positive effect. That and the intermittent freezes indicate the MCBP based preemption isn't actually working as intended yet. >>> From: Koenig, Christian >>> This work is solely for gfx9 (e.g. Vega) and older. Navi has a completely separate high priority gfx queue we can use for this. >> Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was >> for Sienna Cichlid only, and turned out to be unstable, so it had to >> reverted. >> >> It would be nice to make the SW ring solution take effect by default >> whenever there is no separate high priority HW gfx queue available (and any >> other requirements are met). > > I don't think that this will be a good idea. The hw ring buffer or even hw > scheduler have much nicer properties and we should focus on getting that > working when available. Of course, the HW features should have priority. I mean as a fallback when the HW features effectively aren't available (which is currently always the case with amdgpu, even when the GPU has the HW features). >>> Am 27.09.22 um 19:49 schrieb Michel Dänzer: On 2022-09-27 08:06, Christian König wrote: > Hey Michel, > > JIadong is working on exposing high/low priority gfx queues for gfx9 and > older hw generations by using mid command buffer preemption. Yeah, I've been keeping an eye on these patches. I'm looking forward to this working. > I know that you have been working on Gnome Mutter to make use from > userspace for this. Do you have time to run some tests with that? I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then with it, since I wasn't sure if it's needed) with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 on Navi 14. Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high priority context via the EGL_IMG_context_priority extension, it's unable to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x. Did I miss a step? [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, mutter can achieve a higher f
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
Am 28.09.22 um 15:52 schrieb Michel Dänzer: On 2022-09-28 03:01, Zhu, Jiadong wrote:> Please make sure umd is calling the libdrm function to create context with different priories, amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, &context_handle). Yes, I double-checked that, and that it returns success. Here is the behavior we could see: 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it is being used. 2. MCBP happens while the two different priority ibs are submitted at the same time. We could check fence info as below: Last signaled trailing fence++ when the mcbp triggers by kmd. Last preempted may not increase as the mcbp is not triggered from CP. --- ring 0 (gfx) --- Last signaled fence 0x0001 Last emitted 0x0001 Last signaled trailing fence 0x0022eb84 Last emitted 0x0022eb84 Last preempted 0x Last reset 0x I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled trailing fence" line is changing here (seems to always match the "Last emitted" line). However, mutter's frame rate still cannot exceed that of GPU-limited clients. BTW, you can test with a GNOME Wayland session, even without my MR referenced below. Preemption will just be less effective without that MR. mutter has used a high priority context when possible for a long time. I'm also seeing intermittent freezes, where not even the mouse cursor moves for up to around one second, e.g. when interacting with the GNOME top bar. I'm not sure yet if these are related to this patch series, but I never noticed it before. I wonder if the freezes might occur when GPU preemption is attempted. Keep in mind that this doesn't have the same fine granularity as the separate hw ring buffer found on gfx10. With MCBP we can only preempt on draw command boundary, while the separate hw ring solution can preempt as soon as a CU is available. From: Koenig, Christian This work is solely for gfx9 (e.g. Vega) and older. Navi has a completely separate high priority gfx queue we can use for this. Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for Sienna Cichlid only, and turned out to be unstable, so it had to reverted. It would be nice to make the SW ring solution take effect by default whenever there is no separate high priority HW gfx queue available (and any other requirements are met). I don't think that this will be a good idea. The hw ring buffer or even hw scheduler have much nicer properties and we should focus on getting that working when available. Regards, Christian. Am 27.09.22 um 19:49 schrieb Michel Dänzer: On 2022-09-27 08:06, Christian König wrote: Hey Michel, JIadong is working on exposing high/low priority gfx queues for gfx9 and older hw generations by using mid command buffer preemption. Yeah, I've been keeping an eye on these patches. I'm looking forward to this working. I know that you have been working on Gnome Mutter to make use from userspace for this. Do you have time to run some tests with that? I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then with it, since I wasn't sure if it's needed) with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 on Navi 14. Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high priority context via the EGL_IMG_context_priority extension, it's unable to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x. Did I miss a step? [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, mutter can achieve a higher frame rate than plot3d, though not than pixmark piano (presumably due to limited GPU preemption granularity).
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
On 2022-09-28 03:01, Zhu, Jiadong wrote:> > Please make sure umd is calling the libdrm function to create context with > different priories, > amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, > &context_handle). Yes, I double-checked that, and that it returns success. > Here is the behavior we could see: > 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in > previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr > to see if it is being used. > 2. MCBP happens while the two different priority ibs are submitted at the > same time. We could check fence info as below: > Last signaled trailing fence++ when the mcbp triggers by kmd. Last preempted > may not increase as the mcbp is not triggered from CP. > > --- ring 0 (gfx) --- > Last signaled fence 0x0001 > Last emitted 0x0001 > Last signaled trailing fence 0x0022eb84 > Last emitted 0x0022eb84 > Last preempted 0x > Last reset 0x I've now tested on this Picasso (GFX9) laptop as well. The "Last signaled trailing fence" line is changing here (seems to always match the "Last emitted" line). However, mutter's frame rate still cannot exceed that of GPU-limited clients. BTW, you can test with a GNOME Wayland session, even without my MR referenced below. Preemption will just be less effective without that MR. mutter has used a high priority context when possible for a long time. I'm also seeing intermittent freezes, where not even the mouse cursor moves for up to around one second, e.g. when interacting with the GNOME top bar. I'm not sure yet if these are related to this patch series, but I never noticed it before. I wonder if the freezes might occur when GPU preemption is attempted. > From: Koenig, Christian > > > This work is solely for gfx9 (e.g. Vega) and older. > > > > Navi has a completely separate high priority gfx queue we can use for this. Right, but 4c7631800e6b ("drm/amd/amdgpu: add pipe1 hardware support") was for Sienna Cichlid only, and turned out to be unstable, so it had to reverted. It would be nice to make the SW ring solution take effect by default whenever there is no separate high priority HW gfx queue available (and any other requirements are met). > Am 27.09.22 um 19:49 schrieb Michel Dänzer: >> On 2022-09-27 08:06, Christian König wrote: >>> Hey Michel, >>> >>> JIadong is working on exposing high/low priority gfx queues for gfx9 and >>> older hw generations by using mid command buffer preemption. >> Yeah, I've been keeping an eye on these patches. I'm looking forward to this >> working. >> >> >>> I know that you have been working on Gnome Mutter to make use from >>> userspace for this. Do you have time to run some tests with that? >> I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel >> command line, then with it, since I wasn't sure if it's needed) with >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 >> on Navi 14. >> >> Unfortunately, I'm not seeing any change in behaviour. Even though mutter >> uses a high priority context via the EGL_IMG_context_priority extension, >> it's unable to reach a higher frame rate than a GPU-limited client[0]. The >> "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains >> at 0x. >> >> Did I miss a step? >> >> >> [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel >> iGPU, mutter can achieve a higher frame rate than plot3d, though not than >> pixmark piano (presumably due to limited GPU preemption granularity). -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
[AMD Official Use Only - General] Hi Michel, Please make sure umd is calling the libdrm function to create context with different priories, amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, &context_handle). Here is the behavior we could see: 1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it is being used. 2. MCBP happens while the two different priority ibs are submitted at the same time. We could check fence info as below: Last signaled trailing fence++ when the mcbp triggers by kmd. Last preempted may not increase as the mcbp is not triggered from CP. --- ring 0 (gfx) --- Last signaled fence 0x0001 Last emitted 0x0001 Last signaled trailing fence 0x0022eb84 Last emitted 0x0022eb84 Last preempted 0x Last reset 0x Thanks, Jiadong -Original Message- From: Koenig, Christian Sent: Wednesday, September 28, 2022 2:20 AM To: Michel Dänzer Cc: Grodzovsky, Andrey ; Tuikov, Luben ; Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6) This work is solely for gfx9 (e.g. Vega) and older. Navi has a completely separate high priority gfx queue we can use for this. Thanks, Christian. Am 27.09.22 um 19:49 schrieb Michel Dänzer: > On 2022-09-27 08:06, Christian König wrote: >> Hey Michel, >> >> JIadong is working on exposing high/low priority gfx queues for gfx9 and >> older hw generations by using mid command buffer preemption. > Yeah, I've been keeping an eye on these patches. I'm looking forward to this > working. > > >> I know that you have been working on Gnome Mutter to make use from userspace >> for this. Do you have time to run some tests with that? > I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel > command line, then with it, since I wasn't sure if it's needed) with > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 > on Navi 14. > > Unfortunately, I'm not seeing any change in behaviour. Even though mutter > uses a high priority context via the EGL_IMG_context_priority extension, it's > unable to reach a higher frame rate than a GPU-limited client[0]. The "Last > preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at > 0x. > > Did I miss a step? > > > [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, > mutter can achieve a higher frame rate than plot3d, though not than pixmark > piano (presumably due to limited GPU preemption granularity). > >> Am 27.09.22 um 05:18 schrieb Zhu, Jiadong: >>> [AMD Official Use Only - General] >>> >>>> I need more time for an in deep review of this, but form the one mile high >>>> view it looks correct to me now. >>>> Can we do some pre-commit qa testing with this? >>> I changed drm test "Command submission Test (GFX)" to send high >>> priority ibs meanwhile running Manhattan on Screen/Unigine heaven >>> foreground, checking mcbp/resubmit triggered by cat >>> /sys/kernel/debug/dri/0/amdgpu_fence_info >>> >>> I have continued running this scenario for 2 daytime and 1 night, no hangs >>> happen yet(lots of hangs has been fixed in the previous patches). >>> >>> I will ask QA team to do more test. >>> >>> Thanks, >>> JIadong >>> >>> -Original Message- >>> From: Christian König >>> Sent: Monday, September 26, 2022 2:49 PM >>> To: Zhu, Jiadong ; >>> amd-gfx@lists.freedesktop.org >>> Cc: Tuikov, Luben ; Koenig, Christian >>> ; Grodzovsky, Andrey >>> >>> Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler >>> (v6) >>> >>> Caution: This message originated from an External Source. Use proper >>> caution when opening attachments, clicking links, or responding. >>> >>> >>> Am 23.09.22 um 15:16 schrieb jiadong@amd.com: >>>> From: "Jiadong.Zhu" >>>> >>>> Trigger Mid-Command Buffer Preemption according to the priority of >>>> the software rings
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
This work is solely for gfx9 (e.g. Vega) and older. Navi has a completely separate high priority gfx queue we can use for this. Thanks, Christian. Am 27.09.22 um 19:49 schrieb Michel Dänzer: On 2022-09-27 08:06, Christian König wrote: Hey Michel, JIadong is working on exposing high/low priority gfx queues for gfx9 and older hw generations by using mid command buffer preemption. Yeah, I've been keeping an eye on these patches. I'm looking forward to this working. I know that you have been working on Gnome Mutter to make use from userspace for this. Do you have time to run some tests with that? I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then with it, since I wasn't sure if it's needed) with https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880&data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3D&reserved=0 on Navi 14. Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high priority context via the EGL_IMG_context_priority extension, it's unable to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x. Did I miss a step? [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, mutter can achieve a higher frame rate than plot3d, though not than pixmark piano (presumably due to limited GPU preemption granularity). Am 27.09.22 um 05:18 schrieb Zhu, Jiadong: [AMD Official Use Only - General] I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? I changed drm test "Command submission Test (GFX)" to send high priority ibs meanwhile running Manhattan on Screen/Unigine heaven foreground, checking mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info I have continued running this scenario for 2 daytime and 1 night, no hangs happen yet(lots of hangs has been fixed in the previous patches). I will ask QA team to do more test. Thanks, JIadong -Original Message- From: Christian König Sent: Monday, September 26, 2022 2:49 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Koenig, Christian ; Grodzovsky, Andrey Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Am 23.09.22 um 15:16 schrieb jiadong@amd.com: From: "Jiadong.Zhu" Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition. The muxer saves the locations of the indirect buffer frames from the software ring together with the fence sequence number in its fifo queue, and pops out those records when the fences are signalled. The locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring. v2: Update comment style. v3: Fix conflict caused by previous modifications. v4: Remove unnecessary prints. v5: Fix corner cases for resubmission cases. v6: Refactor functions for resubmission, calling fence_process in irq handler. Cc: Christian Koenig Cc: Luben Tuikov Cc: Andrey Grodzovsky Acked-by: Luben Tuikov Signed-off-by: Jiadong.Zhu I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? Thanks, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 30 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 +- 8 files changed, 368 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 258cffe3c06a..af86d87e2f3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, } } + amdgpu_ring_ib_begin(ring); if (job && ring->funcs->init_cond_exec) patch_offset = amdgpu_ring_init_cond_exec(ring
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
On 2022-09-27 08:06, Christian König wrote: > Hey Michel, > > JIadong is working on exposing high/low priority gfx queues for gfx9 and > older hw generations by using mid command buffer preemption. Yeah, I've been keeping an eye on these patches. I'm looking forward to this working. > I know that you have been working on Gnome Mutter to make use from userspace > for this. Do you have time to run some tests with that? I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel command line, then with it, since I wasn't sure if it's needed) with https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/1880 on Navi 14. Unfortunately, I'm not seeing any change in behaviour. Even though mutter uses a high priority context via the EGL_IMG_context_priority extension, it's unable to reach a higher frame rate than a GPU-limited client[0]. The "Last preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 0x. Did I miss a step? [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, mutter can achieve a higher frame rate than plot3d, though not than pixmark piano (presumably due to limited GPU preemption granularity). > Am 27.09.22 um 05:18 schrieb Zhu, Jiadong: >> [AMD Official Use Only - General] >> >>> I need more time for an in deep review of this, but form the one mile high >>> view it looks correct to me now. >>> Can we do some pre-commit qa testing with this? >> I changed drm test "Command submission Test (GFX)" to send high priority ibs >> meanwhile running Manhattan on Screen/Unigine heaven foreground, checking >> mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info >> >> I have continued running this scenario for 2 daytime and 1 night, no hangs >> happen yet(lots of hangs has been fixed in the previous patches). >> >> I will ask QA team to do more test. >> >> Thanks, >> JIadong >> >> -Original Message- >> From: Christian König >> Sent: Monday, September 26, 2022 2:49 PM >> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org >> Cc: Tuikov, Luben ; Koenig, Christian >> ; Grodzovsky, Andrey >> Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6) >> >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> Am 23.09.22 um 15:16 schrieb jiadong@amd.com: >>> From: "Jiadong.Zhu" >>> >>> Trigger Mid-Command Buffer Preemption according to the priority of the >>> software rings and the hw fence signalling condition. >>> >>> The muxer saves the locations of the indirect buffer frames from the >>> software ring together with the fence sequence number in its fifo >>> queue, and pops out those records when the fences are signalled. The >>> locations are used to resubmit packages in preemption scenarios by coping >>> the chunks from the software ring. >>> >>> v2: Update comment style. >>> v3: Fix conflict caused by previous modifications. >>> v4: Remove unnecessary prints. >>> v5: Fix corner cases for resubmission cases. >>> v6: Refactor functions for resubmission, calling fence_process in irq >>> handler. >>> >>> Cc: Christian Koenig >>> Cc: Luben Tuikov >>> Cc: Andrey Grodzovsky >>> Acked-by: Luben Tuikov >>> Signed-off-by: Jiadong.Zhu >> I need more time for an in deep review of this, but form the one mile high >> view it looks correct to me now. >> >> Can we do some pre-commit qa testing with this? >> >> Thanks, >> Christian. >> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 13 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 30 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 ++ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + >>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 +- >>> 8 files changed, 368 insertions(+), 41 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> index 258cffe3c06a..af86d87e2f3b 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c >
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
Hey Michel, JIadong is working on exposing high/low priority gfx queues for gfx9 and older hw generations by using mid command buffer preemption. I know that you have been working on Gnome Mutter to make use from userspace for this. Do you have time to run some tests with that? Thanks, Christian. Am 27.09.22 um 05:18 schrieb Zhu, Jiadong: [AMD Official Use Only - General] I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? I changed drm test "Command submission Test (GFX)" to send high priority ibs meanwhile running Manhattan on Screen/Unigine heaven foreground, checking mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info I have continued running this scenario for 2 daytime and 1 night, no hangs happen yet(lots of hangs has been fixed in the previous patches). I will ask QA team to do more test. Thanks, JIadong -Original Message- From: Christian König Sent: Monday, September 26, 2022 2:49 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Koenig, Christian ; Grodzovsky, Andrey Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Am 23.09.22 um 15:16 schrieb jiadong@amd.com: From: "Jiadong.Zhu" Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition. The muxer saves the locations of the indirect buffer frames from the software ring together with the fence sequence number in its fifo queue, and pops out those records when the fences are signalled. The locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring. v2: Update comment style. v3: Fix conflict caused by previous modifications. v4: Remove unnecessary prints. v5: Fix corner cases for resubmission cases. v6: Refactor functions for resubmission, calling fence_process in irq handler. Cc: Christian Koenig Cc: Luben Tuikov Cc: Andrey Grodzovsky Acked-by: Luben Tuikov Signed-off-by: Jiadong.Zhu I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? Thanks, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 30 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- 8 files changed, 368 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 258cffe3c06a..af86d87e2f3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, } } + amdgpu_ring_ib_begin(ring); if (job && ring->funcs->init_cond_exec) patch_offset = amdgpu_ring_init_cond_exec(ring); @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH) ring->funcs->emit_wave_limit(ring, false); + amdgpu_ring_ib_end(ring); amdgpu_ring_commit(ring); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 13db99d653bd..84b0b3c7d40f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -33,6 +33,7 @@ #include #include "amdgpu.h" +#include "amdgpu_sw_ring.h" #include "atom.h" /* @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring *ring) return mqd_mgr->init_mqd(adev, ring->mqd_ptr, &prop); } + +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) { + if (ring->is_sw_ring) + amdgpu_sw_ring_ib_begin(ring); } + +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) { + if (ring->is_sw_ring) + amdgpu_sw_ring_ib_end(ring); } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index e90d327a589e..6fbc1627dab7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -312,6 +312,9 @@ struct amdgpu_ring { #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); +void amdgpu_ring_ib
RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
[AMD Official Use Only - General] >I need more time for an in deep review of this, but form the one mile high >view it looks correct to me now. >Can we do some pre-commit qa testing with this? I changed drm test "Command submission Test (GFX)" to send high priority ibs meanwhile running Manhattan on Screen/Unigine heaven foreground, checking mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info I have continued running this scenario for 2 daytime and 1 night, no hangs happen yet(lots of hangs has been fixed in the previous patches). I will ask QA team to do more test. Thanks, JIadong -Original Message- From: Christian König Sent: Monday, September 26, 2022 2:49 PM To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org Cc: Tuikov, Luben ; Koenig, Christian ; Grodzovsky, Andrey Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6) Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding. Am 23.09.22 um 15:16 schrieb jiadong@amd.com: > From: "Jiadong.Zhu" > > Trigger Mid-Command Buffer Preemption according to the priority of the > software rings and the hw fence signalling condition. > > The muxer saves the locations of the indirect buffer frames from the > software ring together with the fence sequence number in its fifo > queue, and pops out those records when the fences are signalled. The > locations are used to resubmit packages in preemption scenarios by coping the > chunks from the software ring. > > v2: Update comment style. > v3: Fix conflict caused by previous modifications. > v4: Remove unnecessary prints. > v5: Fix corner cases for resubmission cases. > v6: Refactor functions for resubmission, calling fence_process in irq handler. > > Cc: Christian Koenig > Cc: Luben Tuikov > Cc: Andrey Grodzovsky > Acked-by: Luben Tuikov > Signed-off-by: Jiadong.Zhu I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? Thanks, Christian. > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 13 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 30 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- > 8 files changed, 368 insertions(+), 41 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > index 258cffe3c06a..af86d87e2f3b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned > num_ibs, > } > } > > + amdgpu_ring_ib_begin(ring); > if (job && ring->funcs->init_cond_exec) > patch_offset = amdgpu_ring_init_cond_exec(ring); > > @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned > num_ibs, > ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH) > ring->funcs->emit_wave_limit(ring, false); > > + amdgpu_ring_ib_end(ring); > amdgpu_ring_commit(ring); > return 0; > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > index 13db99d653bd..84b0b3c7d40f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c > @@ -33,6 +33,7 @@ > > #include > #include "amdgpu.h" > +#include "amdgpu_sw_ring.h" > #include "atom.h" > > /* > @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring > *ring) > > return mqd_mgr->init_mqd(adev, ring->mqd_ptr, &prop); > } > + > +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) { > + if (ring->is_sw_ring) > + amdgpu_sw_ring_ib_begin(ring); } > + > +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) { > + if (ring->is_sw_ring) > + amdgpu_sw_ring_ib_end(ring); } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > index e90d327a589e..6fbc1627dab7 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h > @@ -312,6 +312,9 @@ struct amdgpu_ring { > #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) > > int amdgpu_ring_alloc(
Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)
Am 23.09.22 um 15:16 schrieb jiadong@amd.com: From: "Jiadong.Zhu" Trigger Mid-Command Buffer Preemption according to the priority of the software rings and the hw fence signalling condition. The muxer saves the locations of the indirect buffer frames from the software ring together with the fence sequence number in its fifo queue, and pops out those records when the fences are signalled. The locations are used to resubmit packages in preemption scenarios by coping the chunks from the software ring. v2: Update comment style. v3: Fix conflict caused by previous modifications. v4: Remove unnecessary prints. v5: Fix corner cases for resubmission cases. v6: Refactor functions for resubmission, calling fence_process in irq handler. Cc: Christian Koenig Cc: Luben Tuikov Cc: Andrey Grodzovsky Acked-by: Luben Tuikov Signed-off-by: Jiadong.Zhu I need more time for an in deep review of this, but form the one mile high view it looks correct to me now. Can we do some pre-commit qa testing with this? Thanks, Christian. --- drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 13 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 30 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c | 26 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 10 +- 8 files changed, 368 insertions(+), 41 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c index 258cffe3c06a..af86d87e2f3b 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, } } + amdgpu_ring_ib_begin(ring); if (job && ring->funcs->init_cond_exec) patch_offset = amdgpu_ring_init_cond_exec(ring); @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH) ring->funcs->emit_wave_limit(ring, false); + amdgpu_ring_ib_end(ring); amdgpu_ring_commit(ring); return 0; } diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 13db99d653bd..84b0b3c7d40f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -33,6 +33,7 @@ #include #include "amdgpu.h" +#include "amdgpu_sw_ring.h" #include "atom.h" /* @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring *ring) return mqd_mgr->init_mqd(adev, ring->mqd_ptr, &prop); } + +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) +{ + if (ring->is_sw_ring) + amdgpu_sw_ring_ib_begin(ring); +} + +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) +{ + if (ring->is_sw_ring) + amdgpu_sw_ring_ib_end(ring); +} diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index e90d327a589e..6fbc1627dab7 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h @@ -312,6 +312,9 @@ struct amdgpu_ring { #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) int amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw); +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring); +void amdgpu_ring_ib_end(struct amdgpu_ring *ring); + void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count); void amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib); void amdgpu_ring_commit(struct amdgpu_ring *ring); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c index 662aadebf111..788567e3b743 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c @@ -28,23 +28,146 @@ #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2) +static struct kmem_cache *amdgpu_mux_chunk_slab; + +static inline struct amdgpu_mux_entry *amdgpu_ring_mux_sw_entry(struct amdgpu_ring_mux *mux, + struct amdgpu_ring *ring) +{ + return ring->entry_index < mux->ring_entry_size ? + &mux->ring_entry[ring->entry_index] : NULL; +} + +/* copy packages on sw ring range[begin, end) */ +static void amdgpu_ring_mux_copy_pkt_from_sw_ring(struct amdgpu_ring_mux *mux, + struct amdgpu_ring *ring, + u64 s_start, u64 s_end) +{ + u64 start, end; + struct amdgpu_ring *real_ring = mux->real_ring; + + start = s_start & ring->buf_mask; + end = s_end & ring->buf_mask; + + if (start == end)