RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] Hi Michel, I didn't reproduce the hang yet but find a race condition related with fence signaling time. I updated the patch series based on kernel 5.18. Thanks, Jiadong -Original Message- From: Michel Dänzer Sent: Tuesday, November 15, 2022 1:15 AM To: Zhu, Jiadong Cc: Tuikov, Luben ; Huang, Ray ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On 2022-11-10 18:00, Michel Dänzer wrote: > On 2022-11-08 09:01, Zhu, Jiadong wrote: >> >> I reproduced the glxgears 400fps scenario locally. The issue is caused by >> the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which >> slows down the low priority scheduler thread if high priority ib is under >> executing. I'll drop this patch as we cannot identify gpu bound according to >> the unsignaled fence, etc. > > Okay, I'm testing with patches 1-4 only now. > > So far I haven't noticed any negative effects, no slowdowns or intermittent > freezes. I'm afraid I may have run into another issue. I just hit a GPU hang, see the journalctl excerpt below. (I tried rebooting the machine via SSH after this, but it never seemed to complete, so I had to hard-power-off the machine by holding the power button for a few seconds) I can't be sure that the GPU hang is directly related to this series, but it seems plausible, and I hadn't hit a GPU hang in months if not over a year before. If this series results in potentially hitting a GPU hang every few days, it definitely doesn't provide enough benefit to justify that. Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166051, emitted seq=1166052 Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin! Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: free PSP TMR buffer Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: MODE2 reset Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset succeeded, trying to resume Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled. Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x00F400A0 Nov 14 17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset! Nov 14 17:21:22 thor kernel: [drm] PSP is resuming... Nov 14 17:21:22 thor kernel: [drm] reserve 0x40 from 0xf431c0 for PSP TMR Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAS: optional ras ta ucode is not available Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAP: optional rap ta ucode is not available Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The CS has been rejected (-125), but the context isn't robust. Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated. Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0 Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Nov 14 17:21:23 thor kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed Nov 14 17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset(2) failed Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset end with ret = -110 Nov 14 17:21:23 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110 [...] Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166052, emitted seq=1166052 Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:33 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin! -- Earthling Michel Dänzer| https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C33282c71226444b6c24508dac663cf4a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638040429257070484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Cb8OyRawA9T9%2FfGSehB9r9JY%2FKcC4%2B%2FCdY8UaRh79t4%3Dreserved=0 Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-11-10 18:00, Michel Dänzer wrote: > On 2022-11-08 09:01, Zhu, Jiadong wrote: >> >> I reproduced the glxgears 400fps scenario locally. The issue is caused by >> the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which >> slows down the low priority scheduler thread if high priority ib is under >> executing. I'll drop this patch as we cannot identify gpu bound according to >> the unsignaled fence, etc. > > Okay, I'm testing with patches 1-4 only now. > > So far I haven't noticed any negative effects, no slowdowns or intermittent > freezes. I'm afraid I may have run into another issue. I just hit a GPU hang, see the journalctl excerpt below. (I tried rebooting the machine via SSH after this, but it never seemed to complete, so I had to hard-power-off the machine by holding the power button for a few seconds) I can't be sure that the GPU hang is directly related to this series, but it seems plausible, and I hadn't hit a GPU hang in months if not over a year before. If this series results in potentially hitting a GPU hang every few days, it definitely doesn't provide enough benefit to justify that. Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166051, emitted seq=1166052 Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin! Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: free PSP TMR buffer Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: MODE2 reset Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset succeeded, trying to resume Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled. Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x00F400A0 Nov 14 17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset! Nov 14 17:21:22 thor kernel: [drm] PSP is resuming... Nov 14 17:21:22 thor kernel: [drm] reserve 0x40 from 0xf431c0 for PSP TMR Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAS: optional ras ta ucode is not available Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAP: optional rap ta ucode is not available Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The CS has been rejected (-125), but the context isn't robust. Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated. Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0 Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110) Nov 14 17:21:23 thor kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed Nov 14 17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset(2) failed Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset end with ret = -110 Nov 14 17:21:23 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110 [...] Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_high timeout, signaled seq=1166052, emitted seq=1166052 Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:33 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin! -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] Hi Michel, It is true that we don’t get obvious improvement on performance with these patches. The original requirement of using mcbp is that when there is a very long ib package with many draw cmds on low priority which uses up gpu utilization, we give a chance to high priority ibs executed by gpu. The total performance could be dropped as mcbp drains the pipe and the low priority ibs would be resubmitted again after that. This set of patches is mainly to implement priority queues by software rings. We may use other method instead of mcbp to improve it later. Thanks, Jiadong -Original Message- From: Alex Deucher Sent: Friday, November 11, 2022 1:54 AM To: Michel Dänzer Cc: Zhu, Jiadong ; Tuikov, Luben ; Huang, Ray ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer wrote: > > On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer > > > > >>>> The bad news is that this series still makes some things very slow. The > >>>> most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 > >>>> fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page > >>>> now takes ~1 second, I can see it drawing line by line; before it was > >>>> almost instantaneous). I suspect this series makes the overhead of > >>>> running a single GPU job much bigger. On the bright side, I'm not > >>>> noticing any significant intermittent freezes anymore. > >>> > >>> Hi Michel, > >>> > >>> Thanks for the trying. > >>> Is there high priority jobs running while executing glxgears? > >> > >> Yes, mutter is submitting high priority jobs. However, I don't think that > >> can explain the problem by itself: > >> > >> mutter only draws once per display refresh cycle. Let's assume mutter's > >> GPU work takes ~6-7ms (conservative example, should be less than that > >> usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh > >> rate) where GPU work from glxgears & Xwayland can run without getting > >> preempted. Since glxgears runs at ~7000 fps without this series, it should > >> be able to draw at least ~70 frames in 10ms[0], which corresponds to over > >> 4000 fps. Yet it manages only 1/10 of that. > >> > >> [0] Worst case consideration, ignoring the fact that without this series, > >> glxgears runs at ~7000 fps while mutter sustains 60 fps. > > > > I reproduced the glxgears 400fps scenario locally. The issue is caused by > > the patch5 "drm/amdgpu: Improve the software rings priority scheduler" > > which slows down the low priority scheduler thread if high priority ib is > > under executing. I'll drop this patch as we cannot identify gpu bound > > according to the unsignaled fence, etc. > > Okay, I'm testing with patches 1-4 only now. > > So far I haven't noticed any negative effects, no slowdowns or intermittent > freezes. > > The only issue is that there's hardly any positive effect either. While > constantly moving the window of a GPU-limited GpuTest benchmark in circles, > most of the time it looks exactly the same as without these patches. Only > occasionally, at most every few seconds, I notice that the window movement > becomes smoother for an instant. > I think it will largely depend on the workload. The gfx pipe can only be preempted on draw boundaries so if most operations are a single draw, you probably won't see much difference. Alex
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-10-18 05:08, jiadong@amd.com wrote: > From: "Jiadong.Zhu" > > Trigger Mid-Command Buffer Preemption according to the priority of the > software > rings and the hw fence signalling condition. > > The muxer saves the locations of the indirect buffer frames from the software > ring together with the fence sequence number in its fifo queue, and pops out > those records when the fences are signalled. The locations are used to > resubmit > packages in preemption scenarios by coping the chunks from the software ring. > > v2: Update comment style. > v3: Fix conflict caused by previous modifications. > v4: Remove unnecessary prints. > v5: Fix corner cases for resubmission cases. > v6: Refactor functions for resubmission, calling fence_process in irq handler. > v7: Solve conflict for removing amdgpu_sw_ring.c. > v8: Add time threshold to judge if preemption request is needed. > > Cc: Christian Koenig > Cc: Luben Tuikov > Cc: Andrey Grodzovsky > Cc: Michel Dänzer > Signed-off-by: Jiadong.Zhu > Acked-by: Luben Tuikov > Acked-by: Huang Rui > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c| 53 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 12 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 6 + > drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 353 +-- > drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 29 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 + > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 7 +- > 8 files changed, 420 insertions(+), 44 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 790f7bfdc654..470448bc1ebb 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -54,6 +54,7 @@ struct amdgpu_fence { > > /* RB, DMA, etc. */ > struct amdgpu_ring *ring; > + ktime_t start_timestamp; > }; > > static struct kmem_cache *amdgpu_fence_slab; > @@ -199,6 +200,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct > dma_fence **f, struct amd > rcu_assign_pointer(*ptr, dma_fence_get(fence)); > > *f = fence; > + to_amdgpu_fence(fence)->start_timestamp = ktime_get(); > > return 0; > } > @@ -400,6 +402,57 @@ unsigned amdgpu_fence_count_emitted(struct amdgpu_ring > *ring) > return lower_32_bits(emitted); > } > > +/** > + * amdgpu_fence_last_unsignaled_time_us - the time fence emited till now Spelling: emitted until > + * @ring: ring the fence is associated with > + * > + * Find the earlist fence unsignaled till now, calculate the time delta Spelling: earliest fence unsignalled until > + * between the time fence emitted and now. > + */ > +u64 amdgpu_fence_last_unsignaled_time_us(struct amdgpu_ring *ring) > +{ > + struct amdgpu_fence_driver *drv = >fence_drv; > + struct dma_fence *fence; > + uint32_t last_seq, sync_seq; > + > + last_seq = atomic_read(>fence_drv.last_seq); > + sync_seq = READ_ONCE(ring->fence_drv.sync_seq); > + if (last_seq == sync_seq) > + return 0; > + > + ++last_seq; > + last_seq &= drv->num_fences_mask; > + fence = drv->fences[last_seq]; > + if (!fence) > + return 0; > + > + return ktime_us_delta(ktime_get(), > + to_amdgpu_fence(fence)->start_timestamp); > +} > + > +/** > + * amdgpu_fence_update_start_timestamp - update the timestamp of the fence > + * @ring: ring the fence is associated with > + * @seq: the fence seq number to update. > + * @timestamp: the start timestamp to update. > + * > + * The function called at the time the fence and related ib is about to > + * resubmit to gpu in MCBP scenario. Thus we do not consider race condition > + * with amdgpu_fence_process to modify the same fence. > + */ > +void amdgpu_fence_update_start_timestamp(struct amdgpu_ring *ring, uint32_t > seq, ktime_t timestamp) > +{ > + struct amdgpu_fence_driver *drv = >fence_drv; > + struct dma_fence *fence; > + > + seq &= drv->num_fences_mask; > + fence = drv->fences[seq]; > + if (!fence) > + return; > + > + to_amdgpu_fence(fence)->start_timestamp = timestamp; > +} > + > /** > * amdgpu_fence_driver_start_ring - make the fence driver > * ready for use on the requested ring. > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > index 258cffe3c06a..af86d87e2f3b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned > num_ibs, > } > } > > + amdgpu_ring_ib_begin(ring); > if (job && ring->funcs->init_cond_exec) > patch_offset = amdgpu_ring_init_cond_exec(ring); > > @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer wrote: > > On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer > > > > The bad news is that this series still makes some things very slow. The > most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 > fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page > now takes ~1 second, I can see it drawing line by line; before it was > almost instantaneous). I suspect this series makes the overhead of > running a single GPU job much bigger. On the bright side, I'm not > noticing any significant intermittent freezes anymore. > >>> > >>> Hi Michel, > >>> > >>> Thanks for the trying. > >>> Is there high priority jobs running while executing glxgears? > >> > >> Yes, mutter is submitting high priority jobs. However, I don't think that > >> can explain the problem by itself: > >> > >> mutter only draws once per display refresh cycle. Let's assume mutter's > >> GPU work takes ~6-7ms (conservative example, should be less than that > >> usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh > >> rate) where GPU work from glxgears & Xwayland can run without getting > >> preempted. Since glxgears runs at ~7000 fps without this series, it should > >> be able to draw at least ~70 frames in 10ms[0], which corresponds to over > >> 4000 fps. Yet it manages only 1/10 of that. > >> > >> [0] Worst case consideration, ignoring the fact that without this series, > >> glxgears runs at ~7000 fps while mutter sustains 60 fps. > > > > I reproduced the glxgears 400fps scenario locally. The issue is caused by > > the patch5 "drm/amdgpu: Improve the software rings priority scheduler" > > which slows down the low priority scheduler thread if high priority ib is > > under executing. I'll drop this patch as we cannot identify gpu bound > > according to the unsignaled fence, etc. > > Okay, I'm testing with patches 1-4 only now. > > So far I haven't noticed any negative effects, no slowdowns or intermittent > freezes. > > The only issue is that there's hardly any positive effect either. While > constantly moving the window of a GPU-limited GpuTest benchmark in circles, > most of the time it looks exactly the same as without these patches. Only > occasionally, at most every few seconds, I notice that the window movement > becomes smoother for an instant. > I think it will largely depend on the workload. The gfx pipe can only be preempted on draw boundaries so if most operations are a single draw, you probably won't see much difference. Alex
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer > The bad news is that this series still makes some things very slow. The most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, I can see it drawing line by line; before it was almost instantaneous). I suspect this series makes the overhead of running a single GPU job much bigger. On the bright side, I'm not noticing any significant intermittent freezes anymore. >>> >>> Hi Michel, >>> >>> Thanks for the trying. >>> Is there high priority jobs running while executing glxgears? >> >> Yes, mutter is submitting high priority jobs. However, I don't think that >> can explain the problem by itself: >> >> mutter only draws once per display refresh cycle. Let's assume mutter's GPU >> work takes ~6-7ms (conservative example, should be less than that usually). >> That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where >> GPU work from glxgears & Xwayland can run without getting preempted. Since >> glxgears runs at ~7000 fps without this series, it should be able to draw at >> least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it >> manages only 1/10 of that. >> >> [0] Worst case consideration, ignoring the fact that without this series, >> glxgears runs at ~7000 fps while mutter sustains 60 fps. > > I reproduced the glxgears 400fps scenario locally. The issue is caused by the > patch5 "drm/amdgpu: Improve the software rings priority scheduler" which > slows down the low priority scheduler thread if high priority ib is under > executing. I'll drop this patch as we cannot identify gpu bound according to > the unsignaled fence, etc. Okay, I'm testing with patches 1-4 only now. So far I haven't noticed any negative effects, no slowdowns or intermittent freezes. The only issue is that there's hardly any positive effect either. While constantly moving the window of a GPU-limited GpuTest benchmark in circles, most of the time it looks exactly the same as without these patches. Only occasionally, at most every few seconds, I notice that the window movement becomes smoother for an instant. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] Hi Michel, I reproduced the glxgears 400fps scenario locally. The issue is caused by the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows down the low priority scheduler thread if high priority ib is under executing. I'll drop this patch as we cannot identify gpu bound according to the unsignaled fence, etc. Thanks, Jiadong -Original Message- From: Michel Dänzer Sent: Thursday, November 3, 2022 5:05 PM To: Zhu, Jiadong Cc: Tuikov, Luben ; Huang, Ray ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On 2022-11-03 03:58, Zhu, Jiadong wrote: > [AMD Official Use Only - General] > >> The bad news is that this series still makes some things very slow. The most >> extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps >> before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes >> ~1 second, I can see it drawing line by line; before it was almost >> instantaneous). I suspect this series makes the overhead of running a single >> GPU job much bigger. On the bright side, I'm not noticing any significant >> intermittent freezes anymore. > > Hi Michel, > > Thanks for the trying. > Is there high priority jobs running while executing glxgears? Yes, mutter is submitting high priority jobs. However, I don't think that can explain the problem by itself: mutter only draws once per display refresh cycle. Let's assume mutter's GPU work takes ~6-7ms (conservative example, should be less than that usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU work from glxgears & Xwayland can run without getting preempted. Since glxgears runs at ~7000 fps without this series, it should be able to draw at least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 of that. [0] Worst case consideration, ignoring the fact that without this series, glxgears runs at ~7000 fps while mutter sustains 60 fps. > I am running glxgears while submitting high priority ibs using amdgpu_test, > the fps ranges from 6000~8000. It's getting clear that artificial tests such as amdgpu_test don't suffice for evaluating the real-world impact of this kind of change. > Continuous preemption and resubmission may cause the slow fps. Could you have > a check about how fast the trailing fence seqNo expands. On my side, the > increment of Last signaled trailing fence is < 10 in a second. I had to go back to a kernel without this series, as it was just unusable. As this is my main machine, I don't know when I'll get a chance to check this. -- Earthling Michel Dänzer| https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C5cb642e1abf34ab7377308dabd7adfb7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638030632689527329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=jJuSMxqY4nMltWdrSOe4iJF5kmwPG2gBFXudDmheNBc%3Dreserved=0 Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-11-03 03:58, Zhu, Jiadong wrote: > [AMD Official Use Only - General] > >> The bad news is that this series still makes some things very slow. The most >> extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps >> before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes >> ~1 second, I can see it drawing line by line; before it was almost >> instantaneous). I suspect this series makes the overhead of running a single >> GPU job much bigger. On the bright side, I'm not noticing any significant >> intermittent freezes anymore. > > Hi Michel, > > Thanks for the trying. > Is there high priority jobs running while executing glxgears? Yes, mutter is submitting high priority jobs. However, I don't think that can explain the problem by itself: mutter only draws once per display refresh cycle. Let's assume mutter's GPU work takes ~6-7ms (conservative example, should be less than that usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU work from glxgears & Xwayland can run without getting preempted. Since glxgears runs at ~7000 fps without this series, it should be able to draw at least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 of that. [0] Worst case consideration, ignoring the fact that without this series, glxgears runs at ~7000 fps while mutter sustains 60 fps. > I am running glxgears while submitting high priority ibs using amdgpu_test, > the fps ranges from 6000~8000. It's getting clear that artificial tests such as amdgpu_test don't suffice for evaluating the real-world impact of this kind of change. > Continuous preemption and resubmission may cause the slow fps. Could you have > a check about how fast the trailing fence seqNo expands. On my side, the > increment of Last signaled trailing fence is < 10 in a second. I had to go back to a kernel without this series, as it was just unusable. As this is my main machine, I don't know when I'll get a chance to check this. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] >The bad news is that this series still makes some things very slow. The most >extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, >i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, >I can see it drawing line by line; before it was almost instantaneous). I >suspect this series makes the overhead of running a single GPU job much >bigger. On the bright side, I'm not noticing any significant intermittent >freezes anymore. Hi Michel, Thanks for the trying. Is there high priority jobs running while executing glxgears? I am running glxgears while submitting high priority ibs using amdgpu_test, the fps ranges from 6000~8000. Continuous preemption and resubmission may cause the slow fps. Could you have a check about how fast the trailing fence seqNo expands. On my side, the increment of Last signaled trailing fence is < 10 in a second. cat /sys/kernel/debug/dri/0/amdgpu_fence_info --- ring 0 (gfx) --- Last signaled fence 0x0001 Last emitted 0x0001 Last signaled trailing fence 0x013c Last emitted 0x013c Last preempted 0x Thanks, Jiadong -Original Message- From: Michel Dänzer Sent: Wednesday, November 2, 2022 7:26 PM To: Zhu, Jiadong Cc: Tuikov, Luben ; Huang, Ray ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) [ Dropping Andrey's no longer working address from Cc ] On 2022-11-01 11:09, Michel Dänzer wrote: > On 2022-11-01 10:58, Zhu, Jiadong wrote: >> >>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in >>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt >>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib >>> ends up dereferencing a NULL pointer. >> >> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of >> gfx_v9_0_sw_ring_funcs_gfx. >> >> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4): >> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_gfx = { >> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl, >> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec, >> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec, >> + .preempt_ib = gfx_v9_0_ring_preempt_ib, >> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl, >> .emit_wreg = gfx_v9_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, diff --git >> a/drivers/gpu/drm/amd/amdgpu/soc15d.h >> b/drivers/gpu/drm/amd/amdgpu/soc15d.h > > Ah! Looks like stg applied patch 3 incorrectly for me. :( > > I'll try and test with this fixed this week, and report back. I'm now running with patch 3 applied correctly, and with patch 5 as well. The good news is that I'm now seeing a positive effect with GpuTest benchmarks which are GPU-limited at low frame rates. In particular, with the pixmark piano benchmark, the GNOME Wayland session now actually stays more responsive on this machine than it does on my work laptop with an Intel iGPU. However, with the plot3d benchmark (with /plot3d_vertex_density=1750 on the command line to increase GPU load), it still doesn't quite manage to keep the desktop running at full frame rate, in contrast to the Intel iGPU. The bad news is that this series still makes some things very slow. The most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, I can see it drawing line by line; before it was almost instantaneous). I suspect this series makes the overhead of running a single GPU job much bigger. On the bright side, I'm not noticing any significant intermittent freezes anymore. In summary, while the benefits are promising, the downsides are unacceptable for enabling this by default. -- Earthling Michel Dänzer| https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7Cb15fb94893a247d734ff08dabcc5265c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638029852189066953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=awC3VH4zMdZGK9ayi8V3goI%2B%2FEkj0%2B2LL2VokYlLXSk%3Dreserved=0 Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[ Dropping Andrey's no longer working address from Cc ] On 2022-11-01 11:09, Michel Dänzer wrote: > On 2022-11-01 10:58, Zhu, Jiadong wrote: >> >>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in >>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt >>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib >>> ends up dereferencing a NULL pointer. >> >> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of >> gfx_v9_0_sw_ring_funcs_gfx. >> >> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4): >> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs >> gfx_v9_0_ring_funcs_gfx = { >> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl, >> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec, >> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec, >> + .preempt_ib = gfx_v9_0_ring_preempt_ib, >> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl, >> .emit_wreg = gfx_v9_0_ring_emit_wreg, >> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, >> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h >> b/drivers/gpu/drm/amd/amdgpu/soc15d.h > > Ah! Looks like stg applied patch 3 incorrectly for me. :( > > I'll try and test with this fixed this week, and report back. I'm now running with patch 3 applied correctly, and with patch 5 as well. The good news is that I'm now seeing a positive effect with GpuTest benchmarks which are GPU-limited at low frame rates. In particular, with the pixmark piano benchmark, the GNOME Wayland session now actually stays more responsive on this machine than it does on my work laptop with an Intel iGPU. However, with the plot3d benchmark (with /plot3d_vertex_density=1750 on the command line to increase GPU load), it still doesn't quite manage to keep the desktop running at full frame rate, in contrast to the Intel iGPU. The bad news is that this series still makes some things very slow. The most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, I can see it drawing line by line; before it was almost instantaneous). I suspect this series makes the overhead of running a single GPU job much bigger. On the bright side, I'm not noticing any significant intermittent freezes anymore. In summary, while the benefits are promising, the downsides are unacceptable for enabling this by default. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-11-01 10:58, Zhu, Jiadong wrote: > >> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in >> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt >> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib >> ends up dereferencing a NULL pointer. > > It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of > gfx_v9_0_sw_ring_funcs_gfx. > > [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4): > @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs > gfx_v9_0_ring_funcs_gfx = { > .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl, > .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec, > .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec, > + .preempt_ib = gfx_v9_0_ring_preempt_ib, > .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl, > .emit_wreg = gfx_v9_0_ring_emit_wreg, > .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, > diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h > b/drivers/gpu/drm/amd/amdgpu/soc15d.h Ah! Looks like stg applied patch 3 incorrectly for me. :( I'll try and test with this fixed this week, and report back. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] >Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in >gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt >presumably uses the latter, which would explain why amdgpu_ring_preempt_ib >ends up dereferencing a NULL pointer. It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of gfx_v9_0_sw_ring_funcs_gfx. [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4): @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl, .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec, .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec, + .preempt_ib = gfx_v9_0_ring_preempt_ib, .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl, .emit_wreg = gfx_v9_0_ring_emit_wreg, .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h b/drivers/gpu/drm/amd/amdgpu/soc15d.h Thanks, Jiadong -Original Message- From: Michel Dänzer Sent: Tuesday, November 1, 2022 5:11 PM To: Zhu, Jiadong Cc: amd-gfx@lists.freedesktop.org; Andrey Grodzovsky ; Huang, Ray ; Koenig, Christian ; Tuikov, Luben Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) [ Please don't top-post ] On 2022-11-01 02:04, Zhu, Jiadong wrote: > > It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib > directly: > #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) > > The real ring's hook is assigned in gfx_v9_0.c: > static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { ... > .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line > existed. > ... > } Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt presumably uses the latter, which would explain why amdgpu_ring_preempt_ib ends up dereferencing a NULL pointer. > Here is the eglinfo on my board, I have once built mesa component > (22.3.0-devel), which may update the libEGL_mesa.so Looks like EGL_IMG_context_priority is supported (it would be interesting to see if it's listed for the Device Platform as well, but there's probably no reason to assume otherwise). Strange that you can't reproduce then. -- Earthling Michel Dänzer| https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C89dc4843c4674e7c706a08dabbe91326%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028906976834499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=8Ow2Lx890iYmhsih0ZFqgOYg9ciGz%2FjJLF3p7hhAtc4%3Dreserved=0 Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[ Please don't top-post ] On 2022-11-01 02:04, Zhu, Jiadong wrote: > > It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib > directly: > #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) > > The real ring's hook is assigned in gfx_v9_0.c: > static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { > ... > .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line > existed. > ... > } Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt presumably uses the latter, which would explain why amdgpu_ring_preempt_ib ends up dereferencing a NULL pointer. > Here is the eglinfo on my board, I have once built mesa component > (22.3.0-devel), which may update the libEGL_mesa.so Looks like EGL_IMG_context_priority is supported (it would be interesting to see if it's listed for the Device Platform as well, but there's probably no reason to assume otherwise). Strange that you can't reproduce then. -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer
RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
[AMD Official Use Only - General] It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib directly: #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r) The real ring's hook is assigned in gfx_v9_0.c: static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { ... .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line existed. ... } Here is the eglinfo on my board, I have once built mesa component (22.3.0-devel), which may update the libEGL_mesa.so EGL client extensions string: EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_device EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_EXT_platform_xcb EGL_MESA_platform_gbm EGL_KHR_platform_gbm EGL_MESA_platform_surfaceless GBM platform: EGL API version: 1.5 EGL vendor string: Mesa Project EGL version string: 1.5 EGL client APIs: OpenGL OpenGL_ES EGL extensions string: EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync EGL_EXT_buffer_age EGL_EXT_create_context_robustness EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers EGL_EXT_protected_surface EGL_IMG_context_priority EGL_KHR_cl_event2 EGL_KHR_config_attribs EGL_KHR_context_flush_control EGL_KHR_create_context EGL_KHR_create_context_no_error EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image EGL_MESA_image_dma_buf_export EGL_MESA_query_driver Thanks, Jiadong -Original Message- From: Michel Dänzer Sent: Monday, October 31, 2022 8:02 PM To: Zhu, Jiadong Cc: Andrey Grodzovsky ; Huang, Ray ; Tuikov, Luben ; Koenig, Christian ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8) On 2022-10-18 11:08, jiadong@amd.com wrote: > From: "Jiadong.Zhu" > > Trigger Mid-Command Buffer Preemption according to the priority of the > software rings and the hw fence signalling condition. > > The muxer saves the locations of the indirect buffer frames from the > software ring together with the fence sequence number in its fifo > queue, and pops out those records when the fences are signalled. The > locations are used to resubmit packages in preemption scenarios by coping the > chunks from the software ring. > > v2: Update comment style. > v3: Fix conflict caused by previous modifications. > v4: Remove unnecessary prints. > v5: Fix corner cases for resubmission cases. > v6: Refactor functions for resubmission, calling fence_process in irq handler. > v7: Solve conflict for removing amdgpu_sw_ring.c. > v8: Add time threshold to judge if preemption request is needed. > > Cc: Christian Koenig > Cc: Luben Tuikov > Cc: Andrey Grodzovsky > Cc: Michel Dänzer > Signed-off-by: Jiadong.Zhu > Acked-by: Luben Tuikov > Acked-by: Huang Rui [...] > +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need > +to resubmit. */ int amdgpu_mcbp_trigger_preempt(struct > +amdgpu_ring_mux *mux) { > + int r; > + > + spin_lock(>lock); > + mux->pending_trailing_fence_signaled = true; > + r = amdgpu_ring_preempt_ib(mux->real_ring); > + spin_unlock(>lock); > + return r; > +} AFAICT amdgpu_mcbp_trigger_preempt is used only in this file, so it should be declared static. (I didn't audit the other new functions added by this series for this, just happened to notice this one) -- Earthling Michel Dänzer| https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7Cjiadong.zhu%40amd.com%7Cfaca0a3e35964bcbb24708dabb37b284%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028145150743814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=BcnmJQ5jUdI%2BGnKvUpn3agyxTD4iTSweGxEQKUlQpxI%3Dreserved=0 Libre software enthusiast | Mesa and Xwayland developer
Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)
On 2022-10-18 11:08, jiadong@amd.com wrote: > From: "Jiadong.Zhu" > > Trigger Mid-Command Buffer Preemption according to the priority of the > software > rings and the hw fence signalling condition. > > The muxer saves the locations of the indirect buffer frames from the software > ring together with the fence sequence number in its fifo queue, and pops out > those records when the fences are signalled. The locations are used to > resubmit > packages in preemption scenarios by coping the chunks from the software ring. > > v2: Update comment style. > v3: Fix conflict caused by previous modifications. > v4: Remove unnecessary prints. > v5: Fix corner cases for resubmission cases. > v6: Refactor functions for resubmission, calling fence_process in irq handler. > v7: Solve conflict for removing amdgpu_sw_ring.c. > v8: Add time threshold to judge if preemption request is needed. > > Cc: Christian Koenig > Cc: Luben Tuikov > Cc: Andrey Grodzovsky > Cc: Michel Dänzer > Signed-off-by: Jiadong.Zhu > Acked-by: Luben Tuikov > Acked-by: Huang Rui [...] > +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need to > resubmit. */ > +int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux) > +{ > + int r; > + > + spin_lock(>lock); > + mux->pending_trailing_fence_signaled = true; > + r = amdgpu_ring_preempt_ib(mux->real_ring); > + spin_unlock(>lock); > + return r; > +} AFAICT amdgpu_mcbp_trigger_preempt is used only in this file, so it should be declared static. (I didn't audit the other new functions added by this series for this, just happened to notice this one) -- Earthling Michel Dänzer| https://redhat.com Libre software enthusiast | Mesa and Xwayland developer