RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-16 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

I didn't reproduce the hang yet but find a race condition related with fence 
signaling time. I updated the patch series based on kernel 5.18.

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Tuesday, November 15, 2022 1:15 AM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-11-10 18:00, Michel Dänzer wrote:
> On 2022-11-08 09:01, Zhu, Jiadong wrote:
>>
>> I reproduced the glxgears 400fps scenario locally. The issue is caused by 
>> the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which 
>> slows down the low priority scheduler thread if high priority ib is under 
>> executing. I'll drop this patch as we cannot identify gpu bound according to 
>> the unsignaled fence, etc.
>
> Okay, I'm testing with patches 1-4 only now.
>
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.

I'm afraid I may have run into another issue. I just hit a GPU hang, see the 
journalctl excerpt below.

(I tried rebooting the machine via SSH after this, but it never seemed to 
complete, so I had to hard-power-off the machine by holding the power button 
for a few seconds)

I can't be sure that the GPU hang is directly related to this series, but it 
seems plausible, and I hadn't hit a GPU hang in months if not over a year 
before. If this series results in potentially hitting a GPU hang every few 
days, it definitely doesn't provide enough benefit to justify that.


Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
gfx_high timeout, signaled seq=1166051, emitted seq=1166052 Nov 14 17:21:22 
thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: 
process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:22 
thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: free PSP TMR buffer 
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: MODE2 reset Nov 14 
17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset succeeded, trying 
to resume Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled.
Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x00F400A0 Nov 14 
17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset!
Nov 14 17:21:22 thor kernel: [drm] PSP is resuming...
Nov 14 17:21:22 thor kernel: [drm] reserve 0x40 from 0xf431c0 for PSP 
TMR Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAS: optional ras 
ta ucode is not available Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: 
amdgpu: RAP: optional rap ta ucode is not available Nov 14 17:21:23 thor 
gnome-shell[3639]: amdgpu: The CS has been rejected (-125), but the context 
isn't robust.
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated.
Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0 Nov 14 17:21:23 
thor kernel: amdgpu :05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] 
*ERROR* ring kiq_2.1.0 test failed (-110) Nov 14 17:21:23 thor kernel: 
[drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed Nov 14 
17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* 
resume of IP block  failed -110 Nov 14 17:21:23 thor kernel: amdgpu 
:05:00.0: amdgpu: GPU reset(2) failed Nov 14 17:21:23 thor kernel: amdgpu 
:05:00.0: amdgpu: GPU reset end with ret = -110 Nov 14 17:21:23 thor 
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110 
[...] Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* 
ring gfx_high timeout, signaled seq=1166052, emitted seq=1166052 Nov 14 
17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 
17:21:33 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C33282c71226444b6c24508dac663cf4a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638040429257070484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Cb8OyRawA9T9%2FfGSehB9r9JY%2FKcC4%2B%2FCdY8UaRh79t4%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-14 Thread Michel Dänzer
On 2022-11-10 18:00, Michel Dänzer wrote:
> On 2022-11-08 09:01, Zhu, Jiadong wrote:
>>
>> I reproduced the glxgears 400fps scenario locally. The issue is caused by 
>> the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which 
>> slows down the low priority scheduler thread if high priority ib is under 
>> executing. I'll drop this patch as we cannot identify gpu bound according to 
>> the unsignaled fence, etc.
> 
> Okay, I'm testing with patches 1-4 only now.
> 
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.

I'm afraid I may have run into another issue. I just hit a GPU hang, see the
journalctl excerpt below.

(I tried rebooting the machine via SSH after this, but it never seemed to
complete, so I had to hard-power-off the machine by holding the power
button for a few seconds)

I can't be sure that the GPU hang is directly related to this series,
but it seems plausible, and I hadn't hit a GPU hang in months if not
over a year before. If this series results in potentially hitting a
GPU hang every few days, it definitely doesn't provide enough benefit
to justify that.


Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
gfx_high timeout, signaled seq=1166051, emitted seq=1166052
Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: free PSP TMR buffer
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: MODE2 reset
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset succeeded, 
trying to resume
Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled.
Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x00F400A0
Nov 14 17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset!
Nov 14 17:21:22 thor kernel: [drm] PSP is resuming...
Nov 14 17:21:22 thor kernel: [drm] reserve 0x40 from 0xf431c0 for PSP 
TMR
Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAS: optional ras ta 
ucode is not available
Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAP: optional rap ta 
ucode is not available
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The CS has been rejected 
(-125), but the context isn't robust.
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated.
Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0
Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: [drm:amdgpu_ring_test_helper 
[amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Nov 14 17:21:23 thor kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* 
KCQ enable failed
Nov 14 17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] 
*ERROR* resume of IP block  failed -110
Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset(2) failed
Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset end with 
ret = -110
Nov 14 17:21:23 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU 
Recovery Failed: -110
[...]
Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
gfx_high timeout, signaled seq=1166052, emitted seq=1166052
Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860
Nov 14 17:21:33 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-10 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

It is true that we don’t get obvious improvement on performance with these 
patches.
The original requirement of using mcbp is that when there is a very long ib 
package with many draw cmds on low priority which uses up gpu utilization, we 
give a chance to high priority ibs executed by gpu.
The total performance could be dropped as mcbp drains the pipe and the low 
priority ibs would be resubmitted again after that.

This set of patches is mainly to implement priority queues by software rings. 
We may use other method instead of mcbp to improve it later.

Thanks,
Jiadong

-Original Message-
From: Alex Deucher 
Sent: Friday, November 11, 2022 1:54 AM
To: Michel Dänzer 
Cc: Zhu, Jiadong ; Tuikov, Luben ; 
Huang, Ray ; Koenig, Christian ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer  wrote:
>
> On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer
> 
> >
> >>>> The bad news is that this series still makes some things very slow. The 
> >>>> most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 
> >>>> fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page 
> >>>> now takes ~1 second, I can see it drawing line by line; before it was 
> >>>> almost instantaneous). I suspect this series makes the overhead of 
> >>>> running a single GPU job much bigger. On the bright side, I'm not 
> >>>> noticing any significant intermittent freezes anymore.
> >>>
> >>> Hi Michel,
> >>>
> >>> Thanks for the trying.
> >>> Is there high priority jobs running while executing glxgears?
> >>
> >> Yes, mutter is submitting high priority jobs. However, I don't think that 
> >> can explain the problem by itself:
> >>
> >> mutter only draws once per display refresh cycle. Let's assume mutter's 
> >> GPU work takes ~6-7ms (conservative example, should be less than that 
> >> usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh 
> >> rate) where GPU work from glxgears & Xwayland can run without getting 
> >> preempted. Since glxgears runs at ~7000 fps without this series, it should 
> >> be able to draw at least ~70 frames in 10ms[0], which corresponds to over 
> >> 4000 fps. Yet it manages only 1/10 of that.
> >>
> >> [0] Worst case consideration, ignoring the fact that without this series, 
> >> glxgears runs at ~7000 fps while mutter sustains 60 fps.
> >
> > I reproduced the glxgears 400fps scenario locally. The issue is caused by 
> > the patch5 "drm/amdgpu: Improve the software rings priority scheduler" 
> > which slows down the low priority scheduler thread if high priority ib is 
> > under executing. I'll drop this patch as we cannot identify gpu bound 
> > according to the unsignaled fence, etc.
>
> Okay, I'm testing with patches 1-4 only now.
>
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.
>
> The only issue is that there's hardly any positive effect either. While 
> constantly moving the window of a GPU-limited GpuTest benchmark in circles, 
> most of the time it looks exactly the same as without these patches. Only 
> occasionally, at most every few seconds, I notice that the window movement 
> becomes smoother for an instant.
>

I think it will largely depend on the workload.  The gfx pipe can only be 
preempted on draw boundaries so if most operations are a single draw, you 
probably won't see much difference.

Alex


Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-10 Thread Luben Tuikov
On 2022-10-18 05:08, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
> 
> Trigger Mid-Command Buffer Preemption according to the priority of the 
> software
> rings and the hw fence signalling condition.
> 
> The muxer saves the locations of the indirect buffer frames from the software
> ring together with the fence sequence number in its fifo queue, and pops out
> those records when the fences are signalled. The locations are used to 
> resubmit
> packages in preemption scenarios by coping the chunks from the software ring.
> 
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
> v7: Solve conflict for removing amdgpu_sw_ring.c.
> v8: Add time threshold to judge if preemption request is needed.
> 
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Cc: Michel Dänzer 
> Signed-off-by: Jiadong.Zhu 
> Acked-by: Luben Tuikov 
> Acked-by: Huang Rui 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c|  53 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   6 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 353 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  29 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|   7 +-
>  8 files changed, 420 insertions(+), 44 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 790f7bfdc654..470448bc1ebb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -54,6 +54,7 @@ struct amdgpu_fence {
>  
>   /* RB, DMA, etc. */
>   struct amdgpu_ring  *ring;
> + ktime_t start_timestamp;
>  };
>  
>  static struct kmem_cache *amdgpu_fence_slab;
> @@ -199,6 +200,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring, struct 
> dma_fence **f, struct amd
>   rcu_assign_pointer(*ptr, dma_fence_get(fence));
>  
>   *f = fence;
> + to_amdgpu_fence(fence)->start_timestamp = ktime_get();
>  
>   return 0;
>  }
> @@ -400,6 +402,57 @@ unsigned amdgpu_fence_count_emitted(struct amdgpu_ring 
> *ring)
>   return lower_32_bits(emitted);
>  }
>  
> +/**
> + * amdgpu_fence_last_unsignaled_time_us - the time fence emited till now

Spelling: emitted until

> + * @ring: ring the fence is associated with
> + *
> + * Find the earlist fence unsignaled till now, calculate the time delta

Spelling: earliest fence unsignalled until

> + * between the time fence emitted and now.
> + */
> +u64 amdgpu_fence_last_unsignaled_time_us(struct amdgpu_ring *ring)
> +{
> + struct amdgpu_fence_driver *drv = >fence_drv;
> + struct dma_fence *fence;
> + uint32_t last_seq, sync_seq;
> +
> + last_seq = atomic_read(>fence_drv.last_seq);
> + sync_seq = READ_ONCE(ring->fence_drv.sync_seq);
> + if (last_seq == sync_seq)
> + return 0;
> +
> + ++last_seq;
> + last_seq &= drv->num_fences_mask;
> + fence = drv->fences[last_seq];
> + if (!fence)
> + return 0;
> +
> + return ktime_us_delta(ktime_get(),
> + to_amdgpu_fence(fence)->start_timestamp);
> +}
> +
> +/**
> + * amdgpu_fence_update_start_timestamp - update the timestamp of the fence
> + * @ring: ring the fence is associated with
> + * @seq: the fence seq number to update.
> + * @timestamp: the start timestamp to update.
> + *
> + * The function called at the time the fence and related ib is about to
> + * resubmit to gpu in MCBP scenario. Thus we do not consider race condition
> + * with amdgpu_fence_process to modify the same fence.
> + */
> +void amdgpu_fence_update_start_timestamp(struct amdgpu_ring *ring, uint32_t 
> seq, ktime_t timestamp)
> +{
> + struct amdgpu_fence_driver *drv = >fence_drv;
> + struct dma_fence *fence;
> +
> + seq &= drv->num_fences_mask;
> + fence = drv->fences[seq];
> + if (!fence)
> + return;
> +
> + to_amdgpu_fence(fence)->start_timestamp = timestamp;
> +}
> +
>  /**
>   * amdgpu_fence_driver_start_ring - make the fence driver
>   * ready for use on the requested ring.
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>  
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>  
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring 

Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-10 Thread Alex Deucher
On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer  wrote:
>
> On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer 
> 
> >
>  The bad news is that this series still makes some things very slow. The 
>  most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 
>  fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page 
>  now takes ~1 second, I can see it drawing line by line; before it was 
>  almost instantaneous). I suspect this series makes the overhead of 
>  running a single GPU job much bigger. On the bright side, I'm not 
>  noticing any significant intermittent freezes anymore.
> >>>
> >>> Hi Michel,
> >>>
> >>> Thanks for the trying.
> >>> Is there high priority jobs running while executing glxgears?
> >>
> >> Yes, mutter is submitting high priority jobs. However, I don't think that 
> >> can explain the problem by itself:
> >>
> >> mutter only draws once per display refresh cycle. Let's assume mutter's 
> >> GPU work takes ~6-7ms (conservative example, should be less than that 
> >> usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh 
> >> rate) where GPU work from glxgears & Xwayland can run without getting 
> >> preempted. Since glxgears runs at ~7000 fps without this series, it should 
> >> be able to draw at least ~70 frames in 10ms[0], which corresponds to over 
> >> 4000 fps. Yet it manages only 1/10 of that.
> >>
> >> [0] Worst case consideration, ignoring the fact that without this series, 
> >> glxgears runs at ~7000 fps while mutter sustains 60 fps.
> >
> > I reproduced the glxgears 400fps scenario locally. The issue is caused by 
> > the patch5 "drm/amdgpu: Improve the software rings priority scheduler" 
> > which slows down the low priority scheduler thread if high priority ib is 
> > under executing. I'll drop this patch as we cannot identify gpu bound 
> > according to the unsignaled fence, etc.
>
> Okay, I'm testing with patches 1-4 only now.
>
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.
>
> The only issue is that there's hardly any positive effect either. While 
> constantly moving the window of a GPU-limited GpuTest benchmark in circles, 
> most of the time it looks exactly the same as without these patches. Only 
> occasionally, at most every few seconds, I notice that the window movement 
> becomes smoother for an instant.
>

I think it will largely depend on the workload.  The gfx pipe can only
be preempted on draw boundaries so if most operations are a single
draw, you probably won't see much difference.

Alex


Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-10 Thread Michel Dänzer
On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer 

>
 The bad news is that this series still makes some things very slow. The 
 most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps 
 before, i.e. almost 20x slowdown) and hexchat (scrolling one page now 
 takes ~1 second, I can see it drawing line by line; before it was almost 
 instantaneous). I suspect this series makes the overhead of running a 
 single GPU job much bigger. On the bright side, I'm not noticing any 
 significant intermittent freezes anymore.
>>>
>>> Hi Michel,
>>>
>>> Thanks for the trying.
>>> Is there high priority jobs running while executing glxgears?
>> 
>> Yes, mutter is submitting high priority jobs. However, I don't think that 
>> can explain the problem by itself:
>> 
>> mutter only draws once per display refresh cycle. Let's assume mutter's GPU 
>> work takes ~6-7ms (conservative example, should be less than that usually). 
>> That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where 
>> GPU work from glxgears & Xwayland can run without getting preempted. Since 
>> glxgears runs at ~7000 fps without this series, it should be able to draw at 
>> least ~70 frames in 10ms[0], which corresponds to over 4000 fps. Yet it 
>> manages only 1/10 of that.
>> 
>> [0] Worst case consideration, ignoring the fact that without this series, 
>> glxgears runs at ~7000 fps while mutter sustains 60 fps.
>
> I reproduced the glxgears 400fps scenario locally. The issue is caused by the 
> patch5 "drm/amdgpu: Improve the software rings priority scheduler" which 
> slows down the low priority scheduler thread if high priority ib is under 
> executing. I'll drop this patch as we cannot identify gpu bound according to 
> the unsignaled fence, etc.

Okay, I'm testing with patches 1-4 only now.

So far I haven't noticed any negative effects, no slowdowns or intermittent 
freezes.

The only issue is that there's hardly any positive effect either. While 
constantly moving the window of a GPU-limited GpuTest benchmark in circles, 
most of the time it looks exactly the same as without these patches. Only 
occasionally, at most every few seconds, I notice that the window movement 
becomes smoother for an instant.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer




RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-08 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

I reproduced the glxgears 400fps scenario locally. The issue is caused by the 
patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows 
down the low priority scheduler thread if high priority ib is under executing. 
I'll drop this patch as we cannot identify gpu bound according to the 
unsignaled fence, etc.

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Thursday, November 3, 2022 5:05 PM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-11-03 03:58, Zhu, Jiadong wrote:
> [AMD Official Use Only - General]
>
>> The bad news is that this series still makes some things very slow. The most 
>> extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps 
>> before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes 
>> ~1 second, I can see it drawing line by line; before it was almost 
>> instantaneous). I suspect this series makes the overhead of running a single 
>> GPU job much bigger. On the bright side, I'm not noticing any significant 
>> intermittent freezes anymore.
>
> Hi Michel,
>
> Thanks for the trying.
> Is there high priority jobs running while executing glxgears?

Yes, mutter is submitting high priority jobs. However, I don't think that can 
explain the problem by itself:

mutter only draws once per display refresh cycle. Let's assume mutter's GPU 
work takes ~6-7ms (conservative example, should be less than that usually). 
That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU 
work from glxgears & Xwayland can run without getting preempted. Since glxgears 
runs at ~7000 fps without this series, it should be able to draw at least ~70 
frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 
of that.

[0] Worst case consideration, ignoring the fact that without this series, 
glxgears runs at ~7000 fps while mutter sustains 60 fps.


> I am running glxgears while submitting high priority ibs using amdgpu_test, 
> the fps ranges from 6000~8000.

It's getting clear that artificial tests such as amdgpu_test don't suffice for 
evaluating the real-world impact of this kind of change.


> Continuous preemption and resubmission may cause the slow fps. Could you have 
> a check about how fast the trailing fence seqNo expands. On my side, the 
> increment of Last signaled trailing fence is < 10 in a second.

I had to go back to a kernel without this series, as it was just unusable. As 
this is my main machine, I don't know when I'll get a chance to check this.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C5cb642e1abf34ab7377308dabd7adfb7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638030632689527329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=jJuSMxqY4nMltWdrSOe4iJF5kmwPG2gBFXudDmheNBc%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-03 Thread Michel Dänzer
On 2022-11-03 03:58, Zhu, Jiadong wrote:
> [AMD Official Use Only - General]
> 
>> The bad news is that this series still makes some things very slow. The most 
>> extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps 
>> before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes 
>> ~1 second, I can see it drawing line by line; before it was almost 
>> instantaneous). I suspect this series makes the overhead of running a single 
>> GPU job much bigger. On the bright side, I'm not noticing any significant 
>> intermittent freezes anymore.
> 
> Hi Michel,
> 
> Thanks for the trying.
> Is there high priority jobs running while executing glxgears?

Yes, mutter is submitting high priority jobs. However, I don't think that can 
explain the problem by itself:

mutter only draws once per display refresh cycle. Let's assume mutter's GPU 
work takes ~6-7ms (conservative example, should be less than that usually). 
That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU 
work from glxgears & Xwayland can run without getting preempted. Since glxgears 
runs at ~7000 fps without this series, it should be able to draw at least ~70 
frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 
of that.

[0] Worst case consideration, ignoring the fact that without this series, 
glxgears runs at ~7000 fps while mutter sustains 60 fps.


> I am running glxgears while submitting high priority ibs using amdgpu_test, 
> the fps ranges from 6000~8000.

It's getting clear that artificial tests such as amdgpu_test don't suffice for 
evaluating the real-world impact of this kind of change.


> Continuous preemption and resubmission may cause the slow fps. Could you have 
> a check about how fast the trailing fence seqNo expands. On my side, the 
> increment of Last signaled trailing fence is < 10 in a second.

I had to go back to a kernel without this series, as it was just unusable. As 
this is my main machine, I don't know when I'll get a chance to check this.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-02 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>The bad news is that this series still makes some things very slow. The most 
>extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, 
>i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, 
>I can see it drawing line by line; before it was almost instantaneous). I 
>suspect this series makes the overhead of running a single GPU job much 
>bigger. On the bright side, I'm not noticing any significant intermittent 
>freezes anymore.

Hi Michel,

Thanks for the trying.
Is there high priority jobs running while executing glxgears? I am running 
glxgears while submitting high priority ibs using amdgpu_test, the fps ranges 
from 6000~8000.

Continuous preemption and resubmission may cause the slow fps. Could you have a 
check about how fast the trailing fence seqNo expands. On my side, the 
increment of Last signaled trailing fence is < 10 in a second.


cat /sys/kernel/debug/dri/0/amdgpu_fence_info
--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x013c
Last emitted 0x013c
Last preempted   0x

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Wednesday, November 2, 2022 7:26 PM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)


[ Dropping Andrey's no longer working address from Cc ]

On 2022-11-01 11:09, Michel Dänzer wrote:
> On 2022-11-01 10:58, Zhu, Jiadong wrote:
>>
>>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>>> ends up dereferencing a NULL pointer.
>>
>> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
>> gfx_v9_0_sw_ring_funcs_gfx.
>>
>> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
>> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
>> gfx_v9_0_ring_funcs_gfx = {
>> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
>> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
>> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
>> +   .preempt_ib = gfx_v9_0_ring_preempt_ib,
>> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
>> .emit_wreg = gfx_v9_0_ring_emit_wreg,
>> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, diff --git
>> a/drivers/gpu/drm/amd/amdgpu/soc15d.h
>> b/drivers/gpu/drm/amd/amdgpu/soc15d.h
>
> Ah! Looks like stg applied patch 3 incorrectly for me. :(
>
> I'll try and test with this fixed this week, and report back.

I'm now running with patch 3 applied correctly, and with patch 5 as well.


The good news is that I'm now seeing a positive effect with GpuTest benchmarks 
which are GPU-limited at low frame rates. In particular, with the pixmark piano 
benchmark, the GNOME Wayland session now actually stays more responsive on this 
machine than it does on my work laptop with an Intel iGPU. However, with the 
plot3d benchmark (with /plot3d_vertex_density=1750 on the command line to 
increase GPU load), it still doesn't quite manage to keep the desktop running 
at full frame rate, in contrast to the Intel iGPU.

The bad news is that this series still makes some things very slow. The most 
extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, 
i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, 
I can see it drawing line by line; before it was almost instantaneous). I 
suspect this series makes the overhead of running a single GPU job much bigger. 
On the bright side, I'm not noticing any significant intermittent freezes 
anymore.


In summary, while the benefits are promising, the downsides are unacceptable 
for enabling this by default.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7Cb15fb94893a247d734ff08dabcc5265c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638029852189066953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=awC3VH4zMdZGK9ayi8V3goI%2B%2FEkj0%2B2LL2VokYlLXSk%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-02 Thread Michel Dänzer


[ Dropping Andrey's no longer working address from Cc ]

On 2022-11-01 11:09, Michel Dänzer wrote:
> On 2022-11-01 10:58, Zhu, Jiadong wrote:
>>
>>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>>> ends up dereferencing a NULL pointer.
>>
>> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
>> gfx_v9_0_sw_ring_funcs_gfx.
>>
>> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
>> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
>> gfx_v9_0_ring_funcs_gfx = {
>> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
>> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
>> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
>> +   .preempt_ib = gfx_v9_0_ring_preempt_ib,
>> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
>> .emit_wreg = gfx_v9_0_ring_emit_wreg,
>> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h 
>> b/drivers/gpu/drm/amd/amdgpu/soc15d.h
> 
> Ah! Looks like stg applied patch 3 incorrectly for me. :(
> 
> I'll try and test with this fixed this week, and report back.

I'm now running with patch 3 applied correctly, and with patch 5 as well.


The good news is that I'm now seeing a positive effect with GpuTest benchmarks 
which are GPU-limited at low frame rates. In particular, with the pixmark piano 
benchmark, the GNOME Wayland session now actually stays more responsive on this 
machine than it does on my work laptop with an Intel iGPU. However, with the 
plot3d benchmark (with /plot3d_vertex_density=1750 on the command line to 
increase GPU load), it still doesn't quite manage to keep the desktop running 
at full frame rate, in contrast to the Intel iGPU.

The bad news is that this series still makes some things very slow. The most 
extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, 
i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, 
I can see it drawing line by line; before it was almost instantaneous). I 
suspect this series makes the overhead of running a single GPU job much bigger. 
On the bright side, I'm not noticing any significant intermittent freezes 
anymore.


In summary, while the benefits are promising, the downsides are unacceptable 
for enabling this by default.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-01 Thread Michel Dänzer
On 2022-11-01 10:58, Zhu, Jiadong wrote:
> 
>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>> ends up dereferencing a NULL pointer.
> 
> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
> gfx_v9_0_sw_ring_funcs_gfx.
> 
> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
> gfx_v9_0_ring_funcs_gfx = {
> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
> +   .preempt_ib = gfx_v9_0_ring_preempt_ib,
> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
> .emit_wreg = gfx_v9_0_ring_emit_wreg,
> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h 
> b/drivers/gpu/drm/amd/amdgpu/soc15d.h

Ah! Looks like stg applied patch 3 incorrectly for me. :(

I'll try and test with this fixed this week, and report back.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-01 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>ends up dereferencing a NULL pointer.

It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
gfx_v9_0_sw_ring_funcs_gfx.

[PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
@@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_gfx = {
.emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
.init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
.patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
+   .preempt_ib = gfx_v9_0_ring_preempt_ib,
.emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
.emit_wreg = gfx_v9_0_ring_emit_wreg,
.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h 
b/drivers/gpu/drm/amd/amdgpu/soc15d.h

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Tuesday, November 1, 2022 5:11 PM
To: Zhu, Jiadong 
Cc: amd-gfx@lists.freedesktop.org; Andrey Grodzovsky 
; Huang, Ray ; Koenig, Christian 
; Tuikov, Luben 
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)


[ Please don't top-post ]


On 2022-11-01 02:04, Zhu, Jiadong wrote:
>
> It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib 
> directly:
> #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
>
> The real ring's hook is assigned  in gfx_v9_0.c:
> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { ...
> .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line 
> existed.
> ...
> }

Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
presumably uses the latter, which would explain why amdgpu_ring_preempt_ib ends 
up dereferencing a NULL pointer.


> Here is the eglinfo on my board, I have once built mesa component
> (22.3.0-devel), which may update the libEGL_mesa.so

Looks like EGL_IMG_context_priority is supported (it would be interesting to 
see if it's listed for the Device Platform as well, but there's probably no 
reason to assume otherwise). Strange that you can't reproduce then.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C89dc4843c4674e7c706a08dabbe91326%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028906976834499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=8Ow2Lx890iYmhsih0ZFqgOYg9ciGz%2FjJLF3p7hhAtc4%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-01 Thread Michel Dänzer


[ Please don't top-post ]


On 2022-11-01 02:04, Zhu, Jiadong wrote:
> 
> It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib 
> directly:
> #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
> 
> The real ring's hook is assigned  in gfx_v9_0.c:
> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
> ...
> .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line 
> existed.
> ...
> }

Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
presumably uses the latter, which would explain why amdgpu_ring_preempt_ib ends 
up dereferencing a NULL pointer.


> Here is the eglinfo on my board, I have once built mesa component 
> (22.3.0-devel), which may update the libEGL_mesa.so

Looks like EGL_IMG_context_priority is supported (it would be interesting to 
see if it's listed for the Device Platform as well, but there's probably no 
reason to assume otherwise). Strange that you can't reproduce then.


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-10-31 Thread Zhu, Jiadong
[AMD Official Use Only - General]

It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib 
directly:
#define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)

The real ring's hook is assigned  in gfx_v9_0.c:
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
...
.preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line existed.
...
}

Here is the eglinfo on my board, I have once built mesa component 
(22.3.0-devel), which may update the libEGL_mesa.so
EGL client extensions string:
EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query
EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses
EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_device
EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_EXT_platform_xcb
EGL_MESA_platform_gbm EGL_KHR_platform_gbm
EGL_MESA_platform_surfaceless

GBM platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
EGL extensions string:
EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync
EGL_EXT_buffer_age EGL_EXT_create_context_robustness
EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers
EGL_EXT_protected_surface EGL_IMG_context_priority EGL_KHR_cl_event2
EGL_KHR_config_attribs EGL_KHR_context_flush_control
EGL_KHR_create_context EGL_KHR_create_context_no_error
EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses
EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image
EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image
EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base
EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync
EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float
EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image
EGL_MESA_image_dma_buf_export EGL_MESA_query_driver

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Monday, October 31, 2022 8:02 PM
To: Zhu, Jiadong 
Cc: Andrey Grodzovsky ; Huang, Ray 
; Tuikov, Luben ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-10-18 11:08, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.
>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
> v7: Solve conflict for removing amdgpu_sw_ring.c.
> v8: Add time threshold to judge if preemption request is needed.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Cc: Michel Dänzer 
> Signed-off-by: Jiadong.Zhu 
> Acked-by: Luben Tuikov 
> Acked-by: Huang Rui 

[...]

> +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need
> +to resubmit. */ int amdgpu_mcbp_trigger_preempt(struct
> +amdgpu_ring_mux *mux) {
> + int r;
> +
> + spin_lock(>lock);
> + mux->pending_trailing_fence_signaled = true;
> + r = amdgpu_ring_preempt_ib(mux->real_ring);
> + spin_unlock(>lock);
> + return r;
> +}

AFAICT amdgpu_mcbp_trigger_preempt is used only in this file, so it should be 
declared static.

(I didn't audit the other new functions added by this series for this, just 
happened to notice this one)


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7Cjiadong.zhu%40amd.com%7Cfaca0a3e35964bcbb24708dabb37b284%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028145150743814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=BcnmJQ5jUdI%2BGnKvUpn3agyxTD4iTSweGxEQKUlQpxI%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-10-31 Thread Michel Dänzer
On 2022-10-18 11:08, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
> 
> Trigger Mid-Command Buffer Preemption according to the priority of the 
> software
> rings and the hw fence signalling condition.
> 
> The muxer saves the locations of the indirect buffer frames from the software
> ring together with the fence sequence number in its fifo queue, and pops out
> those records when the fences are signalled. The locations are used to 
> resubmit
> packages in preemption scenarios by coping the chunks from the software ring.
> 
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
> v7: Solve conflict for removing amdgpu_sw_ring.c.
> v8: Add time threshold to judge if preemption request is needed.
> 
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Cc: Michel Dänzer 
> Signed-off-by: Jiadong.Zhu 
> Acked-by: Luben Tuikov 
> Acked-by: Huang Rui 

[...]

> +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need to 
> resubmit. */
> +int amdgpu_mcbp_trigger_preempt(struct amdgpu_ring_mux *mux)
> +{
> + int r;
> +
> + spin_lock(>lock);
> + mux->pending_trailing_fence_signaled = true;
> + r = amdgpu_ring_preempt_ib(mux->real_ring);
> + spin_unlock(>lock);
> + return r;
> +}

AFAICT amdgpu_mcbp_trigger_preempt is used only in this file, so it should be 
declared static.

(I didn't audit the other new functions added by this series for this, just 
happened to notice this one)


-- 
Earthling Michel Dänzer|  https://redhat.com
Libre software enthusiast  | Mesa and Xwayland developer