RE: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Zhu, Jiadong
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Christian König 
> Sent: Wednesday, July 10, 2024 8:46 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander 
> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the
> right place
>
> Am 10.07.24 um 12:15 schrieb Zhu, Jiadong:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Wednesday, July 10, 2024 5:27 PM
> >> To: Zhu, Jiadong ;
> >> amd-gfx@lists.freedesktop.org; Deucher, Alexander
> >> 
> >> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in
> >> the right place
> >>
> >> Am 10.07.24 um 09:54 schrieb Zhu, Jiadong:
> >>> [AMD Official Use Only - AMD Internal Distribution Only]
> >>>
> >>>> -Original Message-
> >>>> From: Christian König 
> >>>> Sent: Wednesday, July 10, 2024 3:17 PM
> >>>> To: Zhu, Jiadong ; amd-
> >> g...@lists.freedesktop.org
> >>>> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in
> >>>> the right place
> >>>>
> >>>> Am 10.07.24 um 02:31 schrieb jiadong@amd.com:
> >>>>> From: Jiadong Zhu 
> >>>>>
> >>>>> The job's embedded fence is dma_fence which shall not be
> conversed
> >>>>> to amdgpu_fence.
> >>>> Good catch.
> >>>>
> >>>>> The start timestamp shall be saved on job for hw_fence.
> >>>> But NAK to that approach. Why do we need the start time here in the
> >>>> first place?
> >>>>
> >>>> Regards,
> >>>> Christian.
> >>>>
> >>> The start timestamp is used for ring mux to check if the fences are
> >> unsignaled for a period of time under mcbp scenarios (by calling
> >> amdgpu_fence_last_unsignaled_time_us).
> >>
> >> I can't find a reason for doing that in the first place. What is the
> >> background of this?
> >>
> >> Regards,
> >> Christian.
> >>
> > It is about os triggered mcbp on gfx9. When we are using software ring and
> ring mux on gfx9,  the ring mux checks the fence unsignaled time of the low
> priority context while high priority job comes. If the time duration exceeds a
> certain time, mux will trigger mcbp.
> > we could add adev->gfx.mcbp check when set start_timestamp for those
> fences.
>
> So you basically want to guarantee some forward progress?
this patch is to fix the memory overlap on job->hw_fence.  For the other part 
we leave it as it was.

> While this is nice to have I don't think we need that in the first place.
>
> I mean when I have two hardware queues the high priority one would starve
> the low priority one as well.
HWS has two levels to handle queue priority:  for priority mode, high priority 
queue will preempt low priority queue as long as it has some work. For quantum 
mode, all the queues are in the same priority, the queue would be preempted 
when it uses up its time slice.
The hardware team suggested OS to use quantum mode as it will not starve low 
priority queue. Our implementation partially referred to that design.

Thanks,
Jiadong

> Regards,
> Christian.
>
> >
> > Thanks,
> > Jiadong
> >
> >>> Thanks,
> >>> Jiadong
> >>>>> v2: optimize get_fence_start_time.
> >>>>> Signed-off-by: Jiadong Zhu 
> >>>>> ---
> >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 31
> >>>> ---
> >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  3 +++
> >>>>> 2 files changed, 31 insertions(+), 3 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>>> index 2f24a6aa13bf..72bb007e48c8 100644
> >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>>>> @@ -88,6 +88,31 @@ static inline struct amdgpu_fence
> >>>> *to_amdgpu_fence(struct dma_fence *f)
> >>>>>   return NULL;
> >>>>> }
> >>>>>
> >>>>> +static inline void set_fence_start_time(struct dma_fence *f,
> >>>>> +kt

RE: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Zhu, Jiadong
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Christian König 
> Sent: Wednesday, July 10, 2024 5:27 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org;
> Deucher, Alexander 
> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the
> right place
>
> Am 10.07.24 um 09:54 schrieb Zhu, Jiadong:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >> -Original Message-
> >> From: Christian König 
> >> Sent: Wednesday, July 10, 2024 3:17 PM
> >> To: Zhu, Jiadong ; amd-
> g...@lists.freedesktop.org
> >> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in
> >> the right place
> >>
> >> Am 10.07.24 um 02:31 schrieb jiadong@amd.com:
> >>> From: Jiadong Zhu 
> >>>
> >>> The job's embedded fence is dma_fence which shall not be conversed
> >>> to amdgpu_fence.
> >> Good catch.
> >>
> >>> The start timestamp shall be saved on job for hw_fence.
> >> But NAK to that approach. Why do we need the start time here in the
> >> first place?
> >>
> >> Regards,
> >> Christian.
> >>
> > The start timestamp is used for ring mux to check if the fences are
> unsignaled for a period of time under mcbp scenarios (by calling
> amdgpu_fence_last_unsignaled_time_us).
>
> I can't find a reason for doing that in the first place. What is the 
> background
> of this?
>
> Regards,
> Christian.
>

It is about os triggered mcbp on gfx9. When we are using software ring and ring 
mux on gfx9,  the ring mux checks the fence unsignaled time of the low priority 
context while high priority job comes. If the time duration exceeds a certain 
time, mux will trigger mcbp.
we could add adev->gfx.mcbp check when set start_timestamp for those fences.

Thanks,
Jiadong

> >
> > Thanks,
> > Jiadong
> >>> v2: optimize get_fence_start_time.
> >>> Signed-off-by: Jiadong Zhu 
> >>> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 31
> >> ---
> >>>drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  3 +++
> >>>2 files changed, 31 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>> index 2f24a6aa13bf..72bb007e48c8 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> >>> @@ -88,6 +88,31 @@ static inline struct amdgpu_fence
> >> *to_amdgpu_fence(struct dma_fence *f)
> >>>  return NULL;
> >>>}
> >>>
> >>> +static inline void set_fence_start_time(struct dma_fence *f,
> >>> +ktime_t
> >>> +start_timestamp) {
> >>> +   if (f->ops == _fence_ops) {
> >>> +   struct amdgpu_fence *__f = container_of(f, struct
> >> amdgpu_fence,
> >>> +base);
> >>> +
> >>> +   __f->start_timestamp = start_timestamp;
> >>> +   } else if (f->ops == _job_fence_ops) {
> >>> +   struct amdgpu_job *job = container_of(f, struct
> >>> +amdgpu_job, hw_fence);
> >>> +
> >>> +   job->start_timestamp = start_timestamp;
> >>> +   }
> >>> +}
> >>> +
> >>> +static inline ktime_t get_fence_start_time(struct dma_fence *f) {
> >>> +   if (unlikely(f->ops == _fence_ops)) {
> >>> +   struct amdgpu_fence *__f = container_of(f, struct
> >> amdgpu_fence,
> >>> +base);
> >>> +
> >>> +   return __f->start_timestamp;
> >>> +   }
> >>> +   struct amdgpu_job *job = container_of(f, struct amdgpu_job,
> >>> +hw_fence);
> >>> +
> >>> +   return job->start_timestamp;
> >>> +}
> >>> +
> >>>/**
> >>> * amdgpu_fence_write - write a fence value
> >>> *
> >>> @@ -197,7 +222,7 @@ int amdgpu_fence_emit(struct amdgpu_ring
> *ring,
> >> struct dma_fence **f, struct amd
> >>>  }
> >>>  }
> >>>
> >>> -   to_amdgpu_fence(fence)->start_timestamp = ktime_get();
> >>> +   set_fence_start_time(fence, ktime_get());
> >>>
> >>>  /* This function can't be called concurrently anyway, otherwise
> >>> 

RE: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Zhu, Jiadong
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Christian König 
> Sent: Wednesday, July 10, 2024 3:17 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the
> right place
>
> Am 10.07.24 um 02:31 schrieb jiadong@amd.com:
> > From: Jiadong Zhu 
> >
> > The job's embedded fence is dma_fence which shall not be conversed to
> > amdgpu_fence.
>
> Good catch.
>
> > The start timestamp shall be saved on job for hw_fence.
>
> But NAK to that approach. Why do we need the start time here in the first
> place?
>
> Regards,
> Christian.
>

The start timestamp is used for ring mux to check if the fences are  unsignaled 
for a period of time under mcbp scenarios (by calling 
amdgpu_fence_last_unsignaled_time_us).

Thanks,
Jiadong
> >
> > v2: optimize get_fence_start_time.
>
> >
> > Signed-off-by: Jiadong Zhu 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 31
> ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_job.h   |  3 +++
> >   2 files changed, 31 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 2f24a6aa13bf..72bb007e48c8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -88,6 +88,31 @@ static inline struct amdgpu_fence
> *to_amdgpu_fence(struct dma_fence *f)
> > return NULL;
> >   }
> >
> > +static inline void set_fence_start_time(struct dma_fence *f, ktime_t
> > +start_timestamp) {
> > +   if (f->ops == _fence_ops) {
> > +   struct amdgpu_fence *__f = container_of(f, struct
> amdgpu_fence,
> > +base);
> > +
> > +   __f->start_timestamp = start_timestamp;
> > +   } else if (f->ops == _job_fence_ops) {
> > +   struct amdgpu_job *job = container_of(f, struct amdgpu_job,
> > +hw_fence);
> > +
> > +   job->start_timestamp = start_timestamp;
> > +   }
> > +}
> > +
> > +static inline ktime_t get_fence_start_time(struct dma_fence *f) {
> > +   if (unlikely(f->ops == _fence_ops)) {
> > +   struct amdgpu_fence *__f = container_of(f, struct
> amdgpu_fence,
> > +base);
> > +
> > +   return __f->start_timestamp;
> > +   }
> > +   struct amdgpu_job *job = container_of(f, struct amdgpu_job,
> > +hw_fence);
> > +
> > +   return job->start_timestamp;
> > +}
> > +
> >   /**
> >* amdgpu_fence_write - write a fence value
> >*
> > @@ -197,7 +222,7 @@ int amdgpu_fence_emit(struct amdgpu_ring *ring,
> struct dma_fence **f, struct amd
> > }
> > }
> >
> > -   to_amdgpu_fence(fence)->start_timestamp = ktime_get();
> > +   set_fence_start_time(fence, ktime_get());
> >
> > /* This function can't be called concurrently anyway, otherwise
> >  * emitting the fence would mess up the hardware ring buffer.
> > @@ -428,7 +453,7 @@ u64
> amdgpu_fence_last_unsignaled_time_us(struct amdgpu_ring *ring)
> > return 0;
> >
> > return ktime_us_delta(ktime_get(),
> > -   to_amdgpu_fence(fence)->start_timestamp);
> > +   get_fence_start_time(fence));
> >   }
> >
> >   /**
> > @@ -451,7 +476,7 @@ void
> amdgpu_fence_update_start_timestamp(struct amdgpu_ring *ring,
> uint32_t seq,
> > if (!fence)
> > return;
> >
> > -   to_amdgpu_fence(fence)->start_timestamp = timestamp;
> > +   set_fence_start_time(fence, timestamp);
> >   }
> >
> >   /**
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > index a963a25ddd62..3a73fe11a1ce 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
> > @@ -73,6 +73,9 @@ struct amdgpu_job {
> > uint64_tgds_va;
> > boolinit_shadow;
> >
> > +   /* start timestamp for hw_fence*/
> > +   ktime_t start_timestamp;
> > +
> > /* job_run_counter >= 1 means a resubmit job */
> > uint32_tjob_run_counter;
> >



RE: [PATCH v2] drm/amdgpu: set start timestamp of fence in the right place

2024-07-10 Thread Zhu, Jiadong
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Christian König 
> Sent: Wednesday, July 10, 2024 3:17 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH v2] drm/amdgpu: set start timestamp of fence in the
> right place
>
> Am 10.07.24 um 02:31 schrieb jiadong@amd.com:
> > From: Jiadong Zhu 
> >
> > The job's embedded fence is dma_fence which shall not be conversed to
> > amdgpu_fence.
>
> Good catch.
>
> > The start timestamp shall be saved on job for hw_fence.
>
> But NAK to that approach. Why do we need the start time here in the first
> place?
>
> Regards,
> Christian.
>
[Zhu, Jiadong


RE: [PATCH] drm/amdgpu: disable ring_muxer if mcbp is off

2024-03-04 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Acked-by: Jiadong Zhu 

-Original Message-
From: amd-gfx  On Behalf Of Christian 
König
Sent: Tuesday, February 27, 2024 12:02 AM
To: Alex Deucher ; Pelloux-Prayer, Pierre-Eric 

Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: disable ring_muxer if mcbp is off

Am 23.02.24 um 17:30 schrieb Alex Deucher:
> On Fri, Feb 23, 2024 at 4:48 AM Pierre-Eric Pelloux-Prayer
>  wrote:
>> Using the ring_muxer without preemption adds overhead for no reason
>> since mcbp cannot be triggered.
>>
>> Moving back to a single queue in this case also helps when high
>> priority app are used: in this case the gpu_scheduler priority
>> handling will work as expected - much better than ring_muxer with its
>> 2 independant schedulers competing for the same hardware queue.
>>
>> This change requires moving amdgpu_device_set_mcbp above
>> amdgpu_device_ip_early_init because we use adev->gfx.mcbp.
>>
>> Signed-off-by: Pierre-Eric Pelloux-Prayer
>> 
> Reviewed-by: Alex Deucher 

Acked-by: Christian König 

>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  4 ++--
>>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 21 -
>>   2 files changed, 14 insertions(+), 11 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index d534e192e260..40516d24026c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -4054,13 +4054,13 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>  return r;
>>  }
>>
>> +   amdgpu_device_set_mcbp(adev);
>> +
>>  /* early init functions */
>>  r = amdgpu_device_ip_early_init(adev);
>>  if (r)
>>  return r;
>>
>> -   amdgpu_device_set_mcbp(adev);
>> -
>>  /* Get rid of things like offb */
>>  r = drm_aperture_remove_conflicting_pci_framebuffers(adev->pdev, 
>> _kms_driver);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> index 169d45268ef6..f682f830f7f6 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>> @@ -2080,7 +2080,7 @@ static int gfx_v9_0_sw_init(void *handle)
>>  ring->doorbell_index =
>> adev->doorbell_index.gfx_ring0 << 1;
>>
>>  /* disable scheduler on the real ring */
>> -   ring->no_scheduler = true;
>> +   ring->no_scheduler = adev->gfx.mcbp;
>>  ring->vm_hub = AMDGPU_GFXHUB(0);
>>  r = amdgpu_ring_init(adev, ring, 1024, >gfx.eop_irq,
>>
>> AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP, @@ -2090,7 +2090,7 @@ static int 
>> gfx_v9_0_sw_init(void *handle)
>>  }
>>
>>  /* set up the software rings */
>> -   if (adev->gfx.num_gfx_rings) {
>> +   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
>>  for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++) {
>>  ring = >gfx.sw_gfx_ring[i];
>>  ring->ring_obj = NULL; @@ -2181,7 +2181,7 @@
>> static int gfx_v9_0_sw_fini(void *handle)
>>  int i;
>>  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>
>> -   if (adev->gfx.num_gfx_rings) {
>> +   if (adev->gfx.mcbp && adev->gfx.num_gfx_rings) {
>>  for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
>>  amdgpu_ring_fini(>gfx.sw_gfx_ring[i]);
>>  amdgpu_ring_mux_fini(>gfx.muxer);
>> @@ -5910,11 +5910,14 @@ static int gfx_v9_0_eop_irq(struct
>> amdgpu_device *adev,
>>
>>  switch (me_id) {
>>  case 0:
>> -   if (adev->gfx.num_gfx_rings &&
>> -   
>> !amdgpu_mcbp_handle_trailing_fence_irq(>gfx.muxer)) {
>> -   /* Fence signals are handled on the software rings*/
>> -   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
>> -   
>> amdgpu_fence_process(>gfx.sw_gfx_ring[i]);
>> +   if (adev->gfx.num_gfx_rings) {
>> +   if (!adev->gfx.mcbp) {
>> +   amdgpu_fence_process(>gfx.gfx_ring[0]);
>> +   } else if 
>> (!amdgpu_mcbp_handle_trailing_fence_irq(>gfx.muxer)) {
>> +   /* Fence signals are handled on the software 
>> rings*/
>> +   for (i = 0; i < GFX9_NUM_SW_GFX_RINGS; i++)
>> +   
>> amdgpu_fence_process(>gfx.sw_gfx_ring[i]);
>> +   }
>>  }
>>  break;
>>  case 1:
>> @@ -7051,7 +7054,7 @@ static void gfx_v9_0_set_ring_funcs(struct 
>> amdgpu_device *adev)
>>  for (i = 0; i < adev->gfx.num_gfx_rings; i++)
>>  adev->gfx.gfx_ring[i].funcs =
>> _v9_0_ring_funcs_gfx;
>>
>> -   if (adev->gfx.num_gfx_rings) 

RE: [PATCH] drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well

2024-01-03 Thread Zhu, Jiadong
[AMD Official Use Only - General]

The patch is:  Reviewed-and-tested-by: Jiadong Zhu 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Thursday, January 4, 2024 5:55 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH] drm/amdgpu: apply the RV2 system aperture fix to RN/CZN as well

These chips needs the same fix.  This was previously not seen on then since the 
AGP aperture expanded the system aperture, but this showed up again when AGP 
was disabled.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c  | 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c  | 4 +++-
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c   | 4 +++-
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 ++--
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 53a2ba5fcf4b..22175da0e16a 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -102,7 +102,9 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
WREG32_SOC15_RLC(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
min(adev->gmc.fb_start, adev->gmc.agp_start) >> 18);

-   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   if (adev->apu_flags & (AMD_APU_IS_RAVEN2 |
+  AMD_APU_IS_RENOIR |
+  AMD_APU_IS_GREEN_SARDINE))
   /*
* Raven2 has a HW issue that it is unable to use the
* vram which is out of MC_VM_SYSTEM_APERTURE_HIGH_ADDR.
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
index 55423ff1bb49..95d06da544e2 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_2.c
@@ -139,7 +139,9 @@ gfxhub_v1_2_xcc_init_system_aperture_regs(struct 
amdgpu_device *adev,
WREG32_SOC15_RLC(GC, GET_INST(GC, i), 
regMC_VM_SYSTEM_APERTURE_LOW_ADDR,
min(adev->gmc.fb_start, adev->gmc.agp_start) >> 
18);

-   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   if (adev->apu_flags & (AMD_APU_IS_RAVEN2 |
+  AMD_APU_IS_RENOIR |
+  AMD_APU_IS_GREEN_SARDINE))
   /*
* Raven2 has a HW issue that it is unable to 
use the
* vram which is out of 
MC_VM_SYSTEM_APERTURE_HIGH_ADDR.
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 843219a91736..e3ddd22aa172 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -96,7 +96,9 @@ static void mmhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
WREG32_SOC15(MMHUB, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
 min(adev->gmc.fb_start, adev->gmc.agp_start) >> 18);

-   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   if (adev->apu_flags & (AMD_APU_IS_RAVEN2 |
+  AMD_APU_IS_RENOIR |
+  AMD_APU_IS_GREEN_SARDINE))
/*
 * Raven2 has a HW issue that it is unable to use the vram which
 * is out of MC_VM_SYSTEM_APERTURE_HIGH_ADDR. So here is the 
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 5a7ac1c35b58..0c35e4122612 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -1294,7 +1294,9 @@ static void mmhub_read_system_context(struct 
amdgpu_device *adev, struct dc_phy_
/* AGP aperture is disabled */
if (agp_bot > agp_top) {
logical_addr_low = adev->gmc.fb_start >> 18;
-   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   if (adev->apu_flags & (AMD_APU_IS_RAVEN2 |
+  AMD_APU_IS_RENOIR |
+  AMD_APU_IS_GREEN_SARDINE))
/*
 * Raven2 has a HW issue that it is unable to use the 
vram which
 * is out of MC_VM_SYSTEM_APERTURE_HIGH_ADDR. So here 
is the @@ -1306,7 +1308,9 @@ static void mmhub_read_system_context(struct 
amdgpu_device *adev, struct dc_phy_
logical_addr_high = adev->gmc.fb_end >> 18;
} else {
logical_addr_low = min(adev->gmc.fb_start, adev->gmc.agp_start) 
>> 18;
-   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   if (adev->apu_flags & (AMD_APU_IS_RAVEN2 |
+  AMD_APU_IS_RENOIR |
+  

RE: [PATCH 2/2] drm/amdgpu: enable mcbp by default on gfx9

2023-06-26 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Reviewed-and-tested-by: Jiadong Zhu 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Saturday, June 17, 2023 5:10 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 2/2] drm/amdgpu: enable mcbp by default on gfx9

It's required for high priority queues.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 6 +++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 78c6265fe79b..3eb370b77ad9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3677,6 +3677,11 @@ static void amdgpu_device_set_mcbp(struct amdgpu_device 
*adev)
if (amdgpu_mcbp == 1)
adev->gfx.mcbp = true;

+   if ((adev->ip_versions[GC_HWIP][0] >= IP_VERSION(9, 0, 0)) &&
+   (adev->ip_versions[GC_HWIP][0] < IP_VERSION(10, 0, 0)) &&
+   adev->gfx.num_gfx_rings)
+   adev->gfx.mcbp = true;
+
if (amdgpu_sriov_vf(adev))
adev->gfx.mcbp = true;

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 03874371af60..308149dd7d00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -180,7 +180,7 @@ uint amdgpu_dc_feature_mask = 2;  uint 
amdgpu_dc_debug_mask;  uint amdgpu_dc_visual_confirm;  int 
amdgpu_async_gfx_ring = 1; -int amdgpu_mcbp;
+int amdgpu_mcbp = -1;
 int amdgpu_discovery = -1;
 int amdgpu_mes;
 int amdgpu_mes_kiq;
@@ -635,10 +635,10 @@ module_param_named(async_gfx_ring, amdgpu_async_gfx_ring, 
int, 0444);

 /**
  * DOC: mcbp (int)
- * It is used to enable mid command buffer preemption. (0 = disabled 
(default), 1 = enabled)
+ * It is used to enable mid command buffer preemption. (0 = disabled, 1
+ = enabled, -1 auto (default))
  */
 MODULE_PARM_DESC(mcbp,
-   "Enable Mid-command buffer preemption (0 = disabled (default), 1 = 
enabled)");
+   "Enable Mid-command buffer preemption (0 = disabled, 1 = enabled), -1
+= auto (default)");
 module_param_named(mcbp, amdgpu_mcbp, int, 0444);

 /**
--
2.40.1



RE: [PATCH 1/2] drm/amdgpu: make mcbp a per device setting

2023-06-26 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Reviewed-and-tested-by: Jiadong Zhu 

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Saturday, June 17, 2023 5:10 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 1/2] drm/amdgpu: make mcbp a per device setting

So we can selectively enable it on certain devices.  No intended functional 
change.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h|  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|  4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   |  3 ---
 drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c |  4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c |  2 +-
 7 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index f39db4a2c2cf..78c6265fe79b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2551,7 +2551,7 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
adev->ip_blocks[i].status.hw = true;

/* right after GMC hw init, we create CSA */
-   if (amdgpu_mcbp) {
+   if (adev->gfx.mcbp) {
r = amdgpu_allocate_static_csa(adev, 
>virt.csa_obj,
   
AMDGPU_GEM_DOMAIN_VRAM |
   
AMDGPU_GEM_DOMAIN_GTT,
@@ -3672,6 +3672,18 @@ static const struct attribute *amdgpu_dev_attributes[] = 
{
NULL
 };

+static void amdgpu_device_set_mcbp(struct amdgpu_device *adev) {
+   if (amdgpu_mcbp == 1)
+   adev->gfx.mcbp = true;
+
+   if (amdgpu_sriov_vf(adev))
+   adev->gfx.mcbp = true;
+
+   if (adev->gfx.mcbp)
+   DRM_INFO("MCBP is enabled\n");
+}
+
 /**
  * amdgpu_device_init - initialize the driver
  *
@@ -3823,9 +3835,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
DRM_INFO("register mmio base: 0x%08X\n", (uint32_t)adev->rmmio_base);
DRM_INFO("register mmio size: %u\n", (unsigned)adev->rmmio_size);

-   if (amdgpu_mcbp)
-   DRM_INFO("MCBP is enabled\n");
-
/*
 * Reset domain needs to be present early, before XGMI hive discovered
 * (if any) and intitialized to use reset sem and in_gpu reset flag @@ 
-3851,6 +3860,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
if (r)
return r;

+   amdgpu_device_set_mcbp(adev);
+
/* Get rid of things like offb */
r = drm_aperture_remove_conflicting_pci_framebuffers(adev->pdev, 
_kms_driver);
if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index ce0f7a8ad4b8..a4ff515ce896 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -434,6 +434,7 @@ struct amdgpu_gfx {
uint16_txcc_mask;
uint32_tnum_xcc_per_xcp;
struct mutexpartition_mutex;
+   boolmcbp; /* mid command buffer preemption 
*/
 };

 struct amdgpu_gfx_ras_reg_entry {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index e3531aa3c8bd..cca5a495611f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -805,7 +805,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, 
struct drm_file *filp)
dev_info->ids_flags = 0;
if (adev->flags & AMD_IS_APU)
dev_info->ids_flags |= AMDGPU_IDS_FLAGS_FUSION;
-   if (amdgpu_mcbp)
+   if (adev->gfx.mcbp)
dev_info->ids_flags |= AMDGPU_IDS_FLAGS_PREEMPTION;
if (amdgpu_is_tmz(adev))
dev_info->ids_flags |= AMDGPU_IDS_FLAGS_TMZ; @@ -1247,7 
+1247,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file 
*file_priv)
goto error_vm;
}

-   if (amdgpu_mcbp) {
+   if (adev->gfx.mcbp) {
uint64_t csa_addr = amdgpu_csa_vaddr(adev) & 
AMDGPU_GMC_HOLE_MASK;

r = amdgpu_map_static_csa(adev, >vm, adev->virt.csa_obj, 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 78ec3420ef85..dacf281d2b21 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -72,7 +72,7 @@ uint64_t amdgpu_sdma_get_csa_mc_addr(struct amdgpu_ring *ring,
int r;

/* don't enable OS preemption on SDMA under SRIOV */
-   if (amdgpu_sriov_vf(adev) || vmid == 0 || !amdgpu_mcbp)
+   

RE: [6.4-rc7][regression] slab-out-of-bounds in amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]

2023-06-21 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi,

It is fixed on  
https://patchwork.freedesktop.org/patch/542647/?series=119384=2

Could you make sure if this patch is included.

Thanks,
Jiadong

-Original Message-
From: amd-gfx  On Behalf Of Mikhail 
Gavrilov
Sent: Wednesday, June 21, 2023 3:38 PM
To: Zhu, Jiadong ; Deucher, Alexander 
; amd-gfx list ; 
Linux List Kernel Mailing 
Subject: [6.4-rc7][regression] slab-out-of-bounds in 
amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]

Hi,
after commit 5b711e7f9c73e5ff44d6ac865711d9a05c2a0360 I see KASAN sanitizer bug 
message at every boot:

Backtrace:
[   18.600551] 
==
[   18.600558] BUG: KASAN: slab-out-of-bounds in
amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.600943] Write of size 8 at addr 8881e4d3a098 by task kworker/8:1/133

[   18.600952] CPU: 8 PID: 133 Comm: kworker/8:1 Tainted: GW
 L---  ---  6.4.0-0.rc7.53.fc39.x86_64+debug #1
[   18.600960] Hardware name: ASUSTeK COMPUTER INC. ROG Strix
G513QY_G513QY/G513QY, BIOS G513QY.331 02/24/2023
[   18.600966] Workqueue: events
amdgpu_device_delayed_init_work_handler [amdgpu]
[   18.601253] Call Trace:
[   18.601256]  
[   18.601260]  dump_stack_lvl+0x76/0xd0
[   18.601267]  print_report+0xcf/0x670
[   18.601275]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.601573]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.601865]  kasan_report+0xa8/0xe0
[   18.601870]  ? amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.602163]  amdgpu_sw_ring_ib_mark_offset+0x2c1/0x2e0 [amdgpu]
[   18.602455]  gfx_v9_0_ring_emit_ib_gfx+0x4cc/0xd50 [amdgpu]
[   18.602767]  ? amdgpu_sw_ring_ib_begin+0x1b4/0x3d0 [amdgpu]
[   18.603061]  amdgpu_ib_schedule+0x7cb/0x1570 [amdgpu]
[   18.603354]  gfx_v9_0_ring_test_ib+0x375/0x540 [amdgpu]
[   18.603656]  ? __pfx_gfx_v9_0_ring_test_ib+0x10/0x10 [amdgpu]
[   18.603959]  ? __pfx_lock_acquire+0x10/0x10
[   18.603966]  amdgpu_ib_ring_tests+0x2bc/0x490 [amdgpu]
[   18.604260]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[   18.604544]  process_one_work+0x888/0x1460
[   18.604551]  ? worker_thread+0x2c8/0x12c0
[   18.604555]  ? __pfx_process_one_work+0x10/0x10
[   18.604562]  worker_thread+0x104/0x12c0
[   18.604567]  ? __kthread_parkme+0xc1/0x1f0
[   18.604573]  ? __pfx_worker_thread+0x10/0x10
[   18.604577]  kthread+0x2ee/0x3c0
[   18.604581]  ? __pfx_kthread+0x10/0x10
[   18.604586]  ret_from_fork+0x2c/0x50
[   18.604593]  

[   18.604598] Allocated by task 466:
[   18.604601]  kasan_save_stack+0x33/0x60
[   18.604606]  kasan_set_track+0x25/0x30
[   18.604610]  __kasan_kmalloc+0x8f/0xa0
[   18.604614]  __kmalloc+0x62/0x160
[   18.604618]  amdgpu_ring_mux_init+0x6e/0x1b0 [amdgpu]
[   18.604905]  gfx_v9_0_sw_init+0xffe/0x2930 [amdgpu]
[   18.605197]  amdgpu_device_init+0x3c36/0x7fc0 [amdgpu]
[   18.605476]  amdgpu_driver_load_kms+0x1d/0x4b0 [amdgpu]
[   18.605753]  amdgpu_pci_probe+0x279/0x9a0 [amdgpu]
[   18.606029]  local_pci_probe+0xdd/0x190
[   18.606034]  pci_device_probe+0x23a/0x770
[   18.606039]  really_probe+0x3e2/0xb80
[   18.606044]  __driver_probe_device+0x18c/0x450
[   18.606048]  driver_probe_device+0x4a/0x120
[   18.606052]  __driver_attach+0x1e5/0x4a0
[   18.606056]  bus_for_each_dev+0x109/0x190
[   18.606061]  bus_add_driver+0x2a1/0x570
[   18.606064]  driver_register+0x134/0x460
[   18.606069]  do_one_initcall+0xd5/0x3b0
[   18.606073]  do_init_module+0x238/0x770
[   18.606079]  load_module+0x5581/0x6f10
[   18.606082]  __do_sys_init_module+0x1f2/0x220
[   18.606086]  do_syscall_64+0x60/0x90
[   18.606091]  entry_SYSCALL_64_after_hwframe+0x72/0xdc

[   18.606099] The buggy address belongs to the object at 8881e4d3a000
which belongs to the cache kmalloc-128 of size 128
[   18.606106] The buggy address is located 24 bytes to the right of
allocated 128-byte region [8881e4d3a000, 8881e4d3a080)

[   18.606115] The buggy address belongs to the physical page:
[   18.606119] page:024dbf3d refcount:1 mapcount:0
mapping: index:0x0 pfn:0x1e4d3a
[   18.606126] head:024dbf3d order:1 entire_mapcount:0
nr_pages_mapped:0 pincount:0
[   18.606132] flags:
0x17c0010200(slab|head|node=0|zone=2|lastcpupid=0x1f)
[   18.606138] page_type: 0x()
[   18.606143] raw: 0017c0010200 8881000428c0 dead0122

[   18.606148] raw:  00200020 0001

[   18.606153] page dumped because: kasan: bad access detected

[   18.606159] Memory state around the buggy address:
[   18.606162]  8881e4d39f80: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[   18.606167]  8881e4d3a000: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[   18.606172] >8881e4d3a080: fc fc fc fc fc fc fc fc fc fc fc fc
fc fc fc fc
[   18.606176] ^
[   18.606180]  8881e4d3a100: 00 00

RE: [PATCH 1/2] drm/amdgpu: Modify indirect buffer packages for resubmission

2023-05-30 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Ping for the series.

Thanks,
Jiadong

-Original Message-
From: Zhu, Jiadong 
Sent: Friday, May 26, 2023 9:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Zhu, Jiadong 
Subject: [PATCH 1/2] drm/amdgpu: Modify indirect buffer packages for 
resubmission

From: Jiadong Zhu 

When the preempted IB frame resubmitted to cp, we need to modify the frame data 
including:
1. set PRE_RESUME 1 in CONTEXT_CONTROL.
2. use meta data(DE and CE) read from CSA in WRITE_DATA.

Add functions to save the location the first time IBs emitted and callback to 
patch the package when resubmission happens.

Signed-off-by: Jiadong Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 18 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  9 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 60   
drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h | 15 +
 4 files changed, 102 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 7429b20257a6..12ba863e69f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -692,3 +692,21 @@ void amdgpu_ring_ib_end(struct amdgpu_ring *ring)
if (ring->is_sw_ring)
amdgpu_sw_ring_ib_end(ring);
 }
+
+void amdgpu_ring_ib_on_emit_cntl(struct amdgpu_ring *ring) {
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_mark_offset(ring, 
AMDGPU_MUX_OFFSET_TYPE_CONTROL);
+}
+
+void amdgpu_ring_ib_on_emit_ce(struct amdgpu_ring *ring) {
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_CE); 
}
+
+void amdgpu_ring_ib_on_emit_de(struct amdgpu_ring *ring) {
+   if (ring->is_sw_ring)
+   amdgpu_sw_ring_ib_mark_offset(ring, AMDGPU_MUX_OFFSET_TYPE_DE); 
}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index baa03527bf8b..702ce55b962a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -229,6 +229,9 @@ struct amdgpu_ring_funcs {
int (*preempt_ib)(struct amdgpu_ring *ring);
void (*emit_mem_sync)(struct amdgpu_ring *ring);
void (*emit_wave_limit)(struct amdgpu_ring *ring, bool enable);
+   void (*patch_cntl)(struct amdgpu_ring *ring, unsigned offset);
+   void (*patch_ce)(struct amdgpu_ring *ring, unsigned offset);
+   void (*patch_de)(struct amdgpu_ring *ring, unsigned offset);
 };

 struct amdgpu_ring {
@@ -323,11 +326,17 @@ struct amdgpu_ring {  #define 
amdgpu_ring_init_cond_exec(r) (r)->funcs->init_cond_exec((r))  #define 
amdgpu_ring_patch_cond_exec(r,o) (r)->funcs->patch_cond_exec((r),(o))
 #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
+#define amdgpu_ring_patch_cntl(r, o) ((r)->funcs->patch_cntl((r), (o)))
+#define amdgpu_ring_patch_ce(r, o) ((r)->funcs->patch_ce((r), (o)))
+#define amdgpu_ring_patch_de(r, o) ((r)->funcs->patch_de((r), (o)))

 unsigned int amdgpu_ring_max_ibs(enum amdgpu_ring_type type);  int 
amdgpu_ring_alloc(struct amdgpu_ring *ring, unsigned ndw);  void 
amdgpu_ring_ib_begin(struct amdgpu_ring *ring);  void amdgpu_ring_ib_end(struct 
amdgpu_ring *ring);
+void amdgpu_ring_ib_on_emit_cntl(struct amdgpu_ring *ring); void
+amdgpu_ring_ib_on_emit_ce(struct amdgpu_ring *ring); void
+amdgpu_ring_ib_on_emit_de(struct amdgpu_ring *ring);

 void amdgpu_ring_insert_nop(struct amdgpu_ring *ring, uint32_t count);  void 
amdgpu_ring_generic_pad_ib(struct amdgpu_ring *ring, struct amdgpu_ib *ib); 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 62079f0e3ee8..73516abef662 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -105,6 +105,16 @@ static void amdgpu_mux_resubmit_chunks(struct 
amdgpu_ring_mux *mux)
amdgpu_fence_update_start_timestamp(e->ring,

chunk->sync_seq,

ktime_get());
+   if (chunk->sync_seq ==
+   
le32_to_cpu(*(e->ring->fence_drv.cpu_addr + 2))) {
+   if (chunk->cntl_offset <= 
e->ring->buf_mask)
+   amdgpu_ring_patch_cntl(e->ring,
+  
chunk->cntl_offset);
+   if (chunk->ce_offset <= 
e->ring->buf_mask)
+   amdgpu_ring_patch_ce(e->ring, 
chunk->ce_offset);
+   if (chunk->de_offset <= 
e->rin

RE: [PATCH] drm/amdgpu: Program gds backup address as zero if no gds allocated

2023-05-24 Thread Zhu, Jiadong
[AMD Official Use Only - General]

> Presumably other gfx versions require something similar?

From firmware guys, that is always the case. firmware would checks 
gds_backup_addr zero or not to restore gds partition in the resumed sequence.
Currently, only gfx9 has the scenario that os sending resubmission ib after 
mcbp happens.

Thanks,
Jiadong

-Original Message-
From: Alex Deucher 
Sent: Wednesday, May 24, 2023 8:30 PM
To: Zhu, Jiadong 
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Program gds backup address as zero if no gds 
allocated

On Wed, May 24, 2023 at 5:00 AM  wrote:
>
> From: Jiadong Zhu 
>
> It is firmware requirement to set gds_backup_addrlo and
> gds_backup_addrhi of DE meta both zero if no gds partition is allocated for 
> the frame.

Presumably other gfx versions require something similar?

Reviewed-by: Alex Deucher 

>
> Signed-off-by: Jiadong Zhu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index cbdd9918b3e7..cbcf6126cce5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -765,7 +765,7 @@ static void gfx_v9_0_set_rlc_funcs(struct
> amdgpu_device *adev);  static int gfx_v9_0_get_cu_info(struct amdgpu_device 
> *adev,
> struct amdgpu_cu_info *cu_info);
> static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
> *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
> *ring, bool resume);
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume, bool usegds);
>  static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
> static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
>   void *ras_error_status); @@
> -5160,7 +5160,8 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring 
> *ring,
> gfx_v9_0_ring_emit_de_meta(ring,
>
> (!amdgpu_sriov_vf(ring->adev) &&
>flags & 
> AMDGPU_IB_PREEMPTED) ?
> -  true : false);
> +  true : false,
> +  job->gds_size > 0
> + && job->gds_base != 0);
> }
>
> amdgpu_ring_write(ring, header); @@ -5435,7 +5436,7 @@ static
> int gfx_v9_0_ring_preempt_ib(struct amdgpu_ring *ring)
> return r;
>  }
>
> -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> resume)
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume, bool usegds)
>  {
> struct amdgpu_device *adev = ring->adev;
> struct v9_de_ib_state de_payload = {0}; @@ -5466,8 +5467,10 @@
> static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool resume)
>  PAGE_SIZE);
> }
>
> -   de_payload.gds_backup_addrlo = lower_32_bits(gds_addr);
> -   de_payload.gds_backup_addrhi = upper_32_bits(gds_addr);
> +   if (usegds) {
> +   de_payload.gds_backup_addrlo = lower_32_bits(gds_addr);
> +   de_payload.gds_backup_addrhi = upper_32_bits(gds_addr);
> +   }
>
> cnt = (sizeof(de_payload) >> 2) + 4 - 2;
> amdgpu_ring_write(ring, PACKET3(PACKET3_WRITE_DATA, cnt));
> --
> 2.25.1
>


RE: [RFC 2/7] drm/amdgpu: Add usermode queue for gfx work

2023-01-04 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Shashank,

I don't find how amdgpu_userq_ioctl is called, shall 
DRM_IOCTL_DEF_DRV(amdgpu_userq_ioctl...) be added somewhere to expose the ioctl?

Thanks,
Jiadong

-Original Message-
From: amd-gfx  On Behalf Of Shashank 
Sharma
Sent: Saturday, December 24, 2022 3:37 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Sharma, Shashank 
; Koenig, Christian ; Yadav, 
Arvind ; Paneer Selvam, Arunpravin 

Subject: [RFC 2/7] drm/amdgpu: Add usermode queue for gfx work

This patch adds skeleton code for usermode queue creation. It typically 
contains:
- A new structure to keep all the user queue data in one place.
- An IOCTL function to create/free a usermode queue.
- A function to generate unique index for the queue.
- A global ptr in amdgpu_dev

Cc: Alex Deucher 
Cc: Christian Koenig 
Signed-off-by: Shashank Sharma 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   2 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 187 ++
 .../drm/amd/include/amdgpu_usermode_queue.h   |  50 +
 5 files changed, 246 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
 create mode 100644 drivers/gpu/drm/amd/include/amdgpu_usermode_queue.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 6ad39cf71bdd..e2a34ee57bfb 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -209,6 +209,8 @@ amdgpu-y += \
 # add amdkfd interfaces
 amdgpu-y += amdgpu_amdkfd.o

+# add usermode queue
+amdgpu-y += amdgpu_userqueue.o

 ifneq ($(CONFIG_HSA_AMD),)
 AMDKFD_PATH := ../amdkfd
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 8639a4f9c6e8..4b566fcfca18 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -749,6 +749,11 @@ struct amdgpu_mqd {
struct amdgpu_mqd_prop *p);
 };

+struct amdgpu_userq_globals {
+   struct ida ida;
+   struct mutex userq_mutex;
+};
+
 #define AMDGPU_RESET_MAGIC_NUM 64
 #define AMDGPU_MAX_DF_PERFMONS 4
 #define AMDGPU_PRODUCT_NAME_LEN 64
@@ -955,6 +960,7 @@ struct amdgpu_device {
boolenable_mes_kiq;
struct amdgpu_mes   mes;
struct amdgpu_mqd   mqds[AMDGPU_HW_IP_NUM];
+   struct amdgpu_userq_globals userq;

/* df */
struct amdgpu_dfdf;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
index 0fa0e56daf67..f7413859b14f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h
@@ -57,6 +57,7 @@ struct amdgpu_ctx {
unsigned long   ras_counter_ce;
unsigned long   ras_counter_ue;
uint32_tstable_pstate;
+   struct amdgpu_usermode_queue*userq;
 };

 struct amdgpu_ctx_mgr {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
new file mode 100644
index ..3b6e8f75495c
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
@@ -0,0 +1,187 @@
+/*
+ * Copyright 2022 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person
+obtaining a
+ * copy of this software and associated documentation files (the
+"Software"),
+ * to deal in the Software without restriction, including without
+limitation
+ * the rights to use, copy, modify, merge, publish, distribute,
+sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom
+the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be
+included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
+SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
+DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
+OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
+OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+
+#include "amdgpu.h"
+#include "amdgpu_vm.h"
+#include "amdgpu_mes.h"
+#include "amdgpu_usermode_queue.h"
+#include "soc15_common.h"
+
+#define CHECK_ACCESS(a) (access_ok((const void __user *)a,
+sizeof(__u64)))
+
+static int
+amdgpu_userqueue_index(struct amdgpu_device *adev) {
+int index;
+struct amdgpu_userq_globals *uqg = >userq;
+
+index = ida_simple_get(>ida, 2, AMDGPU_MAX_USERQ, GFP_KERNEL);
+return index;
+}
+
+static void

RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-16 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

I didn't reproduce the hang yet but find a race condition related with fence 
signaling time. I updated the patch series based on kernel 5.18.

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Tuesday, November 15, 2022 1:15 AM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-11-10 18:00, Michel Dänzer wrote:
> On 2022-11-08 09:01, Zhu, Jiadong wrote:
>>
>> I reproduced the glxgears 400fps scenario locally. The issue is caused by 
>> the patch5 "drm/amdgpu: Improve the software rings priority scheduler" which 
>> slows down the low priority scheduler thread if high priority ib is under 
>> executing. I'll drop this patch as we cannot identify gpu bound according to 
>> the unsignaled fence, etc.
>
> Okay, I'm testing with patches 1-4 only now.
>
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.

I'm afraid I may have run into another issue. I just hit a GPU hang, see the 
journalctl excerpt below.

(I tried rebooting the machine via SSH after this, but it never seemed to 
complete, so I had to hard-power-off the machine by holding the power button 
for a few seconds)

I can't be sure that the GPU hang is directly related to this series, but it 
seems plausible, and I hadn't hit a GPU hang in months if not over a year 
before. If this series results in potentially hitting a GPU hang every few 
days, it definitely doesn't provide enough benefit to justify that.


Nov 14 17:21:22 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring 
gfx_high timeout, signaled seq=1166051, emitted seq=1166052 Nov 14 17:21:22 
thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: 
process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 17:21:22 
thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: free PSP TMR buffer 
Nov 14 17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: MODE2 reset Nov 14 
17:21:22 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset succeeded, trying 
to resume Nov 14 17:21:22 thor kernel: [drm] PCIE GART of 1024M enabled.
Nov 14 17:21:22 thor kernel: [drm] PTB located at 0x00F400A0 Nov 14 
17:21:22 thor kernel: [drm] VRAM is lost due to GPU reset!
Nov 14 17:21:22 thor kernel: [drm] PSP is resuming...
Nov 14 17:21:22 thor kernel: [drm] reserve 0x40 from 0xf431c0 for PSP 
TMR Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: amdgpu: RAS: optional ras 
ta ucode is not available Nov 14 17:21:23 thor kernel: amdgpu :05:00.0: 
amdgpu: RAP: optional rap ta ucode is not available Nov 14 17:21:23 thor 
gnome-shell[3639]: amdgpu: The CS has been rejected (-125), but the context 
isn't robust.
Nov 14 17:21:23 thor gnome-shell[3639]: amdgpu: The process will be terminated.
Nov 14 17:21:23 thor kernel: [drm] kiq ring mec 2 pipe 1 q 0 Nov 14 17:21:23 
thor kernel: amdgpu :05:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] 
*ERROR* ring kiq_2.1.0 test failed (-110) Nov 14 17:21:23 thor kernel: 
[drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed Nov 14 
17:21:23 thor kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* 
resume of IP block  failed -110 Nov 14 17:21:23 thor kernel: amdgpu 
:05:00.0: amdgpu: GPU reset(2) failed Nov 14 17:21:23 thor kernel: amdgpu 
:05:00.0: amdgpu: GPU reset end with ret = -110 Nov 14 17:21:23 thor 
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110 
[...] Nov 14 17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* 
ring gfx_high timeout, signaled seq=1166052, emitted seq=1166052 Nov 14 
17:21:33 thor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process 
information: process gnome-shell pid 2828 thread gnome-shel:cs0 pid 2860 Nov 14 
17:21:33 thor kernel: amdgpu :05:00.0: amdgpu: GPU reset begin!


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C33282c71226444b6c24508dac663cf4a%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638040429257070484%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=Cb8OyRawA9T9%2FfGSehB9r9JY%2FKcC4%2B%2FCdY8UaRh79t4%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-10 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

It is true that we don’t get obvious improvement on performance with these 
patches.
The original requirement of using mcbp is that when there is a very long ib 
package with many draw cmds on low priority which uses up gpu utilization, we 
give a chance to high priority ibs executed by gpu.
The total performance could be dropped as mcbp drains the pipe and the low 
priority ibs would be resubmitted again after that.

This set of patches is mainly to implement priority queues by software rings. 
We may use other method instead of mcbp to improve it later.

Thanks,
Jiadong

-Original Message-
From: Alex Deucher 
Sent: Friday, November 11, 2022 1:54 AM
To: Michel Dänzer 
Cc: Zhu, Jiadong ; Tuikov, Luben ; 
Huang, Ray ; Koenig, Christian ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On Thu, Nov 10, 2022 at 12:00 PM Michel Dänzer  wrote:
>
> On 2022-11-08 09:01, Zhu, Jiadong wrote:> From: Michel Dänzer
> 
> >
> >>>> The bad news is that this series still makes some things very slow. The 
> >>>> most extreme examples so far are glxgears (runs at ~400 fps now, ~7000 
> >>>> fps before, i.e. almost 20x slowdown) and hexchat (scrolling one page 
> >>>> now takes ~1 second, I can see it drawing line by line; before it was 
> >>>> almost instantaneous). I suspect this series makes the overhead of 
> >>>> running a single GPU job much bigger. On the bright side, I'm not 
> >>>> noticing any significant intermittent freezes anymore.
> >>>
> >>> Hi Michel,
> >>>
> >>> Thanks for the trying.
> >>> Is there high priority jobs running while executing glxgears?
> >>
> >> Yes, mutter is submitting high priority jobs. However, I don't think that 
> >> can explain the problem by itself:
> >>
> >> mutter only draws once per display refresh cycle. Let's assume mutter's 
> >> GPU work takes ~6-7ms (conservative example, should be less than that 
> >> usually). That leaves ~10ms per display refresh cycle (at 60 Hz refresh 
> >> rate) where GPU work from glxgears & Xwayland can run without getting 
> >> preempted. Since glxgears runs at ~7000 fps without this series, it should 
> >> be able to draw at least ~70 frames in 10ms[0], which corresponds to over 
> >> 4000 fps. Yet it manages only 1/10 of that.
> >>
> >> [0] Worst case consideration, ignoring the fact that without this series, 
> >> glxgears runs at ~7000 fps while mutter sustains 60 fps.
> >
> > I reproduced the glxgears 400fps scenario locally. The issue is caused by 
> > the patch5 "drm/amdgpu: Improve the software rings priority scheduler" 
> > which slows down the low priority scheduler thread if high priority ib is 
> > under executing. I'll drop this patch as we cannot identify gpu bound 
> > according to the unsignaled fence, etc.
>
> Okay, I'm testing with patches 1-4 only now.
>
> So far I haven't noticed any negative effects, no slowdowns or intermittent 
> freezes.
>
> The only issue is that there's hardly any positive effect either. While 
> constantly moving the window of a GPU-limited GpuTest benchmark in circles, 
> most of the time it looks exactly the same as without these patches. Only 
> occasionally, at most every few seconds, I notice that the window movement 
> becomes smoother for an instant.
>

I think it will largely depend on the workload.  The gfx pipe can only be 
preempted on draw boundaries so if most operations are a single draw, you 
probably won't see much difference.

Alex


RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-08 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

I reproduced the glxgears 400fps scenario locally. The issue is caused by the 
patch5 "drm/amdgpu: Improve the software rings priority scheduler" which slows 
down the low priority scheduler thread if high priority ib is under executing. 
I'll drop this patch as we cannot identify gpu bound according to the 
unsignaled fence, etc.

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Thursday, November 3, 2022 5:05 PM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-11-03 03:58, Zhu, Jiadong wrote:
> [AMD Official Use Only - General]
>
>> The bad news is that this series still makes some things very slow. The most 
>> extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps 
>> before, i.e. almost 20x slowdown) and hexchat (scrolling one page now takes 
>> ~1 second, I can see it drawing line by line; before it was almost 
>> instantaneous). I suspect this series makes the overhead of running a single 
>> GPU job much bigger. On the bright side, I'm not noticing any significant 
>> intermittent freezes anymore.
>
> Hi Michel,
>
> Thanks for the trying.
> Is there high priority jobs running while executing glxgears?

Yes, mutter is submitting high priority jobs. However, I don't think that can 
explain the problem by itself:

mutter only draws once per display refresh cycle. Let's assume mutter's GPU 
work takes ~6-7ms (conservative example, should be less than that usually). 
That leaves ~10ms per display refresh cycle (at 60 Hz refresh rate) where GPU 
work from glxgears & Xwayland can run without getting preempted. Since glxgears 
runs at ~7000 fps without this series, it should be able to draw at least ~70 
frames in 10ms[0], which corresponds to over 4000 fps. Yet it manages only 1/10 
of that.

[0] Worst case consideration, ignoring the fact that without this series, 
glxgears runs at ~7000 fps while mutter sustains 60 fps.


> I am running glxgears while submitting high priority ibs using amdgpu_test, 
> the fps ranges from 6000~8000.

It's getting clear that artificial tests such as amdgpu_test don't suffice for 
evaluating the real-world impact of this kind of change.


> Continuous preemption and resubmission may cause the slow fps. Could you have 
> a check about how fast the trailing fence seqNo expands. On my side, the 
> increment of Last signaled trailing fence is < 10 in a second.

I had to go back to a kernel without this series, as it was just unusable. As 
this is my main machine, I don't know when I'll get a chance to check this.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C5cb642e1abf34ab7377308dabd7adfb7%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638030632689527329%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=jJuSMxqY4nMltWdrSOe4iJF5kmwPG2gBFXudDmheNBc%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-02 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>The bad news is that this series still makes some things very slow. The most 
>extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, 
>i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, 
>I can see it drawing line by line; before it was almost instantaneous). I 
>suspect this series makes the overhead of running a single GPU job much 
>bigger. On the bright side, I'm not noticing any significant intermittent 
>freezes anymore.

Hi Michel,

Thanks for the trying.
Is there high priority jobs running while executing glxgears? I am running 
glxgears while submitting high priority ibs using amdgpu_test, the fps ranges 
from 6000~8000.

Continuous preemption and resubmission may cause the slow fps. Could you have a 
check about how fast the trailing fence seqNo expands. On my side, the 
increment of Last signaled trailing fence is < 10 in a second.


cat /sys/kernel/debug/dri/0/amdgpu_fence_info
--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x013c
Last emitted 0x013c
Last preempted   0x

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Wednesday, November 2, 2022 7:26 PM
To: Zhu, Jiadong 
Cc: Tuikov, Luben ; Huang, Ray ; 
Koenig, Christian ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)


[ Dropping Andrey's no longer working address from Cc ]

On 2022-11-01 11:09, Michel Dänzer wrote:
> On 2022-11-01 10:58, Zhu, Jiadong wrote:
>>
>>> Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>>> gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>>> presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>>> ends up dereferencing a NULL pointer.
>>
>> It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
>> gfx_v9_0_sw_ring_funcs_gfx.
>>
>> [PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
>> @@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
>> gfx_v9_0_ring_funcs_gfx = {
>> .emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
>> .init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
>> .patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
>> +   .preempt_ib = gfx_v9_0_ring_preempt_ib,
>> .emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
>> .emit_wreg = gfx_v9_0_ring_emit_wreg,
>> .emit_reg_wait = gfx_v9_0_ring_emit_reg_wait, diff --git
>> a/drivers/gpu/drm/amd/amdgpu/soc15d.h
>> b/drivers/gpu/drm/amd/amdgpu/soc15d.h
>
> Ah! Looks like stg applied patch 3 incorrectly for me. :(
>
> I'll try and test with this fixed this week, and report back.

I'm now running with patch 3 applied correctly, and with patch 5 as well.


The good news is that I'm now seeing a positive effect with GpuTest benchmarks 
which are GPU-limited at low frame rates. In particular, with the pixmark piano 
benchmark, the GNOME Wayland session now actually stays more responsive on this 
machine than it does on my work laptop with an Intel iGPU. However, with the 
plot3d benchmark (with /plot3d_vertex_density=1750 on the command line to 
increase GPU load), it still doesn't quite manage to keep the desktop running 
at full frame rate, in contrast to the Intel iGPU.

The bad news is that this series still makes some things very slow. The most 
extreme examples so far are glxgears (runs at ~400 fps now, ~7000 fps before, 
i.e. almost 20x slowdown) and hexchat (scrolling one page now takes ~1 second, 
I can see it drawing line by line; before it was almost instantaneous). I 
suspect this series makes the overhead of running a single GPU job much bigger. 
On the bright side, I'm not noticing any significant intermittent freezes 
anymore.


In summary, while the benefits are promising, the downsides are unacceptable 
for enabling this by default.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7Cb15fb94893a247d734ff08dabcc5265c%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638029852189066953%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=awC3VH4zMdZGK9ayi8V3goI%2B%2FEkj0%2B2LL2VokYlLXSk%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-11-01 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
>gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
>presumably uses the latter, which would explain why amdgpu_ring_preempt_ib 
>ends up dereferencing a NULL pointer.

It's weird the assignment should be in gfx_v9_0_ring_funcs_gfx instead of 
gfx_v9_0_sw_ring_funcs_gfx.

[PATCH 3/5] drm/amdgpu: Modify unmap_queue format for gfx9 (v4):
@@ -6925,6 +7047,7 @@ static const struct amdgpu_ring_funcs 
gfx_v9_0_ring_funcs_gfx = {
.emit_cntxcntl = gfx_v9_ring_emit_cntxcntl,
.init_cond_exec = gfx_v9_0_ring_emit_init_cond_exec,
.patch_cond_exec = gfx_v9_0_ring_emit_patch_cond_exec,
+   .preempt_ib = gfx_v9_0_ring_preempt_ib,
.emit_frame_cntl = gfx_v9_0_ring_emit_frame_cntl,
.emit_wreg = gfx_v9_0_ring_emit_wreg,
.emit_reg_wait = gfx_v9_0_ring_emit_reg_wait,
diff --git a/drivers/gpu/drm/amd/amdgpu/soc15d.h 
b/drivers/gpu/drm/amd/amdgpu/soc15d.h

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Tuesday, November 1, 2022 5:11 PM
To: Zhu, Jiadong 
Cc: amd-gfx@lists.freedesktop.org; Andrey Grodzovsky 
; Huang, Ray ; Koenig, Christian 
; Tuikov, Luben 
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)


[ Please don't top-post ]


On 2022-11-01 02:04, Zhu, Jiadong wrote:
>
> It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib 
> directly:
> #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
>
> The real ring's hook is assigned  in gfx_v9_0.c:
> static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = { ...
> .preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line 
> existed.
> ...
> }

Patch 3 assigns preempt_ib in gfx_v9_0_sw_ring_funcs_gfx, but not in 
gfx_v9_0_ring_funcs_gfx. mux->real_ring in amdgpu_mcbp_trigger_preempt 
presumably uses the latter, which would explain why amdgpu_ring_preempt_ib ends 
up dereferencing a NULL pointer.


> Here is the eglinfo on my board, I have once built mesa component
> (22.3.0-devel), which may update the libEGL_mesa.so

Looks like EGL_IMG_context_priority is supported (it would be interesting to 
see if it's listed for the Device Platform as well, but there's probably no 
reason to assume otherwise). Strange that you can't reproduce then.


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7CJiadong.Zhu%40amd.com%7C89dc4843c4674e7c706a08dabbe91326%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028906976834499%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=8Ow2Lx890iYmhsih0ZFqgOYg9ciGz%2FjJLF3p7hhAtc4%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

2022-10-31 Thread Zhu, Jiadong
[AMD Official Use Only - General]

It is a macro defined in amdgpu_ring.h, calling function pointer preempt_ib 
directly:
#define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)

The real ring's hook is assigned  in gfx_v9_0.c:
static const struct amdgpu_ring_funcs gfx_v9_0_ring_funcs_gfx = {
...
.preempt_ib = gfx_v9_0_ring_preempt_ib, //please have a check this line existed.
...
}

Here is the eglinfo on my board, I have once built mesa component 
(22.3.0-devel), which may update the libEGL_mesa.so
EGL client extensions string:
EGL_EXT_device_base EGL_EXT_device_enumeration EGL_EXT_device_query
EGL_EXT_platform_base EGL_KHR_client_get_all_proc_addresses
EGL_EXT_client_extensions EGL_KHR_debug EGL_EXT_platform_device
EGL_EXT_platform_x11 EGL_KHR_platform_x11 EGL_EXT_platform_xcb
EGL_MESA_platform_gbm EGL_KHR_platform_gbm
EGL_MESA_platform_surfaceless

GBM platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
EGL extensions string:
EGL_ANDROID_blob_cache EGL_ANDROID_native_fence_sync
EGL_EXT_buffer_age EGL_EXT_create_context_robustness
EGL_EXT_image_dma_buf_import EGL_EXT_image_dma_buf_import_modifiers
EGL_EXT_protected_surface EGL_IMG_context_priority EGL_KHR_cl_event2
EGL_KHR_config_attribs EGL_KHR_context_flush_control
EGL_KHR_create_context EGL_KHR_create_context_no_error
EGL_KHR_fence_sync EGL_KHR_get_all_proc_addresses
EGL_KHR_gl_colorspace EGL_KHR_gl_renderbuffer_image
EGL_KHR_gl_texture_2D_image EGL_KHR_gl_texture_3D_image
EGL_KHR_gl_texture_cubemap_image EGL_KHR_image EGL_KHR_image_base
EGL_KHR_image_pixmap EGL_KHR_no_config_context EGL_KHR_reusable_sync
EGL_KHR_surfaceless_context EGL_EXT_pixel_format_float
EGL_KHR_wait_sync EGL_MESA_configless_context EGL_MESA_drm_image
EGL_MESA_image_dma_buf_export EGL_MESA_query_driver

Thanks,
Jiadong

-Original Message-
From: Michel Dänzer 
Sent: Monday, October 31, 2022 8:02 PM
To: Zhu, Jiadong 
Cc: Andrey Grodzovsky ; Huang, Ray 
; Tuikov, Luben ; Koenig, Christian 
; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/5] drm/amdgpu: MCBP based on DRM scheduler (v8)

On 2022-10-18 11:08, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.
>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
> v7: Solve conflict for removing amdgpu_sw_ring.c.
> v8: Add time threshold to judge if preemption request is needed.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Cc: Michel Dänzer 
> Signed-off-by: Jiadong.Zhu 
> Acked-by: Luben Tuikov 
> Acked-by: Huang Rui 

[...]

> +/* Trigger Mid-Command Buffer Preemption (MCBP) and find if we need
> +to resubmit. */ int amdgpu_mcbp_trigger_preempt(struct
> +amdgpu_ring_mux *mux) {
> + int r;
> +
> + spin_lock(>lock);
> + mux->pending_trailing_fence_signaled = true;
> + r = amdgpu_ring_preempt_ib(mux->real_ring);
> + spin_unlock(>lock);
> + return r;
> +}

AFAICT amdgpu_mcbp_trigger_preempt is used only in this file, so it should be 
declared static.

(I didn't audit the other new functions added by this series for this, just 
happened to notice this one)


--
Earthling Michel Dänzer|  
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fredhat.com%2Fdata=05%7C01%7Cjiadong.zhu%40amd.com%7Cfaca0a3e35964bcbb24708dabb37b284%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638028145150743814%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=BcnmJQ5jUdI%2BGnKvUpn3agyxTD4iTSweGxEQKUlQpxI%3Dreserved=0
Libre software enthusiast  | Mesa and Xwayland developer



RE: [PATCH 1/5] drm/amdgpu: Introduce gfx software ring (v8)

2022-10-31 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

Sorry for the late response. It is more likely the null pointer is raised from 
function amdgpu_ring_preempt_ib as preempt_ib is not assigned.

Could you have a check preempt_ib  is set in gfx_v9_0.c from the patch 
"drm/amdgpu: Modify unmap_queue format for gfx9 (v4)".

"
...
.preempt_ib = gfx_v9_0_ring_preempt_ib,
"

Btw, Which branch of kmd are you cherry-pick? Maybe my code base is too old.
I tried using wayland on ubuntu 20.04 and not reproduced the crash.

Thanks,
Jiadong

-Original Message-
From: Koenig, Christian 
Sent: Thursday, October 20, 2022 10:59 PM
To: Michel Dänzer ; Zhu, Jiadong ; 
amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey ; Huang, Ray 
; Tuikov, Luben 
Subject: Re: [PATCH 1/5] drm/amdgpu: Introduce gfx software ring (v8)

Am 20.10.22 um 16:49 schrieb Michel Dänzer:
> On 2022-10-18 11:08, jiadong@amd.com wrote:
>> From: "Jiadong.Zhu" 
>>
>> The software ring is created to support priority context while there
>> is only one hardware queue for gfx.
>>
>> Every software ring has its fence driver and could be used as an
>> ordinary ring for the GPU scheduler.
>> Multiple software rings are bound to a real ring with the ring muxer.
>> The packages committed on the software ring are copied to the real ring.
>>
>> v2: Use array to store software ring entry.
>> v3: Remove unnecessary prints.
>> v4: Remove amdgpu_ring_sw_init/fini functions, using gtt for sw ring
>> buffer for later dma copy optimization.
>> v5: Allocate ring entry dynamically in the muxer.
>> v6: Update comments for the ring muxer.
>> v7: Modify for function naming.
>> v8: Combine software ring functions into amdgpu_ring_mux.c
> I tested patches 1-4 of this series (since Christian clearly nacked patch 5). 
> I hit the oops below.

As long as you don't try to reset the GPU you can also test patch 5.
It's just that we can't upstream the stuff like this or that would break 
immediately.

Regards,
Christian.

>
> amdgpu_sw_ring_ib_begin+0x70/0x80 is in amdgpu_mcbp_trigger_preempt according 
> to scripts/faddr2line, specifically line 376:
>
>   spin_unlock(>lock);
>
> though I'm not sure why that would crash.
>
>
> Are you not able to reproduce this with a GNOME Wayland session?
>
>
> BUG: kernel NULL pointer dereference, address: 
> #PF: supervisor instruction fetch in kernel mode
> #PF: error_code(0x0010) - not-present page PGD 0 P4D 0
> Oops: 0010 [#1] PREEMPT SMP NOPTI
> CPU: 7 PID: 281 Comm: gfx_high Tainted: GE  6.0.2+ #1
> Hardware name: LENOVO 20NFGE/20NFGE, BIOS R11ET36W (1.16 )
> 03/30/2020
> RIP: 0010:0x0
> Code: Unable to access opcode bytes at RIP 0xffd6.
> RSP: 0018:bd594073bdc8 EFLAGS: 00010246
> RAX:  RBX: 993c4a62 RCX: 
> RDX:  RSI:  RDI: 993c4a62a350
> RBP: 993c4a62d530 R08:  R09: 
> R10:  R11:  R12: 0114
> R13: 993c4a62 R14:  R15: 993c4a62d128
> FS:  () GS:993ef0bc()
> knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: ffd6 CR3: 0001959fc000 CR4: 003506e0 Call
> Trace:
>   
>   amdgpu_sw_ring_ib_begin+0x70/0x80 [amdgpu]
>   amdgpu_ib_schedule+0x15f/0x5d0 [amdgpu]
>   amdgpu_job_run+0x102/0x1c0 [amdgpu]
>   drm_sched_main+0x19a/0x440 [gpu_sched]
>   ? dequeue_task_stop+0x70/0x70
>   ? drm_sched_resubmit_jobs+0x10/0x10 [gpu_sched]
>   kthread+0xe9/0x110
>   ? kthread_complete_and_exit+0x20/0x20
>   ret_from_fork+0x22/0x30
>   
> [...]
> note: gfx_high[281] exited with preempt_count 1 [...]
> [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout,
> signaled seq=14864, emitted seq=14865 [drm:amdgpu_job_timedout
> [amdgpu]] *ERROR* Process information: process firefox.dpkg-di pid 3540 
> thread firefox:cs0 pid 4666 amdgpu :05:00.0: amdgpu: GPU reset begin!
>
>



RE: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

2022-10-07 Thread Zhu, Jiadong
I think Likun means to stop creating sw ring if there is no gfx ring existed.

Thanks,
Jiadong

-Original Message-
From: Zhang, Hawking  
Sent: Saturday, October 8, 2022 12:37 PM
To: Gao, Likun ; Zhu, Jiadong ; 
amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey ; Michel Dänzer 
; Tuikov, Luben ; Zhu, Jiadong 
; Koenig, Christian 
Subject: RE: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

[AMD Official Use Only - General]

I don't think so. In such case, current cwsr and user queue mechanism handle 
the pre-emption very well. The command submission actually bypass drm GPU 
scheduler.

Regards,
Hawking

-Original Message-
From: amd-gfx  On Behalf Of Gao, Likun
Sent: Saturday, October 8, 2022 11:52
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Grodzovsky, Andrey ; Michel Dänzer 
; Tuikov, Luben ; Zhu, Jiadong 
; Koenig, Christian 
Subject: RE: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

[AMD Official Use Only - General]

Shall we need to deal with the situation that no real gfx ring exist? 
(adev->gfx.num_gfx_rings is 0)

Regards,
Likun

-Original Message-
From: amd-gfx  On Behalf Of 
jiadong@amd.com
Sent: Thursday, September 29, 2022 5:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Michel Dänzer ; 
Zhu, Jiadong ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

From: "Jiadong.Zhu" 

Set ring functions with software ring callbacks on gfx9.

The software ring could be tested by debugfs_test_ib case.

v2: Set sw_ring 2 to enable software ring by default.
v3: Remove the parameter for software ring enablement.
v4: Use amdgpu_ring_init/fini for software rings.
v5: Update for code format. Fix conflict.
v6: Remove unnecessary checks and enable software ring on gfx9 by default.
v7: Use static array for software ring names and priorities.

Acked-by: Luben Tuikov 
Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Cc: Michel Dänzer 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c |  20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 104 ++-
 5 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 9996dadb39f7..4fdfc3ec134a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -348,6 +348,7 @@ struct amdgpu_gfx {
 
boolis_poweron;
 
+   struct amdgpu_ring  sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
struct amdgpu_ring_mux  muxer;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 40b1277b4f0c..f08ee1ac281c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -39,6 +39,7 @@ struct amdgpu_vm;
 #define AMDGPU_MAX_RINGS   28
 #define AMDGPU_MAX_HWIP_RINGS  8
 #define AMDGPU_MAX_GFX_RINGS   2
+#define AMDGPU_MAX_SW_GFX_RINGS 2
 #define AMDGPU_MAX_COMPUTE_RINGS   8
 #define AMDGPU_MAX_VCE_RINGS   3
 #define AMDGPU_MAX_UVD_ENC_RINGS   2
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 43cab8a37754..2e64ffccc030 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -29,6 +29,14 @@
 
 #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
 
+static const struct ring_info {
+   unsigned int hw_pio;
+   const char *ring_name;
+} sw_ring_info[] = {
+   { AMDGPU_RING_PRIO_DEFAULT, "gfx_low"},
+   { AMDGPU_RING_PRIO_2, "gfx_high"},
+};
+
 int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
 unsigned int entry_size)
 {
@@ -215,3 +223,15 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count)  {
WARN_ON(!ring->is_sw_ring);
 }
+
+const char *amdgpu_sw_ring_name(int idx) {
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].ring_name : NULL;
+}
+
+unsigned int amdgpu_sw_ring_priority(int idx) {
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].hw_pio : AMDGPU_RING_PRIO_DEFAULT; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
index d91629589577..28399f4b0e5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
@@ -73,4 +73,6 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count);  void amdgpu_sw_ring_ib_begin(struct amdgpu_ring *ring);  void

RE: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

2022-10-07 Thread Zhu, Jiadong
I agree, let me update the patch.

Thanks,
Jiadong

-Original Message-
From: Gao, Likun  
Sent: Saturday, October 8, 2022 11:52 AM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Michel Dänzer ; 
Zhu, Jiadong ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: RE: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

[AMD Official Use Only - General]

Shall we need to deal with the situation that no real gfx ring exist? 
(adev->gfx.num_gfx_rings is 0)

Regards,
Likun

-Original Message-
From: amd-gfx  On Behalf Of 
jiadong@amd.com
Sent: Thursday, September 29, 2022 5:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Michel Dänzer ; 
Zhu, Jiadong ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: [PATCH 2/4] drm/amdgpu: Add software ring callbacks for gfx9 (v7)

From: "Jiadong.Zhu" 

Set ring functions with software ring callbacks on gfx9.

The software ring could be tested by debugfs_test_ib case.

v2: Set sw_ring 2 to enable software ring by default.
v3: Remove the parameter for software ring enablement.
v4: Use amdgpu_ring_init/fini for software rings.
v5: Update for code format. Fix conflict.
v6: Remove unnecessary checks and enable software ring on gfx9 by default.
v7: Use static array for software ring names and priorities.

Acked-by: Luben Tuikov 
Cc: Christian Koenig 
Cc: Luben Tuikov 
Cc: Andrey Grodzovsky 
Cc: Michel Dänzer 
Signed-off-by: Jiadong.Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c |  20 
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |   2 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 104 ++-
 5 files changed, 127 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index 9996dadb39f7..4fdfc3ec134a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -348,6 +348,7 @@ struct amdgpu_gfx {
 
boolis_poweron;
 
+   struct amdgpu_ring  sw_gfx_ring[AMDGPU_MAX_SW_GFX_RINGS];
struct amdgpu_ring_mux  muxer;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 40b1277b4f0c..f08ee1ac281c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -39,6 +39,7 @@ struct amdgpu_vm;
 #define AMDGPU_MAX_RINGS   28
 #define AMDGPU_MAX_HWIP_RINGS  8
 #define AMDGPU_MAX_GFX_RINGS   2
+#define AMDGPU_MAX_SW_GFX_RINGS 2
 #define AMDGPU_MAX_COMPUTE_RINGS   8
 #define AMDGPU_MAX_VCE_RINGS   3
 #define AMDGPU_MAX_UVD_ENC_RINGS   2
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
index 43cab8a37754..2e64ffccc030 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
@@ -29,6 +29,14 @@
 
 #define AMDGPU_MUX_RESUBMIT_JIFFIES_TIMEOUT (HZ / 2)
 
+static const struct ring_info {
+   unsigned int hw_pio;
+   const char *ring_name;
+} sw_ring_info[] = {
+   { AMDGPU_RING_PRIO_DEFAULT, "gfx_low"},
+   { AMDGPU_RING_PRIO_2, "gfx_high"},
+};
+
 int amdgpu_ring_mux_init(struct amdgpu_ring_mux *mux, struct amdgpu_ring *ring,
 unsigned int entry_size)
 {
@@ -215,3 +223,15 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count)  {
WARN_ON(!ring->is_sw_ring);
 }
+
+const char *amdgpu_sw_ring_name(int idx) {
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].ring_name : NULL;
+}
+
+unsigned int amdgpu_sw_ring_priority(int idx) {
+   return idx < ARRAY_SIZE(sw_ring_info) ?
+   sw_ring_info[idx].hw_pio : AMDGPU_RING_PRIO_DEFAULT; }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
index d91629589577..28399f4b0e5c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
@@ -73,4 +73,6 @@ void amdgpu_sw_ring_insert_nop(struct amdgpu_ring *ring, 
uint32_t count);  void amdgpu_sw_ring_ib_begin(struct amdgpu_ring *ring);  void 
amdgpu_sw_ring_ib_end(struct amdgpu_ring *ring);
 
+const char *amdgpu_sw_ring_name(int idx); unsigned int 
+amdgpu_sw_ring_priority(int idx);
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6b609f33261f..3b607c09d267 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -47,6 +47,7 @@
 
 #include "amdgpu_ras.h"
 
+#include "amdgpu_ring_mux.h"
 #include "gfx_v9_4.h"
 #include "gfx_v9_0.h"
 #include "gfx_v9_4_2.h"
@@ -56,6 +57,7 @@
 #include "as

RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-27 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Michel,

Please make sure umd is calling the libdrm function to create context with 
different priories,
amdgpu_cs_ctx_create2(device_handle, AMDGPU_CTX_PRIORITY_HIGH, _handle).

Here is the behavior we could see:
1. After modprobe amdgpu, two software rings named gfx_high/gfx_low(in previous 
patch named gfx_sw) is visible in UMR. We could check the wptr/ptr to see if it 
is being used.
2. MCBP happens while the two different priority ibs are submitted at the same 
time. We could check fence info as below:
Last signaled trailing fence++  when the mcbp triggers by kmd. Last preempted 
may not increase as the mcbp is not triggered from CP.

--- ring 0 (gfx) ---
Last signaled fence  0x0001
Last emitted 0x0001
Last signaled trailing fence 0x0022eb84
Last emitted 0x0022eb84
Last preempted   0x
Last reset   0x

Thanks,
Jiadong
-Original Message-
From: Koenig, Christian 
Sent: Wednesday, September 28, 2022 2:20 AM
To: Michel Dänzer 
Cc: Grodzovsky, Andrey ; Tuikov, Luben 
; Zhu, Jiadong ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

This work is solely for gfx9 (e.g. Vega) and older.

Navi has a completely separate high priority gfx queue we can use for this.

Thanks,
Christian.

Am 27.09.22 um 19:49 schrieb Michel Dänzer:
> On 2022-09-27 08:06, Christian König wrote:
>> Hey Michel,
>>
>> JIadong is working on exposing high/low priority gfx queues for gfx9 and 
>> older hw generations by using mid command buffer preemption.
> Yeah, I've been keeping an eye on these patches. I'm looking forward to this 
> working.
>
>
>> I know that you have been working on Gnome Mutter to make use from userspace 
>> for this. Do you have time to run some tests with that?
> I just tested the v8 series (first without amdgpu.mcbp=1 on the kernel 
> command line, then with it, since I wasn't sure if it's needed) with 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.gnome.org%2FGNOME%2Fmutter%2F-%2Fmerge_requests%2F1880data=05%7C01%7Cchristian.koenig%40amd.com%7Cc6345d9230004549ba4d08daa0b0abcd%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637998977913548768%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=P1Qo2AwDmfmPrxJe2SxTFsVjdJ9vjabK8s84ZVz%2Beh8%3Dreserved=0
>  on Navi 14.
>
> Unfortunately, I'm not seeing any change in behaviour. Even though mutter 
> uses a high priority context via the EGL_IMG_context_priority extension, it's 
> unable to reach a higher frame rate than a GPU-limited client[0]. The "Last 
> preempted" line of /sys/kernel/debug/dri/0/amdgpu_fence_info remains at 
> 0x.
>
> Did I miss a step?
>
>
> [0] I used the GpuTest pixmark piano & plot3d benchmarks. With an Intel iGPU, 
> mutter can achieve a higher frame rate than plot3d, though not than pixmark 
> piano (presumably due to limited GPU preemption granularity).
>
>> Am 27.09.22 um 05:18 schrieb Zhu, Jiadong:
>>> [AMD Official Use Only - General]
>>>
>>>> I need more time for an in deep review of this, but form the one mile high 
>>>> view it looks correct to me now.
>>>> Can we do some pre-commit qa testing with this?
>>> I changed drm test "Command submission Test (GFX)" to send high
>>> priority ibs meanwhile running Manhattan on Screen/Unigine heaven
>>> foreground, checking mcbp/resubmit triggered by cat
>>> /sys/kernel/debug/dri/0/amdgpu_fence_info
>>>
>>> I have continued running this scenario for 2 daytime and 1 night, no hangs 
>>> happen yet(lots of hangs has been fixed in the previous patches).
>>>
>>> I will ask QA team to do more test.
>>>
>>> Thanks,
>>> JIadong
>>>
>>> -Original Message-
>>> From: Christian König 
>>> Sent: Monday, September 26, 2022 2:49 PM
>>> To: Zhu, Jiadong ;
>>> amd-gfx@lists.freedesktop.org
>>> Cc: Tuikov, Luben ; Koenig, Christian
>>> ; Grodzovsky, Andrey
>>> 
>>> Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler
>>> (v6)
>>>
>>> Caution: This message originated from an External Source. Use proper 
>>> caution when opening attachments, clicking links, or responding.
>>>
>>>
>>> Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
>>>> From: "Jiadong.Zhu" 
>>>>
>>>> Trigger Mid-Command Buffer Preemption according to the priority of
>>>> the software rings and the hw fence signalling condition.
>>>

RE: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

2022-09-26 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>I need more time for an in deep review of this, but form the one mile high 
>view it looks correct to me now.

>Can we do some pre-commit qa testing with this?

I changed drm test "Command submission Test (GFX)" to send high priority ibs 
meanwhile running Manhattan on Screen/Unigine heaven foreground, checking 
mcbp/resubmit triggered by cat /sys/kernel/debug/dri/0/amdgpu_fence_info

I have continued running this scenario for 2 daytime and 1 night, no hangs 
happen yet(lots of hangs has been fixed in the previous patches).

I will ask QA team to do more test.

Thanks,
JIadong

-Original Message-
From: Christian König 
Sent: Monday, September 26, 2022 2:49 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Koenig, Christian 
; Grodzovsky, Andrey 
Subject: Re: [PATCH 4/4] drm/amdgpu: MCBP based on DRM scheduler (v6)

Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.


Am 23.09.22 um 15:16 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.
>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
> v6: Refactor functions for resubmission, calling fence_process in irq handler.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Acked-by: Luben Tuikov 
> Signed-off-by: Jiadong.Zhu 

I need more time for an in deep review of this, but form the one mile high view 
it looks correct to me now.

Can we do some pre-commit qa testing with this?

Thanks,
Christian.

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  13 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 323 ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  30 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c|  10 +-
>   8 files changed, 368 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
>   ring->funcs->emit_wave_limit(ring, false);
>
> + amdgpu_ring_ib_end(ring);
>   amdgpu_ring_commit(ring);
>   return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> index 13db99d653bd..84b0b3c7d40f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
> @@ -33,6 +33,7 @@
>
>   #include 
>   #include "amdgpu.h"
> +#include "amdgpu_sw_ring.h"
>   #include "atom.h"
>
>   /*
> @@ -569,3 +570,15 @@ int amdgpu_ring_init_mqd(struct amdgpu_ring
> *ring)
>
>   return mqd_mgr->init_mqd(adev, ring->mqd_ptr, );
>   }
> +
> +void amdgpu_ring_ib_begin(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_begin(ring); }
> +
> +void amdgpu_ring_ib_end(struct amdgpu_ring *ring) {
> + if (ring->is_sw_ring)
> + amdgpu_sw_ring_ib_end(ring); }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index e90d327a589e..6fbc1627dab7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -312,6 +312,9 @@ struct amdgpu_ring {
>   #define amdgpu_ring_preempt_ib(r) (r)->funcs->preempt_ib(r)
>
>   int amdgpu_ring_alloc(struct amd

RE: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

2022-09-23 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>> Also what is the check of s_resubmit state good for here?
> The resubmission happens on the low priority ring shed task. Thinking of the 
> situation that two high prio ib requests from high priority calling 
> amdgpu_mcbp_scan, we don’t want to trigger preemption twice when low priority 
> ring has not resubmitted packages.

>Mhm, I'm not sure if we should complicate things like this and rather just use 
>a fixed low and high priority ring.

>So the flag basically means that some low priority work as still preempted and 
>waits for re-submission?

Yes, the re-submission happens the next time there is low priority ib to emit 
or in timeout-fallback if it is the last low priority ib.

Thanks,
Jiadong

-Original Message-
From: Koenig, Christian 
Sent: Friday, September 23, 2022 7:23 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Grodzovsky, Andrey 

Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

Am 23.09.22 um 12:51 schrieb Zhu, Jiadong:
> [AMD Official Use Only - General]
>
>>> + return need_preempt && !mux->s_resubmit;
>>> Well what exactly are you trying to do here? Finding if a lower priority 
>>> ring has unsignaled fences?
>> Yes, we are peeking the fence_drv data at the time high priority ibs are 
>> going to emit. The result is not necessarily accurate because we would check 
>> the fence after preemption complete.
>> Please use amdgpu_fence_count_emitted() for this instead.
> amdgpu_fence_count_emitted calls amdgpu_fence_process in it.  We are in high 
> priority ring schedule task while calling amdgpu_mcbp_scan, 
> amdgpu_fence_process on both rings might lower the performance.
> Maybe we could add a function in amdgpu_fence.c to count_emmited without 
> amdgpu_fence_process?

Good point. This also not only lowers the performance, but is problematic for 
correctness since dma_fence won't signal from the interrupt handler any more.

Yeah, feel free to remove the call to amdgpu_fence_process() from
amdgpu_fence_count_emitted() in a separate patch.

>
>> Also what is the check of s_resubmit state good for here?
> The resubmission happens on the low priority ring shed task. Thinking of the 
> situation that two high prio ib requests from high priority calling 
> amdgpu_mcbp_scan, we don’t want to trigger preemption twice when low priority 
> ring has not resubmitted packages.

Mhm, I'm not sure if we should complicate things like this and rather just use 
a fixed low and high priority ring.

So the flag basically means that some low priority work as still preempted and 
waits for re-submission?

Regards,
Christian.

>
> Thanks,
> Jiadong
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Friday, September 23, 2022 6:13 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
> Cc: Tuikov, Luben ; Grodzovsky, Andrey
> 
> Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)
>
> Am 23.09.22 um 11:24 schrieb Zhu, Jiadong:
>> [AMD Official Use Only - General]
>>
>> Inlined.
>>
>> Thanks,
>> Jiadong
>>
>> -Original Message-
>> From: Koenig, Christian 
>> Sent: Wednesday, September 21, 2022 9:01 PM
>> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
>> Cc: Tuikov, Luben ; Grodzovsky, Andrey
>> 
>> Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)
>>
>> Am 21.09.22 um 11:41 schrieb jiadong@amd.com:
>>> From: "Jiadong.Zhu" 
>>>
>>> Trigger Mid-Command Buffer Preemption according to the priority of
>>> the software rings and the hw fence signalling condition.
>>>
>>> The muxer saves the locations of the indirect buffer frames from the
>>> software ring together with the fence sequence number in its fifo
>>> queue, and pops out those records when the fences are signalled. The
>>> locations are used to resubmit packages in preemption scenarios by coping 
>>> the chunks from the software ring.
>> Maybe change the subject a bit. The MCBP is not really triggered by the core 
>> Linux kernel.
>>
>> Maybe write instead "MCBP based on DRM scheduler".
>>
>>> v2: Update comment style.
>>> v3: Fix conflict caused by previous modifications.
>>> v4: Remove unnecessary prints.
>>> v5: Fix corner cases for resubmission cases.
>>>
>>> Cc: Christian Koenig 
>>> Cc: Luben Tuikov 
>>> Cc: Andrey Grodzovsky 
>>> Acked-by: Luben Tuikov 
>>> Signed-off-by: Jiadong.Zhu 
>>> ---
>>> drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
>>> drivers/gpu/dr

RE: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

2022-09-23 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>> + return need_preempt && !mux->s_resubmit;
>> Well what exactly are you trying to do here? Finding if a lower priority 
>> ring has unsignaled fences?
> Yes, we are peeking the fence_drv data at the time high priority ibs are 
> going to emit. The result is not necessarily accurate because we would check 
> the fence after preemption complete.

>Please use amdgpu_fence_count_emitted() for this instead.

amdgpu_fence_count_emitted calls amdgpu_fence_process in it.  We are in high 
priority ring schedule task while calling amdgpu_mcbp_scan, 
amdgpu_fence_process on both rings might lower the performance.
Maybe we could add a function in amdgpu_fence.c to count_emmited without 
amdgpu_fence_process?

>Also what is the check of s_resubmit state good for here?
The resubmission happens on the low priority ring shed task. Thinking of the 
situation that two high prio ib requests from high priority calling 
amdgpu_mcbp_scan, we don’t want to trigger preemption twice when low priority 
ring has not resubmitted packages.

Thanks,
Jiadong

-Original Message-
From: Koenig, Christian 
Sent: Friday, September 23, 2022 6:13 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Grodzovsky, Andrey 

Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

Am 23.09.22 um 11:24 schrieb Zhu, Jiadong:
> [AMD Official Use Only - General]
>
> Inlined.
>
> Thanks,
> Jiadong
>
> -Original Message-
> From: Koenig, Christian 
> Sent: Wednesday, September 21, 2022 9:01 PM
> To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
> Cc: Tuikov, Luben ; Grodzovsky, Andrey
> 
> Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)
>
> Am 21.09.22 um 11:41 schrieb jiadong@amd.com:
>> From: "Jiadong.Zhu" 
>>
>> Trigger Mid-Command Buffer Preemption according to the priority of
>> the software rings and the hw fence signalling condition.
>>
>> The muxer saves the locations of the indirect buffer frames from the
>> software ring together with the fence sequence number in its fifo
>> queue, and pops out those records when the fences are signalled. The
>> locations are used to resubmit packages in preemption scenarios by coping 
>> the chunks from the software ring.
> Maybe change the subject a bit. The MCBP is not really triggered by the core 
> Linux kernel.
>
> Maybe write instead "MCBP based on DRM scheduler".
>
>> v2: Update comment style.
>> v3: Fix conflict caused by previous modifications.
>> v4: Remove unnecessary prints.
>> v5: Fix corner cases for resubmission cases.
>>
>> Cc: Christian Koenig 
>> Cc: Luben Tuikov 
>> Cc: Andrey Grodzovsky 
>> Acked-by: Luben Tuikov 
>> Signed-off-by: Jiadong.Zhu 
>> ---
>>drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>>drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c |  91 +
>>drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h |  29 +++
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 186 ++-
>>drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  24 +++
>>drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  27 +++
>>drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>>10 files changed, 372 insertions(+), 6 deletions(-)
>>create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
>>create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>> index 85224bc81ce5..24c5aa19bbf2 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>>amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>>amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>>amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
>> - amdgpu_sw_ring.o amdgpu_ring_mux.o
>> + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o
> This functionality is spread over to many files. Probably better to move this 
> into the amdgpu_ring_mux.c as well.
>
>>amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> index 258cffe3c06a..af86d87e2f3b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
>> +++ b/drivers/gpu/drm/amd/a

RE: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

2022-09-23 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Inlined.

Thanks,
Jiadong

-Original Message-
From: Koenig, Christian 
Sent: Wednesday, September 21, 2022 9:01 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Tuikov, Luben ; Grodzovsky, Andrey 

Subject: Re: [PATCH 4/5] drm/amdgpu: Implement OS triggered MCBP (v5)

Am 21.09.22 um 11:41 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Trigger Mid-Command Buffer Preemption according to the priority of the
> software rings and the hw fence signalling condition.
>
> The muxer saves the locations of the indirect buffer frames from the
> software ring together with the fence sequence number in its fifo
> queue, and pops out those records when the fences are signalled. The
> locations are used to resubmit packages in preemption scenarios by coping the 
> chunks from the software ring.

Maybe change the subject a bit. The MCBP is not really triggered by the core 
Linux kernel.

Maybe write instead "MCBP based on DRM scheduler".

>
> v2: Update comment style.
> v3: Fix conflict caused by previous modifications.
> v4: Remove unnecessary prints.
> v5: Fix corner cases for resubmission cases.
>
> Cc: Christian Koenig 
> Cc: Luben Tuikov 
> Cc: Andrey Grodzovsky 
> Acked-by: Luben Tuikov 
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c |  91 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h |  29 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 186 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  24 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  27 +++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   |   2 +
>   10 files changed, 372 insertions(+), 6 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 85224bc81ce5..24c5aa19bbf2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
> - amdgpu_sw_ring.o amdgpu_ring_mux.o
> + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o

This functionality is spread over to many files. Probably better to move this 
into the amdgpu_ring_mux.c as well.

>
>   amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
>   ring->funcs->emit_wave_limit(ring, false);
>
> + amdgpu_ring_ib_end(ring);
>   amdgpu_ring_commit(ring);
>   return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> new file mode 100644
> index ..121b1a4e0f04
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> @@ -0,0 +1,91 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person
> +obtaining a
> + * copy of this software and associated documentation files (the
> +"Software"),
> + * to deal in the Software without restriction, including without
> +limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> +sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom
> +the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> +included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITH

RE: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v4)

2022-09-14 Thread Zhu, Jiadong
[AMD Official Use Only - General]

>So, between the function names of "amdgpu_ring_get_rptr_from_mux()" and 
>"amdgpu_ring_get_wptr_from_mux()",
>they are 96.6(6)% _the_same_ name to a human. How about 
>"amdgpu_ring_get_readp_from_mux()" and "amdgpu_ring_get_writep_from_mux()"?

I agree with the similarity between rptr and wptr. But the function named 
amdgpu_ring_get_rptr_from_mux  is aligned with amdgpu_ring_get_rptr for the 
same style, easier to understand.
I would rather use it currently until we replace rptr with writep in all the 
other places, shall we?

Thanks,
Jiadong

-Original Message-
From: Tuikov, Luben 
Sent: Thursday, September 15, 2022 12:49 AM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Grodzovsky, Andrey 
; Koenig, Christian 
Subject: Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v4)

It's customary to add a Cc: tag in the commit message on subsequent revisions 
of a patch, once a person has reviewed and commented on it--Christian, Andrey, 
me, so that subsequent patches' emails go out to those people automatically via 
a CC.

Inlined:

On 2022-09-13 05:05, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
>
> The software ring is created to support priority context while there
> is only one hardware queue for gfx.
>
> Every software rings has its fence driver and could

"ring", singular.

> be used as an ordinary ring for the gpu_scheduler.

"GPU scheduler".

> Multiple software rings are binded to a real ring

The past tense of "bind" is "bound", not "binded".

> with the ring muxer. The packages committed on the software ring are
> copied to the real ring.

Use 79 column wrap setting in your editor, not 50 or 60.
Wrap at 79.

>
> v2: use array to store software ring entry.
> v3: remove unnecessary prints.
> v4: remove amdgpu_ring_sw_init/fini functions, using gtt for sw ring
> buffer for later dma copy optimization.
>
> Signed-off-by: Jiadong.Zhu 

Add Cc: tags before the Signed-off-by line.

> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182
> +++  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |
> 67 +++  drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  60 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  43 +
>  7 files changed, 360 insertions(+), 1 deletion(-)  create mode 100644
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 3e0e2eb7e235..85224bc81ce5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
> - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
> + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
> + amdgpu_sw_ring.o amdgpu_ring_mux.o
>
>  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 53526ffb2ce1..0de8e3cd0f1c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -33,6 +33,7 @@
>  #include "amdgpu_imu.h"
>  #include "soc15.h"
>  #include "amdgpu_ras.h"
> +#include "amdgpu_ring_mux.h"
>
>  /* GFX current status */
>  #define AMDGPU_GFX_NORMAL_MODE   0xL
> @@ -346,6 +347,8 @@ struct amdgpu_gfx {
>   struct amdgpu_gfx_ras   *ras;
>
>   boolis_poweron;
> +
> + struct amdgpu_ring_mux  muxer;

It doesn't align, possibly because TAB chars were used between "bool" and 
"is_poweron", and because TAB chars were used between "struct amdgpu_ring_mux" 
and "muxer".

>  };
>
>  #define amdgpu_gfx_get_gpu_clock_counter(adev)
> (adev)->gfx.funcs->get_gpu_clock_counter((adev))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 7d89a52091c0..fe33a683bfba 100644
> --- 

RE: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-13 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Thank Luben for the review. I replied inline and will update the patch.

Thanks,
Jiadong

-Original Message-
From: Tuikov, Luben 
Sent: Tuesday, September 13, 2022 11:12 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray 
Subject: Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

Inlined:

On 2022-09-08 21:50, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
>
> The software ring is created to support priority context while there
> is only one hardware queue for gfx.
>
> Every software rings has its fence driver and could be used as an
> ordinary ring for the gpu_scheduler.
> Multiple software rings are binded to a real ring with the ring muxer.
> The packages committed on the software ring are copied to the real
> ring.
>
> v2: use array to store software ring entry.
> v3: remove unnecessary prints.
>
> Signed-off-by: Jiadong.Zhu 
> ---
>  drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
>  7 files changed, 509 insertions(+), 1 deletion(-)  create mode 100644
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
>  create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 3e0e2eb7e235..85224bc81ce5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o amdgpu_nbio.o \
>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
> - amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
> + amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
> + amdgpu_sw_ring.o amdgpu_ring_mux.o
>
>  amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index 53526ffb2ce1..0de8e3cd0f1c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -33,6 +33,7 @@
>  #include "amdgpu_imu.h"
>  #include "soc15.h"
>  #include "amdgpu_ras.h"
> +#include "amdgpu_ring_mux.h"
>
>  /* GFX current status */
>  #define AMDGPU_GFX_NORMAL_MODE   0xL
> @@ -346,6 +347,8 @@ struct amdgpu_gfx {
>   struct amdgpu_gfx_ras   *ras;
>
>   boolis_poweron;
> +
> + struct amdgpu_ring_mux  muxer;
>  };
>
>  #define amdgpu_gfx_get_gpu_clock_counter(adev)
> (adev)->gfx.funcs->get_gpu_clock_counter((adev))
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 7d89a52091c0..fe33a683bfba 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -278,6 +278,9 @@ struct amdgpu_ring {
>   boolis_mes_queue;
>   uint32_thw_queue_id;
>   struct amdgpu_mes_ctx_data *mes_ctx;
> +
> + boolis_sw_ring;
> +
>  };
>
>  #define amdgpu_ring_parse_cs(r, p, job, ib)
> ((r)->funcs->parse_cs((p), (job), (ib))) diff --git
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
> new file mode 100644
> index ..ea4a3c66119a
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
> @@ -0,0 +1,182 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person
> +obtaining a
> + * copy of this software and associated documentation files (the
> +"Software"),
> + * to deal in the Software without restriction, including without
> +limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> +sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom
> +the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permi

RE: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)

2022-09-12 Thread Zhu, Jiadong
[AMD Official Use Only - General]

-Original Message-
From: Grodzovsky, Andrey 
Sent: Tuesday, September 13, 2022 12:45 AM
To: Christian König ; Zhu, Jiadong 
; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray 
Subject: Re: [PATCH 1/4] drm/amdgpu: Introduce gfx software ring(v3)


On 2022-09-12 12:22, Christian König wrote:
> Am 12.09.22 um 17:34 schrieb Andrey Grodzovsky:
>> On 2022-09-12 09:27, Christian König wrote:
>>
>>> Am 12.09.22 um 15:22 schrieb Andrey Grodzovsky:
>>>>
>>>> On 2022-09-12 06:20, Christian König wrote:
>>>>> Am 09.09.22 um 18:45 schrieb Andrey Grodzovsky:
>>>>>>
>>>>>> On 2022-09-08 21:50, jiadong@amd.com wrote:
>>>>>>> From: "Jiadong.Zhu" 
>>>>>>>
>>>>>>> The software ring is created to support priority context while
>>>>>>> there is only one hardware queue for gfx.
>>>>>>>
>>>>>>> Every software rings has its fence driver and could be used as
>>>>>>> an ordinary ring for the gpu_scheduler.
>>>>>>> Multiple software rings are binded to a real ring with the ring
>>>>>>> muxer. The packages committed on the software ring are copied to
>>>>>>> the real ring.
>>>>>>>
>>>>>>> v2: use array to store software ring entry.
>>>>>>> v3: remove unnecessary prints.
>>>>>>>
>>>>>>> Signed-off-by: Jiadong.Zhu 
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/amd/amdgpu/Makefile  |   3 +-
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h  |   3 +
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 182
>>>>>>> +
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  67 ++
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  | 204
>>>>>>> +++
>>>>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h  |  48 +
>>>>>>>   7 files changed, 509 insertions(+), 1 deletion(-)
>>>>>>>   create mode 100644
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c
>>>>>>>   create mode 100644
>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h
>>>>>>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c
>>>>>>>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.h
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>> index 3e0e2eb7e235..85224bc81ce5 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
>>>>>>> @@ -58,7 +58,8 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>>>>>>>   amdgpu_vm_sdma.o amdgpu_discovery.o amdgpu_ras_eeprom.o
>>>>>>> amdgpu_nbio.o \
>>>>>>>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o
>>>>>>> amdgpu_rap.o \
>>>>>>>   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>>>>>>> -amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
>>>>>>> +amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o
>>>>>>> +\
>>>>>>> +amdgpu_sw_ring.o amdgpu_ring_mux.o
>>>>>>> amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>>>>>>>   diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>>>>>> index 53526ffb2ce1..0de8e3cd0f1c 100644
>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>>>>>> @@ -33,6 +33,7 @@
>>>>>>>   #include "amdgpu_imu.h"
>>>>>>>   #include "soc15.h"
>>>>>>>   #include "amdgpu_ras.h"
>>>>>>> +#include "amdgpu_ring_mux.h"
>>>>>>> /* GFX current status */
>>>>>>>   #define AMDGPU_GFX_NORMAL_MODE 0xL @@ -346,6 +347,8 @@
>>>>>>> struct amdgpu_gfx {
>>>&g

RE: [PATCH 4/4] drm/amdgpu: Implement OS triggered MCBP(v2)

2022-09-12 Thread Zhu, Jiadong
[AMD Official Use Only - General]

-Original Message-
From: Grodzovsky, Andrey 
Sent: Saturday, September 10, 2022 1:02 AM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray 
Subject: Re: [PATCH 4/4] drm/amdgpu: Implement OS triggered MCBP(v2)


On 2022-09-08 21:50, jiadong@amd.com wrote:
> From: "Jiadong.Zhu" 
>
> Trigger MCBP according to the priroty of the software rings and the hw
> fence signaling condition.
>
> The muxer records some lastest locations from the software ring which
> is used to resubmit packages in preemption scenarios.
>
> v2: update comment style
>
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/Makefile  |   2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c   |   2 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c | 101 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h |  29 
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c |  12 ++
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   3 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.c | 163 ++-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring_mux.h |  16 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_sw_ring.c  |  26 +++
>   9 files changed, 351 insertions(+), 3 deletions(-)
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
>   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.h
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> b/drivers/gpu/drm/amd/amdgpu/Makefile
> index 85224bc81ce5..24c5aa19bbf2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> @@ -59,7 +59,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
>   amdgpu_umc.o smu_v11_0_i2c.o amdgpu_fru_eeprom.o amdgpu_rap.o \
>   amdgpu_fw_attestation.o amdgpu_securedisplay.o \
>   amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
> - amdgpu_sw_ring.o amdgpu_ring_mux.o
> + amdgpu_sw_ring.o amdgpu_ring_mux.o amdgpu_mcbp.o
>
>   amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> index 258cffe3c06a..af86d87e2f3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
> @@ -211,6 +211,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   }
>   }
>
> + amdgpu_ring_ib_begin(ring);
>   if (job && ring->funcs->init_cond_exec)
>   patch_offset = amdgpu_ring_init_cond_exec(ring);
>
> @@ -285,6 +286,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
> num_ibs,
>   ring->hw_prio == AMDGPU_GFX_PIPE_PRIO_HIGH)
>   ring->funcs->emit_wave_limit(ring, false);
>
> + amdgpu_ring_ib_end(ring);
>   amdgpu_ring_commit(ring);
>   return 0;
>   }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> new file mode 100644
> index ..2a12101a7699
> --- /dev/null
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mcbp.c
> @@ -0,0 +1,101 @@
> +/*
> + * Copyright 2022 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person
> +obtaining a
> + * copy of this software and associated documentation files (the
> +"Software"),
> + * to deal in the Software without restriction, including without
> +limitation
> + * the rights to use, copy, modify, merge, publish, distribute,
> +sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom
> +the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be
> +included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
> +EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> +MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT
> +SHALL
> + * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM,
> +DAMAGES OR
> + * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR
> +OTHERWISE,
> + * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE
> +OR
> + * OTHER DEALINGS IN THE SOFTWARE.
> + *
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "amdgpu.h"
> +#include "amdgpu_mcbp.h"
> +#include "amdgpu_ring.h"
> +
> +/* trigger mcbp and find if we need resubmit */ int
> +amdgpu_mcbp_trigger_preempt(struct

RE: [PATCH] drm/amdgpu: modify mcbp implement for gfx9(v3)

2022-08-12 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Christian,

The details as follows:

> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.

On gfx9, there is no single package to complete the mcbp request in a single 
frame like gfx10 does.
To send preemption on gfx9, kmd needs to:
1. emit a trailing fence on gfx ring, do not update the wptr to cp.
2. emit a write_reg to reset mmCP_VMID_PREEMPT after the trailing fence.
3. send unmap_queue to kiq ring with field rb_wptr which is the offset of 
trailing fence on gfx ring.

When cp fw receives the unmap_queue in mec, it will:
1. Store mmCP_RB0_WPTR from rb_wptr to kick GFX RB off.
2. write mmCP_VMID_PREEMPT as 0x to request preemption on all vmids. Then 
wait on mmCP_VMID_PREEMPT to become 0x0 indicating the preemption is complete.
3. the rest of pipeline would do the preemption according to the 
mmCP_VMID_PREEMPT until it hits the trailing fence.
4. after the trailing fence is signaled,  the write_reg to reset 
mmCP_VMID_PREEMPT unblocks the unmap_queue package to proceed.

The unmap_queue on gfx9 using rb_wptr is referred from the doc cp_packages_rn:
UNMAP_QUEUES
DW| Bits | Field  | Description
4b | 19:0 | rb_wptr | If ((engine_sel = 4) and (action = 3)) then preempted GFX 
queue’s new RB
pointer.

2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
For preemption enabled ibs, kmd add preamble ib(ce/de meta) to initialize csa 
data before send the main ib. The csa is used to save/restore ib execution 
infos when preemption/resubmit happens.
KMD is responsible to extract the content from CSA during re-submission of a 
previously pre-empted DMA frame.
The patch is to write csa data for resubmit ibs with previous preempted ib's 
csa.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Friday, August 12, 2022 7:39 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH] drm/amdgpu: modify mcbp implement for gfx9(v3)

[CAUTION: External Email]

Am 11.08.22 um 05:19 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.
> 2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
>
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 161 ---
>   drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
>   3 files changed, 143 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 82c178a9033a..ca626f0ad7b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
>   #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
>   #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
>   #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)

> +#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)

Ok, that here needs much more explanation why you need it and how all this is 
supposed to work?

Regards,
Christian.

>
>   #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring,
> sched)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 5332899642dc..887021fd56aa 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
> *adev);
>   static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
>   struct amdgpu_cu_info *cu_info);
>   static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
> *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
> *ring);
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume);
>   static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
>   static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
> void *ras_error_status); @@
> -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
> *kiq_ring,
>
> PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>
>   if (action == PREEMPT_QUEUES_NO_UNMAP) {
> - amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, seq);
> + amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
> ring->buf_mask));
> + amdgpu_ring_write(kiq_ring, 0);
> +  

RE: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

2022-08-10 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Christian,

Thank you for the reply, I will update the patch to fix style issue.

The patch has several changes
1. change the unmap package for mcbp which is not correct in 
gfx_v9_0_kiq_unmap_queues.
2. change the emitted ce/de meta data used for preempted ibs
3. add the function gfx_v9_0_ring_preempt_ib used for debugfs case.

Though the part 3 may be removed in the future.  Those functions of 1 and 2 
could be still used by some projects such as virtualization etc.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Thursday, August 11, 2022 12:06 AM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

[CAUTION: External Email]

Hi, Jiadong,

first of all your patches have major style issues. Please use the checkpatch.pl 
script before sending those out.

Apart from that as discussed on our call on Monday MCBP is not something we 
will implement on Linux. So we will probably remove the existing debugfs test 
sooner or later.

Regards,
Christian.

Am 09.08.22 um 11:21 schrieb Zhu, Jiadong:
> [AMD Official Use Only - General]
>
> Hi,
>
> This patch is to correct the mcbp package for gfx9, which is the basic 
> function used for debugfs.
> There are no logic about when to trigger mcbp.
> Shall we get this reviewed?
>
> Thanks,
> Jiadong
>
> -Original Message-
> From: Zhu, Jiadong 
> Sent: Tuesday, August 9, 2022 5:15 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Aaron ; Huang, Ray ;
> Zhu, Jiadong 
> Subject: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)
>
> From: "Jiadong.Zhu" 
>
> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.
> 2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 159 ---
>   drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
>   3 files changed, 141 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 82c178a9033a..ca626f0ad7b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
>   #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
>   #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
>   #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
> +#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)
>
>   #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring,
> sched)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 5332899642dc..0b7cb4cf13c4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
> *adev);  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
>  struct amdgpu_cu_info *cu_info);
>   static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
> *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
> *ring);
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume);
>   static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);  
> static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
>void *ras_error_status); @@
> -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
> *kiq_ring,
>
> PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>
>  if (action == PREEMPT_QUEUES_NO_UNMAP) {
> -   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
> -   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
> -   amdgpu_ring_write(kiq_ring, seq);
> +   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
> ring->buf_mask));
> +   amdgpu_ring_write(kiq_ring, 0);
> +   amdgpu_ring_write(kiq_ring, 0);
> +
>  } else {
>  amdgpu_ring_write(kiq_ring, 0);
>  amdgpu_ring_write(kiq_ring, 0); @@ -5446,11 +5447,15
> @@ static void gfx_v9_0_ring_emit_ib_gfx(struct amdgpu_ring *ring,
>
>  control |= ib->length_dw | (vmid << 24);
>
> -   if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
> AMDGPU_IB_FLAG_PREEMPT)) {
> +   if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->

RE: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

2022-08-09 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi,

This patch is to correct the mcbp package for gfx9, which is the basic function 
used for debugfs.
There are no logic about when to trigger mcbp.
Shall we get this reviewed?

Thanks,
Jiadong

-Original Message-
From: Zhu, Jiadong 
Sent: Tuesday, August 9, 2022 5:15 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Aaron ; Huang, Ray ; Zhu, 
Jiadong 
Subject: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9(v2)

From: "Jiadong.Zhu" 

1. Use unmap_queue package to trigger preemption on gfx9
   Add trailing fence to track the preemption done.
2. Modify emit_ce_meta emit_de_meta functions
   for the resumed ibs.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 159 ---
 drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
 3 files changed, 141 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 82c178a9033a..ca626f0ad7b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
 #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
 #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
 #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
+#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)

 #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring, sched)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 5332899642dc..0b7cb4cf13c4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
*adev);  static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
struct amdgpu_cu_info *cu_info);
 static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device *adev); 
-static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring);
+static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
+resume);
 static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);  static 
void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
  void *ras_error_status);
@@ -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
*kiq_ring,

PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));

if (action == PREEMPT_QUEUES_NO_UNMAP) {
-   amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
-   amdgpu_ring_write(kiq_ring, seq);
+   amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
ring->buf_mask));
+   amdgpu_ring_write(kiq_ring, 0);
+   amdgpu_ring_write(kiq_ring, 0);
+
} else {
amdgpu_ring_write(kiq_ring, 0);
amdgpu_ring_write(kiq_ring, 0);
@@ -5446,11 +5447,15 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct 
amdgpu_ring *ring,

control |= ib->length_dw | (vmid << 24);

-   if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
AMDGPU_IB_FLAG_PREEMPT)) {
+   if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags &
+AMDGPU_IB_FLAG_PREEMPT)) {
control |= INDIRECT_BUFFER_PRE_ENB(1);

+   if (flags & AMDGPU_IB_PREEMPTED)
+   control |= INDIRECT_BUFFER_PRE_RESUME(1);
+
if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
-   gfx_v9_0_ring_emit_de_meta(ring);
+   gfx_v9_0_ring_emit_de_meta(ring,
+(!amdgpu_sriov_vf(ring->adev) && flags & 
AMDGPU_IB_PREEMPTED) ?
+true : false);
}

amdgpu_ring_write(ring, header);
@@ -5505,6 +5510,7 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
*ring, u64 addr,
bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY;
+   bool exec = flags & AMDGPU_FENCE_FLAG_EXEC;

/* RELEASE_MEM - flush caches, send int */
amdgpu_ring_write(ring, PACKET3(PACKET3_RELEASE_MEM, 6)); @@ -5515,6 
+5521,7 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring *ring, u64 
addr,
   EOP_TC_WB_ACTION_EN |
   EOP_TC_MD_ACTION_EN)) |
 EVENT_TYPE(CACHE_FLUSH_AND_INV_TS_EVENT) |
+(exec ? EOP_EXEC : 0x0) |
 EVENT_INDEX(5)));
amdgpu_ring_write(ring, DATA_SEL(write64bit ? 2 : 1) | INT_SEL(int_sel 
? 2 : 

RE: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9

2022-07-19 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Christian,

There is an imbed project based on xen. One of the guest vm with high priority 
jobs needs to send preemption against the other vm.
There are some works in other component including umd and qemu, etc. For kmd, 
we just modify the mcbp related functions to pass the unit test.

Thanks,
Jiadong

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, July 19, 2022 9:59 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 1/2] drm/amdgpu: modify mcbp implement for gfx9

Well what's the background for this?

So far MCBP isn't a validated feature, we just added some debugfs interface for 
testing it.

Regards,
Christian.

Am 19.07.22 um 04:09 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> 1. Use unmap_queue package to trigger preemption on gfx9
> Add trailing fence to track the preemption done.
> 2. Modify emit_ce_meta emit_de_meta functions
> for the resumed ibs.
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |   1 +
>   drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c| 156 ---
>   drivers/gpu/drm/amd/amdgpu/soc15d.h  |   2 +
>   3 files changed, 138 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> index 82c178a9033a..ca626f0ad7b1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
> @@ -59,6 +59,7 @@ enum amdgpu_ring_priority_level {
>   #define AMDGPU_FENCE_FLAG_64BIT (1 << 0)
>   #define AMDGPU_FENCE_FLAG_INT   (1 << 1)
>   #define AMDGPU_FENCE_FLAG_TC_WB_ONLY(1 << 2)
> +#define AMDGPU_FENCE_FLAG_EXEC  (1 << 3)
>
>   #define to_amdgpu_ring(s) container_of((s), struct amdgpu_ring,
> sched)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 5332899642dc..e2c614441691 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -751,7 +751,7 @@ static void gfx_v9_0_set_rlc_funcs(struct amdgpu_device 
> *adev);
>   static int gfx_v9_0_get_cu_info(struct amdgpu_device *adev,
>   struct amdgpu_cu_info *cu_info);
>   static uint64_t gfx_v9_0_get_gpu_clock_counter(struct amdgpu_device
> *adev); -static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring
> *ring);
> +static void gfx_v9_0_ring_emit_de_meta(struct amdgpu_ring *ring, bool
> +resume);
>   static u64 gfx_v9_0_ring_get_rptr_compute(struct amdgpu_ring *ring);
>   static void gfx_v9_0_query_ras_error_count(struct amdgpu_device *adev,
> void *ras_error_status);
> @@ -824,9 +824,10 @@ static void gfx_v9_0_kiq_unmap_queues(struct amdgpu_ring 
> *kiq_ring,
>   
> PACKET3_UNMAP_QUEUES_DOORBELL_OFFSET0(ring->doorbell_index));
>
>   if (action == PREEMPT_QUEUES_NO_UNMAP) {
> - amdgpu_ring_write(kiq_ring, lower_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, upper_32_bits(gpu_addr));
> - amdgpu_ring_write(kiq_ring, seq);
> + amdgpu_ring_write(kiq_ring, lower_32_bits(ring->wptr & 
> ring->buf_mask));
> + amdgpu_ring_write(kiq_ring, 0);
> + amdgpu_ring_write(kiq_ring, 0);
> +
>   } else {
>   amdgpu_ring_write(kiq_ring, 0);
>   amdgpu_ring_write(kiq_ring, 0);
> @@ -5446,11 +5447,15 @@ static void gfx_v9_0_ring_emit_ib_gfx(struct
> amdgpu_ring *ring,
>
>   control |= ib->length_dw | (vmid << 24);
>
> - if (amdgpu_sriov_vf(ring->adev) && (ib->flags & 
> AMDGPU_IB_FLAG_PREEMPT)) {
> + if ((amdgpu_sriov_vf(ring->adev) || amdgpu_mcbp) && (ib->flags &
> +AMDGPU_IB_FLAG_PREEMPT)) {
>   control |= INDIRECT_BUFFER_PRE_ENB(1);
>
> + if (flags & AMDGPU_IB_PREEMPTED)
> + control |= INDIRECT_BUFFER_PRE_RESUME(1);
> +
>   if (!(ib->flags & AMDGPU_IB_FLAG_CE) && vmid)
> - gfx_v9_0_ring_emit_de_meta(ring);
> + gfx_v9_0_ring_emit_de_meta(ring,
> +  (!amdgpu_sriov_vf(ring->adev) && flags & 
> AMDGPU_IB_PREEMPTED) ?
> +true : false);
>   }
>
>   amdgpu_ring_write(ring, header);
> @@ -5505,6 +5510,7 @@ static void gfx_v9_0_ring_emit_fence(struct amdgpu_ring 
> *ring, u64 addr,
>   bool write64bit = flags & AMDGPU_FENCE_FLAG_64BIT;
>   bool int_sel = flags & AMDGPU_FENCE_FLAG_INT;
>   bool writeback = flags & AMDGPU_FENCE_FLAG_TC_WB_ONLY;
>

RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

2022-07-17 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Andrey,

Thank you for your clarifying.
The refcount modified by amdgpu_fence_emit on my side is different.
I update my code and get the patch 'drm/amdgpu: Follow up change to previous 
drm scheduler change.' , the " underflow " disappears.

My patch is not needed.


Thanks,
Jiadong


-Original Message-
From: Grodzovsky, Andrey 
Sent: Friday, July 15, 2022 11:43 PM
To: Zhu, Jiadong ; Christian König 
; amd-gfx@lists.freedesktop.org
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails


On 2022-07-15 05:28, Zhu, Jiadong wrote:
> [AMD Official Use Only - General]
>
> Updated some comments
>
> -Original Message-
> From: Zhu, Jiadong
> Sent: Friday, July 15, 2022 5:13 PM
> To: Christian König ;
> amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey
> 
> Cc: Huang, Ray ; Liu, Aaron 
> Subject: RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails
>
> Hi Christian,
>
> The resubmitted job in function amdgpu_ib_preempt_job_recovery returns the 
> same hw fence because of this commit:
>
> static void amdgpu_ib_preempt_job_recovery(struct drm_gpu_scheduler *sched) {
>  struct drm_sched_job *s_job;
>  struct dma_fence *fence;
>
>  spin_lock(>job_list_lock);
>  list_for_each_entry(s_job, >pending_list, list) {
>  fence = sched->ops->run_job(s_job);   //fence returned 
> has the same address with swapped fences
>  dma_fence_put(fence);
>  }
>  spin_unlock(>job_list_lock);
> }
>
>
>
> commit c530b02f39850a639b72d01ebbf7e5d745c60831
> Author: Jack Zhang 
> Date:   Wed May 12 15:06:35 2021 +0800
>
>  drm/amd/amdgpu embed hw_fence into amdgpu_job
>
>  Why: Previously hw fence is alloced separately with job.
>  It caused historical lifetime issues and corner cases.
>  The ideal situation is to take fence to manage both job
>  and fence's lifetime, and simplify the design of gpu-scheduler.
>
>  How:
>  We propose to embed hw_fence into amdgpu_job.
>  1. We cover the normal job submission by this method.
>  2. For ib_test, and submit without a parent job keep the
>  legacy way to create a hw fence separately.
>  v2:
>  use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
>  embedded in a job.
>  v3:
>  remove redundant variable ring in amdgpu_job
>  v4:
>  add tdr sequence support for this feature. Add a job_run_counter to
>  indicate whether this job is a resubmit job.
>  v5
>  add missing handling in amdgpu_fence_enable_signaling
>
>  Signed-off-by: Jingwen Chen 
>  Signed-off-by: Jack Zhang 
>  Reviewed-by: Andrey Grodzovsky 
>  Reviewed by: Monk Liu 
>  Signed-off-by: Alex Deucher 
>
>
> Thus the fence we swapped out is signaled and put twice in the following 2 
> functions and we get " refcount_t: underflow; use-after-free. " errors latter.
>
>  /* wait for jobs finished */
>  amdgpu_fence_wait_empty(ring); //wait on the resubmitted 
> fence which is signaled and put somewhere else. The refcount decreased by 1 
> after amdgpu_fence_wait_empty.
>
>  /* signal the old fences */
>  amdgpu_ib_preempt_signal_fences(fences, length);   //signal 
> and put the previous swapped fence, signal would return -22.
>
> Thanks,
> Jiadong


Did you have 'drm/amdgpu: Follow up change to previous drm scheduler change.' 
this commit in your branch while you encountered this problem ?
I don't see an underflow issue
for the preempted job when inspecting the code with this commit in mind -

amdgpu_fence_emit
 dma_fence_init 1
 dma_fence_get(fence) 2
 rcu_assign_pointer(*ptr, dma_fence_get(fence) 3

drm_sched_main
 s_fence->parent = dma_fence_get(fence); 4
 dma_fence_put(fence); 3

amdgpu_ib_preempt_job_recovery
 amdgpu_fence_emit
 if (job && job->job_run_counter) -> dma_fence_get(fence); 4
 rcu_assign_pointer(*ptr, dma_fence_get(fence)); 5

 dma_fence_put(fence); 4

amdgpu_fence_wait_empty
 dma_fence_get_rcu(fence) 5
 dma_fence_put(fence) 4

amdgpu_process_fence (EOP interrupt for re-submission of preempted job)
 dma_fence_put 3

amdgpu_ib_preempt_signal_fences
 dma_fence_put 2

amdgpu_job_free_cb
 dma_fence_put(>hw_fence) 1

drm_sched_fence_release_scheduled
 dma_fence_put(fence->parent); 0

Also take a look here for reference -
https://drive.google.com/file/d/1yEoeW6OQC9WnwmzFW6NBLhFP_jD0xcHm/view

Andrey





Andrey


>
>
> -Original Message-
> From: Christian König 
> 

RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

2022-07-15 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Updated some comments

-Original Message-
From: Zhu, Jiadong
Sent: Friday, July 15, 2022 5:13 PM
To: Christian König ; 
amd-gfx@lists.freedesktop.org; Grodzovsky, Andrey 
Cc: Huang, Ray ; Liu, Aaron 
Subject: RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

Hi Christian,

The resubmitted job in function amdgpu_ib_preempt_job_recovery returns the same 
hw fence because of this commit:

static void amdgpu_ib_preempt_job_recovery(struct drm_gpu_scheduler *sched) {
struct drm_sched_job *s_job;
struct dma_fence *fence;

spin_lock(>job_list_lock);
list_for_each_entry(s_job, >pending_list, list) {
fence = sched->ops->run_job(s_job);   //fence returned has 
the same address with swapped fences
dma_fence_put(fence);
}
spin_unlock(>job_list_lock);
}



commit c530b02f39850a639b72d01ebbf7e5d745c60831
Author: Jack Zhang 
Date:   Wed May 12 15:06:35 2021 +0800

drm/amd/amdgpu embed hw_fence into amdgpu_job

Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling

Signed-off-by: Jingwen Chen 
Signed-off-by: Jack Zhang 
Reviewed-by: Andrey Grodzovsky 
Reviewed by: Monk Liu 
Signed-off-by: Alex Deucher 


Thus the fence we swapped out is signaled and put twice in the following 2 
functions and we get " refcount_t: underflow; use-after-free. " errors latter.

/* wait for jobs finished */
amdgpu_fence_wait_empty(ring); //wait on the resubmitted fence 
which is signaled and put somewhere else. The refcount decreased by 1 after 
amdgpu_fence_wait_empty.

/* signal the old fences */
amdgpu_ib_preempt_signal_fences(fences, length);   //signal and 
put the previous swapped fence, signal would return -22.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Friday, July 15, 2022 4:48 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; 
Grodzovsky, Andrey 
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

[CAUTION: External Email]

Am 15.07.22 um 10:43 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Dma_fence_signal returning non-zero indicates that the fence is
> signaled and put somewhere else.
> Skip dma_fence_put to make the fence refcount correct.

Well quite a big NAK on this.

Reference counting should be completely independent where a fence signals.

Andrey can you take a look at this as well?

Thanks,
Christian.

>
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index f4ed0785d523..93c1a5e83835 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1500,8 +1500,8 @@ static void amdgpu_ib_preempt_signal_fences(struct 
> dma_fence **fences,
>   fence = fences[i];
>   if (!fence)
>   continue;
> - dma_fence_signal(fence);
> - dma_fence_put(fence);
> + if (!dma_fence_signal(fence))
> + dma_fence_put(fence);
>   }
>   }
>



RE: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

2022-07-15 Thread Zhu, Jiadong
[AMD Official Use Only - General]

Hi Christian,

The resubmitted job in function amdgpu_ib_preempt_job_recovery returns the same 
hw fence because of this commit:

static void amdgpu_ib_preempt_job_recovery(struct drm_gpu_scheduler *sched)
{
struct drm_sched_job *s_job;
struct dma_fence *fence;

spin_lock(>job_list_lock);
list_for_each_entry(s_job, >pending_list, list) {
fence = sched->ops->run_job(s_job);   //fence returned has 
the same address with swapped fences
dma_fence_put(fence);
}
spin_unlock(>job_list_lock);
}



commit c530b02f39850a639b72d01ebbf7e5d745c60831
Author: Jack Zhang 
Date:   Wed May 12 15:06:35 2021 +0800

drm/amd/amdgpu embed hw_fence into amdgpu_job

Why: Previously hw fence is alloced separately with job.
It caused historical lifetime issues and corner cases.
The ideal situation is to take fence to manage both job
and fence's lifetime, and simplify the design of gpu-scheduler.

How:
We propose to embed hw_fence into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embedded in a job.
v3:
remove redundant variable ring in amdgpu_job
v4:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling

Signed-off-by: Jingwen Chen 
Signed-off-by: Jack Zhang 
Reviewed-by: Andrey Grodzovsky 
Reviewed by: Monk Liu 
Signed-off-by: Alex Deucher 


Thus the fence we swapped out is signaled and put twice in the following 2 
functions and we get " refcount_t: underflow; use-after-free. " errors latter.

/* wait for jobs finished */
amdgpu_fence_wait_empty(ring); //signal and put the resubmitted 
jobs fence.

/* signal the old fences */
amdgpu_ib_preempt_signal_fences(fences, length);   //signal and 
put the previous swapped fence, signal would return -22.

Thanks,
Jiadong


-Original Message-
From: Christian König 
Sent: Friday, July 15, 2022 4:48 PM
To: Zhu, Jiadong ; amd-gfx@lists.freedesktop.org; 
Grodzovsky, Andrey 
Cc: Huang, Ray ; Liu, Aaron 
Subject: Re: [PATCH 3/3] drm/amdgpu: skip put fence if signal fails

[CAUTION: External Email]

Am 15.07.22 um 10:43 schrieb jiadong@amd.com:
> From: "Jiadong.Zhu" 
>
> Dma_fence_signal returning non-zero indicates that the fence is
> signaled and put somewhere else.
> Skip dma_fence_put to make the fence refcount correct.

Well quite a big NAK on this.

Reference counting should be completely independent where a fence signals.

Andrey can you take a look at this as well?

Thanks,
Christian.

>
> Signed-off-by: Jiadong.Zhu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index f4ed0785d523..93c1a5e83835 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -1500,8 +1500,8 @@ static void amdgpu_ib_preempt_signal_fences(struct 
> dma_fence **fences,
>   fence = fences[i];
>   if (!fence)
>   continue;
> - dma_fence_signal(fence);
> - dma_fence_put(fence);
> + if (!dma_fence_signal(fence))
> + dma_fence_put(fence);
>   }
>   }
>