[PATCH] drm/amdgpu: Remove writing GRBM_GFX_CNTL in RLCG interface under SRIOV

2023-01-30 Thread Yifan Zha
[Why]
Accessing GRBM_GFX_CNTL in full access time has risk when VF is doing MMIO 
attacking.
Therefore, VF writing GRBM_GFX_CNTL are blocked by L1 Policy.
For RLCG interface, RLCG use SCRATCH_REG2 which is copied from GRBM_GFX_CNTL.

[How]
Remove writing GRBM_GFX_CNTL in amdgpu_virt_rlcg_reg_rw.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index f39391e03d46..0e05fa0001f6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -983,7 +983,6 @@ static u32 amdgpu_virt_rlcg_reg_rw(struct amdgpu_device 
*adev, u32 offset, u32 v
if (offset == reg_access_ctrl->grbm_cntl) {
/* if the target reg offset is grbm_cntl, write to scratch_reg2 
*/
writel(v, scratch_reg2);
-   writel(v, ((void __iomem *)adev->rmmio) + (offset * 4));
} else if (offset == reg_access_ctrl->grbm_idx) {
/* if the target reg offset is grbm_idx, write to scratch_reg3 
*/
writel(v, scratch_reg3);
-- 
2.25.1



RE: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Chen, Guchun
Hi Piccoli,

I agree with Alex's point, using ring->sched.name for such check is not a good 
way. BTW, can you please attach a full dmesg long in bad case to help me 
understand more?

Regards,
Guchun

-Original Message-
From: Alex Deucher  
Sent: Tuesday, January 31, 2023 6:30 AM
To: Guilherme G. Piccoli 
Cc: amd-gfx@lists.freedesktop.org; ker...@gpiccoli.net; Chen, Guchun 
; Pan, Xinhui ; 
dri-de...@lists.freedesktop.org; Tuikov, Luben ; 
Limonciello, Mario ; kernel-...@igalia.com; Deucher, 
Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched 
init/fini

On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli  
wrote:
>
> + Luben
>
> (sorry, missed that in the first submission).
>
> On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> > Currently amdgpu calls drm_sched_fini() from the fence driver sw 
> > fini routine - such function is expected to be called only after the 
> > respective init function - drm_sched_init() - was executed successfully.
> >
> > Happens that we faced a driver probe failure in the Steam Deck 
> > recently, and the function drm_sched_fini() was called even without 
> > its counter-part had been previously called, causing the following oops:
> >
> > amdgpu: probe of :04:00.0 failed with error -110
> > BUG: kernel NULL pointer dereference, address: 0090 PGD 
> > 0 P4D 0
> > Oops: 0002 [#1] PREEMPT SMP NOPTI
> > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli 
> > #338 Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched] [...] Call Trace:
> >  
> >  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
> >  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
> >  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
> >  devm_drm_dev_init_release+0x49/0x70
> >  [...]
> >
> > To prevent that, check if the drm_sched was properly initialized for 
> > a given ring before calling its fini counter-part.
> >
> > Notice ideally we'd use sched.ready for that; such field is set as 
> > the latest thing on drm_sched_init(). But amdgpu seems to "override" 
> > the meaning of such field - in the above oops for example, it was a 
> > GFX ring causing the crash, and the sched.ready field was set to 
> > true in the ring init routine, regardless of the state of the DRM 
> > scheduler. Hence, we ended-up using another sched field.
> >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence 
> >> > driver fini in s3 test (v2)")
> > Cc: Andrey Grodzovsky 
> > Cc: Guchun Chen 
> > Cc: Mario Limonciello 
> > Signed-off-by: Guilherme G. Piccoli 
> > ---
> >
> >
> > Hi folks, first of all thanks in advance for reviews / comments!
> > Notice that I've used the Fixes tag more in the sense to bring it to 
> > stable, I didn't find a good patch candidate that added the call to 
> > drm_sched_fini(), was reaching way too old commits...so
> > 067f44c8b459 seems a good candidate - or maybe not?
> >
> > Now, with regards sched.ready, spent a bit of time to figure what 
> > was happening...would be feasible maybe to stop using that to mark 
> > some kind ring status? I think it should be possible to add a flag 
> > to the ring structure for that, and free sched.ready from being 
> > manipulate by the amdgpu driver, what's your thoughts on that?

It's been a while, but IIRC, we used to have a ring->ready field in the driver 
which at some point got migrated out of the driver into the GPU scheduler and 
the driver side code never got cleaned up.  I think we should probably just 
drop the driver messing with that field and leave it up to the drm scheduler.

Alex


> >
> > I could try myself, but first of course I'd like to raise the 
> > "temperature" on this topic and check if somebody is already working 
> > on that.
> >
> > Cheers,
> >
> > Guilherme
> >
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 00444203220d..e154eb8241fb 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
> > *adev)
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> >
> > - if (!ring->no_scheduler)
> > + /*
> > +  * Notice we check for sched.name since there's some
> > +  * override on the meaning of sched.ready by amdgpu.
> > +  * The natural check would be sched.ready, which is
> > +  * set as drm_sched_init() finishes...
> > +  */
> > + if (!ring->no_scheduler && ring->sched.name)
> >   drm_sched_fini(>sched);
> >
> >   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)


Re: [PATCH v2 3/3] drm: Call vga_switcheroo_process_delayed_switch() in drm_lastclose

2023-01-30 Thread Lyude Paul
Acked-by: Lyude Paul 

On Thu, 2023-01-12 at 21:11 +0100, Thomas Zimmermann wrote:
> Several lastclose helpers call vga_switcheroo_process_delayed_switch().
> It's better to call the helper from drm_lastclose() after the kernel
> client's screen has been restored. This way, all drivers can benefit
> without having to implement their own lastclose helper. For drivers
> without vga-switcheroo, vga_switcheroo_process_delayed_switch() does
> nothing.
> 
> There was an earlier patchset to do something similar. [1]
> 
> v2:
>   * handle vga_switcheroo_client_fb_set() in a separate patch
>   * also update i915, nouveau and radeon
>   * remove unnecessary include statements
>   * update vga-switcheroo docs
> 
> Suggested-by: Alexander Deucher 
> Signed-off-by: Thomas Zimmermann 
> Reviewed-by: Daniel Vetter 
> Reviewed-by: Alex Deucher 
> Link: 
> https://lore.kernel.org/amd-gfx/20221020143603.563929-1-alexander.deuc...@amd.com/
>  # 1
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h |  1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  2 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 13 -
>  drivers/gpu/drm/drm_file.c  |  3 +++
>  drivers/gpu/drm/i915/i915_driver.c  | 25 ++---
>  drivers/gpu/drm/nouveau/nouveau_drm.c   |  1 -
>  drivers/gpu/drm/nouveau/nouveau_vga.c   |  7 ---
>  drivers/gpu/drm/nouveau/nouveau_vga.h   |  1 -
>  drivers/gpu/drm/radeon/radeon_drv.c |  2 +-
>  drivers/gpu/drm/radeon/radeon_drv.h |  1 -
>  drivers/gpu/drm/radeon/radeon_kms.c | 18 --
>  drivers/gpu/vga/vga_switcheroo.c|  4 ++--
>  12 files changed, 8 insertions(+), 70 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 63c921c55fb9..7120b9b6e580 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1330,7 +1330,6 @@ extern const int amdgpu_max_kms_ioctl;
>  
>  int amdgpu_driver_load_kms(struct amdgpu_device *adev, unsigned long flags);
>  void amdgpu_driver_unload_kms(struct drm_device *dev);
> -void amdgpu_driver_lastclose_kms(struct drm_device *dev);
>  int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file 
> *file_priv);
>  void amdgpu_driver_postclose_kms(struct drm_device *dev,
>struct drm_file *file_priv);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 1353ffd08988..783c1e284a22 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -34,7 +34,6 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -2785,7 +2784,6 @@ static const struct drm_driver amdgpu_kms_driver = {
>   DRIVER_SYNCOBJ_TIMELINE,
>   .open = amdgpu_driver_open_kms,
>   .postclose = amdgpu_driver_postclose_kms,
> - .lastclose = amdgpu_driver_lastclose_kms,
>   .ioctls = amdgpu_ioctls_kms,
>   .num_ioctls = ARRAY_SIZE(amdgpu_ioctls_kms),
>   .dumb_create = amdgpu_mode_dumb_create,
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> index 7aa7e52ca784..a37be02fb2fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> @@ -34,7 +34,6 @@
>  #include "amdgpu_vce.h"
>  #include "atom.h"
>  
> -#include 
>  #include 
>  #include 
>  #include 
> @@ -1104,18 +1103,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
> *data, struct drm_file *filp)
>  /*
>   * Outdated mess for old drm with Xorg being in charge (void function now).
>   */
> -/**
> - * amdgpu_driver_lastclose_kms - drm callback for last close
> - *
> - * @dev: drm dev pointer
> - *
> - * Switch vga_switcheroo state after last close (all asics).
> - */
> -void amdgpu_driver_lastclose_kms(struct drm_device *dev)
> -{
> - drm_fb_helper_lastclose(dev);
> - vga_switcheroo_process_delayed_switch();
> -}
>  
>  /**
>   * amdgpu_driver_open_kms - drm callback for open
> diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
> index a51ff8cee049..314c309db9a3 100644
> --- a/drivers/gpu/drm/drm_file.c
> +++ b/drivers/gpu/drm/drm_file.c
> @@ -38,6 +38,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -460,6 +461,8 @@ void drm_lastclose(struct drm_device * dev)
>   drm_legacy_dev_reinit(dev);
>  
>   drm_client_dev_restore(dev);
> +
> + vga_switcheroo_process_delayed_switch();
>  }
>  
>  /**
> diff --git a/drivers/gpu/drm/i915/i915_driver.c 
> b/drivers/gpu/drm/i915/i915_driver.c
> index 33e231b120c1..bf6ad8620970 100644
> --- a/drivers/gpu/drm/i915/i915_driver.c
> +++ b/drivers/gpu/drm/i915/i915_driver.c
> @@ -29,6 +29,7 @@
>  
>  #include 
>  #include 
> +#include  /* for FBINFO_STATE_ */
>  #include 
>  #include 
>  #include 
> @@ -37,7 +38,6 @@
>  #include 
>  #include 
>  #include 

Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Alex Deucher
On Mon, Jan 30, 2023 at 4:51 PM Guilherme G. Piccoli
 wrote:
>
> + Luben
>
> (sorry, missed that in the first submission).
>
> On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> > Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
> > routine - such function is expected to be called only after the
> > respective init function - drm_sched_init() - was executed successfully.
> >
> > Happens that we faced a driver probe failure in the Steam Deck
> > recently, and the function drm_sched_fini() was called even without
> > its counter-part had been previously called, causing the following oops:
> >
> > amdgpu: probe of :04:00.0 failed with error -110
> > BUG: kernel NULL pointer dereference, address: 0090
> > PGD 0 P4D 0
> > Oops: 0002 [#1] PREEMPT SMP NOPTI
> > CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
> > Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> > RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
> > [...]
> > Call Trace:
> >  
> >  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
> >  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
> >  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
> >  devm_drm_dev_init_release+0x49/0x70
> >  [...]
> >
> > To prevent that, check if the drm_sched was properly initialized for a
> > given ring before calling its fini counter-part.
> >
> > Notice ideally we'd use sched.ready for that; such field is set as the 
> > latest
> > thing on drm_sched_init(). But amdgpu seems to "override" the meaning of 
> > such
> > field - in the above oops for example, it was a GFX ring causing the crash, 
> > and
> > the sched.ready field was set to true in the ring init routine, regardless 
> > of
> > the state of the DRM scheduler. Hence, we ended-up using another sched 
> > field.
> >> > Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini 
> >> > in s3 test (v2)")
> > Cc: Andrey Grodzovsky 
> > Cc: Guchun Chen 
> > Cc: Mario Limonciello 
> > Signed-off-by: Guilherme G. Piccoli 
> > ---
> >
> >
> > Hi folks, first of all thanks in advance for reviews / comments!
> > Notice that I've used the Fixes tag more in the sense to bring it
> > to stable, I didn't find a good patch candidate that added the
> > call to drm_sched_fini(), was reaching way too old commits...so
> > 067f44c8b459 seems a good candidate - or maybe not?
> >
> > Now, with regards sched.ready, spent a bit of time to figure what
> > was happening...would be feasible maybe to stop using that to
> > mark some kind ring status? I think it should be possible to add a
> > flag to the ring structure for that, and free sched.ready from
> > being manipulate by the amdgpu driver, what's your thoughts on that?

It's been a while, but IIRC, we used to have a ring->ready field in
the driver which at some point got migrated out of the driver into the
GPU scheduler and the driver side code never got cleaned up.  I think
we should probably just drop the driver messing with that field and
leave it up to the drm scheduler.

Alex


> >
> > I could try myself, but first of course I'd like to raise the
> > "temperature" on this topic and check if somebody is already working
> > on that.
> >
> > Cheers,
> >
> > Guilherme
> >
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
> >  1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > index 00444203220d..e154eb8241fb 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> > @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
> > *adev)
> >   if (!ring || !ring->fence_drv.initialized)
> >   continue;
> >
> > - if (!ring->no_scheduler)
> > + /*
> > +  * Notice we check for sched.name since there's some
> > +  * override on the meaning of sched.ready by amdgpu.
> > +  * The natural check would be sched.ready, which is
> > +  * set as drm_sched_init() finishes...
> > +  */
> > + if (!ring->no_scheduler && ring->sched.name)
> >   drm_sched_fini(>sched);
> >
> >   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)


[PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
routine - such function is expected to be called only after the
respective init function - drm_sched_init() - was executed successfully.

Happens that we faced a driver probe failure in the Steam Deck
recently, and the function drm_sched_fini() was called even without
its counter-part had been previously called, causing the following oops:

amdgpu: probe of :04:00.0 failed with error -110
BUG: kernel NULL pointer dereference, address: 0090
PGD 0 P4D 0
Oops: 0002 [#1] PREEMPT SMP NOPTI
CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
[...]
Call Trace:
 
 amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
 amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
 amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
 devm_drm_dev_init_release+0x49/0x70
 [...]

To prevent that, check if the drm_sched was properly initialized for a
given ring before calling its fini counter-part.

Notice ideally we'd use sched.ready for that; such field is set as the latest
thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
field - in the above oops for example, it was a GFX ring causing the crash, and
the sched.ready field was set to true in the ring init routine, regardless of
the state of the DRM scheduler. Hence, we ended-up using another sched field.

Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in s3 
test (v2)")
Cc: Andrey Grodzovsky 
Cc: Guchun Chen 
Cc: Mario Limonciello 
Signed-off-by: Guilherme G. Piccoli 
---


Hi folks, first of all thanks in advance for reviews / comments!
Notice that I've used the Fixes tag more in the sense to bring it
to stable, I didn't find a good patch candidate that added the
call to drm_sched_fini(), was reaching way too old commits...so
067f44c8b459 seems a good candidate - or maybe not?

Now, with regards sched.ready, spent a bit of time to figure what
was happening...would be feasible maybe to stop using that to
mark some kind ring status? I think it should be possible to add a
flag to the ring structure for that, and free sched.ready from
being manipulate by the amdgpu driver, what's your thoughts on that?

I could try myself, but first of course I'd like to raise the
"temperature" on this topic and check if somebody is already working
on that.

Cheers,

Guilherme


 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 00444203220d..e154eb8241fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
*adev)
if (!ring || !ring->fence_drv.initialized)
continue;
 
-   if (!ring->no_scheduler)
+   /*
+* Notice we check for sched.name since there's some
+* override on the meaning of sched.ready by amdgpu.
+* The natural check would be sched.ready, which is
+* set as drm_sched_init() finishes...
+*/
+   if (!ring->no_scheduler && ring->sched.name)
drm_sched_fini(>sched);
 
for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)
-- 
2.39.0



Re: [PATCH] drm/amdgpu/fence: Fix oops due to non-matching drm_sched init/fini

2023-01-30 Thread Guilherme G. Piccoli
+ Luben

(sorry, missed that in the first submission).

On 30/01/2023 18:45, Guilherme G. Piccoli wrote:
> Currently amdgpu calls drm_sched_fini() from the fence driver sw fini
> routine - such function is expected to be called only after the
> respective init function - drm_sched_init() - was executed successfully.
> 
> Happens that we faced a driver probe failure in the Steam Deck
> recently, and the function drm_sched_fini() was called even without
> its counter-part had been previously called, causing the following oops:
> 
> amdgpu: probe of :04:00.0 failed with error -110
> BUG: kernel NULL pointer dereference, address: 0090
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP NOPTI
> CPU: 0 PID: 609 Comm: systemd-udevd Not tainted 6.2.0-rc3-gpiccoli #338
> Hardware name: Valve Jupiter/Jupiter, BIOS F7A0113 11/04/2022
> RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
> [...]
> Call Trace:
>  
>  amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu]
>  amdgpu_device_fini_sw+0x2b/0x3b0 [amdgpu]
>  amdgpu_driver_release_kms+0x16/0x30 [amdgpu]
>  devm_drm_dev_init_release+0x49/0x70
>  [...]
> 
> To prevent that, check if the drm_sched was properly initialized for a
> given ring before calling its fini counter-part.
> 
> Notice ideally we'd use sched.ready for that; such field is set as the latest
> thing on drm_sched_init(). But amdgpu seems to "override" the meaning of such
> field - in the above oops for example, it was a GFX ring causing the crash, 
> and
> the sched.ready field was set to true in the ring init routine, regardless of
> the state of the DRM scheduler. Hence, we ended-up using another sched field.
> 
> Fixes: 067f44c8b459 ("drm/amdgpu: avoid over-handle of fence driver fini in 
> s3 test (v2)")
> Cc: Andrey Grodzovsky 
> Cc: Guchun Chen 
> Cc: Mario Limonciello 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> 
> 
> Hi folks, first of all thanks in advance for reviews / comments!
> Notice that I've used the Fixes tag more in the sense to bring it
> to stable, I didn't find a good patch candidate that added the
> call to drm_sched_fini(), was reaching way too old commits...so
> 067f44c8b459 seems a good candidate - or maybe not?
> 
> Now, with regards sched.ready, spent a bit of time to figure what
> was happening...would be feasible maybe to stop using that to
> mark some kind ring status? I think it should be possible to add a
> flag to the ring structure for that, and free sched.ready from
> being manipulate by the amdgpu driver, what's your thoughts on that?
> 
> I could try myself, but first of course I'd like to raise the
> "temperature" on this topic and check if somebody is already working
> on that.
> 
> Cheers,
> 
> Guilherme
> 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> index 00444203220d..e154eb8241fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
> @@ -618,7 +618,13 @@ void amdgpu_fence_driver_sw_fini(struct amdgpu_device 
> *adev)
>   if (!ring || !ring->fence_drv.initialized)
>   continue;
>  
> - if (!ring->no_scheduler)
> + /*
> +  * Notice we check for sched.name since there's some
> +  * override on the meaning of sched.ready by amdgpu.
> +  * The natural check would be sched.ready, which is
> +  * set as drm_sched_init() finishes...
> +  */
> + if (!ring->no_scheduler && ring->sched.name)
>   drm_sched_fini(>sched);
>  
>   for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j)


Re: [PATCH 1/5] drm/amd/amdgpu revert "implement tdr advanced mode"

2023-01-30 Thread Luben Tuikov
The series is,

Acked-by: Luben Tuikov 

We don't want the kernel to be in the business of retrying client's
requests. Instead we want the kernel to provide a conduit for such
requests to be sent, executed by the GPU, and a result returned.
If the kernel cannot process requests for any reason, e.g. GPU
is being reset, or in recovery, or OOM, or there's a problem with
the request, etc., the driver should return the request back to
the client (user space), and let the client decide how to proceed.

Regards,
Luben

On 2022-10-26 11:35, Christian König wrote:
> This reverts commit e6c6338f393b74ac0b303d567bb918b44ae7ad75.
> 
> This feature basically re-submits one job after another to
> figure out which one was the one causing a hang.
> 
> This is obviously incompatible with gang-submit which requires
> that multiple jobs run at the same time. It's also absolutely
> not helpful to crash the hardware multiple times if a clean
> recovery is desired.
> 
> For testing and debugging environments we should rather disable
> recovery alltogether to be able to inspect the state with a hw
> debugger.
> 
> Additional to that the sw implementation is clearly buggy and causes
> reference count issues for the hardware fence.
> 
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 103 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c|   2 +-
>  drivers/gpu/drm/scheduler/sched_main.c |  58 ++--
>  include/drm/gpu_scheduler.h|   3 -
>  4 files changed, 10 insertions(+), 156 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 6f958603c8cc..d4584e577b51 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5070,94 +5070,6 @@ static int amdgpu_device_suspend_display_audio(struct 
> amdgpu_device *adev)
>   return 0;
>  }
>  
> -static void amdgpu_device_recheck_guilty_jobs(
> - struct amdgpu_device *adev, struct list_head *device_list_handle,
> - struct amdgpu_reset_context *reset_context)
> -{
> - int i, r = 0;
> -
> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> - struct amdgpu_ring *ring = adev->rings[i];
> - int ret = 0;
> - struct drm_sched_job *s_job;
> -
> - if (!ring || !ring->sched.thread)
> - continue;
> -
> - s_job = list_first_entry_or_null(>sched.pending_list,
> - struct drm_sched_job, list);
> - if (s_job == NULL)
> - continue;
> -
> - /* clear job's guilty and depend the folowing step to decide 
> the real one */
> - drm_sched_reset_karma(s_job);
> - drm_sched_resubmit_jobs_ext(>sched, 1);
> -
> - if (!s_job->s_fence->parent) {
> - DRM_WARN("Failed to get a HW fence for job!");
> - continue;
> - }
> -
> - ret = dma_fence_wait_timeout(s_job->s_fence->parent, false, 
> ring->sched.timeout);
> - if (ret == 0) { /* timeout */
> - DRM_ERROR("Found the real bad job! ring:%s, 
> job_id:%llx\n",
> - ring->sched.name, s_job->id);
> -
> -
> - amdgpu_fence_driver_isr_toggle(adev, true);
> -
> - /* Clear this failed job from fence array */
> - amdgpu_fence_driver_clear_job_fences(ring);
> -
> - amdgpu_fence_driver_isr_toggle(adev, false);
> -
> - /* Since the job won't signal and we go for
> -  * another resubmit drop this parent pointer
> -  */
> - dma_fence_put(s_job->s_fence->parent);
> - s_job->s_fence->parent = NULL;
> -
> - /* set guilty */
> - drm_sched_increase_karma(s_job);
> - amdgpu_reset_prepare_hwcontext(adev, reset_context);
> -retry:
> - /* do hw reset */
> - if (amdgpu_sriov_vf(adev)) {
> - amdgpu_virt_fini_data_exchange(adev);
> - r = amdgpu_device_reset_sriov(adev, false);
> - if (r)
> - adev->asic_reset_res = r;
> - } else {
> - clear_bit(AMDGPU_SKIP_HW_RESET,
> -   _context->flags);
> - r = amdgpu_do_asic_reset(device_list_handle,
> -  reset_context);
> - if (r && r == -EAGAIN)
> - goto retry;
> - }
> -
> - /*
> -  * add reset counter so that the following
> - 

[PATCH] drm/amd/display: Trivial swizzle-related code clean-ups

2023-01-30 Thread Guilherme G. Piccoli
This is a very trivial code clean-up related to commit 5468c36d6285
("drm/amd/display: Filter Invalid 420 Modes for HDMI TMDS"). This commit
added a validation on driver probe to prevent invalid TMDS modes, but one
of the fake properties (swizzle) ended-up causing a warning on driver
probe; was reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/2264.

It was fixed by commit 105a8b8698e2 ("drm/amd/display: patch cases with
unknown plane state to prevent warning"), but the validation code had
a double variable assignment, which we hereby remove. Also, the fix relies
in the dcn2{0,1}patch_unknown_plane_state() callbacks, so while at it we
took the opportunity to perform a small code clean-up in such routines.

Cc: Aurabindo Pillai 
Cc: Daniel Wheeler 
Cc: Fangzhi Zuo 
Cc: Harry Wentland 
Cc: Leo Li 
Cc: Mark Broadworth 
Cc: Melissa Wen 
Cc: Rodrigo Siqueira 
Cc: Sung Joon Kim 
Cc: Swapnil Patel 
Signed-off-by: Guilherme G. Piccoli 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 1 -
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 8 ++--
 drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c | 6 ++
 3 files changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 86a2f7f58550..e71e94663d14 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6336,7 +6336,6 @@ static enum dc_status 
dm_validate_stream_and_context(struct dc *dc,
dc_plane_state->plane_size.surface_size.width  = stream->src.width;
dc_plane_state->plane_size.chroma_size.height  = stream->src.height;
dc_plane_state->plane_size.chroma_size.width   = stream->src.width;
-   dc_plane_state->tiling_info.gfx9.swizzle =  DC_SW_UNKNOWN;
dc_plane_state->format = SURFACE_PIXEL_FORMAT_GRPH_ARGB;
dc_plane_state->tiling_info.gfx9.swizzle = DC_SW_UNKNOWN;
dc_plane_state->rotation = ROTATION_ANGLE_0;
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
index 531f405d2554..3af24ef9cb2d 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c
@@ -2225,14 +2225,10 @@ enum dc_status dcn20_patch_unknown_plane_state(struct 
dc_plane_state *plane_stat
enum surface_pixel_format surf_pix_format = plane_state->format;
unsigned int bpp = resource_pixel_format_to_bpp(surf_pix_format);
 
-   enum swizzle_mode_values swizzle = DC_SW_LINEAR;
-
+   plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_S;
if (bpp == 64)
-   swizzle = DC_SW_64KB_D;
-   else
-   swizzle = DC_SW_64KB_S;
+   plane_state->tiling_info.gfx9.swizzle = DC_SW_64KB_D;
 
-   plane_state->tiling_info.gfx9.swizzle = swizzle;
return DC_OK;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c 
b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
index fbcf0afeae0d..8f9244fe5c86 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn21/dcn21_resource.c
@@ -1393,15 +1393,13 @@ static uint32_t read_pipe_fuses(struct dc_context *ctx)
 
 static enum dc_status dcn21_patch_unknown_plane_state(struct dc_plane_state 
*plane_state)
 {
-   enum dc_status result = DC_OK;
-
if (plane_state->ctx->dc->debug.disable_dcc == DCC_ENABLE) {
plane_state->dcc.enable = 1;
/* align to our worst case block width */
plane_state->dcc.meta_pitch = ((plane_state->src_rect.width + 
1023) / 1024) * 1024;
}
-   result = dcn20_patch_unknown_plane_state(plane_state);
-   return result;
+
+   return dcn20_patch_unknown_plane_state(plane_state);
 }
 
 static const struct resource_funcs dcn21_res_pool_funcs = {
-- 
2.39.0



[linux-next:master] BUILD REGRESSION ae0c77e1bc6963c67c6c09e8c72959fcb1ed8d5f

2023-01-30 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: ae0c77e1bc6963c67c6c09e8c72959fcb1ed8d5f  Add linux-next specific 
files for 20230130

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202301230743.xnut0zvc-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202301300743.bp7dpazv-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202301301722.6fghrppl-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202301301801.y5o08tqx-...@intel.com
https://lore.kernel.org/oe-kbuild-all/202301302110.metnwkbd-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

Documentation/riscv/uabi.rst:24: WARNING: Enumerated list ends without a blank 
line; unexpected unindent.
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] 
undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] 
undefined!
arch/arm64/kvm/arm.c:2211: warning: expecting prototype for Initialize Hyp(). 
Prototype was for kvm_arm_init() instead
drivers/clk/qcom/gcc-sa8775p.c:313:32: warning: unused variable 
'gcc_parent_map_10' [-Wunused-const-variable]
drivers/clk/qcom/gcc-sa8775p.c:318:37: warning: unused variable 
'gcc_parent_data_10' [-Wunused-const-variable]
drivers/clk/qcom/gcc-sa8775p.c:333:32: warning: unused variable 
'gcc_parent_map_12' [-Wunused-const-variable]
drivers/clk/qcom/gcc-sa8775p.c:338:37: warning: unused variable 
'gcc_parent_data_12' [-Wunused-const-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/accessories/link_dp_trace.c:148:6:
 warning: no previous prototype for 'link_dp_trace_set_edp_power_timestamp' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/accessories/link_dp_trace.c:158:10:
 warning: no previous prototype for 'link_dp_trace_get_edp_poweron_timestamp' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/accessories/link_dp_trace.c:163:10:
 warning: no previous prototype for 'link_dp_trace_get_edp_poweroff_timestamp' 
[-Wmissing-prototypes]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:1295:32:
 warning: variable 'result_write_min_hblank' set but not used 
[-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_capability.c:279:42:
 warning: variable 'ds_port' set but not used [-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dp_training.c:1585:38:
 warning: variable 'result' set but not used [-Wunused-but-set-variable]
kismet: WARNING: unmet direct dependencies detected for VFIO_MDEV when selected 
by SAMPLE_VFIO_MDEV_MTTY
libbpf: failed to find '.BTF' ELF section in vmlinux

Unverified Error/Warning (likely false positive, please contact us if 
interested):

drivers/nvmem/imx-ocotp.c:599:21: sparse: sparse: symbol 'imx_ocotp_layout' was 
not declared. Should it be static?
drivers/nvmem/layouts/sl28vpd.c:143:21: sparse: sparse: symbol 'sl28vpd_layout' 
was not declared. Should it be static?
drivers/thermal/qcom/tsens-v0_1.c:106:40: sparse: sparse: symbol 
'tsens_9607_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v0_1.c:26:40: sparse: sparse: symbol 
'tsens_8916_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v0_1.c:42:40: sparse: sparse: symbol 
'tsens_8939_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v0_1.c:62:40: sparse: sparse: symbol 
'tsens_8974_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v0_1.c:84:40: sparse: sparse: symbol 
'tsens_8974_backup_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v1.c:24:40: sparse: sparse: symbol 
'tsens_qcs404_nvmem' was not declared. Should it be static?
drivers/thermal/qcom/tsens-v1.c:45:40: sparse: sparse: symbol 
'tsens_8976_nvmem' was not declared. Should it be static?
pahole: .tmp_vmlinux.btf: No such file or directory

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-accessories-link_dp_trace.c:warning:no-previous-prototype-for-link_dp_trace_get_edp_poweroff_timestamp
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-accessories-link_dp_trace.c:warning:no-previous-prototype-for-link_dp_trace_get_edp_poweron_timestamp
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-accessories-link_dp_trace.c:warning:no-previous-prototype-for-link_dp_trace_set_edp_power_timestamp
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-ds_port-set-but-not-used
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_capability.c:warning:variable-result_write_min_hblank-set-but-not-used
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-link-protocols-link_dp_training.c:warning:variable-result-set-but-not-used
|-- alpha-randconfig-r016-20230130
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display

Re: [PATCH] drm/amd: Fix initialization for nbio 4.3.0

2023-01-30 Thread Alex Deucher
On Mon, Jan 30, 2023 at 11:16 AM Mario Limonciello
 wrote:
>
> A mistake has been made on some boards with NBIO 4.3.0 where some
> NBIO registers aren't properly set by the hardware.
>
> Ensure that they're set during initialization.
>
> Cc: Natikar Basavaraj 
> Tested-by: Satyanarayana ReddyTVN 
> Tested-by: Rutvij Gajjar 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c 
> b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> index 15eb3658d70e6..4b1c6946a60f3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
> @@ -337,6 +337,13 @@ const struct nbio_hdp_flush_reg nbio_v4_3_hdp_flush_reg 
> = {
>
>  static void nbio_v4_3_init_registers(struct amdgpu_device *adev)
>  {
> +   if (adev->ip_versions[NBIO_HWIP][0] == IP_VERSION(4, 3, 0)) {
> +   uint32_t data;
> +
> +   data = RREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF2_STRAP2);
> +   data &= 
> ~RCC_DEV0_EPF2_STRAP2__STRAP_NO_SOFT_RESET_DEV0_F2_MASK;
> +   WREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF2_STRAP2, data);
> +   }
> return;

You can drop the return here.  With that, the patch is:
Reviewed-by: Alex Deucher 

>  }
>
> --
> 2.34.1
>


[PATCH] drm/amd: Fix initialization for nbio 4.3.0

2023-01-30 Thread Mario Limonciello
A mistake has been made on some boards with NBIO 4.3.0 where some
NBIO registers aren't properly set by the hardware.

Ensure that they're set during initialization.

Cc: Natikar Basavaraj 
Tested-by: Satyanarayana ReddyTVN 
Tested-by: Rutvij Gajjar 
Signed-off-by: Mario Limonciello 
---
 drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c 
b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
index 15eb3658d70e6..4b1c6946a60f3 100644
--- a/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/nbio_v4_3.c
@@ -337,6 +337,13 @@ const struct nbio_hdp_flush_reg nbio_v4_3_hdp_flush_reg = {
 
 static void nbio_v4_3_init_registers(struct amdgpu_device *adev)
 {
+   if (adev->ip_versions[NBIO_HWIP][0] == IP_VERSION(4, 3, 0)) {
+   uint32_t data;
+
+   data = RREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF2_STRAP2);
+   data &= ~RCC_DEV0_EPF2_STRAP2__STRAP_NO_SOFT_RESET_DEV0_F2_MASK;
+   WREG32_SOC15(NBIO, 0, regRCC_DEV0_EPF2_STRAP2, data);
+   }
return;
 }
 
-- 
2.34.1



RE: [PATCH] drm/amd: Allow s0ix without BIOS support

2023-01-30 Thread Limonciello, Mario
[Public]



> -Original Message-
> From: Rafael Ávila de Espíndola 
> Sent: Monday, January 30, 2023 08:08
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Limonciello, Mario 
> Subject: Re: [PATCH] drm/amd: Allow s0ix without BIOS support
> 
> BTW, to which git repo this gets added first? I took a look at
> git://anongit.freedesktop.org/drm-tip, but it is not there.
> 
> Thanks,
> Rafael

Hi,

It will first show up in amd-staging-drm-next here:

https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next

It hasn't been refreshed for things accepted this last week yet, but it should
show up this week some time.

Thanks,

> 
> Mario Limonciello  writes:
> 
> > We guard the suspend entry code from running unless we have proper
> > BIOS support for either S3 mode or s0ix mode.
> >
> > If a user's system doesn't support either of these modes the kernel
> > still does offer s2idle in `/sys/power/mem_sleep` so there is an
> > expectation from users that it works even if the power consumption
> > remains very high.
> >
> > Rafael Ávila de Espíndola reports that a system of his has a
> > non-functional graphics stack after resuming.  That system doesn't
> > support S3 and the FADT doesn't indicate support for low power idle.
> >
> > Through some experimentation it was concluded that even without the
> > hardware s0i3 support provided by the amd_pmc driver the power
> > consumption over suspend is decreased by running amdgpu's s0ix
> > suspend routine.
> >
> > The numbers over suspend showed:
> > * No patch: 9.2W
> > * Skip amdgpu suspend entirely: 10.5W
> > * Run amdgpu s0ix routine: 7.7W
> >
> > As this does improve the power, remove some of the guard rails in
> > `amdgpu_acpi.c` for only running s0ix suspend routines in the right
> > circumstances.
> >
> > However if this turns out to cause regressions for anyone, we should
> > revert this change and instead opt for skipping suspend/resume routines
> > entirely or try to fix the underlying behavior that makes graphics fail
> > after resume without underlying platform support.
> >
> > Reported-by: Rafael Ávila de Espíndola 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 57b5e11446c65..fa7375b97fd47 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
> >  * S0ix even though the system is suspending to idle, so return false
> >  * in that case.
> >  */
> > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as BIOS has
> not been configured for suspend-to-idle.\n"
> >   "To use suspend-to-idle change the sleep mode in
> BIOS setup.\n");
> > -   return false;
> > -   }
> >
> >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as the kernel has not
> been compiled with CONFIG_AMD_PMC.\n");
> > -   return false;
> > -#else
> > -   return true;
> >  #endif /* CONFIG_AMD_PMC */
> > +   return true;
> >  }
> >
> >  #endif /* CONFIG_SUSPEND */
> > --
> > 2.25.1


Re: [PATCH] drm/amd: Allow s0ix without BIOS support

2023-01-30 Thread Rafael Ávila de Espíndola
BTW, to which git repo this gets added first? I took a look at
git://anongit.freedesktop.org/drm-tip, but it is not there.

Thanks,
Rafael

Mario Limonciello  writes:

> We guard the suspend entry code from running unless we have proper
> BIOS support for either S3 mode or s0ix mode.
>
> If a user's system doesn't support either of these modes the kernel
> still does offer s2idle in `/sys/power/mem_sleep` so there is an
> expectation from users that it works even if the power consumption
> remains very high.
>
> Rafael Ávila de Espíndola reports that a system of his has a
> non-functional graphics stack after resuming.  That system doesn't
> support S3 and the FADT doesn't indicate support for low power idle.
>
> Through some experimentation it was concluded that even without the
> hardware s0i3 support provided by the amd_pmc driver the power
> consumption over suspend is decreased by running amdgpu's s0ix
> suspend routine.
>
> The numbers over suspend showed:
> * No patch: 9.2W
> * Skip amdgpu suspend entirely: 10.5W
> * Run amdgpu s0ix routine: 7.7W
>
> As this does improve the power, remove some of the guard rails in
> `amdgpu_acpi.c` for only running s0ix suspend routines in the right
> circumstances.
>
> However if this turns out to cause regressions for anyone, we should
> revert this change and instead opt for skipping suspend/resume routines
> entirely or try to fix the underlying behavior that makes graphics fail
> after resume without underlying platform support.
>
> Reported-by: Rafael Ávila de Espíndola 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 57b5e11446c65..fa7375b97fd47 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct amdgpu_device 
> *adev)
>* S0ix even though the system is suspending to idle, so return false
>* in that case.
>*/
> - if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> + if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
>   dev_warn_once(adev->dev,
> "Power consumption will be higher as BIOS has not 
> been configured for suspend-to-idle.\n"
> "To use suspend-to-idle change the sleep mode in 
> BIOS setup.\n");
> - return false;
> - }
>  
>  #if !IS_ENABLED(CONFIG_AMD_PMC)
>   dev_warn_once(adev->dev,
> "Power consumption will be higher as the kernel has not 
> been compiled with CONFIG_AMD_PMC.\n");
> - return false;
> -#else
> - return true;
>  #endif /* CONFIG_AMD_PMC */
> + return true;
>  }
>  
>  #endif /* CONFIG_SUSPEND */
> -- 
> 2.25.1


Re: [PATCH v3 01/10] drm/client: Test for connectors before sending hotplug event

2023-01-30 Thread Thomas Zimmermann

Hi

Am 27.01.23 um 19:02 schrieb Simon Ser:

On Wednesday, January 25th, 2023 at 21:04, Thomas Zimmermann 
 wrote:


Not having connectors indicates a driver bug.


Is it? What if all connectors are of the DP-MST type, ie. they are
created on-the-fly?


My commit message was nonsense. I even write this here that having no 
connectors is legitimate.


Best regards
Thomas


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


[PATCH] drm/amd: Optimize some memory initializations

2023-01-30 Thread Christophe JAILLET
Instead of zeroing some memory and then copying data in part or all of it,
use memcpy_and_pad().
This avoids writing some memory twice and should save a few cycles.

Signed-off-by: Christophe JAILLET 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  | 11 ---
 drivers/gpu/drm/amd/amdgpu/psp_v13_0.c   |  8 ++--
 drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c |  8 ++--
 3 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index a8391f269cd0..5e69693a5cc4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -613,9 +613,8 @@ psp_cmd_submit_buf(struct psp_context *psp,
if (!drm_dev_enter(adev_to_drm(psp->adev), ))
return 0;
 
-   memset(psp->cmd_buf_mem, 0, PSP_CMD_BUFFER_SIZE);
-
-   memcpy(psp->cmd_buf_mem, cmd, sizeof(struct psp_gfx_cmd_resp));
+   memcpy_and_pad(psp->cmd_buf_mem, PSP_CMD_BUFFER_SIZE, cmd,
+  sizeof(struct psp_gfx_cmd_resp), 0);
 
index = atomic_inc_return(>fence_value);
ret = psp_ring_cmd_submit(psp, psp->cmd_buf_mc_addr, fence_mc_addr, 
index);
@@ -947,8 +946,7 @@ static int psp_rl_load(struct amdgpu_device *adev)
 
cmd = acquire_psp_cmd_buf(psp);
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-   memcpy(psp->fw_pri_buf, psp->rl.start_addr, psp->rl.size_bytes);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, psp->rl.start_addr, 
psp->rl.size_bytes, 0);
 
cmd->cmd_id = GFX_CMD_ID_LOAD_IP_FW;
cmd->cmd.cmd_load_ip_fw.fw_phy_addr_lo = 
lower_32_bits(psp->fw_pri_mc_addr);
@@ -3479,8 +3477,7 @@ void psp_copy_fw(struct psp_context *psp, uint8_t 
*start_addr, uint32_t bin_size
if (!drm_dev_enter(adev_to_drm(psp->adev), ))
return;
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-   memcpy(psp->fw_pri_buf, start_addr, bin_size);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, start_addr, bin_size, 0);
 
drm_dev_exit(idx);
 }
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
index d62fcc77af95..79733ec4ffab 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0.c
@@ -168,10 +168,8 @@ static int psp_v13_0_bootloader_load_component(struct 
psp_context  *psp,
if (ret)
return ret;
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-
/* Copy PSP KDB binary to memory */
-   memcpy(psp->fw_pri_buf, bin_desc->start_addr, bin_desc->size_bytes);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, bin_desc->start_addr, 
bin_desc->size_bytes, 0);
 
/* Provide the PSP KDB to bootloader */
WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_36,
@@ -237,10 +235,8 @@ static int psp_v13_0_bootloader_load_sos(struct 
psp_context *psp)
if (ret)
return ret;
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-
/* Copy Secure OS binary to PSP memory */
-   memcpy(psp->fw_pri_buf, psp->sos.start_addr, psp->sos.size_bytes);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, psp->sos.start_addr, 
psp->sos.size_bytes, 0);
 
/* Provide the PSP secure OS to bootloader */
WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_36,
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c
index d5ba58eba3e2..c73415b09e85 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v13_0_4.c
@@ -107,10 +107,8 @@ static int psp_v13_0_4_bootloader_load_component(struct 
psp_context*psp,
if (ret)
return ret;
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-
/* Copy PSP KDB binary to memory */
-   memcpy(psp->fw_pri_buf, bin_desc->start_addr, bin_desc->size_bytes);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, bin_desc->start_addr, 
bin_desc->size_bytes, 0);
 
/* Provide the PSP KDB to bootloader */
WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_36,
@@ -170,10 +168,8 @@ static int psp_v13_0_4_bootloader_load_sos(struct 
psp_context *psp)
if (ret)
return ret;
 
-   memset(psp->fw_pri_buf, 0, PSP_1_MEG);
-
/* Copy Secure OS binary to PSP memory */
-   memcpy(psp->fw_pri_buf, psp->sos.start_addr, psp->sos.size_bytes);
+   memcpy_and_pad(psp->fw_pri_buf, PSP_1_MEG, psp->sos.start_addr, 
psp->sos.size_bytes, 0);
 
/* Provide the PSP secure OS to bootloader */
WREG32_SOC15(MP0, 0, regMP0_SMN_C2PMSG_36,
-- 
2.34.1



Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-30 Thread Greg KH
On Fri, Jan 27, 2023 at 03:02:41PM +, Limonciello, Mario wrote:
> [Public]
> 
> 
> 
> > -Original Message-
> > From: Linux kernel regression tracking (Thorsten Leemhuis)
> > 
> > Sent: Friday, January 27, 2023 03:15
> > To: Greg KH ; Limonciello, Mario
> > 
> > Cc: dri-de...@lists.freedesktop.org; sta...@vger.kernel.org;
> > stanislav.lisovs...@intel.com; Zuo, Jerry ; amd-
> > g...@lists.freedesktop.org; Lin, Wayne ; Guenter
> > Roeck ; bske...@redhat.com
> > Subject: Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into
> > the atomic state"
> > 
> > On 27.01.23 08:39, Greg KH wrote:
> > > On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
> > >> On 1/20/2023 11:46, Guenter Roeck wrote:
> > >>> On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
> >  This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.
> > 
> >  [Why]
> >  Changes cause regression on amdgpu mst.
> >  E.g.
> >  In fill_dc_mst_payload_table_from_drm(), amdgpu expects to
> > add/remove payload
> >  one by one and call fill_dc_mst_payload_table_from_drm() to update
> > the HW
> >  maintained payload table. But previous change tries to go through all
> > the
> >  payloads in mst_state and update amdpug hw maintained table in once
> > everytime
> >  driver only tries to add/remove a specific payload stream only. The
> > newly
> >  design idea conflicts with the implementation in amdgpu nowadays.
> > 
> >  [How]
> >  Revert this patch first. After addressing all regression problems 
> >  caused
> > by
> >  this previous patch, will add it back and adjust it.
> > >>>
> > >>> Has there been any progress on this revert, or on fixing the underlying
> > >>> problem ?
> > >>>
> > >>> Thanks,
> > >>> Guenter
> > >>
> > >> Hi Guenter,
> > >>
> > >> Wayne is OOO for CNY, but let me update you.
> > >>
> > >> Harry has sent out this series which is a collection of proper fixes.
> > >> https://patchwork.freedesktop.org/series/113125/
> > >>
> > >> Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> > >
> > > Any hint on when those will be reviewed and accepted?  patchwork
> > doesn't
> > > show any activity on them, or at least I can't figure it out...
> > 
> > I didn't look closer (hence please correct me if I'm wrong), but the
> > core changes afaics are in the DRM pull airlied send a few hours ago to
> > Linus (note the "amdgpu […] DP MST fixes" line):
> > 
> > https://lore.kernel.org/all/CAPM%3D9tzuu4xnx6T5v7sKsK%2BA5HEaPOc1ie
> > myznsyqzgztj%3d...@mail.gmail.com/
> 
> That's right.  There are 4 commits in that PR with the appropriate stable tags
> that should fix the majority of the MST issues introduced in 6.1 by 
> 4d07b0bc40340
> ("drm/display/dp_mst: Move all payload info into the atomic state"):
> 
>   drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count 
> assignments
>   drm/amdgpu/display/mst: limit payload to be updated one by one
>   drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
>   drm/display/dp_mst: Correct the kref of port.
> 
> There will be follow ups for any remaining corner cases.

Great, thanks for this, all are now queued up in the 6.1.y queue.

greg k-h


[PATCH] drm/amdgpu/display: remove duplicate include header in files

2023-01-30 Thread ye.xingchen
From: ye xingchen 

opp.h is included more than once.

Signed-off-by: ye xingchen 
---
 drivers/gpu/drm/amd/display/dc/core/dc_link.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
index 6475664baa8a..1a2ab934b4bd 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link.c
@@ -46,7 +46,6 @@
 #include "dpcd_defs.h"
 #include "dmcu.h"
 #include "dsc.h"
-#include "opp.h"
 #include "hw/clk_mgr.h"
 #include "dce/dmub_psr.h"
 #include "dmub/dmub_srv.h"
-- 
2.25.1


[PATCH] drm/amdgpu: Fix a typo ("boradcast")

2023-01-30 Thread Jonathan Neuschäfer
Spell it as "broadcast".

Signed-off-by: Jonathan Neuschäfer 
---
 drivers/gpu/drm/amd/amdgpu/df_v1_7.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v1_7.c 
b/drivers/gpu/drm/amd/amdgpu/df_v1_7.c
index b991609f46c10..5dfab8021 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v1_7.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v1_7.c
@@ -94,7 +94,7 @@ static void df_v1_7_update_medium_grain_clock_gating(struct 
amdgpu_device *adev,
WREG32_SOC15(DF, 0, mmDF_PIE_AON0_DfGlobalClkGater, tmp);
}

-   /* Exit boradcast mode */
+   /* Exit broadcast mode */
adev->df.funcs->enable_broadcast_mode(adev, false);
 }

--
2.39.0