from:"Limonciello, Mario"

Re: [PATCH v2] drm/client: Detect when ACPI lid is closed during initialization

2024-05-29 Thread Limonciello, Mario





Also a direct acpi_lid_open() call seems a bit iffy. But I guess if
someone needs this to work on non-ACPI system they get to figure out
how to abstract it better. acpi_lid_open() does seem to return != 0
when ACPI is not supported, so at least it would err on the side
of enabling everything.


Thanks. I was going to comment, but you got it first. I think a proper
implementation should check for SW_LID input device instead of simply
using acpi_lid_open(). This will handle the issue for other,
non-ACPI-based laptops.



Can you suggest how this would actually work?  AFAICT the only way to 
discover if input devices support SW_LID would be to iterate all the 
input devices in the kernel and look for whether ->swbit has SW_LID set.


This then turns into a dependency problem of whether any myriad of 
drivers have started to report SW_LID.  It's also a state machine 
problem because other drivers can be unloaded at will.


And then what do you if more than one sets SW_LID?

IOW - a lot of complexity for a non-ACPI system.  Does such a problem 
exist in non-ACPI systems?

Re: [PATCH] drm/amdgpu: fix Kconfig for ISP v2

2024-05-16 Thread Limonciello, Mario





On 5/14/2024 4:28 PM, Alex Deucher wrote:

Add new config option and set proper dependencies for ISP.

v2: add missed guards, drop separate Kconfig

Signed-off-by: Alex Deucher 


I have one optional remark regarding headers, but otherwise it looks 
fine by me.  Feel free to ignore it or squash it in while committing.


Reviewed-by: Mario Limonciello 


Cc: Pratap Nirujogi 



---
  drivers/gpu/drm/amd/amdgpu/Kconfig| 11 +++
  drivers/gpu/drm/amd/amdgpu/Makefile   |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  4 
  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c |  6 ++
  4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig 
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index 22d88f8ef5279..0cd9d29394072 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -70,6 +70,17 @@ config DRM_AMDGPU_USERPTR
  This option selects CONFIG_HMM and CONFIG_HMM_MIRROR if it
  isn't already selected to enabled full userptr support.
  
+config DRM_AMD_ISP

+   bool "Enable AMD Image Signal Processor IP support"
+   depends on DRM_AMDGPU
+   select MFD_CORE
+   select PM_GENERIC_DOMAINS if PM
+   help
+   Choose this option to enable ISP IP support for AMD SOCs.
+   This adds the ISP (Image Signal Processor) IP driver and wires
+   it up into the amdgpu driver.  It is required for cameras
+   on APUs which utilize mipi cameras.
+
  config DRM_AMDGPU_WERROR
bool "Force the compiler to throw an error instead of a warning when 
compiling"
depends on DRM_AMDGPU
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 12ba76025cb7c..c95ec19a38264 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -325,6 +325,8 @@ amdgpu-y += $(AMD_DISPLAY_FILES)
  endif
  
  # add isp block

+ifneq ($(CONFIG_DRM_AMD_ISP),)
  amdgpu-y += amdgpu_isp.o
+endif
  
  obj-$(CONFIG_DRM_AMDGPU)+= amdgpu.o

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 846c3550fbda8..936ed3c10c884 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -113,7 +113,9 @@
  #include "amdgpu_seq64.h"
  #include "amdgpu_reg_state.h"
  #include "amdgpu_umsch_mm.h"
+#if defined(CONFIG_DRM_AMD_ISP)
  #include "amdgpu_isp.h"
+#endif
  
  #define MAX_GPU_INSTANCE		64
  
@@ -1049,8 +1051,10 @@ struct amdgpu_device {

/* display related functionality */
struct amdgpu_display_manager dm;
  
+#if defined(CONFIG_DRM_AMD_ISP)

/* isp */
struct amdgpu_isp   isp;
+#endif
  
  	/* mes */

boolenable_mes;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 378d5a5cda917..1bab8dd37d621 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -107,7 +107,9 @@
  #include "jpeg_v5_0_0.h"
  
  #include "amdgpu_vpe.h"

+#if defined(CONFIG_DRM_AMD_ISP)
  #include "amdgpu_isp.h"
+#endif


IMO including this header is probably relatively safe no matter if you 
have ISP enabled or not, no?


  
  #define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"

  MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);
@@ -712,10 +714,12 @@ static void 
amdgpu_discovery_read_from_harvest_table(struct amdgpu_device *adev,
adev->sdma.sdma_mask &=
~(1U << harvest_info->list[i].number_instance);
break;
+#if defined(CONFIG_DRM_AMD_ISP)
case ISP_HWID:
adev->isp.harvest_config |=
~(1U << harvest_info->list[i].number_instance);
break;
+#endif
default:
break;
}
@@ -2402,6 +2406,7 @@ static int amdgpu_discovery_set_umsch_mm_ip_blocks(struct 
amdgpu_device *adev)
  
  static int amdgpu_discovery_set_isp_ip_blocks(struct amdgpu_device *adev)

  {
+#if defined(CONFIG_DRM_AMD_ISP)
switch (amdgpu_ip_version(adev, ISP_HWIP, 0)) {
case IP_VERSION(4, 1, 0):
case IP_VERSION(4, 1, 1):
@@ -2410,6 +2415,7 @@ static int amdgpu_discovery_set_isp_ip_blocks(struct 
amdgpu_device *adev)
default:
break;
}
+#endif
  
  	return 0;

  }

Re: [PATCH] drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2

2024-05-12 Thread Limonciello, Mario





On 5/10/2024 4:24 AM, Jani Nikula wrote:

On Fri, 10 May 2024, "Lin, Wayne"  wrote:

[Public]


-Original Message-----
From: Limonciello, Mario 
Sent: Friday, May 10, 2024 3:18 AM
To: Linux regressions mailing list ; Wentland, 
Harry
; Lin, Wayne 
Cc: ly...@redhat.com; imre.d...@intel.com; Leon Weiß ; sta...@vger.kernel.org; dri-de...@lists.freedesktop.org; amd-
g...@lists.freedesktop.org; intel-...@lists.freedesktop.org
Subject: Re: [PATCH] drm/mst: Fix NULL pointer dereference at
drm_dp_add_payload_part2

On 5/9/2024 07:43, Linux regression tracking (Thorsten Leemhuis) wrote:

On 18.04.24 21:43, Harry Wentland wrote:

On 2024-03-07 01:29, Wayne Lin wrote:

[Why]
Commit:
- commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
allocation/removement") accidently overwrite the commit
- commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in
drm_dp_add_payload_part2") which cause regression.

[How]
Recover the original NULL fix and remove the unnecessary input
parameter 'state' for drm_dp_add_payload_part2().

Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload
allocation/removement")
Reported-by: Leon Weiß 
Link:
https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.c
a...@ruhr-uni-bochum.de/
Cc: ly...@redhat.com
Cc: imre.d...@intel.com
Cc: sta...@vger.kernel.org
Cc: regressi...@lists.linux.dev
Signed-off-by: Wayne Lin 


I haven't been deep in MST code in a while but this all looks pretty
straightforward and good.

Reviewed-by: Harry Wentland 


Hmmm, that was three weeks ago, but it seems since then nothing
happened to fix the linked regression through this or some other
patch. Is there a reason? The build failure report from the CI maybe?


It touches files outside of amd but only has an ack from AMD.  I think we
/probably/ want an ack from i915 and nouveau to take it through.


Thanks, Mario!

Hi Thorsten,
Yeah, like what Mario said. Would also like to have ack from i915 and nouveau.


It usually works better if you Cc the folks you want an ack from! ;)

Acked-by: Jani Nikula 



Thanks! Can someone with commit permissions take this to drm-misc?

RE: [PATCH 2/3] drm/amd/amdgpu: Add ISP driver support

2024-05-09 Thread Limonciello, Mario

[AMD Official Use Only - General]

Thanks, I'll take a look.

In general make sure that you prefix new versions with "PATCH v2", "PATCH v3" 
etc
and include a changelog below the cutlist or in the cover letter.

There is a --subject-prefix argument for git send-email you can use.

It's okay this time, but if you end up spinning again for a v3 do that.

> -Original Message-
> From: Nirujogi, Pratap 
> Sent: Thursday, May 9, 2024 12:19
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Chan, Benjamin
> (Koon Pan) ; Li, King ; Du,
> Bin 
> Subject: RE: [PATCH 2/3] drm/amd/amdgpu: Add ISP driver support
>
> [AMD Official Use Only - General]
>
> Hi Mario,
>
> I addressed the comments, submitted the updated patchset, please review
> and let us know your comments.
>
> Thanks,
> Pratap
>
> -Original Message-
> From: Limonciello, Mario 
> Sent: Thursday, May 9, 2024 10:15 AM
> To: Nirujogi, Pratap ; amd-
> g...@lists.freedesktop.org
> Cc: Deucher, Alexander ; Chan, Benjamin
> (Koon Pan) ; Li, King ; Du,
> Bin 
> Subject: Re: [PATCH 2/3] drm/amd/amdgpu: Add ISP driver support
>
> On 5/8/2024 09:50, Pratap Nirujogi wrote:
> > Add the isp driver in amdgpu to support ISP device on the APUs that
> > supports ISP IP block. ISP hw block is used for camera front-end, pre
> > and post processing operations.
> >
> > Signed-off-by: Pratap Nirujogi 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |   4 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c   | 298
> ++
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h   |  54 
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c   |   3 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c |   5 +
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.h |   1 +
> >   7 files changed, 368 insertions(+)
> >   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
> >   create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_isp.h
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile
> > b/drivers/gpu/drm/amd/amdgpu/Makefile
> > index de7b76327f5b..12ba76025cb7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/Makefile
> > +++ b/drivers/gpu/drm/amd/amdgpu/Makefile
> > @@ -324,4 +324,7 @@ amdgpu-y += $(AMD_DISPLAY_FILES)
> >
> >   endif
> >
> > +# add isp block
> > +amdgpu-y += amdgpu_isp.o
> > +
> >   obj-$(CONFIG_DRM_AMDGPU)+= amdgpu.o
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index eb60d28a3a13..6d7f9ef53269 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -112,6 +112,7 @@
> >   #include "amdgpu_xcp.h"
> >   #include "amdgpu_seq64.h"
> >   #include "amdgpu_reg_state.h"
> > +#include "amdgpu_isp.h"
> >
> >   #define MAX_GPU_INSTANCE64
> >
> > @@ -1045,6 +1046,9 @@ struct amdgpu_device {
> >   /* display related functionality */
> >   struct amdgpu_display_manager dm;
> >
> > + /* isp */
> > + struct amdgpu_isp   isp;
> > +
> >   /* mes */
> >   boolenable_mes;
> >   boolenable_mes_kiq;
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
> > new file mode 100644
> > index ..dcc01a339a43
> > --- /dev/null
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c
> > @@ -0,0 +1,298 @@
> > +/* SPDX-License-Identifier: MIT */
> > +/*
> > + * Copyright (C) 2024 Advanced Micro Devices, Inc. All rights reserved.
> > + * All Rights Reserved.
> > + *
> > + * Permission is hereby granted, free of charge, to any person
> > +obtaining a
> > + * copy of this software and associated documentation files (the
> > + * "Software"), to deal in the Software without restriction,
> > +including
> > + * without limitation the rights to use, copy, modify, merge,
> > +publish,
> > + * distribute, sub license, and/or sell copies of the Software, and
> > +to
> > + * permit persons to whom the Software is furnished to do so, subject
> > +to
> > + * the following conditions:
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY
> KIND,
> > +EXPRESS OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
> > +MERCHANTABILITY,
> > +

RE: [PATCH] drm/amd: Disable ASPM for VI w/ all Intel systems

2023-10-23 Thread Limonciello, Mario

[Public]

> -Original Message-
> From: Deucher, Alexander 
> Sent: Monday, October 23, 2023 09:22
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Limonciello, Mario ;
> paolo.gent...@canonical.com
> Subject: RE: [PATCH] drm/amd: Disable ASPM for VI w/ all Intel systems
>
> [Public]
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> Mario
> > Limonciello
> > Sent: Monday, October 23, 2023 9:45 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Limonciello, Mario ;
> > paolo.gent...@canonical.com
> > Subject: [PATCH] drm/amd: Disable ASPM for VI w/ all Intel systems
> >
> > Originally we were quirking ASPM disabled specifically for VI when used with
> > Alder Lake, but it appears to have problems with Rocket Lake as well.
> >
> > Like we've done in the case of dpm for newer platforms, disable ASPM for all
> > Intel systems.
> >
> > Cc: sta...@vger.kernel.org # 5.15+
> > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > Reported-and-tested-by: Paolo Gentili 
> > Closes: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2036742
> > Signed-off-by: Mario Limonciello 
>
> Reviewed-by: Alex Deucher 
>
> As a follow on, we probably want to apply this to all of the program_aspm()
> functions for each asic family.
>

Yeah; I had that thought too but wanted to have a narrow patch for fixes and 
stable first.
I will merge and send a follow up for that.

> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/vi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> > b/drivers/gpu/drm/amd/amdgpu/vi.c index 6a8494f98d3e..fe8ba9e9837b
> > 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > @@ -1124,7 +1124,7 @@ static void vi_program_aspm(struct
> > amdgpu_device *adev)
> >   bool bL1SS = false;
> >   bool bClkReqSupport = true;
> >
> > - if (!amdgpu_device_should_use_aspm(adev) ||
> > !amdgpu_device_aspm_support_quirk())
> > + if (!amdgpu_device_should_use_aspm(adev) ||
> > +!amdgpu_device_pcie_dynamic_switching_supported())
> >   return;
> >
> >   if (adev->flags & AMD_IS_APU ||
> > --
> > 2.34.1
>

RE: [PATCH 13/16] drm/amd/display: Don't set dpms_off for seamless boot

2023-10-04 Thread Limonciello, Mario

[AMD Official Use Only - General]

> From: Daniel Miess 
>
> [Why]
> eDPs fail to light up with seamless boot enabled
>
> [How]
> When seamless boot is enabled don't configure dpms_off
> in disable_vbios_mode_if_required.
>
> Reviewed-by: Charlene Liu 
> Cc: Mario Limonciello 
> Cc: Alex Deucher 
> Cc: sta...@vger.kernel.org
> Acked-by: Tom Chung 
> Signed-off-by: Daniel Miess 
> ---
>  drivers/gpu/drm/amd/display/dc/core/dc.c | 3 +++
>  1 file changed, 3 insertions(+)

Feifei,

Can you recheck seamless boot on DCN3.2 after this lands into 
amd-staging-drm-next?
If it works, we may remove the check to only apply it to APUs.

>
> diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
> b/drivers/gpu/drm/amd/display/dc/core/dc.c
> index bd4834f921c1..88d41bf6d53a 100644
> --- a/drivers/gpu/drm/amd/display/dc/core/dc.c
> +++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
> @@ -1230,6 +1230,9 @@ static void disable_vbios_mode_if_required(
>   if (stream == NULL)
>   continue;
>
> + if (stream->apply_seamless_boot_optimization)
> + continue;
> +
>   // only looking for first odm pipe
>   if (pipe->prev_odm_pipe)
>   continue;
> --
> 2.25.1

Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-21 Thread Limonciello, Mario





On 8/19/2023 5:50 AM, Greg KH wrote:

On Fri, Aug 18, 2023 at 05:49:14PM -0500, Limonciello, Mario wrote:



On 8/18/2023 4:24 PM, Greg KH wrote:

On Fri, Aug 18, 2023 at 11:26:11AM +0800, Evan Quan wrote:

   drivers/base/Makefile |   1 +
   drivers/base/wbrf.c   | 280 ++


Why is a wifi-specific thing going into drivers/base/?

confused,

greg k-h


The original problem statement was at a high level 'there can be
interference between different devices operating at high frequencies'. The
original patches introduced some ACPI library code that enabled a mitigated
for this interference between mac80211 devices and amdgpu devices.

Andrew Lunn wanted to see something more generic, so the series has morphed
into base code for things to advertise frequencies in use and other things
to listen to frequencies in use and react.

The idea is supposed to be that if the platform knows that these mitigations
are needed then the producers send the frequencies in use, consumers react
to them.  The AMD implementation of getting this info from the platform
plugs into the base code (patch 2).

If users don't want this behavior they can turn it off on kernel command
line.

If the platform doesn't know mitigations are needed but user wants to turn
them on anyway they can turn it on kernel command line.


That's all fine, I don't object to that at all.  But bus/device-specific
stuff should NOT be in drivers/base/ if at all possible (yes, we do have
some exceptions with hypervisor.c and memory and cpu stuff) but for a
frequency thing like this, why can't it live with the other
wifi/frequency code in drivers/net/wireless/?

In other words, what's the benefit to having me be the maintainer of
this, someone who knows nothing about this subsystem, other than you
passing off that work to me?  :)

thanks,

greg k-h


The reason drivers/base was proposed was because although the initial 
implementation is for producers from mac80211, Andrew pointed out that 
many other things can technically be producers and cause interference

if not properly shielded.

So by making it part of base that sets up the policy that if something 
"can" produce certain problematic harmonics that it can participate.


Whether or not other devices /will/ use this is another question though.
You need deep platform knowledge and proper equipment to diagnose a 
problem and conclude it can be helped with this kind of mitigation.


So I wonder if the right answer is to put it in drivers/net/wireless 
initially and if we come up with a need later for non wifi producers we 
can discuss moving it at that time.

Re: [V9 1/9] drivers core: Add support for Wifi band RF mitigations

2023-08-18 Thread Limonciello, Mario





On 8/18/2023 4:24 PM, Greg KH wrote:

On Fri, Aug 18, 2023 at 11:26:11AM +0800, Evan Quan wrote:

  drivers/base/Makefile |   1 +
  drivers/base/wbrf.c   | 280 ++


Why is a wifi-specific thing going into drivers/base/?

confused,

greg k-h


The original problem statement was at a high level 'there can be 
interference between different devices operating at high frequencies'. 
The original patches introduced some ACPI library code that enabled a 
mitigated for this interference between mac80211 devices and amdgpu devices.


Andrew Lunn wanted to see something more generic, so the series has 
morphed into base code for things to advertise frequencies in use and 
other things to listen to frequencies in use and react.


The idea is supposed to be that if the platform knows that these 
mitigations are needed then the producers send the frequencies in use, 
consumers react to them.  The AMD implementation of getting this info 
from the platform plugs into the base code (patch 2).


If users don't want this behavior they can turn it off on kernel command 
line.


If the platform doesn't know mitigations are needed but user wants to 
turn them on anyway they can turn it on kernel command line.

Re: [PATCH] drm/amd/display: Don't show stack trace for missing eDP

2023-07-31 Thread Limonciello, Mario





On 7/31/2023 11:15 AM, Hamza Mahfooz wrote:

On 7/31/23 12:08, Mario Limonciello wrote:

Some systems are only connected by HDMI or DP, so warning related to
missing eDP is unnecessary.  Downgrade to debug instead.

Cc: Hamza Mahfooz 
Fixes: 6d9b6dceaa51 ("drm/amd/display: only warn once in 
dce110_edp_wait_for_hpd_ready()")

Reported-and-tested-by: mastan.katraga...@amd.com
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git 
a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c

index 20d4d08a6a2f3..3ce3d3367b288 100644
--- a/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce110/dce110_hw_sequencer.c
@@ -777,7 +777,8 @@ void dce110_edp_wait_for_hpd_ready(
  dal_gpio_destroy_irq();
  /* ensure that the panel is detected */
-    ASSERT(edp_hpd_high);
+    if (!edp_hpd_high)
+    BREAK_TO_DEBUGGER();


Can you print a message using DC_LOG_DC() here instead?


Sure, will respin it.




  }
  void dce110_edp_power_control(

Re: [PATCH -next] drm/amdgpu: Remove a lot of unnecessary ternary operators

2023-07-31 Thread Limonciello, Mario





On 7/31/2023 8:26 AM, Ruan Jinjie wrote:

Ther are many ternary operators, the true or false judgement
of which is unnecessary in C language semantics.

s/Ther/There/

Unnecessary; sure.  But don't they improve the readability quite a bit?



Signed-off-by: Ruan Jinjie 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c|  2 +-
  .../drm/amd/display/dc/dce/dce_link_encoder.c  |  4 +---
  .../drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c|  6 +++---
  .../amd/pm/powerplay/hwmgr/smu7_powertune.c|  2 +-
  .../drm/amd/pm/powerplay/hwmgr/vega10_hwmgr.c  | 18 +++---
  .../amd/pm/powerplay/smumgr/polaris10_smumgr.c |  2 +-
  .../drm/amd/pm/powerplay/smumgr/vegam_smumgr.c |  7 +++
  13 files changed, 23 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
index b582b83c4984..38ccec913f00 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
@@ -460,7 +460,7 @@ bool amdgpu_get_bios(struct amdgpu_device *adev)
return false;
  
  success:

-   adev->is_atom_fw = (adev->asic_type >= CHIP_VEGA10) ? true : false;
+   adev->is_atom_fw = adev->asic_type >= CHIP_VEGA10;
return true;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c

index 79791379fc2b..df4440c21bbf 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c
@@ -479,7 +479,7 @@ static int jpeg_v3_0_set_clockgating_state(void *handle,
  enum amd_clockgating_state state)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   bool enable = (state == AMD_CG_STATE_GATE) ? true : false;
+   bool enable = state == AMD_CG_STATE_GATE;
  
  	if (enable) {

if (!jpeg_v3_0_is_idle(handle))
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
index a707d407fbd0..3eb3dcd56b57 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c
@@ -626,7 +626,7 @@ static int jpeg_v4_0_set_clockgating_state(void *handle,
  enum amd_clockgating_state state)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   bool enable = (state == AMD_CG_STATE_GATE) ? true : false;
+   bool enable = state == AMD_CG_STATE_GATE;
  
  	if (enable) {

if (!jpeg_v4_0_is_idle(handle))
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
index ce2b22f7e4e4..153731d6ce8b 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
@@ -785,7 +785,7 @@ static int jpeg_v4_0_3_set_clockgating_state(void *handle,
  enum amd_clockgating_state state)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   bool enable = (state == AMD_CG_STATE_GATE) ? true : false;
+   bool enable = state == AMD_CG_STATE_GATE;
int i;
  
  	for (i = 0; i < adev->jpeg.num_jpeg_inst; ++i) {

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
index b76ba21b5a89..9b662b105cc1 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c
@@ -2095,7 +2095,7 @@ static int vcn_v3_0_set_clockgating_state(void *handle,
  enum amd_clockgating_state state)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   bool enable = (state == AMD_CG_STATE_GATE) ? true : false;
+   bool enable = state == AMD_CG_STATE_GATE;
int i;
  
  	for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
index 6089c7deba8a..7c486745bece 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c
@@ -1918,7 +1918,7 @@ static int vcn_v4_0_wait_for_idle(void *handle)
  static int vcn_v4_0_set_clockgating_state(void *handle, enum 
amd_clockgating_state state)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
-   bool enable = (state == AMD_CG_STATE_GATE) ? true : false;
+   bool enable = state == AMD_CG_STATE_GATE;
int i;
  
  	for (i = 0; i < adev->vcn.num_vcn_inst; ++i) {

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
index 550ac040b4be..e62472e6e7b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c
+++

Re: [PATCH V7 4/9] wifi: mac80211: Add support for ACPI WBRF

2023-07-24 Thread Limonciello, Mario


On 7/24/2023 04:22, Andrew Lunn wrote:

@@ -1395,6 +1395,8 @@ int ieee80211_register_hw(struct ieee80211_hw *hw)
debugfs_hw_add(local);
rate_control_add_debugfs(local);
  
+	ieee80211_check_wbrf_support(local);

+
rtnl_lock();
wiphy_lock(hw->wiphy);
  



+void ieee80211_check_wbrf_support(struct ieee80211_local *local)
+{
+   struct wiphy *wiphy = local->hw.wiphy;
+   struct device *dev;
+
+   if (!wiphy)
+   return;
+
+   dev = wiphy->dev.parent;
+   if (!dev)
+   return;
+
+   local->wbrf_supported = wbrf_supported_producer(dev);
+   dev_dbg(dev, "WBRF is %s supported\n",
+   local->wbrf_supported ? "" : "not");
+}


This seems wrong. wbrf_supported_producer() is about "Should this
device report the frequencies it is using?" The answer to that depends
on a combination of: Are there consumers registered with the core, and
is the policy set so WBRF should take actions. >
The problem here is, you have no idea of the probe order. It could be
this device probes before others, so wbrf_supported_producer() reports
false, but a few second later would report true, once other devices
have probed.

It should be an inexpensive call into the core, so can be made every
time the channel changes. All the core needs to do is check if the
list of consumers is empty, and if not, check a Boolean policy value.

  Andrew


No, it's not a combination of whether consumers are registered with the 
core.  If a consumer probes later it needs to know the current in use 
frequencies too.


The reason is because of this sequence of events:
1) Producer probes.
2) Producer selects a frequency.
3) Consumer probes.
4) Producer stays at same frequency.

If the producer doesn't notify the frequency because a consumer isn't 
yet loaded then the consumer won't be able to get the current frequency.

RE: [PATCH] drm/amd: Fix an error handling mistake in psp_sw_init()

2023-07-18 Thread Limonciello, Mario

[Public]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Thursday, July 13, 2023 00:15
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario 
> Subject: [PATCH] drm/amd: Fix an error handling mistake in psp_sw_init()
>
> If the second call to amdgpu_bo_create_kernel() fails, the memory
> allocated from the first call should be cleared.  If the third call
> fails, the memory from the second call should be cleared.
>
> Fixes: b95b5391684b ("drm/amdgpu/psp: move PSP memory alloc from
> hw_init to sw_init")
> Signed-off-by: Mario Limonciello 

Ping on this one.

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 1b4d5f04d968..6ffc1a640d2d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -491,11 +491,11 @@ static int psp_sw_init(void *handle)
>   return 0;
>
>  failed2:
> - amdgpu_bo_free_kernel(>fw_pri_bo,
> -   >fw_pri_mc_addr, >fw_pri_buf);
> -failed1:
>   amdgpu_bo_free_kernel(>fence_buf_bo,
> >fence_buf_mc_addr, >fence_buf);
> +failed1:
> + amdgpu_bo_free_kernel(>fw_pri_bo,
> +   >fw_pri_mc_addr, >fw_pri_buf);
>   return ret;
>  }
>
> --
> 2.34.1

Re: [1/2] drm/amdgpu: fix slab-out-of-bounds issue in amdgpu_vm_pt_create

2023-07-16 Thread Limonciello, Mario




On 7/14/2023 6:05 AM, Guchun Chen wrote:

Recent code set xcp_id stored from file private data when opening
device to amdgpu bo for accounting memory usage etc, but not all
VMs are attached to this fpriv structure like the vm cases in
amdgpu_mes_self_test, otherwise, KASAN will complain below out
of bound access. And more importantly, VM code should not touch
fpriv structure, so drop fpriv code handling from amdgpu_vm_pt.

[   77.292314] BUG: KASAN: slab-out-of-bounds in 
amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.293845] Read of size 4 at addr 888102c48a48 by task modprobe/1069
[   77.294146] Call Trace:
[   77.294178]  
[   77.294208]  dump_stack_lvl+0x49/0x63
[   77.294260]  print_report+0x16f/0x4a6
[   77.294307]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.295979]  ? kasan_complete_mode_report_info+0x3c/0x200
[   77.296057]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.297556]  kasan_report+0xb4/0x130
[   77.297609]  ? amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.299202]  __asan_load4+0x6f/0x90
[   77.299272]  amdgpu_vm_pt_create+0x17e/0x4b0 [amdgpu]
[   77.300796]  ? amdgpu_init+0x6e/0x1000 [amdgpu]
[   77.30]  ? amdgpu_vm_pt_clear+0x750/0x750 [amdgpu]
[   77.303721]  ? preempt_count_sub+0x18/0xc0
[   77.303786]  amdgpu_vm_init+0x39e/0x870 [amdgpu]
[   77.305186]  ? amdgpu_vm_wait_idle+0x90/0x90 [amdgpu]
[   77.306683]  ? kasan_set_track+0x25/0x30
[   77.306737]  ? kasan_save_alloc_info+0x1b/0x30
[   77.306795]  ? __kasan_kmalloc+0x87/0xa0
[   77.306852]  amdgpu_mes_self_test+0x169/0x620 [amdgpu]

Fixes: ffc6deb773f7 ("drm/amdkfd: Store xcp partition id to amdgpu bo")
Signed-off-by: Guchun Chen 
Reviewed-by: Christian König 


This bug was also reported in Gitlab.  Here's some more tags.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2686
|Tested-by: Mikhail Gavrilov |


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c   |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h|  5 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 11 ++-
  5 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 85a0d5f419c8..9a5aa4318cad 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1232,7 +1232,7 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct 
drm_file *file_priv)
pasid = 0;
}
  
-	r = amdgpu_vm_init(adev, >vm);

+   r = amdgpu_vm_init(adev, >vm, fpriv->xcp_id);
if (r)
goto error_pasid;
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c

index e9091ebfe230..cac1d1b227f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1382,7 +1382,7 @@ int amdgpu_mes_self_test(struct amdgpu_device *adev)
goto error_pasid;
}
  
-	r = amdgpu_vm_init(adev, vm);

+   r = amdgpu_vm_init(adev, vm, 0);
if (r) {
DRM_ERROR("failed to initialize vm\n");
goto error_pasid;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 32adc31c093d..74380b21e7a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2121,13 +2121,14 @@ long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long 
timeout)
   *
   * @adev: amdgpu_device pointer
   * @vm: requested vm
+ * @xcp_id: GPU partition selection id
   *
   * Init @vm fields.
   *
   * Returns:
   * 0 for success, error for failure.
   */
-int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
+int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t 
xcp_id)
  {
struct amdgpu_bo *root_bo;
struct amdgpu_bo_vm *root;
@@ -2177,7 +2178,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct 
amdgpu_vm *vm)
vm->evicting = false;
  
  	r = amdgpu_vm_pt_create(adev, vm, adev->vm_manager.root_level,

-   false, );
+   false, , xcp_id);
if (r)
goto error_free_delayed;
root_bo = >bo;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 88ee4507f6b6..bca258c38919 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -398,7 +398,7 @@ int amdgpu_vm_set_pasid(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
u32 pasid);
  
  long amdgpu_vm_wait_idle(struct amdgpu_vm *vm, long timeout);

-int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm);
+int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm, int32_t 
xcp_id);
  int amdgpu_vm_make_compute(struct amdgpu_device *adev, struct amdgpu_vm *vm);
  void

RE: [PATCH 4/4] drm/amd: Drop amdgpu_device_aspm_support_quirk()

2023-07-11 Thread Limonciello, Mario

[AMD Official Use Only - General]

OK, will pick up 1/2/3 and continue to think about 4.

> -Original Message-
> From: Quan, Evan 
> Sent: Sunday, July 9, 2023 20:54
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Subject: RE: [PATCH 4/4] drm/amd: Drop
> amdgpu_device_aspm_support_quirk()
>
> [AMD Official Use Only - General]
>
> Patch1, 2, 3 are reviewed-by: Evan Quan 
>
> For patch4, it seems not quite right(at least for the naming).
> Since although the ASPM is the prerequisite for pcie/lclk dpm features.
> But the changes involved here are really for aspm feature disablement.
> I mean even if pcie dynamic lane/speed switching is not supported, aspm
> feature can be still enabled.
> So, using "amdgpu_device_pcie_dynamic_switching_supported" for the
> determination whether aspm feature can be enabled seems not proper.
>
> Evan
> > -Original Message-
> > From: Limonciello, Mario 
> > Sent: Saturday, July 8, 2023 10:26 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Quan, Evan ; Limonciello, Mario
> > 
> > Subject: [PATCH 4/4] drm/amd: Drop
> amdgpu_device_aspm_support_quirk()
> >
> > NV and VI currently set up a quirk to not enable ASPM on Alder Lake
> > systems, but the issue appears to be tied to hosts without support
> > for dynamic speed switching. Migrate both of these over to use
> > amdgpu_device_pcie_dynamic_switching_supported() instead and drop
> > amdgpu_device_aspm_support_quirk().
> >
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 -
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 11 ---
> >  drivers/gpu/drm/amd/amdgpu/nv.c|  5 -
> >  drivers/gpu/drm/amd/amdgpu/vi.c|  5 -
> >  4 files changed, 8 insertions(+), 14 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index 813713f42d5e..6ecf42c4c970 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1315,7 +1315,6 @@ int amdgpu_device_pci_reset(struct
> > amdgpu_device *adev);
> >  bool amdgpu_device_need_post(struct amdgpu_device *adev);
> >  bool amdgpu_device_pcie_dynamic_switching_supported(void);
> >  bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
> > -bool amdgpu_device_aspm_support_quirk(void);
> >
> >  void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64
> > num_bytes,
> > u64 num_vis_bytes);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7314529553f6..a9e757f899f2 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -1505,17 +1505,6 @@ bool amdgpu_device_should_use_aspm(struct
> > amdgpu_device *adev)
> >   return pcie_aspm_enabled(adev->pdev);
> >  }
> >
> > -bool amdgpu_device_aspm_support_quirk(void)
> > -{
> > -#if IS_ENABLED(CONFIG_X86)
> > - struct cpuinfo_x86 *c = _data(0);
> > -
> > - return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> > -#else
> > - return true;
> > -#endif
> > -}
> > -
> >  /* if we get transitioned to only one device, take VGA back */
> >  /**
> >   * amdgpu_device_vga_set_decode - enable/disable vga decode
> > diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
> > b/drivers/gpu/drm/amd/amdgpu/nv.c
> > index 51523b27a186..71bc5b2f36cf 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> > @@ -527,7 +527,10 @@ static int nv_set_vce_clocks(struct amdgpu_device
> > *adev, u32 evclk, u32 ecclk)
> >
> >  static void nv_program_aspm(struct amdgpu_device *adev)
> >  {
> > - if (!amdgpu_device_should_use_aspm(adev)
> > || !amdgpu_device_aspm_support_quirk())
> > + if (!amdgpu_device_should_use_aspm(adev))
> > + return;
> > +
> > + if (!amdgpu_device_pcie_dynamic_switching_supported())
> >   return;
> >
> >   if (!(adev->flags & AMD_IS_APU) &&
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> > b/drivers/gpu/drm/amd/amdgpu/vi.c
> > index 6a8494f98d3e..f44c78e69b7f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> > @@ -1124,7 +1124,10 @@ static void vi_program_aspm(struct
> > amdgpu_device *adev)
> >   bool bL1SS = false;
> >   bool bClkReqSupport = true;
> >
> > - if (!amdgpu_device_should_use_aspm(adev)
> > || !amdgpu_device_aspm_support_quirk())
> > + if (!amdgpu_device_should_use_aspm(adev))
> > + return;
> > +
> > + if (!amdgpu_device_pcie_dynamic_switching_supported())
> >   return;
> >
> >   if (adev->flags & AMD_IS_APU ||
> > --
> > 2.34.1
>

Re: [PATCH] drm/client: Send hotplug event after registering a client

2023-07-10 Thread Limonciello, Mario


+regressions
On 7/10/2023 04:58, Thomas Zimmermann wrote:

Hi

Am 10.07.23 um 11:52 schrieb Javier Martinez Canillas:

Thomas Zimmermann  writes:

Hello Thomas,


Generate a hotplug event after registering a client to allow the
client to configure its display. Remove the hotplug calls from the
existing clients for fbdev emulation. This change fixes a concurrency
bug between registering a client and receiving events from the DRM
core. The bug is present in the fbdev emulation of all drivers.

The fbdev emulation currently generates a hotplug event before
registering the client to the device. For each new output, the DRM
core sends an additional hotplug event to each registered client.

If the DRM core detects first output between sending the artificial
hotplug and registering the device, the output's hotplug event gets
lost. If this is the first output, the fbdev console display remains
dark. This has been observed with amdgpu and fbdev-generic.

Fix this by adding hotplug generation directly to the client's
register helper drm_client_register(). Registering the client and
receiving events are serialized by struct drm_device.clientlist_mutex.
So an output is either configured by the initial hotplug event, or
the client has already been registered.

The bug was originally added in commit 6e3f17ee73f7 ("drm/fb-helper:
generic: Call drm_client_add() after setup is done"), in which adding
a client and receiving a hotplug event switched order. It was hidden,
as most hardware and drivers have at least on static output configured.
Other drivers didn't use the internal DRM client or still had struct
drm_mode_config_funcs.output_poll_changed set. That callback handled
hotplug events as well. After not setting the callback in amdgpu in
commit 0e3172bac3f4 ("drm/amdgpu: Don't set struct
drm_driver.output_poll_changed"), amdgpu did not show a framebuffer
console if output events got lost. The bug got copy-pasted from
fbdev-generic into the other fbdev emulation.

Reported-by: Moritz Duge 
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2649


Aren't you missing a Fixes: for 0e3172bac3f4 too? Since that's the commit
that unmasked the bug for amdgpu, IMO that is the most important to list.


Well, OK.



Fixes: 6e3f17ee73f7 ("drm/fb-helper: generic: Call drm_client_add() 
after setup is done")
Fixes: 8ab59da26bc0 ("drm/fb-helper: Move generic fbdev emulation 
into separate source file")
Fixes: b79fe9abd58b ("drm/fbdev-dma: Implement fbdev emulation for 
GEM DMA helpers")
Fixes: 63c381552f69 ("drm/armada: Implement fbdev emulation as 
in-kernel client")
Fixes: 49953b70e7d3 ("drm/exynos: Implement fbdev emulation as 
in-kernel client")
Fixes: 8f1aaccb04b7 ("drm/gma500: Implement client-based fbdev 
emulation")
Fixes: 940b869c2f2f ("drm/msm: Implement fbdev emulation as in-kernel 
client")
Fixes: 9e69bcd88e45 ("drm/omapdrm: Implement fbdev emulation as 
in-kernel client")
Fixes: e317a69fe891 ("drm/radeon: Implement client-based fbdev 
emulation")
Fixes: 71ec16f45ef8 ("drm/tegra: Implement fbdev emulation as 
in-kernel client")

Signed-off-by: Thomas Zimmermann 
Tested-by: Moritz Duge 
Tested-by: Torsten Krah 
Tested-by: Paul Schyska 
Cc: Daniel Vetter 
Cc: David Airlie 
Cc: Noralf Trønnes 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: Javier Martinez Canillas 
Cc: Russell King 
Cc: Inki Dae 
Cc: Seung-Woo Kim 
Cc: Kyungmin Park 
Cc: Krzysztof Kozlowski 
Cc: Patrik Jakobsson 
Cc: Rob Clark 
Cc: Abhinav Kumar 
Cc: Dmitry Baryshkov 
Cc: Tomi Valkeinen 
Cc: Alex Deucher 
Cc: "Christian König" 
Cc: "Pan, Xinhui" 
Cc: Thierry Reding 
Cc: Mikko Perttunen 
Cc: dri-de...@lists.freedesktop.org
Cc: linux-ker...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-samsung-...@vger.kernel.org
Cc: linux-arm-...@vger.kernel.org
Cc: freedr...@lists.freedesktop.org
Cc: amd-gfx@lists.freedesktop.org
Cc: linux-te...@vger.kernel.org
Cc: dri-de...@lists.freedesktop.org
Cc:  # v5.2+


While it's true that the but was introduced by commit 6e3f17ee73f7 and 
that
landed in v5.2, I wonder if this patch could even be applied to such 
olders

Linux versions. Probably in practice it would be at most backported to
v6.2, which is the release that exposed the bug for the amdgpu driver.


No idea. The fix looks simple enough, but a lot has changed in the 
surrounding code.




Actually it needs to go to at least 6.1.y.

Moritz found it in 6.1.35 (not present in 6.1.34).



Best regards
Thomas



Your explanation makes sense to me and the patch looks good.

Reviewed-by: Javier Martinez Canillas

RE: [RFC 1/2] drm/amd: Extend Intel ASPM quirk to all dGPUs

2023-07-06 Thread Limonciello, Mario

[AMD Official Use Only - General]

Thanks.  I'm going to leave this series as the backup option, have another idea
that I'll have Koba try first.

> -Original Message-
> From: Quan, Evan 
> Sent: Wednesday, July 5, 2023 20:04
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: koba...@canonical.com; Limonciello, Mario
> 
> Subject: RE: [RFC 1/2] drm/amd: Extend Intel ASPM quirk to all dGPUs
>
> [AMD Official Use Only - General]
>
> One small nitpick:
> It seems there is missing a default clause for the switch statement.
> Will that hit the compile warning about "a switch statement must have a
> default clause"?
> With that checked, the series is reviewed-by: Evan Quan
> 
>
> Evan
> > -Original Message-
> > From: amd-gfx  On Behalf Of
> Mario
> > Limonciello
> > Sent: Thursday, July 6, 2023 2:07 AM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: koba...@canonical.com; Limonciello, Mario
> > 
> > Subject: [RFC 1/2] drm/amd: Extend Intel ASPM quirk to all dGPUs
> >
> > More failures are reported across additional products and so it seems
> > unless we have a handle on the fundmental ASPM incompatibilities with
> > Intel host and AMD dGPU, we should not allow them on problematic hosts.
> >
> > Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 -
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39 +++-
> --
> > 
> >  drivers/gpu/drm/amd/amdgpu/nv.c|  2 +-
> >  drivers/gpu/drm/amd/amdgpu/vi.c|  2 +-
> >  4 files changed, 29 insertions(+), 15 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > index dc4dc1446a19..294a549e7499 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> > @@ -1314,7 +1314,6 @@ void amdgpu_device_pci_config_reset(struct
> > amdgpu_device *adev);
> >  int amdgpu_device_pci_reset(struct amdgpu_device *adev);
> >  bool amdgpu_device_need_post(struct amdgpu_device *adev);
> >  bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
> > -bool amdgpu_device_aspm_support_quirk(void);
> >
> >  void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64
> > num_bytes,
> > u64 num_vis_bytes);
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > index 7f069e1731fe..ef22a0a6065e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> > @@ -1458,6 +1458,30 @@ bool amdgpu_device_need_post(struct
> > amdgpu_device *adev)
> >   return true;
> >  }
> >
> > +static bool amdgpu_device_aspm_support_quirk(void)
> > +{
> > +#if IS_ENABLED(CONFIG_X86)
> > + struct cpuinfo_x86 *c = _data(0);
> > +
> > + if (c->x86 != 6)
> > + return true;
> > +
> > + switch (c->x86_model) {
> > + /* Problems reported for Alder Lake
> > +  * Volcanic Islands:
> > +  *   https://gitlab.freedesktop.org/drm/amd/-/issues/1885
> > +  *   e02fe3bc7aba2 ("drm/amdgpu: vi: disable ASPM on Intel Alder
> > Lake based systems")
> > +  * Navi 1x cards:
> > +  *   https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> > +  *   c08c079692da0 ("drm/amdgpu/nv: Apply ASPM quirk on Intel
> > ADL + AMD Navi")
> > +  */
> > + case INTEL_FAM6_ALDERLAKE:
> > + return false;
> > + }
> > +#endif
> > + return true;
> > +}
> > +
> >  /**
> >   * amdgpu_device_should_use_aspm - check if the device should program
> > ASPM
> >   *
> > @@ -1480,18 +1504,9 @@ bool amdgpu_device_should_use_aspm(struct
> > amdgpu_device *adev)
> >   default:
> >   return false;
> >   }
> > - return pcie_aspm_enabled(adev->pdev);
> > -}
> > -
> > -bool amdgpu_device_aspm_support_quirk(void)
> > -{
> > -#if IS_ENABLED(CONFIG_X86)
> > - struct cpuinfo_x86 *c = _data(0);
> > -
> > - return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> > -#else
> > - return true;
> > -#endif
> > + if (!pcie_aspm_enabled(adev->pdev))
> > + return false;
> > + return amdgpu_device_aspm_support_quirk();
&g

Re: [PATCH V5 1/9] drivers core: Add support for Wifi band RF mitigations

2023-06-30 Thread Limonciello, Mario


On 6/30/2023 05:32, Evan Quan wrote:

Due to electrical and mechanical constraints in certain platform designs
there may be likely interference of relatively high-powered harmonics of
the (G-)DDR memory clocks with local radio module frequency bands used
by Wifi 6/6e/7.

To mitigate this, AMD has introduced a mechanism that devices can use to
notify active use of particular frequencies so that other devices can make
relative internal adjustments as necessary to avoid this resonance.

In order for a device to support this, the expected flow for device
driver or subsystems:

Drivers/subsystems contributing frequencies:

1) During probe, check `wbrf_supported_producer` to see if WBRF supported
for the device.
2) If adding frequencies, then call `wbrf_add_exclusion` with the
start and end ranges of the frequencies.
3) If removing frequencies, then call `wbrf_remove_exclusion` with
start and end ranges of the frequencies.

Drivers/subsystems responding to frequencies:

1) During probe, check `wbrf_supported_consumer` to see if WBRF is supported
for the device.
2) Call the `wbrf_retrieve_exclusions` to retrieve the current
exclusions on receiving an ACPI notification for a new frequency
change.

Co-developed-by: Mario Limonciello 
Signed-off-by: Mario Limonciello 
Co-developed-by: Evan Quan 
Signed-off-by: Evan Quan 
--
v4->v5:
   - promote this to be a more generic solution with input argument taking
 `struct device` and provide better scalability to support non-ACPI
 scenarios(Andrew)
   - update the APIs naming and some other minor fixes(Rafael)
---
  drivers/base/Kconfig  |   8 ++
  drivers/base/Makefile |   1 +
  drivers/base/wbrf.c   | 227 ++
  include/linux/wbrf.h  |  65 
  4 files changed, 301 insertions(+)
  create mode 100644 drivers/base/wbrf.c
  create mode 100644 include/linux/wbrf.h

diff --git a/drivers/base/Kconfig b/drivers/base/Kconfig
index 2b8fd6bb7da0..5b441017b225 100644
--- a/drivers/base/Kconfig
+++ b/drivers/base/Kconfig
@@ -242,4 +242,12 @@ config FW_DEVLINK_SYNC_STATE_TIMEOUT
  command line option on every system/board your kernel is expected to
  work on.
  
+config WBRF

+   bool "Wifi band RF mitigation mechanism"
+   default n
+   help
+ Wifi band RF mitigation mechanism allows multiple drivers from
+ different domains to notify the frequencies in use so that hardware
+ can be reconfigured to avoid harmonic conflicts.
+
  endmenu
diff --git a/drivers/base/Makefile b/drivers/base/Makefile
index 3079bfe53d04..c844f68a6830 100644
--- a/drivers/base/Makefile
+++ b/drivers/base/Makefile
@@ -26,6 +26,7 @@ obj-$(CONFIG_GENERIC_MSI_IRQ) += platform-msi.o
  obj-$(CONFIG_GENERIC_ARCH_TOPOLOGY) += arch_topology.o
  obj-$(CONFIG_GENERIC_ARCH_NUMA) += arch_numa.o
  obj-$(CONFIG_ACPI) += physical_location.o
+obj-$(CONFIG_WBRF) += wbrf.o
  
  obj-y			+= test/
  
diff --git a/drivers/base/wbrf.c b/drivers/base/wbrf.c

new file mode 100644
index ..2163a8ec8a9a
--- /dev/null
+++ b/drivers/base/wbrf.c
@@ -0,0 +1,227 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Wifi Band Exclusion Interface
+ * Copyright (C) 2023 Advanced Micro Devices
+ *
+ */
+
+#include 
+
+static BLOCKING_NOTIFIER_HEAD(wbrf_chain_head);
+static DEFINE_MUTEX(wbrf_mutex);
+static struct exclusion_range_pool wbrf_pool;
+
+static int _wbrf_add_exclusion_ranges(struct wbrf_ranges_in *in)
+{
+   int i, j;
+
+   for (i = 0; i < ARRAY_SIZE(in->band_list); i++) {
+   if (!in->band_list[i].start &&
+   !in->band_list[i].end)
+   continue;
+
+   for (j = 0; j < ARRAY_SIZE(wbrf_pool.band_list); j++) {
+   if (wbrf_pool.band_list[j].start == in->band_list[i].start 
&&
+   wbrf_pool.band_list[j].end == in->band_list[i].end) 
{
+   wbrf_pool.ref_counter[j]++;
+   break;
+   }
+   }
+   if (j < ARRAY_SIZE(wbrf_pool.band_list))
+   continue;
+
+   for (j = 0; j < ARRAY_SIZE(wbrf_pool.band_list); j++) {
+   if (!wbrf_pool.band_list[j].start &&
+   !wbrf_pool.band_list[j].end) {
+   wbrf_pool.band_list[j].start = 
in->band_list[i].start;
+   wbrf_pool.band_list[j].end = 
in->band_list[i].end;
+   wbrf_pool.ref_counter[j] = 1;
+   break;
+   }
+   }
+   if (j >= ARRAY_SIZE(wbrf_pool.band_list))
+   return -ENOSPC;
+   }
+
+   return 0;
+}
+
+static int _wbrf_remove_exclusion_ranges(struct wbrf_ranges_in *in)
+{
+   int i, j;
+
+   for (i = 0; i < ARRAY_SIZE(in->band_list); i++) {
+   if (!in->band_list[i].start &&
+

RE: [PATCH v7 6/8] PCI/VGA: Introduce is_boot_device function callback to vga_client_register

2023-06-29 Thread Limonciello, Mario

[Public]

> -Original Message-
> From: 15330273...@189.cn <15330273...@189.cn>
> Sent: Thursday, June 29, 2023 12:00 PM
> To: Bjorn Helgaas ; Sui Jingfeng
> 
> Cc: Bjorn Helgaas ; linux-fb...@vger.kernel.org;
> Cornelia Huck ; Karol Herbst ;
> nouv...@lists.freedesktop.org; Joonas Lahtinen
> ; dri-de...@lists.freedesktop.org; Chai,
> Thomas ; Limonciello, Mario
> ; Gao, Likun ; David
> Airlie ; Ville Syrjala ; Yi 
> Liu
> ; k...@vger.kernel.org; amd-gfx@lists.freedesktop.org;
> Jason Gunthorpe ; Ben Skeggs ; linux-
> p...@vger.kernel.org; Kevin Tian ; Lazar, Lijo
> ; Thomas Zimmermann ;
> Zhang, Bokun ; intel-...@lists.freedesktop.org;
> Maarten Lankhorst ; Jani Nikula
> ; Alex Williamson
> ; Abhishek Sahu ;
> Maxime Ripard ; Rodrigo Vivi ;
> Tvrtko Ursulin ; Yishai Hadas
> ; Pan, Xinhui ; linux-
> ker...@vger.kernel.org; Daniel Vetter ; Deucher, Alexander
> ; Koenig, Christian
> ; Zhang, Hawking 
> Subject: Re: [PATCH v7 6/8] PCI/VGA: Introduce is_boot_device function
> callback to vga_client_register
>
> Hi,
>
> On 2023/6/29 23:54, Bjorn Helgaas wrote:
> > On Thu, Jun 22, 2023 at 01:08:15PM +0800, Sui Jingfeng wrote:
> >> Hi,
> >>
> >>
> >> A nouveau developer(Lyude) from redhat send me a R-B,
> >>
> >> Thanks for the developers of nouveau project.
> >>
> >>
> >> Please allow me add a link[1] here.
> >>
> >>
> >> [1]
> https://lore.kernel.org/all/0afadc69f99a36bc9d03ecf54ff25859dbc10e28.ca
> m...@redhat.com/
> > 1) Thanks for this.  If you post another version of this series,
> > please pick up Lyude's Reviewed-by and include it in the relevant
> > patches (as long as you haven't made significant changes to the
> > code Lyude reviewed).
>
> Yes, no significant changes. Just fix typo.
>
> I also would like to add support for other DRM drivers.
>
> But I think this deserve another patch.
>
> >   Whoever applies this should automatically
> > pick up Reviewed-by/Ack/etc that are replies to the version being
> > applied, but they won't go through previous revisions to find them.
> >
> > 2) Please mention the commit to which the series applies.  I tried to
> > apply this on v6.4-rc1, but it doesn't apply cleanly.
>
> Since I'm a graphic driver developer, I'm using drm-tip.
>
> I just have already pulled, it still apply cleanly on drm-tip.
>
> > 3) Thanks for including cover letters in your postings.  Please
> > include a little changelog in the cover letter so we know what
> > changed between v6 and v7, etc.
>
> No change between v6 and v7,
>
> it seems that it is because the mailbox don't allow me to sending too
> many mails a day.
>
> so some of the patch is failed to delivery because out of quota.
>
>
> > 4) Right now we're in the middle of the v6.5 merge window, so new
> > content, e.g., this series, is too late for v6.5.  Most
> > maintainers, including me, wait to merge new content until the
> > merge window closes and a new -rc1 is tagged.  This merge window
> > should close on July 9, and people will start merging content for
> > v6.6, typically based on v6.5-rc1.
>
> I'm wondering
>
> Would you will merge all of the patches in this series (e.g. including
> the patch for drm/amdgpu(7/8) and drm/radeon(8/8)) ?
>
> Or just part of them?
>
> Emm, I don't know because my patch seems across different subsystem of
> Linux kernel.
>
> There is also a developer for AMDGPU (Mario) give me a R-B for the
> patch-0002 of this series.
>
> So, at least, PATCH-0001, PATCH-0002, PATCH-0003, PATCH-0004, PATCH-
> 0006
> are already OK(got reviewed by).
>
> Those 5 patch are already qualified to be merged, I think.

I think what you can do is pick up all the tags in your next version.  Once the
whole series has tags we can discuss how it merges.

>
> I means that if you could merge those 5 patch first, then there no need
> to send another version again.
>
> I will refine the rest patch with more details and description.
>
> I'm fear of making too much noise.
>
> > Bjorn

RE: [PATCH] Revert "drm/amd/display: edp do not add non-edid timings"

2023-06-26 Thread Limonciello, Mario

[Public]

> -Original Message-
> From: Limonciello, Mario
> Sent: Monday, June 26, 2023 12:45 PM
> To: Hersen Wu ; amd-gfx@lists.freedesktop.org;
> Wentland, Harry 
> Cc: Wu, Hersen 
> Subject: RE: [PATCH] Revert "drm/amd/display: edp do not add non-edid
> timings"
>
> > This change causes regression when eDP and external display in mirror
> > mode. When external display supports low resolution than eDP, use eDP
> > timing to driver external display may cause corruption on external
> > display.
> >
> > This reverts commit aa9704d5127f06c9ffedb0480d2788b87fecedfb.

One more thing - although this is the correct hash for ASDN, this merged
into Linus' tree as e749dd10e5f292061ad63d2b030194bf7d7d452c.

As this has to go back to stable trees properly, I think the hash should
reflect what's in Linus' tree instead of what's in ASDN.

> >
> > Signed-off-by: Hersen Wu 
>
> The original commit CC to stable, we need this to go to stable too.
>
> Here's some tags to pick up when merging.
>
> Cc: sta...@vger.kernel.org
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2655
> Reviewed-by: Mario Limonciello 
>
> > ---
> >  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +---
> >  1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index a46b8b47b756..073bf00c6fdc 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -7258,13 +7258,7 @@ static int
> > amdgpu_dm_connector_get_modes(struct drm_connector *connector)
> > drm_add_modes_noedid(connector, 1920,
> > 1080);
> > } else {
> > amdgpu_dm_connector_ddc_get_modes(connector, edid);
> > -   /* most eDP supports only timings from its edid,
> > -* usually only detailed timings are available
> > -* from eDP edid. timings which are not from edid
> > -* may damage eDP
> > -*/
> > -   if (connector->connector_type !=
> > DRM_MODE_CONNECTOR_eDP)
> > -
> > amdgpu_dm_connector_add_common_modes(encoder, connector);
> > +   amdgpu_dm_connector_add_common_modes(encoder,
> > connector);
> > amdgpu_dm_connector_add_freesync_modes(connector,
> > edid);
> > }
> > amdgpu_dm_fbc_init(connector);
> > --
> > 2.25.1

RE: [PATCH] Revert "drm/amd/display: edp do not add non-edid timings"

2023-06-26 Thread Limonciello, Mario

[Public]

> This change causes regression when eDP and external display in mirror
> mode. When external display supports low resolution than eDP, use eDP
> timing to driver external display may cause corruption on external
> display.
>
> This reverts commit aa9704d5127f06c9ffedb0480d2788b87fecedfb.
>
> Signed-off-by: Hersen Wu 

The original commit CC to stable, we need this to go to stable too.

Here's some tags to pick up when merging.

Cc: sta...@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2655
Reviewed-by: Mario Limonciello 

> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index a46b8b47b756..073bf00c6fdc 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -7258,13 +7258,7 @@ static int
> amdgpu_dm_connector_get_modes(struct drm_connector *connector)
>   drm_add_modes_noedid(connector, 1920,
> 1080);
>   } else {
>   amdgpu_dm_connector_ddc_get_modes(connector, edid);
> - /* most eDP supports only timings from its edid,
> -  * usually only detailed timings are available
> -  * from eDP edid. timings which are not from edid
> -  * may damage eDP
> -  */
> - if (connector->connector_type !=
> DRM_MODE_CONNECTOR_eDP)
> -
>   amdgpu_dm_connector_add_common_modes(encoder, connector);
> + amdgpu_dm_connector_add_common_modes(encoder,
> connector);
>   amdgpu_dm_connector_add_freesync_modes(connector,
> edid);
>   }
>   amdgpu_dm_fbc_init(connector);
> --
> 2.25.1

Re: [PATCH] drm/amd: Fix a documentation warning about excess parameters

2023-06-26 Thread Limonciello, Mario




On 6/26/2023 10:05 AM, Alex Deucher wrote:

On Mon, Jun 26, 2023 at 11:00 AM Mario Limonciello
 wrote:

`pcie_index` and `pcie_data` aren't used by
amdgpu_device_indirect_wreg() since commit 65ba96e91b68
("drm/amdgpu: Move to common indirect reg access helper") but
the documentation wasn't updated. This causes a warning while
building documentation.

Fixes: 65ba96e91b68 ("drm/amdgpu: Move to common indirect reg access helper")
Signed-off-by: Mario Limonciello 

Reviewed-by: Alex Deucher 

It turns out that the exact same patch already landed in ASDN as:

fbdfbe84aaf4 ("drm/amdgpu: Fix up kdoc in amdgpu_device.c")

and I missed this.  Sorry for that.




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 
  1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 65fe0f3488679..a3dae8ffbdb10 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -747,8 +747,6 @@ u64 amdgpu_device_indirect_rreg64(struct amdgpu_device 
*adev,
   * amdgpu_device_indirect_wreg - write an indirect register address
   *
   * @adev: amdgpu_device pointer
- * @pcie_index: mmio register offset
- * @pcie_data: mmio register offset
   * @reg_addr: indirect register offset
   * @reg_data: indirect register data
   *
@@ -778,8 +776,6 @@ void amdgpu_device_indirect_wreg(struct amdgpu_device *adev,
   * amdgpu_device_indirect_wreg64 - write a 64bits indirect register address
   *
   * @adev: amdgpu_device pointer
- * @pcie_index: mmio register offset
- * @pcie_data: mmio register offset
   * @reg_addr: indirect register offset
   * @reg_data: indirect register data
   *
--
2.34.1

RE: [PATCH 2/4] drm/amd/display: Set minimum requirement for using PSR-SU on Rembrandt

2023-06-23 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Li, Sun peng (Leo) 
> Sent: Friday, June 23, 2023 2:27 PM
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Lin, Tsung-hua (Ryan) ; Rossi, Marc
> ; Wang, Sean ; Mahfooz,
> Hamza 
> Subject: Re: [PATCH 2/4] drm/amd/display: Set minimum requirement for
> using PSR-SU on Rembrandt
>
>
>
>
> On 6/22/23 14:25, Mario Limonciello wrote:
> > A number of parade TCONs are causing system hangs when utilized with
> > older DMUB firmware and PSR-SU. Some changes have been introduced into
> > DMUB firmware to add resilience against these failures.
> >
> > Don't allow running PSR-SU unless on the newer firmware.
> >
> > Cc: Sean Wang 
> > Cc: Marc Rossi 
> > Cc: Hamza Mahfooz 
> > Cc: Tsung-hua (Ryan) Lin 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2443
> > Signed-off-by: Mario Limonciello 
> > ---
> >   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c |  3 ++-
> >   drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c  |  7 +++
> >   drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h  |  1 +
> >   drivers/gpu/drm/amd/display/dmub/dmub_srv.h   |  2 ++
> >   drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.c |  5 +
> >   drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.h |  2 ++
> >   drivers/gpu/drm/amd/display/dmub/src/dmub_srv.c   | 10 ++
> >   7 files changed, 25 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> > index d647f68fd563..4f61d4f257cd 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_psr.c
> > @@ -24,6 +24,7 @@
> >*/
> >
> >   #include "amdgpu_dm_psr.h"
> > +#include "dc_dmub_srv.h"
> >   #include "dc.h"
> >   #include "dm_helpers.h"
> >   #include "amdgpu_dm.h"
> > @@ -50,7 +51,7 @@ static bool link_supports_psrsu(struct dc_link *link)
> > !link->dpcd_caps.psr_info.psr2_su_y_granularity_cap)
> > return false;
> >
> > -   return true;
> > +   return dc_dmub_check_min_version(dc->ctx->dmub_srv->dmub);
> >   }
> >
> >   /*
> > diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
> b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
> > index c52c40b16387..c753c6f30dd7 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.c
> > @@ -1011,3 +1011,10 @@ void dc_send_update_cursor_info_to_dmu(
> > dm_execute_dmub_cmd_list(pCtx->stream->ctx, 2, cmd,
> DM_DMUB_WAIT_TYPE_WAIT);
> > }
> >   }
> > +
> > +bool dc_dmub_check_min_version(struct dmub_srv *srv)
> > +{
> > +   if (!srv->hw_funcs.is_psrsu_supported)
> > +   return true;
> > +   return srv->hw_funcs.is_psrsu_supported(srv);
> > +}
> > diff --git a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
> b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
> > index a5196a9292b3..099f94b6107c 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
> > +++ b/drivers/gpu/drm/amd/display/dc/dc_dmub_srv.h
> > @@ -86,4 +86,5 @@ void dc_dmub_setup_subvp_dmub_command(struct
> dc *dc, struct dc_state *context, b
> >   void dc_dmub_srv_log_diagnostic_data(struct dc_dmub_srv
> *dc_dmub_srv);
> >
> >   void dc_send_update_cursor_info_to_dmu(struct pipe_ctx *pCtx, uint8_t
> pipe_idx);
> > +bool dc_dmub_check_min_version(struct dmub_srv *srv);
> >   #endif /* _DMUB_DC_SRV_H_ */
> > diff --git a/drivers/gpu/drm/amd/display/dmub/dmub_srv.h
> b/drivers/gpu/drm/amd/display/dmub/dmub_srv.h
> > index 2a66a305679a..4585e0419da6 100644
> > --- a/drivers/gpu/drm/amd/display/dmub/dmub_srv.h
> > +++ b/drivers/gpu/drm/amd/display/dmub/dmub_srv.h
> > @@ -367,6 +367,8 @@ struct dmub_srv_hw_funcs {
> >
> > bool (*is_supported)(struct dmub_srv *dmub);
> >
> > +   bool (*is_psrsu_supported)(struct dmub_srv *dmub);
> > +
> > bool (*is_hw_init)(struct dmub_srv *dmub);
> >
> > void (*enable_dmub_boot_options)(struct dmub_srv *dmub,
> > diff --git a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.c
> b/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.c
> > index ebf7aeec4029..c8445d474107 100644
> > --- a/drivers/gpu/drm/amd/display/dmub/src/dmub_dcn31.c
> > +++ b/drivers/gpu/drm/amd/display/dmub/src/dmub_d

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Limonciello, Mario




On 6/23/2023 11:28 AM, Rafael J. Wysocki wrote:

On Fri, Jun 23, 2023 at 5:57 PM Limonciello, Mario
 wrote:


On 6/23/2023 9:52 AM, Rafael J. Wysocki wrote:

On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:

From: Mario Limonciello 

Due to electrical and mechanical constraints in certain platform designs
there may be likely interference of relatively high-powered harmonics of
the (G-)DDR memory clocks with local radio module frequency bands used
by Wifi 6/6e/7.

To mitigate this, AMD has introduced an ACPI based mechanism that
devices can use to notify active use of particular frequencies so
that devices can make relative internal adjustments as necessary
to avoid this resonance.

In order for a device to support this, the expected flow for device
driver or subsystems:

Drivers/subsystems contributing frequencies:

1) During probe, check `wbrf_supported_producer` to see if WBRF supported

The prefix should be acpi_wbrf_ or acpi_amd_wbrf_ even, so it is clear
that this uses ACPI and is AMD-specific.

I guess if we end up with an intermediary library approach
wbrf_supported_producer makes sense and that could call acpi_wbrf_*.

But with no intermediate library your suggestion makes sense.

I would prefer not to make it acpi_amd as there is no reason that
this exact same problem couldn't happen on an
Wifi 6e + Intel SOC + AMD dGPU design too and OEMs could use the
same mitigation mechanism as Wifi6e + AMD SOC + AMD dGPU too.

The mitigation mechanism might be the same, but the AML interface very
well may be different.



Right.  I suppose right now we should keep it prefixed as "amd",
and if it later is promoted as a standard it can be renamed.




My point is that this particular interface is AMD-specific ATM and I'm
not aware of any plans to make it "standard" in some way.



Yeah; this implementation is currently AMD specific AML, but I
expect the exact same AML would be delivered to OEMs using the
dGPUs.




Also if the given interface is specified somewhere, it would be good
to have a pointer to that place.



It's a code first implementation.  I'm discussing with the
owners when they will release it.





Whether or not there needs to be an intermediate library wrapped
around this is a different matter.

IMO individual drivers should not be expected to use this interface
directly, as that would add to boilerplate code and overall bloat.


The thing is the ACPI method is not a platform method.  It's
a function of the device (_DSM).

The reason for having acpi_wbrf.c in the first place is to
avoid the boilerplate of the _DSM implementation across multiple
drivers.



Also whoever uses it, would first need to check if the device in
question has an ACPI companion.



Which comes back to Andrew's point.
Either we:

Have a generic wbrf_ helper that takes struct *device and
internally checks if there is an ACPI companion and support.

or

Do the check for support in mac80211 + applicable drivers
and only call the AMD WBRF ACPI method in those drivers in
those cases.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-23 Thread Limonciello, Mario




On 6/23/2023 9:52 AM, Rafael J. Wysocki wrote:

On Wed, Jun 21, 2023 at 7:47 AM Evan Quan  wrote:

From: Mario Limonciello 

Due to electrical and mechanical constraints in certain platform designs
there may be likely interference of relatively high-powered harmonics of
the (G-)DDR memory clocks with local radio module frequency bands used
by Wifi 6/6e/7.

To mitigate this, AMD has introduced an ACPI based mechanism that
devices can use to notify active use of particular frequencies so
that devices can make relative internal adjustments as necessary
to avoid this resonance.

In order for a device to support this, the expected flow for device
driver or subsystems:

Drivers/subsystems contributing frequencies:

1) During probe, check `wbrf_supported_producer` to see if WBRF supported

The prefix should be acpi_wbrf_ or acpi_amd_wbrf_ even, so it is clear
that this uses ACPI and is AMD-specific.


I guess if we end up with an intermediary library approach
wbrf_supported_producer makes sense and that could call acpi_wbrf_*.

But with no intermediate library your suggestion makes sense.

I would prefer not to make it acpi_amd as there is no reason that
this exact same problem couldn't happen on an
Wifi 6e + Intel SOC + AMD dGPU design too and OEMs could use the
same mitigation mechanism as Wifi6e + AMD SOC + AMD dGPU too.



Whether or not there needs to be an intermediate library wrapped
around this is a different matter.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-22 Thread Limonciello, Mario




On 6/21/2023 8:55 PM, Andrew Lunn wrote:

Honestly I'm not sure though we need this complexity right now? I mean,
it'd be really easy to replace the calls in mac80211 with some other
more generalised calls in the future?

You need some really deep platform/hardware level knowledge and
involvement to do this, so I don't think it's something that someone
will come up with very easily for a DT-based platform...

What is this API about?

It is a struct device says, i'm badly designed and make a mess of the
following frequency bands. Optionally, if you ask me nicely, i might
be able to tweak what i'm doing to avoid interfering with you.

And it is about a struct device say, i'm using this particular
frequency. If you can reduce the noise you make, i would be thankful.

Hey now - you're making assumptions about what's badly designed.

At it's core the issue here that prompts all of this is a
"platform" issue with the tiny Z heights laptops these days
strive for causing implied limitations for shielding.

Independently both components work just fine.



The one generating the noise could be anything. The PWM driving my
laptop display back light?, What is being interfered with?  The 3.5mm
audio jack?

How much deep system knowledge is needed to call pwm_set_state() to
move the base frequency up above 20Khz so only my dog will hear it?
But at the cost of a loss of efficiency and my battery going flatter
faster?

Is the DDR memory really the only badly designed component, when you
think of the range of systems Linux is used on from PHC to tiny
embedded systems?

Ideally we want any sort of receiver with a low noise amplifier to
just unconditionally use this API to let rest of the system know about
it. And ideally we want anything which is a source of noise to declare
itself. What happens after that should be up to the struct device
causing the interference.

I do get your point here - but the problem with a PWM on your
laptop display interfering with the 3.5mm audio jack would
likely be localized to your specific model.

If you have the 16" version of the laptop and I have the 13"
version I might have the 3.5mm audio jack in another location,
that is better shielded and so making that assumption that we
both have the same components so need to make changes could be
totally wrong.

If you have EVERYTHING with an amplifier advertising frequencies
in use without any extra information about the location of the
component or the impacts that component can have you're going
to have a useless interface that is just a bunch of garbage data.

I really think the application of this type of software
mitigation should be reserved for system designers that made
those design decisions and know they are going to run into problems.


Mario did say:

   The way that WBRF has been architected, it's intended to be able to
   scale to any type of device pair that has harmonic issues.

Andrew

The types of things that we envisioned were high frequency devices
with larger power emissions. For example WWAN or USB4 devices.
These fit well into the ACPI device model.

When Evan gets back from holiday I'll discuss with him the ideas from
this thread.

However before then I would really appreciate if Rafael can provide
some comments on patch 1 as it stands today.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario


So if we go down this path of CONFIG_WBRF and CONFIG_WBRF_ACPI, another
question would be where should the new "wbrf.c" be stored?  The ACPI only
version most certainly made sense in drivers/acpi/wbrf.c, but a generic
version that only has an ACPI implementation right now not so much.

On 6/21/2023 1:30 PM, Andrew Lunn wrote:

And consumer would need to call it, but only if CONFIG_WBRF_ACPI isn't set.

Why? How is ACPI special that it does not need notifiers?

ACPI core does has notifiers that are used, but they don't work the same.
If you look at patch 4, you'll see amdgpu registers and unregisters using
both

acpi_install_notify_handler()
and
acpi_remove_notify_handler()

If we supported both ACPI notifications and non-ACPI notifications
all consumers would have to have support to register and use both types.




I don't see why it couldn't be a DT/ACPI hybrid solution for ARM64.

As said somewhere else, nobody does hybrid. In fact, turn it
around. Why not implement all this in DT, and make X86 hybrid? That
will make arm, powerpc, risc-v and mips much simpler :-)

Andrew

Doesn't coreboot do something hybrid with device tree?  I thought they
generate their ACPI tables from a combination of DT and some static ASL.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 12:26 PM, Andrew Lunn wrote:

I think what you're asking for is another layer of indirection
like CONFIG_WBRF in addition to CONFIG_ACPI_WBRF.

Producers would call functions like wbrf_supported_producer()
where the source file is not guarded behind CONFIG_ACPI_WBRF,
but instead by CONFIG_WBRF and locally use CONFIG_ACPI_WBRF within
it.  So a producer could look like this:

bool wbrf_supported_producer(struct device *dev)
{
#ifdef CONFIG_ACPI_WBRF
     struct acpi_device *adev = ACPI_COMPANION(dev);

     if (adev)
     return check_acpi_wbrf(adev->handle,
                    WBRF_REVISION,
                    1ULL << WBRF_RECORD);
#endif
     return -ENODEV;

}
EXPORT_SYMBOL_GPL(wbrf_supported_producer);

And then adding/removing could look something like this

int wbrf_add_exclusion(struct device *dev,
            struct wbrf_ranges_in *in)
{
#ifdef CONFIG_ACPI_WBRF
     struct acpi_device *adev = ACPI_COMPANION(dev);

     if (adev)
     return wbrf_record(adev, WBRF_RECORD_ADD, in);
#endif
     return -ENODEV;
}
EXPORT_SYMBOL_GPL(wbrf_add_exclusion);

int wbrf_remove_exclusion(struct device *dev,
            struct wbrf_ranges_in *in)
{
#ifdef CONFIG_ACPI_WBRF
     struct acpi_device *adev = ACPI_COMPANION(dev);

     if (adev)
     return wbrf_record(adev, WBRF_RECORD_REMOVE, in);
#endif
     return -ENODEV;
}
EXPORT_SYMBOL_GPL(wbrf_remove_exclusion);

Yes, this looks a lot better.

But what about notifications?
Once you implement this it gets a lot more complex and the driver 
consumers would need
to know more about the kernel's implementation.  For example consumers 
need a

notifier block like:

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h

index e3e2e6e3b485..146fe3c43343 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1066,6 +1066,8 @@ struct amdgpu_device {

    bool    job_hang;
    bool    dc_enabled;
+
+   struct notifier_block   wbrf_notifier;
 };

 static inline struct amdgpu_device *drm_to_adev(struct drm_device *ddev)

And then would need matching notifier functions like:

static int amdgpu_wbrf_frequencies_notifier(struct notifier_block *nb,
    unsigned long action, void *_arg)

And we'd need to set up a chain to be used in this case in the WBRF code:

static BLOCKING_NOTIFIER_HEAD(wbrf_chain_head);

int wbrf_register_notifier(struct notifier_block *nb)
{
    return blocking_notifier_chain_register(_chain_head, nb);
}
EXPORT_SYMBOL_GPL(wbrf_register_notifier);

int wbrf_unregister_notifier(struct notifier_block *nb)
{
    return blocking_notifier_chain_unregister(_chain_head, nb);
}
EXPORT_SYMBOL_GPL(wbrf_unregister_notifier);

And consumer would need to call it, but only if CONFIG_WBRF_ACPI isn't set.

Add/remove functions can easily call something like:

blocking_notifier_call_chain(_chain_head, action, data);

With all of this complexity and (effectively) dead code for ACPI vs non-ACPI
path I really have to ask why wouldn't a non-AMD implementation be able to
do this as ACPI?

I don't see why it couldn't be a DT/ACPI hybrid solution for ARM64.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 11:52 AM, Andrew Lunn wrote:

On Wed, Jun 21, 2023 at 11:15:00AM -0500, Limonciello, Mario wrote:

On 6/21/2023 10:39 AM, Johannes Berg wrote:

On Wed, 2023-06-21 at 17:36 +0200, Andrew Lunn wrote:

On Wed, Jun 21, 2023 at 01:45:56PM +0800, Evan Quan wrote:

From: Mario Limonciello 

Due to electrical and mechanical constraints in certain platform designs
there may be likely interference of relatively high-powered harmonics of
the (G-)DDR memory clocks with local radio module frequency bands used
by Wifi 6/6e/7.

To mitigate this, AMD has introduced an ACPI based mechanism that
devices can use to notify active use of particular frequencies so
that devices can make relative internal adjustments as necessary
to avoid this resonance.

Do only ACPI based systems have:

 interference of relatively high-powered harmonics of the (G-)DDR
 memory clocks with local radio module frequency bands used by
 Wifi 6/6e/7."

Could Device Tree based systems not experience this problem?

They could, of course, but they'd need some other driver to change
_something_ in the system? I don't even know what this is doing
precisely under the hood in the ACPI BIOS, perhaps it adjusts the DDR
memory clock frequency in response to WiFi using a frequency that will
cause interference with harmonics.

The way that WBRF has been architected, it's intended to be able
to scale to any type of device pair that has harmonic issues.

So you set out to make something generic...


In the first use (Wifi 6e + specific AMD dGPUs) that matches this
series BIOS has the following purposes:

1) The existence of _DSM indicates that the system may not have
adequate shielding and should be using these mitigations.

2) Notification mechanism of frequency use.

For the first problematic devices we *could* have done notifications
entirely in native Linux kernel code with notifier chains.
However that still means you need a hint from the platform that the
functionality is needed like a _DSD bit.

It's also done this way so that AML could do some of the notifications
directly to applicable devices in the future without needing "consumer"
driver participation.

And then tie is very closely to ACPI.

Now, you are AMD, i get that ACPI is what you have. But i think as
kernel Maintainers, we need to consider that ACPI is not the only
thing used. Do we want the APIs to be agnostic? I think APIs used by
drivers should be agnostic.

   Andrew

I think what you're asking for is another layer of indirection
like CONFIG_WBRF in addition to CONFIG_ACPI_WBRF.

Producers would call functions like wbrf_supported_producer()
where the source file is not guarded behind CONFIG_ACPI_WBRF,
but instead by CONFIG_WBRF and locally use CONFIG_ACPI_WBRF within
it.  So a producer could look like this:

bool wbrf_supported_producer(struct device *dev)
{
#ifdef CONFIG_ACPI_WBRF
    struct acpi_device *adev = ACPI_COMPANION(dev);

    if (adev)
    return check_acpi_wbrf(adev->handle,
                   WBRF_REVISION,
                   1ULL << WBRF_RECORD);
#endif
    return -ENODEV;

}
EXPORT_SYMBOL_GPL(wbrf_supported_producer);

And then adding/removing could look something like this

int wbrf_add_exclusion(struct device *dev,
           struct wbrf_ranges_in *in)
{
#ifdef CONFIG_ACPI_WBRF
    struct acpi_device *adev = ACPI_COMPANION(dev);

    if (adev)
    return wbrf_record(adev, WBRF_RECORD_ADD, in);
#endif
    return -ENODEV;
}
EXPORT_SYMBOL_GPL(wbrf_add_exclusion);

int wbrf_remove_exclusion(struct device *dev,
           struct wbrf_ranges_in *in)
{
#ifdef CONFIG_ACPI_WBRF
    struct acpi_device *adev = ACPI_COMPANION(dev);

    if (adev)
    return wbrf_record(adev, WBRF_RECORD_REMOVE, in);
#endif
    return -ENODEV;
}
EXPORT_SYMBOL_GPL(wbrf_remove_exclusion);

This would allow anyone interested in making a non-ACPI implementation
be able to slide it into those functions.

How does that sound?

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 11:31 AM, Andrew Lunn wrote:

I think there is enough details for this to happen. It's done
so that either the AML can natively behave as a consumer or a
driver can behave as a consumer.

+/**
+ * APIs needed by drivers/subsystems for contributing frequencies:
+ * During probe, check `wbrf_supported_producer` to see if WBRF is supported.
+ * If adding frequencies, then call `wbrf_add_exclusion` with the
+ * start and end points specified for the frequency ranges added.
+ * If removing frequencies, then call `wbrf_remove_exclusion` with
+ * start and end points specified for the frequency ranges added.
+ */
+bool wbrf_supported_producer(struct acpi_device *adev);
+int wbrf_add_exclusion(struct acpi_device *adev,
+  struct wbrf_ranges_in *in);
+int wbrf_remove_exclusion(struct acpi_device *adev,
+ struct wbrf_ranges_in *in);

Could struct device be used here, to make the API agnostic to where
the information is coming from? That would then allow somebody in the
future to implement a device tree based information provider.

That does make sense, and it wouldn't even be that much harder if we
assume in a given platform there's only one provider

That seems like a very reasonable assumption. It is theoretically
possible to build an ACPI + DT hybrid, but i've never seen it actually
done.

If an ARM64 ACPI BIOS could implement this, then i would guess the low
level bits would be solved, i guess jumping into the EL1
firmware. Putting DT on top instead should not be too hard.

Andrew

To make life easier I'll ask whether we can include snippets of
the matching ASL for this first implementation as part of the
public ACPI spec that matches this code when we release it.

So it sounds like you are pretty open about this, there should be
enough information for independent implementations. So please do make
the APIs between the providers and the consumers abstract, struct
device, not an ACPI object.

Andrew

Think a little more about what a non-ACPI implementation
would look like:

1) Would producers and consumers still need you to set CONFIG_ACPI_WBRF?
2) How would you indicate you need WBRF support?
3) How would notifications from one device to another work?

I don't think those are trivial problems that can be solved by
just making the pointer 'struct device' particularly as with the
ACPI implementation consumers are expecting the notification from
ACPI.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 11:14 AM, Andrew Lunn wrote:

Do only ACPI based systems have:

interference of relatively high-powered harmonics of the (G-)DDR
memory clocks with local radio module frequency bands used by
Wifi 6/6e/7."

Could Device Tree based systems not experience this problem?

They could, of course, but they'd need some other driver to change
_something_ in the system? I don't even know what this is doing
precisely under the hood in the ACPI BIOS

If you don't know what it is actually doing, it suggests the API is
not very well defined. Is there even enough details that ARM64 ACPI
BIOS could implement this?

I think there is enough details for this to happen. It's done
so that either the AML can natively behave as a consumer or a
driver can behave as a consumer.

+/**
+ * APIs needed by drivers/subsystems for contributing frequencies:
+ * During probe, check `wbrf_supported_producer` to see if WBRF is supported.
+ * If adding frequencies, then call `wbrf_add_exclusion` with the
+ * start and end points specified for the frequency ranges added.
+ * If removing frequencies, then call `wbrf_remove_exclusion` with
+ * start and end points specified for the frequency ranges added.
+ */
+bool wbrf_supported_producer(struct acpi_device *adev);
+int wbrf_add_exclusion(struct acpi_device *adev,
+  struct wbrf_ranges_in *in);
+int wbrf_remove_exclusion(struct acpi_device *adev,
+ struct wbrf_ranges_in *in);

Could struct device be used here, to make the API agnostic to where
the information is coming from? That would then allow somebody in the
future to implement a device tree based information provider.

That does make sense, and it wouldn't even be that much harder if we
assume in a given platform there's only one provider

That seems like a very reasonable assumption. It is theoretically
possible to build an ACPI + DT hybrid, but i've never seen it actually
done.

If an ARM64 ACPI BIOS could implement this, then i would guess the low
level bits would be solved, i guess jumping into the EL1
firmware. Putting DT on top instead should not be too hard.

Andrew


To make life easier I'll ask whether we can include snippets of
the matching ASL for this first implementation as part of the
public ACPI spec that matches this code when we release it.

Re: [PATCH V4 1/8] drivers/acpi: Add support for Wifi band RF mitigations

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 10:39 AM, Johannes Berg wrote:

On Wed, 2023-06-21 at 17:36 +0200, Andrew Lunn wrote:

On Wed, Jun 21, 2023 at 01:45:56PM +0800, Evan Quan wrote:

From: Mario Limonciello 

Due to electrical and mechanical constraints in certain platform designs
there may be likely interference of relatively high-powered harmonics of
the (G-)DDR memory clocks with local radio module frequency bands used
by Wifi 6/6e/7.

To mitigate this, AMD has introduced an ACPI based mechanism that
devices can use to notify active use of particular frequencies so
that devices can make relative internal adjustments as necessary
to avoid this resonance.

Do only ACPI based systems have:

interference of relatively high-powered harmonics of the (G-)DDR
memory clocks with local radio module frequency bands used by
Wifi 6/6e/7."

Could Device Tree based systems not experience this problem?

They could, of course, but they'd need some other driver to change
_something_ in the system? I don't even know what this is doing
precisely under the hood in the ACPI BIOS, perhaps it adjusts the DDR
memory clock frequency in response to WiFi using a frequency that will
cause interference with harmonics.


The way that WBRF has been architected, it's intended to be able
to scale to any type of device pair that has harmonic issues.

In the first use (Wifi 6e + specific AMD dGPUs) that matches this
series BIOS has the following purposes:

1) The existence of _DSM indicates that the system may not have
adequate shielding and should be using these mitigations.

2) Notification mechanism of frequency use.

For the first problematic devices we *could* have done notifications
entirely in native Linux kernel code with notifier chains.
However that still means you need a hint from the platform that the
functionality is needed like a _DSD bit.

It's also done this way so that AML could do some of the notifications
directly to applicable devices in the future without needing "consumer"
driver participation.


+/**
+ * APIs needed by drivers/subsystems for contributing frequencies:
+ * During probe, check `wbrf_supported_producer` to see if WBRF is supported.
+ * If adding frequencies, then call `wbrf_add_exclusion` with the
+ * start and end points specified for the frequency ranges added.
+ * If removing frequencies, then call `wbrf_remove_exclusion` with
+ * start and end points specified for the frequency ranges added.
+ */
+bool wbrf_supported_producer(struct acpi_device *adev);
+int wbrf_add_exclusion(struct acpi_device *adev,
+  struct wbrf_ranges_in *in);
+int wbrf_remove_exclusion(struct acpi_device *adev,
+ struct wbrf_ranges_in *in);

Could struct device be used here, to make the API agnostic to where
the information is coming from? That would then allow somebody in the
future to implement a device tree based information provider.

That does make sense, and it wouldn't even be that much harder if we
assume in a given platform there's only one provider - but once you go
beyond that these would need to call function pointers I guess? Though
that could be left for "future improvement" too.

johannes


There's more to it than just sending in the frequency that is
added or removed.  The notification path comes from ACPI as well.

This first implementation only has one provider and consumer
but yes, we envision that there could be multiple of each party
and that AML may be the mechanism for some consumers to react.

Re: [PATCH V4 3/8] wifi: mac80211: Add support for ACPI WBRF

2023-06-21 Thread Limonciello, Mario




On 6/21/2023 5:22 AM, Johannes Berg wrote:

On Wed, 2023-06-21 at 13:45 +0800, Evan Quan wrote:

To support AMD's WBRF interference mitigation mechanism, Wifi adapters
utilized in the system must register the frequencies in use(or unregister
those frequencies no longer used) via the dedicated APCI calls. So that,
other drivers responding to the frequencies can take proper actions to
mitigate possible interference.

To make WBRF feature functional, the kernel needs to be configured with
CONFIG_ACPI_WBRF and the platform is equipped with WBRF support(from
BIOS and drivers).

Signed-off-by: Mario Limonciello 
Co-developed-by: Evan Quan 
Signed-off-by: Evan Quan 

I was going to say this looks good ... but still have a few nits, sorry.

But then the next question anyway is how we merge this? The wifi parts
sort of depend on the first patch, although technically I guess I could
merge them since it's all hidden behind the CONFIG_ symbol, assuming you
get that in via some other tree it can combine upstream.

I'd also say you can merge those parts elsewhere but I'm planning to
also land some locking rework that I've been working on, so it will
probably conflict somewhere.

Since it's all gated by CONFIG_ACPI_WBRF for each subsystem that it touches,
my take is that we should merge like this:

1) Get A-b/R-b on patch 1 (ACPI patch) from Rafael.
2) Merge mac80211 bits through WLAN trees
3) Merge AMDGPU bits *and* ACPI bits through amd-staging-drm-next 
followed by drm tree


Since WLAN and AMDGPU bits are using the exported ACPI functions from
patch 1, we need to make sure that it is accepted and won't change
interface before merging other bits.

Everything can come together in the upstream tree and the bots
will be able to test linux-next as well this way.

By bringing ACPI bits through amd-staging-drm-next we can also enable 
the new Kconfig
option in AMD's CI system to make sure that all the amdgpu bits are 
going through CI

testing too earlier before it even hits linux-next.



+++ b/net/mac80211/chan.c
@@ -506,11 +506,16 @@ static void _ieee80211_change_chanctx(struct 
ieee80211_local *local,
  
  	WARN_ON(!cfg80211_chandef_compatible(>conf.def, chandef));
  
+	ieee80211_remove_wbrf(local, >conf.def);

+
ctx->conf.def = *chandef;
  
  	/* check if min chanctx also changed */

changed = IEEE80211_CHANCTX_CHANGE_WIDTH |
  _ieee80211_recalc_chanctx_min_def(local, ctx, rsvd_for);
+
+   ieee80211_add_wbrf(local, >conf.def);

You ignore the return value here.



@@ -668,6 +673,10 @@ static int ieee80211_add_chanctx(struct ieee80211_local 
*local,
lockdep_assert_held(>mtx);
lockdep_assert_held(>chanctx_mtx);
  
+	err = ieee80211_add_wbrf(local, >conf.def);

+   if (err)
+   return err;

But not here.

In the code, there are basically two error paths:


+int ieee80211_add_wbrf(struct ieee80211_local *local,
+  struct cfg80211_chan_def *chandef)
+{
+   struct device *dev = local->hw.wiphy->dev.parent;
+   struct wbrf_ranges_in ranges_in = {0};
+   int ret;
+
+   if (!local->wbrf_supported)
+   return 0;
+
+   ret = wbrf_get_ranges_from_chandef(chandef, _in);
+   if (ret)
+   return ret;

This really won't fail, just if the bandwidth calculation was bad, but
that's an internal error that WARNs anyway and we can ignore it.


+   return wbrf_add_exclusion(ACPI_COMPANION(dev), _in);

This I find a bit confusing, why do we even propagate the error? If the
platform has some issue with it, should we really fail the connection?


I think it seems better to me to just make this void, and have it be
only a notification interface?

johannes

RE: [PATCH] drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics

2023-06-20 Thread Limonciello, Mario

[Public]

You've got an A-b from Evan already on this.  It looks fine to me too.

Reviewed-by: Mario Limonciello 

> -Original Message-
> From: Yang, WenYou 
> Sent: Sunday, June 11, 2023 12:53 AM
> To: Yang, WenYou ; Deucher, Alexander
> ; Limonciello, Mario
> ; Koenig, Christian
> ; Pan, Xinhui ; Quan,
> Evan 
> Cc: Yuan, Perry ; Liang, Richard qi
> ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; linux-ker...@vger.kernel.org
> Subject: RE: [PATCH] drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to
> acquire gpu_metrics
>
> [AMD Official Use Only - General]
>
> Any comments?
>
> > -Original Message-
> > From: Wenyou Yang 
> > Sent: Thursday, June 1, 2023 9:38 AM
> > To: Deucher, Alexander ; Limonciello, Mario
> > ; Koenig, Christian
> ;
> > Pan, Xinhui ; Quan, Evan 
> > Cc: Yuan, Perry ; Liang, Richard qi
> > ; amd-gfx@lists.freedesktop.org; dri-
> > de...@lists.freedesktop.org; linux-ker...@vger.kernel.org; Yang, WenYou
> > 
> > Subject: [PATCH] drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to
> acquire
> > gpu_metrics
> >
> > To acquire the voltage and current info from gpu_metrics interface, but
> > gpu_metrics_v2_3 doesn't contain them, and to be backward compatible,
> add
> > new gpu_metrics_v2_4 structure.
> >
> > Acked-by: Evan Quan 
> > Signed-off-by: Wenyou Yang 
> > ---
> >  .../gpu/drm/amd/include/kgd_pp_interface.h|  69 +++
> >  .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 109
> -
> > -
> >  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c|   3 +
> >  3 files changed, 172 insertions(+), 9 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > index 9f542f6e19ed..0f37dafafcf9 100644
> > --- a/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > +++ b/drivers/gpu/drm/amd/include/kgd_pp_interface.h
> > @@ -892,4 +892,73 @@ struct gpu_metrics_v2_3 {
> >   uint16_taverage_temperature_core[8]; //
> > average CPU core temperature on APUs
> >   uint16_taverage_temperature_l3[2];
> >  };
> > +
> > +struct gpu_metrics_v2_4 {
> > + struct metrics_table_header common_header;
> > +
> > + /* Temperature */
> > + uint16_ttemperature_gfx;
> > + uint16_ttemperature_soc;
> > + uint16_ttemperature_core[8];
> > + uint16_ttemperature_l3[2];
> > +
> > + /* Utilization */
> > + uint16_taverage_gfx_activity;
> > + uint16_taverage_mm_activity;
> > +
> > + /* Driver attached timestamp (in ns) */
> > + uint64_tsystem_clock_counter;
> > +
> > + /* Power/Energy */
> > + uint16_taverage_socket_power;
> > + uint16_taverage_cpu_power;
> > + uint16_taverage_soc_power;
> > + uint16_taverage_gfx_power;
> > + uint16_taverage_core_power[8];
> > +
> > + /* Average clocks */
> > + uint16_taverage_gfxclk_frequency;
> > + uint16_taverage_socclk_frequency;
> > + uint16_taverage_uclk_frequency;
> > + uint16_taverage_fclk_frequency;
> > + uint16_taverage_vclk_frequency;
> > + uint16_taverage_dclk_frequency;
> > +
> > + /* Current clocks */
> > + uint16_tcurrent_gfxclk;
> > + uint16_tcurrent_socclk;
> > + uint16_tcurrent_uclk;
> > + uint16_tcurrent_fclk;
> > + uint16_tcurrent_vclk;
> > + uint16_tcurrent_dclk;
> > + uint16_tcurrent_coreclk[8];
> > + uint16_tcurrent_l3clk[2];
> > +
> > + /* Throttle status (ASIC dependent) */
> > + uint32_tthrottle_status;
> > +
> > + /* Fans */
> > + uint16_tfan_pwm;
> > +
> > + uint16_tpadding[3];
> > +
> > + /* Throttle status (ASIC independent) */
> > +

Re: [PATCH v6 2/8] PCI/VGA: Deal only with VGA class devices

2023-06-19 Thread Limonciello, Mario




On 6/12/2023 2:25 PM, Sui Jingfeng wrote:

From: Sui Jingfeng 

Deal only with the VGA devcie(pdev->class == 0x0300), so replace the
pci_get_subsys() function with pci_get_class(). Filter the non-PCI display
device(pdev->class != 0x0300) out. There no need to process the non-display
PCI device.

Signed-off-by: Sui Jingfeng 
---

This also means that deleting a PCI device no longer needs
to walk the list.

Reviewed-by: Mario Limonciello 


  drivers/pci/vgaarb.c | 22 --
  1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/pci/vgaarb.c b/drivers/pci/vgaarb.c
index c1bc6c983932..22a505e877dc 100644
--- a/drivers/pci/vgaarb.c
+++ b/drivers/pci/vgaarb.c
@@ -754,10 +754,6 @@ static bool vga_arbiter_add_pci_device(struct pci_dev 
*pdev)
struct pci_dev *bridge;
u16 cmd;
  
-	/* Only deal with VGA class devices */

-   if ((pdev->class >> 8) != PCI_CLASS_DISPLAY_VGA)
-   return false;
-
/* Allocate structure */
vgadev = kzalloc(sizeof(struct vga_device), GFP_KERNEL);
if (vgadev == NULL) {
@@ -1500,7 +1496,9 @@ static int pci_notify(struct notifier_block *nb, unsigned 
long action,
struct pci_dev *pdev = to_pci_dev(dev);
bool notify = false;
  
-	vgaarb_dbg(dev, "%s\n", __func__);

+   /* Only deal with VGA class devices */
+   if (pdev->class != PCI_CLASS_DISPLAY_VGA << 8)
+   return 0;
  
  	/* For now we're only intereted in devices added and removed. I didn't

 * test this thing here, so someone needs to double check for the
@@ -1510,6 +1508,8 @@ static int pci_notify(struct notifier_block *nb, unsigned 
long action,
else if (action == BUS_NOTIFY_DEL_DEVICE)
notify = vga_arbiter_del_pci_device(pdev);
  
+	vgaarb_dbg(dev, "%s: action = %lu\n", __func__, action);

+
if (notify)
vga_arbiter_notify_clients();
return 0;
@@ -1534,8 +1534,8 @@ static struct miscdevice vga_arb_device = {
  
  static int __init vga_arb_device_init(void)

  {
+   struct pci_dev *pdev = NULL;
int rc;
-   struct pci_dev *pdev;
  
  	rc = misc_register(_arb_device);

if (rc < 0)
@@ -1545,11 +1545,13 @@ static int __init vga_arb_device_init(void)
  
  	/* We add all PCI devices satisfying VGA class in the arbiter by

 * default */
-   pdev = NULL;
-   while ((pdev =
-   pci_get_subsys(PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
-  PCI_ANY_ID, pdev)) != NULL)
+   while (1) {
+   pdev = pci_get_class(PCI_CLASS_DISPLAY_VGA << 8, pdev);
+   if (!pdev)
+   break;
+
vga_arbiter_add_pci_device(pdev);
+   }
  
  	pr_info("loaded\n");

return rc;

RE: [PATCH 2/2] drm/amd: Tighten permissions on VBIOS flashing attributes

2023-06-07 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Limonciello, Mario 
> Sent: Wednesday, June 7, 2023 1:53 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario 
> Subject: [PATCH 2/2] drm/amd: Tighten permissions on VBIOS flashing
> attributes
>
> Non-root users shouldn't be able to try to trigger a VBIOS flash
> or query the flashing status.  This should be reserved for users with the
> appropriate permissions.
>
> Fixes: 8424f2ccb3c0 ("drm/amdgpu/psp: Add vbflash sysfs interface
> support")
> Reviewed-by: Alex Deucher 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index 8c60db176119..488d5b7ab97c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -3671,13 +3671,13 @@ static ssize_t
> amdgpu_psp_vbflash_status(struct device *dev,
>  }
>
>  static const struct bin_attribute psp_vbflash_bin_attr = {
> - .attr = {.name = "psp_vbflash", .mode = 0664},
> + .attr = {.name = "psp_vbflash", .mode = 0220},

I noticed a mistake with this, it should be 0660.

If no other feedback I'll correct it when committing.

>   .size = 0,
>   .write = amdgpu_psp_vbflash_write,
>   .read = amdgpu_psp_vbflash_read,
>  };
>
> -static DEVICE_ATTR(psp_vbflash_status, 0444, amdgpu_psp_vbflash_status,
> NULL);
> +static DEVICE_ATTR(psp_vbflash_status, 0440, amdgpu_psp_vbflash_status,
> NULL);
>
>  int amdgpu_psp_sysfs_init(struct amdgpu_device *adev)
>  {
> --
> 2.34.1

RE: [PATCH] drm/amd: Check that a system is a NUMA system before looking for SRAT

2023-06-05 Thread Limonciello, Mario

[Public]

> On 2023-06-02 08:18, Mario Limonciello wrote:
> > It's pointless on laptops to look for the SRAT table as these are not
> > NUMA.  Check the number of possible nodes is > 1 to decide whether to
> > look for SRAT.
> >
> > Suggested-by: Felix Kuehling 
> > Signed-off-by: Mario Limonciello 
>
> I think we discussed this a while ago and I don't remember the exact
> issue that was meant to fix. Was just to get rid of an irritating
> warning in the kernel log? Anyway, the patch looks good to me.

Yeah I forgot all about sending out the fix until I noticed it again recently.

>
> Reviewed-by: Felix Kuehling 

Thanks!

>
>
> > ---
> >   drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 3 ++-
> >   1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > index 950af6820153..3dcd8f8bc98e 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_crat.c
> > @@ -2041,7 +2041,8 @@ static int kfd_fill_gpu_direct_io_link_to_cpu(int
> *avail_size,
> > sub_type_hdr->proximity_domain_from = proximity_domain;
> >
> >   #ifdef CONFIG_ACPI_NUMA
> > -   if (kdev->adev->pdev->dev.numa_node == NUMA_NO_NODE)
> > +   if (kdev->adev->pdev->dev.numa_node == NUMA_NO_NODE &&
> > +   num_possible_nodes() > 1)
> > kfd_find_numa_node_in_srat(kdev);
> >   #endif
> >   #ifdef CONFIG_NUMA

Re: drm/amd: Drop messages in init for radeon, amdgpu

2023-06-05 Thread Limonciello, Mario




On 6/5/2023 9:28 AM, Alex Deucher wrote:

Since there is overlap in supported devices, both
modules load, but only one will bind to a particular
device depending on the user's configuration.  Drop
the message in the module init function as this can
be confusing to users.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2608
Signed-off-by: Alex Deucher 

Reviewed-by: Mario Limonciello 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 1 -
  drivers/gpu/drm/radeon/radeon_drv.c | 1 -
  2 files changed, 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 7eda4f039224..94509b76fa6c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -3065,7 +3065,6 @@ static int __init amdgpu_init(void)
if (r)
goto error_fence;
  
-	DRM_INFO("amdgpu kernel modesetting enabled.\n");

amdgpu_register_atpx_handler();
amdgpu_acpi_detect();
  
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c

index e4374814f0ef..16b9eab90185 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -634,7 +634,6 @@ static int __init radeon_module_init(void)
if (radeon_modeset == 0)
return -EINVAL;
  
-	DRM_INFO("radeon kernel modesetting enabled.\n");

radeon_register_atpx_handler();
  
  	return pci_register_driver(_kms_pci_driver);

RE: [PATCH v2 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-06-01 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, June 1, 2023 11:15 AM
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> 
> Subject: Re: [PATCH v2 1/2] drm/amd: Disallow s0ix without BIOS support
> again
>
> On Thu, Jun 1, 2023 at 11:33 AM Limonciello, Mario
>  wrote:
> >
> > [AMD Official Use Only - General]
> >
> > > -Original Message-
> > > From: Alex Deucher 
> > > Sent: Wednesday, May 31, 2023 10:22 PM
> > > To: Limonciello, Mario 
> > > Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> > > 
> > > Subject: Re: [PATCH v2 1/2] drm/amd: Disallow s0ix without BIOS support
> > > again
> > >
> > > On Wed, May 31, 2023 at 9:26 AM Alex Deucher
> 
> > > wrote:
> > > >
> > > > On Tue, May 30, 2023 at 6:34 PM Mario Limonciello
> > > >  wrote:
> > > > >
> > > > > commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> > > showed
> > > > > improvements to power consumption over suspend when s0ix wasn't
> > > enabled in
> > > > > BIOS and the system didn't support S3.
> > > > >
> > > > > This patch however was misguided because the reason the system
> didn't
> > > > > support S3 was because SMT was disabled in OEM BIOS setup.
> > > > > This prevented the BIOS from allowing S3.
> > > > >
> > > > > Also allowing GPUs to use the s2idle path actually causes problems if
> > > > > they're invoked on systems that may not support s2idle in the platform
> > > > > firmware. `systemd` has a tendency to try to use `s2idle` if `deep` 
> > > > > fails
> > > > > for any reason, which could lead to unexpected flows.
> > > > >
> > > > > The original commit also fixed a problem during resume from suspend
> to
> > > idle
> > > > > without hardware support, but this is no longer necessary with commit
> > > > > ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than
> Raven")
> > > > >
> > > > > Revert commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS
> > > support")
> > > > > to make it match the expected behavior again.
> > > > >
> > > > > Cc: Rafael Ávila de Espíndola 
> > > > > Link:
> > >
> https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
> > > /amdgpu_acpi.c#L1060
> > > > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
> > > > > Signed-off-by: Mario Limonciello 
> > > >
> > > > Patch 1 is:
> > > > Reviewed-by: Alex Deucher 
> > > > Patch 2 seems a bit much, but I could be convinced if you think it
> > > > will actually help more than a warn would.  Users already assume warn
> > > > is a kernel crash.  I'm not sure the average user makes a distinction
> > > > between warn and err.
> > > >
> > >
> > > You'll need to revert d2a197a45daacd ("drm/amd: Only run s3 or s0ix if
> > > system is configured properly") as well, otherwise, we'll break
> > > runtime pm.
> > >
> >
> > Can you elaborate more on your thought process?  d2a197a45daacd was
> added in 5.18
> > and cf488dcd0ab7 was added in 6.3.  I can't imagine runtime PM is broken
> the whole time
> > on dGPUs.
>
> I tested this patch yesterday and it broke runtime pm because
> amdgpu_pmops_prepare() returned 1.  I haven't delved into what
> condition broke.  Reverting this patch restored runtime pm.  This is a
> Threadripper box that only supports S3.  The dGPUs were polaris and
> navi2x.
>

But runtime_suspend isn't supposed to run the prepare() callback AFACIT.
SMART_PREPARE is only used for system wide suspend/resume.

> Alex
>
>
> >
> > > Alex
> > >
> > > > Alex
> > > >
> > > > > ---
> > > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
> > > > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > > > >
> > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > > > index aeeec211861c..e1b01554e323 100644
> > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > > > +++

RE: [PATCH v2 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-06-01 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, May 31, 2023 10:22 PM
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> 
> Subject: Re: [PATCH v2 1/2] drm/amd: Disallow s0ix without BIOS support
> again
>
> On Wed, May 31, 2023 at 9:26 AM Alex Deucher 
> wrote:
> >
> > On Tue, May 30, 2023 at 6:34 PM Mario Limonciello
> >  wrote:
> > >
> > > commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> showed
> > > improvements to power consumption over suspend when s0ix wasn't
> enabled in
> > > BIOS and the system didn't support S3.
> > >
> > > This patch however was misguided because the reason the system didn't
> > > support S3 was because SMT was disabled in OEM BIOS setup.
> > > This prevented the BIOS from allowing S3.
> > >
> > > Also allowing GPUs to use the s2idle path actually causes problems if
> > > they're invoked on systems that may not support s2idle in the platform
> > > firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
> > > for any reason, which could lead to unexpected flows.
> > >
> > > The original commit also fixed a problem during resume from suspend to
> idle
> > > without hardware support, but this is no longer necessary with commit
> > > ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven")
> > >
> > > Revert commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS
> support")
> > > to make it match the expected behavior again.
> > >
> > > Cc: Rafael Ávila de Espíndola 
> > > Link:
> https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
> /amdgpu_acpi.c#L1060
> > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
> > > Signed-off-by: Mario Limonciello 
> >
> > Patch 1 is:
> > Reviewed-by: Alex Deucher 
> > Patch 2 seems a bit much, but I could be convinced if you think it
> > will actually help more than a warn would.  Users already assume warn
> > is a kernel crash.  I'm not sure the average user makes a distinction
> > between warn and err.
> >
>
> You'll need to revert d2a197a45daacd ("drm/amd: Only run s3 or s0ix if
> system is configured properly") as well, otherwise, we'll break
> runtime pm.
>

Can you elaborate more on your thought process?  d2a197a45daacd was added in 
5.18
and cf488dcd0ab7 was added in 6.3.  I can't imagine runtime PM is broken the 
whole time
on dGPUs.

> Alex
>
> > Alex
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
> > >  1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > index aeeec211861c..e1b01554e323 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > @@ -1092,16 +1092,20 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
> > >  * S0ix even though the system is suspending to idle, so return 
> > > false
> > >  * in that case.
> > >  */
> > > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> > > dev_warn_once(adev->dev,
> > >   "Power consumption will be higher as BIOS 
> > > has not been
> configured for suspend-to-idle.\n"
> > >   "To use suspend-to-idle change the sleep 
> > > mode in BIOS
> setup.\n");
> > > +   return false;
> > > +   }
> > >
> > >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> > > dev_warn_once(adev->dev,
> > >   "Power consumption will be higher as the kernel has 
> > > not been
> compiled with CONFIG_AMD_PMC.\n");
> > > -#endif /* CONFIG_AMD_PMC */
> > > +   return false;
> > > +#else
> > > return true;
> > > +#endif /* CONFIG_AMD_PMC */
> > >  }
> > >
> > >  #endif /* CONFIG_SUSPEND */
> > > --
> > > 2.34.1
> > >

Re: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-05-30 Thread Limonciello, Mario




On 5/30/2023 4:34 PM, Alex Deucher wrote:

On Tue, May 30, 2023 at 2:19 PM Limonciello, Mario
 wrote:

[AMD Official Use Only - General]


-Original Message-
From: Alex Deucher 
Sent: Tuesday, May 30, 2023 1:16 PM
To: Limonciello, Mario 
Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola

Subject: Re: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again

On Tue, May 30, 2023 at 1:53 PM Mario Limonciello
 wrote:

commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")

showed

improvements to power consumption over suspend when s0ix wasn't

enabled in

BIOS and the system didn't support S3.

This patch however was misguided because the reason the system didn't
support S3 was because SMT was disabled in OEM BIOS setup.
This prevented the BIOS from allowing S3.

Also allowing GPUs to use the s2idle path actually causes problems if
they're invoked on systems that may not support s2idle in the platform
firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
for any reason, which could lead to unexpected flows.

To make this the behavior discoverable and expected, revert commit
cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") and offer
a message if SMT appears to be disabled.

Cc: Rafael Ávila de Espíndola 
Link:

https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
/amdgpu_acpi.c#L1060

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 16 ++--
  1 file changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c

b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c

index 3a6b2e2089f6..a3523d03d769 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -28,6 +28,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  #include 
  #include 
@@ -1473,6 +1474,13 @@ void amdgpu_acpi_release(void)
   */
  bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev)
  {
+#ifdef CONFIG_X86
+   if (!sched_smt_active()) {
+   dev_warn_once(adev->dev,
+ "SMT is disabled by the BIOS.\n"
+ "To use suspend-to-ram enable SMT in BIOS 
setup.\n");
+   }
+#endif

Will this generate a spurious warning on platforms that are natively non-SMT?

Yeah; it could.  I'm not sure how we can reliably detect this.  I thought about 
looking for
the 'ht' flag, but that probably wouldn't work for this case.

Are there AMD Zen CPUs or APUs that are non-SMT?  Could gate the 
sched_smt_active()
check to only run when it's an AMD x86 Zen SoC.

Some of the more budget conscient Athlon parts don't have SMT IIRC.

Alex

In that case, I think the best solution is to just revert cf488dcd0ab7.



Alex


 return !(adev->flags & AMD_IS_APU) ||
 (pm_suspend_target_state == PM_SUSPEND_MEM);
  }
@@ -1499,16 +1507,20 @@ bool amdgpu_acpi_is_s0ix_active(struct

amdgpu_device *adev)

  * S0ix even though the system is suspending to idle, so return false
  * in that case.
  */
-   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
+   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
 dev_warn_once(adev->dev,
   "Power consumption will be higher as BIOS has 
not been

configured for suspend-to-idle.\n"

   "To use suspend-to-idle change the sleep mode in 
BIOS

setup.\n");

+   return false;
+   }

  #if !IS_ENABLED(CONFIG_AMD_PMC)
 dev_warn_once(adev->dev,
   "Power consumption will be higher as the kernel has not 
been

compiled with CONFIG_AMD_PMC.\n");

-#endif /* CONFIG_AMD_PMC */
+   return false;
+#else
 return true;
+#endif /* CONFIG_AMD_PMC */
  }

  #endif /* CONFIG_SUSPEND */
--
2.34.1

RE: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-05-30 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Limonciello, Mario
> Sent: Tuesday, May 30, 2023 1:38 PM
> To: Rafael Ávila de Espíndola ; Alex Deucher
> 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again
>
> > As far as I know the "no S3 if SMT off" is just an oddity of the
> > particular BIOS I got on the "B550I AORUS PRO AX".
>
> In that case, maybe the message should be downgraded to INFO, and
> only shown in the case that s3 is not supported on APUs.  This will
> narrow it quite a bit then.

Here's my proposal to narrow it down better.

bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev)
{
/* dGPUs always go to S3 */
if (!(adev->flags & AMD_IS_APU))
return true;
/* the kernel has found support for S3 and user selected it */
if (pm_suspend_target_state == PM_SUSPEND_MEM)
return true;
#ifdef CONFIG_X86
if (boot_cpu_has(X86_FEATURE_ZEN) && !sched_smt_active()) {
dev_info_once(adev->dev,
  "SMT is disabled (possibly by the BIOS).\n"
  "To use suspend-to-ram enable SMT in BIOS 
setup.\n");
}
#endif
return false;
}

>
> >
> > Also, what has changed that would prevent the same issue I was hitting
> > before?:
> >
> > https://gitlab.freedesktop.org/drm/amd/-/issues/2364#note_1735422
> >
>
> This commit in 6.3:
> ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven")
>
> > Cheers,
> > Rafael
> >
> > "Limonciello, Mario"  writes:
> >
> > > [AMD Official Use Only - General]
> > >
> > >> -Original Message-
> > >> From: Alex Deucher 
> > >> Sent: Tuesday, May 30, 2023 1:16 PM
> > >> To: Limonciello, Mario 
> > >> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> > >> 
> > >> Subject: Re: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support
> > again
> > >>
> > >> On Tue, May 30, 2023 at 1:53 PM Mario Limonciello
> > >>  wrote:
> > >> >
> > >> > commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> > >> showed
> > >> > improvements to power consumption over suspend when s0ix wasn't
> > >> enabled in
> > >> > BIOS and the system didn't support S3.
> > >> >
> > >> > This patch however was misguided because the reason the system
> didn't
> > >> > support S3 was because SMT was disabled in OEM BIOS setup.
> > >> > This prevented the BIOS from allowing S3.
> > >> >
> > >> > Also allowing GPUs to use the s2idle path actually causes problems if
> > >> > they're invoked on systems that may not support s2idle in the platform
> > >> > firmware. `systemd` has a tendency to try to use `s2idle` if `deep` 
> > >> > fails
> > >> > for any reason, which could lead to unexpected flows.
> > >> >
> > >> > To make this the behavior discoverable and expected, revert commit
> > >> > cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") and offer
> > >> > a message if SMT appears to be disabled.
> > >> >
> > >> > Cc: Rafael Ávila de Espíndola 
> > >> > Link:
> > >>
> >
> https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
> > >> /amdgpu_acpi.c#L1060
> > >> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
> > >> > Signed-off-by: Mario Limonciello 
> > >> > ---
> > >> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 16
> ++--
> > >> >  1 file changed, 14 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > >> > index 3a6b2e2089f6..a3523d03d769 100644
> > >> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > >> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > >> > @@ -28,6 +28,7 @@
> > >> >  #include 
> > >> >  #include 
> > >> >  #include 
> > >> > +#include 
> > >> >  #include 
> > >> >  #include 
> > >> >  #include 
> > >> > @@ -1473,6 +1474,1

RE: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-05-30 Thread Limonciello, Mario

[AMD Official Use Only - General]

> As far as I know the "no S3 if SMT off" is just an oddity of the
> particular BIOS I got on the "B550I AORUS PRO AX".

In that case, maybe the message should be downgraded to INFO, and
only shown in the case that s3 is not supported on APUs.  This will
narrow it quite a bit then.

>
> Also, what has changed that would prevent the same issue I was hitting
> before?:
>
> https://gitlab.freedesktop.org/drm/amd/-/issues/2364#note_1735422
>

This commit in 6.3:
ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven")

> Cheers,
> Rafael
>
> "Limonciello, Mario"  writes:
>
> > [AMD Official Use Only - General]
> >
> >> -----Original Message-
> >> From: Alex Deucher 
> >> Sent: Tuesday, May 30, 2023 1:16 PM
> >> To: Limonciello, Mario 
> >> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> >> 
> >> Subject: Re: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support
> again
> >>
> >> On Tue, May 30, 2023 at 1:53 PM Mario Limonciello
> >>  wrote:
> >> >
> >> > commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> >> showed
> >> > improvements to power consumption over suspend when s0ix wasn't
> >> enabled in
> >> > BIOS and the system didn't support S3.
> >> >
> >> > This patch however was misguided because the reason the system didn't
> >> > support S3 was because SMT was disabled in OEM BIOS setup.
> >> > This prevented the BIOS from allowing S3.
> >> >
> >> > Also allowing GPUs to use the s2idle path actually causes problems if
> >> > they're invoked on systems that may not support s2idle in the platform
> >> > firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
> >> > for any reason, which could lead to unexpected flows.
> >> >
> >> > To make this the behavior discoverable and expected, revert commit
> >> > cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") and offer
> >> > a message if SMT appears to be disabled.
> >> >
> >> > Cc: Rafael Ávila de Espíndola 
> >> > Link:
> >>
> https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
> >> /amdgpu_acpi.c#L1060
> >> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
> >> > Signed-off-by: Mario Limonciello 
> >> > ---
> >> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 16 ++--
> >> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> >> > index 3a6b2e2089f6..a3523d03d769 100644
> >> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> >> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> >> > @@ -28,6 +28,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > @@ -1473,6 +1474,13 @@ void amdgpu_acpi_release(void)
> >> >   */
> >> >  bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev)
> >> >  {
> >> > +#ifdef CONFIG_X86
> >> > +   if (!sched_smt_active()) {
> >> > +   dev_warn_once(adev->dev,
> >> > + "SMT is disabled by the BIOS.\n"
> >> > + "To use suspend-to-ram enable SMT in BIOS 
> >> > setup.\n");
> >> > +   }
> >> > +#endif
> >>
> >> Will this generate a spurious warning on platforms that are natively non-
> SMT?
> >
> > Yeah; it could.  I'm not sure how we can reliably detect this.  I thought 
> > about
> looking for
> > the 'ht' flag, but that probably wouldn't work for this case.
> >
> > Are there AMD Zen CPUs or APUs that are non-SMT?  Could gate the
> sched_smt_active()
> > check to only run when it's an AMD x86 Zen SoC.
> >
> >>
> >> Alex
> >>
> >> > return !(adev->flags & AMD_IS_APU) ||
> >> > (pm_suspend_target_state == PM_SUSPEND_MEM);
> >> >  }
> >> > @@ -1499,16 +1507,20 @@ bool amdgpu_acpi_is_s0ix_active(struct
> >> amdgpu_device *adev)
> >> >  * S0ix even though the system is suspending to idle, so return 
> >> > false
> >> >  * in that case.
> >> >  */
> >> > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> >> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> >> > dev_warn_once(adev->dev,
> >> >   "Power consumption will be higher as BIOS 
> >> > has not been
> >> configured for suspend-to-idle.\n"
> >> >   "To use suspend-to-idle change the sleep 
> >> > mode in BIOS
> >> setup.\n");
> >> > +   return false;
> >> > +   }
> >> >
> >> >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> >> > dev_warn_once(adev->dev,
> >> >   "Power consumption will be higher as the kernel 
> >> > has not been
> >> compiled with CONFIG_AMD_PMC.\n");
> >> > -#endif /* CONFIG_AMD_PMC */
> >> > +   return false;
> >> > +#else
> >> > return true;
> >> > +#endif /* CONFIG_AMD_PMC */
> >> >  }
> >> >
> >> >  #endif /* CONFIG_SUSPEND */
> >> > --
> >> > 2.34.1
> >> >

RE: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again

2023-05-30 Thread Limonciello, Mario

[AMD Official Use Only - General]

> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, May 30, 2023 1:16 PM
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> 
> Subject: Re: [PATCH 1/2] drm/amd: Disallow s0ix without BIOS support again
>
> On Tue, May 30, 2023 at 1:53 PM Mario Limonciello
>  wrote:
> >
> > commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> showed
> > improvements to power consumption over suspend when s0ix wasn't
> enabled in
> > BIOS and the system didn't support S3.
> >
> > This patch however was misguided because the reason the system didn't
> > support S3 was because SMT was disabled in OEM BIOS setup.
> > This prevented the BIOS from allowing S3.
> >
> > Also allowing GPUs to use the s2idle path actually causes problems if
> > they're invoked on systems that may not support s2idle in the platform
> > firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails
> > for any reason, which could lead to unexpected flows.
> >
> > To make this the behavior discoverable and expected, revert commit
> > cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") and offer
> > a message if SMT appears to be disabled.
> >
> > Cc: Rafael Ávila de Espíndola 
> > Link:
> https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu
> /amdgpu_acpi.c#L1060
> > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 16 ++--
> >  1 file changed, 14 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 3a6b2e2089f6..a3523d03d769 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -28,6 +28,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -1473,6 +1474,13 @@ void amdgpu_acpi_release(void)
> >   */
> >  bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev)
> >  {
> > +#ifdef CONFIG_X86
> > +   if (!sched_smt_active()) {
> > +   dev_warn_once(adev->dev,
> > + "SMT is disabled by the BIOS.\n"
> > + "To use suspend-to-ram enable SMT in BIOS 
> > setup.\n");
> > +   }
> > +#endif
>
> Will this generate a spurious warning on platforms that are natively non-SMT?

Yeah; it could.  I'm not sure how we can reliably detect this.  I thought about 
looking for
the 'ht' flag, but that probably wouldn't work for this case.

Are there AMD Zen CPUs or APUs that are non-SMT?  Could gate the 
sched_smt_active()
check to only run when it's an AMD x86 Zen SoC.

>
> Alex
>
> > return !(adev->flags & AMD_IS_APU) ||
> > (pm_suspend_target_state == PM_SUSPEND_MEM);
> >  }
> > @@ -1499,16 +1507,20 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
> >  * S0ix even though the system is suspending to idle, so return 
> > false
> >  * in that case.
> >  */
> > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as BIOS has 
> > not been
> configured for suspend-to-idle.\n"
> >   "To use suspend-to-idle change the sleep mode 
> > in BIOS
> setup.\n");
> > +   return false;
> > +   }
> >
> >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as the kernel has 
> > not been
> compiled with CONFIG_AMD_PMC.\n");
> > -#endif /* CONFIG_AMD_PMC */
> > +   return false;
> > +#else
> > return true;
> > +#endif /* CONFIG_AMD_PMC */
> >  }
> >
> >  #endif /* CONFIG_SUSPEND */
> > --
> > 2.34.1
> >

Re: [PATCH 0/9] Support Wifi RFI interference mitigation feature

2023-05-30 Thread Limonciello, Mario




On 5/30/2023 1:22 AM, Felix Fietkau wrote:

On 30.05.23 04:42, Evan Quan wrote:
Due to electrical and mechanical constraints in certain platform 
designs there may
be likely interference of relatively high-powered harmonics of the 
(G-)DDR memory
clocks with local radio module frequency bands used by Wifi 6/6e/7. 
To mitigate
possible RFI interference producers can advertise the frequencies in 
use and

consumers can use this information to avoid using these frequencies for
sensitive features.

The whole patch set is based on 6.4-rc3. With some brief 
introductions as below:

Patch1: Core ACPI interfaces needed to support WBRF feature.
Patch2 - 4: Enable WBRF support for some Mediatek and Qualcomm wifi 
drivers.

Patch5 - 9: Enable WBRF support for AMD graphics driver.

Anson Tsao (1):
   wifi: ath11k: Add support to the Qualcomm ath11k for ACPI WBRF

Evan Quan (6):
   wifi: ath12k: Add support to the Qualcomm ath12k for ACPI WBRF
   drm/amd/pm: update driver_if and ppsmc headers for coming wbrf 
feature
   drm/amd/pm: setup the framework to support Wifi RFI mitigation 
feature

   drm/amd/pm: add flood detection for wbrf events
   drm/amd/pm: enable Wifi RFI mitigation feature support for SMU13.0.0
   drm/amd/pm: enable Wifi RFI mitigation feature support for SMU13.0.7

Mario Limonciello (2):
   drivers/acpi: Add support for Wifi band RF mitigations
   mt76: Add support to the Mediatek MT7921 for ACPI WBRF
Wouldn't it make more sense to put this in mac80211 or cfg80211 
instead of duplicating the logic in different drivers?


- Felix

I think it's generally a sensible proposal, but there are a few things 
that need to be agreed upon to find the right places

for everything.

1) The actual notifying, would it make sense to put it directly into 
these functions?


ieee80211_add_chanctx / ieee80211_del_chanctx

2) "Where" should the WBRF support detection need to happen?

wbrf_supported_producer needs to have an argument of the ACPI companion 
for the device.

What level *should* the ACPI device be found?
Should that still be individual drivers calling a mac80211 helper 
function to indicate they're opting in?
Or should there there be some CONFIG_ACPI_WBRF gated helper as part of a 
driver registration?

Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

2023-05-17 Thread Limonciello, Mario




I think we replaced this with golden timestamp value which doesn't 
require GFX register access.


Ah yes; through

5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")

This wasn't part of the kernel this was originally reported on.

I suspect this would significantly decrease the likelihood of it 
occurring. I'll confirm it.
I do think that patches 1/2 still make sense though because gfxoff can 
be triggered other ways too.



Confirmed that by adding:

5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")
and
ea27ee2bea6b ("drm/amdgpu/gfx11: update gpu_clock_counter logic")
the original issue goes away.

I will still refine my patches and send a v3 up though as GFXOFF can be 
triggered other ways by userspace and we should avoid this bug.


@Alex:

Can you please queue up ea27ee2bea6b for this week's fixes and include 
the tags:


Cc: sta...@vger.kernel.org # 6.1.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method
Cc: sta...@vger.kernel.org # 6.2.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method
Cc: sta...@vger.kernel.org # 6.3.y: 5591a051b86b: drm/amdgpu: refine get 
gpu clock counter method



 Here is the function calls with the patched kernel:


[   32.720456] amdgpu :c2:00.0: amdgpu: set GFX off state to 
enabled, count:1
[   32.720457] amdgpu :c2:00.0: amdgpu: broke gfx_off_mutex for 
gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
adev->gfx.gfx_off_state is 0

[   32.760475] PM: suspend entry (s2idle)
[   32.768996] Filesystems sync: 0.008 seconds
[   32.769310] Freezing user space processes
[   32.776527] Freezing user space processes completed (elapsed 
0.007 seconds)

[   32.776530] OOM killer disabled.
[   32.776531] Freezing remaining freezable tasks
[   32.777528] Freezing remaining freezable tasks completed (elapsed 
0.000 seconds)
[   32.777531] printk: Suspending console(s) (use no_console_suspend 
to debug)
[   32.817853] amdgpu :c2:00.0: amdgpu: Delayed work to enable 
gfxoff
[   32.817857] amdgpu :c2:00.0: amdgpu: 
amdgpu_dpm_set_powergating_by_smu by 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
[   32.818142] amdgpu :c2:00.0: amdgpu: broke pm.mutex for 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]

[   32.852099] amdgpu :c2:00.0: amdgpu: smu_suspend: suspend called
[   32.852101] amdgpu :c2:00.0: amdgpu: smu_disable_dpms: called

Without patch 1 the delayed work doesn't get called on entry ever.


Can we remove this code also as there is a flush anyway with patch 1?


Sure.  Do you think it should go into patch 1 or on it's own?



Preferably in patch 1 itself as it explains why it was removed.

OK.


Also, is there a need to call GFXOFF forcefully on S0ix suspend 
(any chance that gfxoff is not scheduled)?


If using "echo mem | sudo tee /sys/power/state" I've confirmed that 
it's already in GFXOFF.  I don't think this case should happen.

2) RLC is never stopped on GFX 10 or greater.


System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip 
RLC suspend but two problems happen:


1) GFXOFF workqueue doesn't get flushed and so driver's request 
for GFXOFF can happen at wrong time.


2) If suspend entry happens before GFXOFF is really asserted lots 
of errors on resume. IE:




Is patch 3 really required?  Does it make any difference?


No; patch 3 isn't really required with patches 1 and 2.



My preference is to drop patch 3 and not to have an additional place 
of in_s0ix check.

OK.


Thanks,
Lijo

Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

2023-05-16 Thread Limonciello, Mario




On 5/17/2023 12:26 AM, Lazar, Lijo wrote:



On 5/17/2023 10:46 AM, Limonciello, Mario wrote:


On 5/17/2023 12:07 AM, Lazar, Lijo wrote:



On 5/17/2023 10:25 AM, Limonciello, Mario wrote:


On 5/16/2023 11:43 PM, Lazar, Lijo wrote:



On 5/17/2023 5:04 AM, Mario Limonciello wrote:

DCN 3.1.4 s2idle entry will hang
occasionally on s2idle entry, but only if running Wayland and only
when using `systemctl suspend`, not `echo mem | tee 
/sys/power/state`.


This happens because using `systemctl suspend` will cause the screen
to lock right before writing mem into /sys/power/state.



A couple of things on this since this mentions systemctl suspend -

1) If in s2idle, it's supposed to immediately signal and not 
schedule delayed work


3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete 
gfxoff allow signal during suspend without delay


It looks like dead code to me now actually.

amdgpu_device_set_pg_state() skips GFX, so gfxoff control never 
gets called as part of suspend path.




Ok, that means schedule happened sometime before. 
To come up with these patches I had a test kernel with extra prints 
that showed the function call orders.


With systemctl suspend there is a call to 
gfx_v11_0_get_gpu_clock_counter() from userspace IOCTL that triggers 
all this behavior. 


I think we replaced this with golden timestamp value which doesn't 
require GFX register access.


Ah yes; through

5591a051b86b ("drm/amdgpu: refine get gpu clock counter method")

This wasn't part of the kernel this was originally reported on.

I suspect this would significantly decrease the likelihood of it 
occurring.  I'll confirm it.
I do think that patches 1/2 still make sense though because gfxoff can 
be triggered other ways too.




 Here is the function calls with the patched kernel:


[   32.720456] amdgpu :c2:00.0: amdgpu: set GFX off state to 
enabled, count:1
[   32.720457] amdgpu :c2:00.0: amdgpu: broke gfx_off_mutex for 
gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
adev->gfx.gfx_off_state is 0

[   32.760475] PM: suspend entry (s2idle)
[   32.768996] Filesystems sync: 0.008 seconds
[   32.769310] Freezing user space processes
[   32.776527] Freezing user space processes completed (elapsed 0.007 
seconds)

[   32.776530] OOM killer disabled.
[   32.776531] Freezing remaining freezable tasks
[   32.777528] Freezing remaining freezable tasks completed (elapsed 
0.000 seconds)
[   32.777531] printk: Suspending console(s) (use no_console_suspend 
to debug)
[   32.817853] amdgpu :c2:00.0: amdgpu: Delayed work to enable 
gfxoff
[   32.817857] amdgpu :c2:00.0: amdgpu: 
amdgpu_dpm_set_powergating_by_smu by 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
[   32.818142] amdgpu :c2:00.0: amdgpu: broke pm.mutex for 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]

[   32.852099] amdgpu :c2:00.0: amdgpu: smu_suspend: suspend called
[   32.852101] amdgpu :c2:00.0: amdgpu: smu_disable_dpms: called

Without patch 1 the delayed work doesn't get called on entry ever.


Can we remove this code also as there is a flush anyway with patch 1?


Sure.  Do you think it should go into patch 1 or on it's own?



Preferably in patch 1 itself as it explains why it was removed.

OK.


Also, is there a need to call GFXOFF forcefully on S0ix suspend (any 
chance that gfxoff is not scheduled)?


If using "echo mem | sudo tee /sys/power/state" I've confirmed that 
it's already in GFXOFF.  I don't think this case should happen.

2) RLC is never stopped on GFX 10 or greater.


System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip 
RLC suspend but two problems happen:


1) GFXOFF workqueue doesn't get flushed and so driver's request for 
GFXOFF can happen at wrong time.


2) If suspend entry happens before GFXOFF is really asserted lots 
of errors on resume. IE:




Is patch 3 really required?  Does it make any difference?


No; patch 3 isn't really required with patches 1 and 2.



My preference is to drop patch 3 and not to have an additional place 
of in_s0ix check.

OK.


Thanks,
Lijo

Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

2023-05-16 Thread Limonciello, Mario




On 5/17/2023 12:07 AM, Lazar, Lijo wrote:



On 5/17/2023 10:25 AM, Limonciello, Mario wrote:


On 5/16/2023 11:43 PM, Lazar, Lijo wrote:



On 5/17/2023 5:04 AM, Mario Limonciello wrote:

DCN 3.1.4 s2idle entry will hang
occasionally on s2idle entry, but only if running Wayland and only
when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.

This happens because using `systemctl suspend` will cause the screen
to lock right before writing mem into /sys/power/state.



A couple of things on this since this mentions systemctl suspend -

1) If in s2idle, it's supposed to immediately signal and not 
schedule delayed work


3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
allow signal during suspend without delay


It looks like dead code to me now actually.

amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
called as part of suspend path.




Ok, that means schedule happened sometime before. 
To come up with these patches I had a test kernel with extra prints that 
showed the function call orders.


With systemctl suspend there is a call to 
gfx_v11_0_get_gpu_clock_counter() from userspace IOCTL that triggers all 
this behavior.  Here is the function calls with the patched kernel:


[   32.720456] amdgpu :c2:00.0: amdgpu: set GFX off state to 
enabled, count:1
[   32.720457] amdgpu :c2:00.0: amdgpu: broke gfx_off_mutex for 
gfx_v11_0_get_gpu_clock_counter+0xa8/0xf0 [amdgpu], 
adev->gfx.gfx_off_state is 0

[   32.760475] PM: suspend entry (s2idle)
[   32.768996] Filesystems sync: 0.008 seconds
[   32.769310] Freezing user space processes
[   32.776527] Freezing user space processes completed (elapsed 0.007 
seconds)

[   32.776530] OOM killer disabled.
[   32.776531] Freezing remaining freezable tasks
[   32.777528] Freezing remaining freezable tasks completed (elapsed 
0.000 seconds)
[   32.777531] printk: Suspending console(s) (use no_console_suspend to 
debug)

[   32.817853] amdgpu :c2:00.0: amdgpu: Delayed work to enable gfxoff
[   32.817857] amdgpu :c2:00.0: amdgpu: 
amdgpu_dpm_set_powergating_by_smu by 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]
[   32.818142] amdgpu :c2:00.0: amdgpu: broke pm.mutex for 
amdgpu_device_delay_enable_gfx_off.cold+0x29/0x46 [amdgpu]

[   32.852099] amdgpu :c2:00.0: amdgpu: smu_suspend: suspend called
[   32.852101] amdgpu :c2:00.0: amdgpu: smu_disable_dpms: called

Without patch 1 the delayed work doesn't get called on entry ever.


Can we remove this code also as there is a flush anyway with patch 1?


Sure.  Do you think it should go into patch 1 or on it's own?

Also, is there a need to call GFXOFF forcefully on S0ix suspend (any 
chance that gfxoff is not scheduled)?


If using "echo mem | sudo tee /sys/power/state" I've confirmed that it's 
already in GFXOFF.  I don't think this case should happen.

2) RLC is never stopped on GFX 10 or greater.


System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip RLC 
suspend but two problems happen:


1) GFXOFF workqueue doesn't get flushed and so driver's request for 
GFXOFF can happen at wrong time.


2) If suspend entry happens before GFXOFF is really asserted lots of 
errors on resume. IE:




Is patch 3 really required?  Does it make any difference?


No; patch 3 isn't really required with patches 1 and 2.


Thanks,
Lijo


[   63.095227] [drm] Fence fallback timer expired on ring sdma0
[   63.098360] [drm] ring gfx_32772.1.1 was added
[   63.099439] [drm] ring compute_32772.2.2 was added
[   63.100460] [drm] ring sdma_32772.3.3 was added
[   63.100504] [drm] ring gfx_32772.1.1 test pass
[   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
[   63.607234] [drm] ring gfx_32772.1.1 ib test pass
[   63.608964] [drm] ring compute_32772.2.2 test pass
[   64.119173] [drm] Fence fallback timer expired on ring 
compute_32772.2.2

[   64.119219] [drm] ring compute_32772.2.2 ib test pass
[   64.121364] [drm] ring sdma_32772.3.3 test pass
[   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
[   64.631465] [drm] ring sdma_32772.3.3 ib test pass
[   65.143184] [drm] Fence fallback timer expired on ring sdma0


Wondering if the code hides something else because of the timing.
Thanks,
Lijo


This causes a delayed GFXOFF entry to be scheduled right before s2idle
entry.  If the workqueue doesn't get processed before the RLC is 
turned
off the system is hung. Even if the workqueue *does* get processed, 
there

is a race between the APU microcontrollers and driver for whether GFX
is actually powered off when RLC is turned off.

To avoid this issue, flush the workqueue on s2idle entry and ensure 
that

GFX is really in GFXOFF before any sensitive register accesses occur.

Mario Limonciello (3):
   drm/amd: Flush any delayed gfxoff on suspend entry
   drm/amd: Poll for GFX core to be off
   drm/amd: Skip RLC suspend for s0ix on

Re: [PATCH v2 0/3] Fix DCN 3.1.4 hangs on s2idle entry

2023-05-16 Thread Limonciello, Mario



On 5/16/2023 11:43 PM, Lazar, Lijo wrote:



On 5/17/2023 5:04 AM, Mario Limonciello wrote:

DCN 3.1.4 s2idle entry will hang
occasionally on s2idle entry, but only if running Wayland and only
when using `systemctl suspend`, not `echo mem | tee /sys/power/state`.

This happens because using `systemctl suspend` will cause the screen
to lock right before writing mem into /sys/power/state.



A couple of things on this since this mentions systemctl suspend -

1) If in s2idle, it's supposed to immediately signal and not schedule 
delayed work


3964b0c2e843334858da99db881859faa4df241d drm/amdgpu: complete gfxoff 
allow signal during suspend without delay


It looks like dead code to me now actually.

amdgpu_device_set_pg_state() skips GFX, so gfxoff control never gets 
called as part of suspend path.




2) RLC is never stopped on GFX 10 or greater.


System was hanging before this series.

Patch 3 "alone" matches this behavior as described above to skip RLC 
suspend but two problems happen:


1) GFXOFF workqueue doesn't get flushed and so driver's request for 
GFXOFF can happen at wrong time.


2) If suspend entry happens before GFXOFF is really asserted lots of 
errors on resume. IE:


[   63.095227] [drm] Fence fallback timer expired on ring sdma0
[   63.098360] [drm] ring gfx_32772.1.1 was added
[   63.099439] [drm] ring compute_32772.2.2 was added
[   63.100460] [drm] ring sdma_32772.3.3 was added
[   63.100504] [drm] ring gfx_32772.1.1 test pass
[   63.607166] [drm] Fence fallback timer expired on ring gfx_32772.1.1
[   63.607234] [drm] ring gfx_32772.1.1 ib test pass
[   63.608964] [drm] ring compute_32772.2.2 test pass
[   64.119173] [drm] Fence fallback timer expired on ring compute_32772.2.2
[   64.119219] [drm] ring compute_32772.2.2 ib test pass
[   64.121364] [drm] ring sdma_32772.3.3 test pass
[   64.631422] [drm] Fence fallback timer expired on ring sdma_32772.3.3
[   64.631465] [drm] ring sdma_32772.3.3 ib test pass
[   65.143184] [drm] Fence fallback timer expired on ring sdma0


Wondering if the code hides something else because of the timing.
Thanks,
Lijo


This causes a delayed GFXOFF entry to be scheduled right before s2idle
entry.  If the workqueue doesn't get processed before the RLC is turned
off the system is hung. Even if the workqueue *does* get processed, 
there

is a race between the APU microcontrollers and driver for whether GFX
is actually powered off when RLC is turned off.

To avoid this issue, flush the workqueue on s2idle entry and ensure that
GFX is really in GFXOFF before any sensitive register accesses occur.

Mario Limonciello (3):
   drm/amd: Flush any delayed gfxoff on suspend entry
   drm/amd: Poll for GFX core to be off
   drm/amd: Skip RLC suspend for s0ix on PSP 13.0.4 and 13.0.11

  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 25 ++
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 18 
  drivers/gpu/drm/amd/include/amd_shared.h   |  1 +
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c  |  4 ++--
  4 files changed, 46 insertions(+), 2 deletions(-)

Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Limonciello, Mario




On 5/16/2023 4:57 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 5:50 PM Limonciello, Mario  wrote:


On 5/16/2023 4:39 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
 wrote:

On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
Prevent this case from ever happening.

Tested-by: Juan Martinez 
Signed-off-by: Mario Limonciello 
---
   drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dcbdb2641086..f1f879d9ed8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
*adev)
   {
  u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);

+   if (!adev->gfx.gfx_off_state) {
+   dev_err(adev->dev, "GFX is not in GFXOFF\n");
+   return;
+   }

This should move up before the RREG above?  Also, I think it would be
cleaner to just not mess with the RLC in S0i3.  Can we just return
early in smu_disable_dpms() for the APU case?  All of the DPM features
are controlled by the SMU so that function is mostly a nop of APUs
anyway.

Alex

That was what the original attempt did when we first identified this issue.
Unfortunately though just skipping RLC (without patches 1 and 2) means
that GFXOFF still either doesn't get toggled at suspend entry or isn't fully

off at suspend entry.

This leads to the graphics core behaving erratically upon resume.

So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip
RLC for
APU.

Sure.

OK, let me double check RLC skip and I'll send out a v2.

I wonder if we need something similar as patch 2 for other APUs?
I expect patch 1 "alone" to help Renoir and Cezanne hitting a similar 
circumstance.

For Rembrandt and Mendocino, they don't have IMU, so what would you poll?


Thinking out loud here, I wonder if we shouldn't just return early in
the top level suspend/resume functions for S0i3.


I think this can make sense for GFX10 and GFX11 maybe, but as it's already
bifurcated I think it's probably better to do case by case basis.

Re: [PATCH 3/3] drm/amd: Add safety check to make sure RLC is only turned off while in GFXOFF

2023-05-16 Thread Limonciello, Mario




On 5/16/2023 4:39 PM, Alex Deucher wrote:

On Tue, May 16, 2023 at 2:15 PM Mario Limonciello
 wrote:

On GFX11 if RLC is stopped when not in GFXOFF the system will hang.
Prevent this case from ever happening.

Tested-by: Juan Martinez 
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index dcbdb2641086..f1f879d9ed8d 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1766,6 +1766,10 @@ static void gfx_v11_0_rlc_stop(struct amdgpu_device 
*adev)
  {
 u32 tmp = RREG32_SOC15(GC, 0, regRLC_CNTL);

+   if (!adev->gfx.gfx_off_state) {
+   dev_err(adev->dev, "GFX is not in GFXOFF\n");
+   return;
+   }

This should move up before the RREG above?  Also, I think it would be
cleaner to just not mess with the RLC in S0i3.  Can we just return
early in smu_disable_dpms() for the APU case?  All of the DPM features
are controlled by the SMU so that function is mostly a nop of APUs
anyway.

Alex

That was what the original attempt did when we first identified this issue.
Unfortunately though just skipping RLC (without patches 1 and 2) means
that GFXOFF still either doesn't get toggled at suspend entry or isn't fully

off at suspend entry.

This leads to the graphics core behaving erratically upon resume.

So if you're OK with patches 1 and 2, I'll adjust patch 3 to also skip 
RLC for

APU.

Re: drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()

2023-05-03 Thread Limonciello, Mario




On 5/2/2023 11:51 AM, Hamza Mahfooz wrote:

As made mention of, in commit 9128e6babf10 ("drm/amdgpu: fix
amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit c094b8923bdd
("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It
is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it
from gmc_v9_0_hw_fini().

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522
Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset")
Signed-off-by: Hamza Mahfooz 


Reviewed-by: Mario Limonciello 


---
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 1 -
  1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 290804a06e05..6ae5cee9b64b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -1999,7 +1999,6 @@ static int gmc_v9_0_hw_fini(void *handle)
if (adev->mmhub.funcs->update_power_gating)
adev->mmhub.funcs->update_power_gating(adev, false);
  
-	amdgpu_irq_put(adev, >gmc.ecc_irq, 0);

amdgpu_irq_put(adev, >gmc.vm_fault, 0);
  
  	return 0;

RE: [PATCH] drm/amd/pm: conditionally disable pcie lane switching for some sienna_cichlid SKUs

2023-04-21 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Quan, Evan 
> Sent: Friday, April 21, 2023 02:29
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Limonciello, Mario
> ; Quan, Evan 
> Subject: [PATCH] drm/amd/pm: conditionally disable pcie lane switching for
> some sienna_cichlid SKUs
> 
> Disable the pcie lane switching for some sienna_cichlid SKUs since it
> might not work well on some platforms.
> 
> Signed-off-by: Evan Quan 
> Change-Id: Iea9ceaa146c8706768ee077c10e5d33bce9bc1c2

You can drop the Gerrit Change-Id here

> ---
>  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 92 +++
>  1 file changed, 74 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> index 4b91cdc3eaa0..e7223513e384 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
> @@ -2067,33 +2067,94 @@ static int
> sienna_cichlid_display_disable_memory_clock_switch(struct smu_context
>   return ret;
>  }
> 
> +static void sienna_cichlid_get_override_pcie_settings(struct smu_context
> *smu,
> +   uint32_t
> *gen_speed_override,
> +   uint32_t
> *lane_width_override)
> +{
> + struct amdgpu_device *adev = smu->adev;
> +
> + *gen_speed_override = 0xff;
> + *lane_width_override = 0xff;
> +
> + switch (adev->pdev->device) {
> + case 0x73A0:
> + case 0x73A1:
> + case 0x73A2:
> + case 0x73A3:
> + case 0x73AB:
> + case 0x73AE:
> + /* Bit 7:0: PCIE lane width, 1 to 7 corresponds is x1 to x32 */
> + *lane_width_override = 6;
> + break;
> + case 0x73E0:
> + case 0x73E1:
> + case 0x73E3:
> + *lane_width_override = 4;
> + break;
> + case 0x7420:
> + case 0x7421:
> + case 0x7422:
> + case 0x7423:
> + case 0x7424:
> + *lane_width_override = 3;
> + break;
> + default:
> + break;
> + }
> +}
> +
> +#define MAX(a, b)((a) > (b) ? (a) : (b))
> +
>  static int sienna_cichlid_update_pcie_parameters(struct smu_context *smu,
>uint32_t pcie_gen_cap,
>uint32_t pcie_width_cap)
>  {
>   struct smu_11_0_dpm_context *dpm_context = smu-
> >smu_dpm.dpm_context;
> -
> - uint32_t smu_pcie_arg;
> + struct smu_11_0_pcie_table *pcie_table = _context-
> >dpm_tables.pcie_table;
> + uint32_t gen_speed_override, lane_width_override;
>   uint8_t *table_member1, *table_member2;
> + uint32_t min_gen_speed, max_gen_speed;
> + uint32_t min_lane_width, max_lane_width;
> + uint32_t smu_pcie_arg;
>   int ret, i;
> 
>   GET_PPTABLE_MEMBER(PcieGenSpeed, _member1);
>   GET_PPTABLE_MEMBER(PcieLaneCount, _member2);
> 
> - /* lclk dpm table setup */
> - for (i = 0; i < MAX_PCIE_CONF; i++) {
> - dpm_context->dpm_tables.pcie_table.pcie_gen[i] =
> table_member1[i];
> - dpm_context->dpm_tables.pcie_table.pcie_lane[i] =
> table_member2[i];
> + sienna_cichlid_get_override_pcie_settings(smu,
> +   _speed_override,
> +   _width_override);
> +
> + /* PCIE gen speed override */
> + if (gen_speed_override != 0xff) {
> + min_gen_speed = MIN(pcie_gen_cap, gen_speed_override);
> + max_gen_speed = MIN(pcie_gen_cap, gen_speed_override);
> + } else {
> + min_gen_speed = MAX(0, table_member1[0]);
> + max_gen_speed = MIN(pcie_gen_cap, table_member1[1]);
> + min_gen_speed = min_gen_speed > max_gen_speed ?
> + max_gen_speed : min_gen_speed;
>   }
> + pcie_table->pcie_gen[0] = min_gen_speed;
> + pcie_table->pcie_gen[1] = max_gen_speed;
> +
> + /* PCIE lane width override */
> + if (lane_width_override != 0xff) {
> + min_lane_width = MIN(pcie_width_cap, lane_width_override);
> + max_lane_width = MIN(pcie_width_cap, lane_width_override);
> + } else {
> + min_lane_width = MAX(1, table_member2[0]);
> + max_lane_width = MIN(pcie_width_cap, table_member2[1]);
> + min_lane_width = min_lane_width > max_lane_width ?
> +  max_lane_width : min_lane_width;
> + }
> + p

Re: drm/amdgpu: skip kfd-iommu suspend/resume for S0ix

2023-04-12 Thread Limonciello, Mario


On 4/5/2023 06:29, Aaron Liu wrote:

GFX is in gfxoff mode during s0ix so we shouldn't need to
actually execute kfd_iommu_suspend/kfd_iommu_resume operation.

Signed-off-by: Aaron Liu 
Acked-by: Alex Deucher 

Reviewed-by: Mario Limonciello 

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3b6b85d9e0be..5094be94fa06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3304,9 +3304,11 @@ static int amdgpu_device_ip_resume(struct amdgpu_device 
*adev)
  {
int r;
  
-	r = amdgpu_amdkfd_resume_iommu(adev);

-   if (r)
-   return r;
+   if (!adev->in_s0ix) {
+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   return r;
+   }
  
  	r = amdgpu_device_ip_resume_phase1(adev);

if (r)

Re: [PATCH v6 0/2] Send message to PMFW when SMT changes

2023-04-07 Thread Limonciello, Mario


On 4/7/2023 00:38, Wenyou Yang wrote:

When the SMT changes on the fly, send the message to the PMFW
to notify the SMT status changed.

Changes in v6
1./ Update last_smt_active only when the return from
smu_set_cpu_smt_enable() successfully.
2./ Use smu->adev->pm.fw_version to check smu version, if it is not
assigned, get the smu version and assigned it.
3./ Remove the redundant error message print.

Changes in v5
1./ Add a new vangogh_fini_smc_tables() to accommodate the timer fini
and smu_v11_0_fini_smc_tables().
2./ Move the version check of SMU version before initializing the timer.

Changes in v4
1./ Since we didn't find a good solution to handle the case that
manually offlining all the SMT siblings using
/sys/devices/system/cpu/cpu*/online to disable or enable SMT.
Come up with a new solution, add a timer to poll the SMT state
periodically, if finding that the SMT state is changed, it invokes
the interface to notify the PMFW.
2./ Move the generic code to smu_cmn.c.
3./ Add PMFW version check for this feature.

Changes in v3
1./ Because it is only required for Vangogh, move registering notifier
to vangogh_ppt.c, then remove the patch 2, and the number of patches
decreased to 2.

Changes in v2:
1/. Embed the smt notifer callback into "struct smu_context" structure.
2/. Correct the PPSMC_Message_Count value.
3/. Improve several code styles and others.

Wenyou Yang (2):
   drm/amd/pm: Add support to check SMT state periodically
   drm/amd/pm/vangogh: Send SMT enable message to PMFW

  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  8 
  .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 +-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 +-
  .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 32 +-
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 44 +++
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|  5 +++
  6 files changed, 92 insertions(+), 3 deletions(-)



For series:

Reviewed-by: Mario Limonciello

Re: [PATCH v5 1/2] drm/amd/pm: Add support to check SMT state periodically

2023-04-06 Thread Limonciello, Mario


On 4/6/2023 07:45, Wenyou Yang wrote:

Add a timer to poll the SMT state periodically, if the SMT state
is changed, invoke the interface to notify the PMFW.

Signed-off-by: Wenyou Yang 
---
  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  8 
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c| 44 +++
  drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h|  5 +++
  3 files changed, 57 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 09469c750a96..fc571c122e87 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -566,6 +566,9 @@ struct smu_context
  
  	struct firmware pptable_firmware;
  
+	bool last_smt_active;

+   struct timer_list smt_timer;
+
u32 param_reg;
u32 msg_reg;
u32 resp_reg;
@@ -1354,6 +1357,11 @@ struct pptable_funcs {
 * @init_pptable_microcode: Prepare the pptable microcode to upload via 
PSP
 */
int (*init_pptable_microcode)(struct smu_context *smu);
+
+   /**
+* @set_cpu_smt_enable: Set the CPU SMT status.
+*/
+   int (*set_cpu_smt_enable)(struct smu_context *smu, bool smt_enable);
  };
  
  typedef enum {

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
index 3ecb900e6ecd..b0e0c6664ac3 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.c
@@ -26,6 +26,7 @@
  #include "amdgpu_smu.h"
  #include "smu_cmn.h"
  #include "soc15_common.h"
+#include 
  
  /*

   * DO NOT use these for err/warn/info/debug messages.
@@ -1058,3 +1059,46 @@ bool smu_cmn_is_audio_func_enabled(struct amdgpu_device 
*adev)
  
  	return snd_driver_loaded;

  }
+
+#define TIME_INTERVAL  200
+
+static int smu_set_cpu_smt_enable(struct smu_context *smu, bool enable)
+{
+   int ret = -EINVAL;
+
+   if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)
+   ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);
+
+   return ret;
+}
+
+static void smu_smt_timer_callback(struct timer_list *timer)
+{
+   struct smu_context *smu = container_of(timer,
+  struct smu_context, smt_timer);
+   bool smt_active;
+
+   smt_active = sched_smt_active();
+   if (smt_active != smu->last_smt_active) {
+   smu->last_smt_active = smt_active;
+   smu_set_cpu_smt_enable(smu, smt_active);


You're ignoring the return value for smu_set_cpu_smt_enable.  If the 
message failed to send that means smu->last_smt_active will have the 
wrong value and the message will never attempt to send again while in

this SMT state even though the timer triggered again.

I think you should do it like this:

if (!smu_set_cpu_smt_enable(smu, smt_active))
smu->last_smt_active = smt_active;


+   }
+
+   mod_timer(timer, jiffies + msecs_to_jiffies(TIME_INTERVAL));
+}
+
+void smu_smt_timer_init(struct smu_context *smu)
+{
+   struct timer_list *timer = >smt_timer;
+
+   smu->last_smt_active = sched_smt_active();
+
+   timer_setup(timer, smu_smt_timer_callback, 0);
+
+   mod_timer(timer, jiffies + msecs_to_jiffies(TIME_INTERVAL));
+}
+
+void smu_smt_timer_fini(struct smu_context *smu)
+{
+   del_timer(>smt_timer);
+}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h 
b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
index d7cd358a53bd..928dd9e30d83 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu_cmn.h
@@ -127,5 +127,10 @@ static inline void smu_cmn_get_sysfs_buf(char **buf, int 
*offset)
  
  bool smu_cmn_is_audio_func_enabled(struct amdgpu_device *adev);
  
+void smu_smt_timer_init(struct smu_context *smu);

+
  #endif
+
+void smu_smt_timer_fini(struct smu_context *smu);
+
  #endif

Re: [PATCH v5 2/2] drm/amd/pm/vangogh: Send SMT enable message to PMFW

2023-04-06 Thread Limonciello, Mario


On 4/6/2023 07:45, Wenyou Yang wrote:

When the SMT state is changed on the fly, sent the SMT enable
message to the PMFW to notify it that the SMT state changed.

Add the support to send PPSMC_MSG_SetCClkSMTEnable(0x58) message
to the PMFW for Vangogh.

Signed-off-by: Wenyou Yang 
---
  .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 +-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 +-
  .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 32 ++-
  3 files changed, 35 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
index 7471e2df2828..a6bfa1912c42 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
@@ -111,7 +111,8 @@
  #define PPSMC_MSG_GetGfxOffStatus0x50
  #define PPSMC_MSG_GetGfxOffEntryCount0x51
  #define PPSMC_MSG_LogGfxOffResidency 0x52
-#define PPSMC_Message_Count0x53
+#define PPSMC_MSG_SetCClkSMTEnable0x58
+#define PPSMC_Message_Count0x59
  
  //Argument for PPSMC_MSG_GfxDeviceDriverReset

  enum {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 297b70b9388f..820812d910bf 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -245,7 +245,8 @@
__SMU_DUMMY_MAP(AllowGpo),  \
__SMU_DUMMY_MAP(Mode2Reset),\
__SMU_DUMMY_MAP(RequestI2cTransaction), \
-   __SMU_DUMMY_MAP(GetMetricsTable),
+   __SMU_DUMMY_MAP(GetMetricsTable), \
+   __SMU_DUMMY_MAP(SetCClkSMTEnable),
  
  #undef __SMU_DUMMY_MAP

  #define __SMU_DUMMY_MAP(type) SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 7433dcaa16e0..ca1ff97f3353 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -141,6 +141,7 @@ static struct cmn2asic_msg_mapping 
vangogh_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(GetGfxOffStatus,PPSMC_MSG_GetGfxOffStatus,  
0),
MSG_MAP(GetGfxOffEntryCount,
PPSMC_MSG_GetGfxOffEntryCount,  0),
MSG_MAP(LogGfxOffResidency, 
PPSMC_MSG_LogGfxOffResidency,   0),
+   MSG_MAP(SetCClkSMTEnable,   PPSMC_MSG_SetCClkSMTEnable, 
0),
  };
  
  static struct cmn2asic_mapping vangogh_feature_mask_map[SMU_FEATURE_COUNT] = {

@@ -460,6 +461,7 @@ static int vangogh_allocate_dpm_context(struct smu_context 
*smu)
  
  static int vangogh_init_smc_tables(struct smu_context *smu)

  {
+   uint32_t smu_version;
int ret = 0;
  
  	ret = vangogh_tables_init(smu);

@@ -477,9 +479,24 @@ static int vangogh_init_smc_tables(struct smu_context *smu)
smu->cpu_core_num = 4;
  #endif
  
+	ret = smu_cmn_get_smc_version(smu, NULL, _version);

+   if (ret)
+   return ret;
+
+   if (smu_version >= 0x063F0600)


AFAICT the value has already been looked up and you can instead use:

smu->adev->pm.fw_version >= 0x063F0600


+   smu_smt_timer_init(smu);
+
return smu_v11_0_init_smc_tables(smu);
  }
  
+static int vangogh_fini_smc_tables(struct smu_context *smu)

+{
+   smu_smt_timer_fini(smu);


Shouldn't this timer only be deleted if

smu->adev->pm.fw_version >= 0x063F0600


+   smu_v11_0_fini_smc_tables(smu);
+
+   return 0;
+}
+
  static int vangogh_dpm_set_vcn_enable(struct smu_context *smu, bool enable)
  {
int ret = 0;
@@ -2428,12 +2445,24 @@ static u32 vangogh_get_gfxoff_entrycount(struct 
smu_context *smu, uint64_t *entr
return ret;
  }
  
+static int vangogh_set_cpu_smt_enable(struct smu_context *smu, bool enable)

+{
+   int ret;
+
+   ret = smu_cmn_send_smc_msg_with_param(smu, SMU_MSG_SetCClkSMTEnable,
+ enable ? 1 : 0, NULL);
+   if (ret)
+   dev_err(smu->adev->dev, "Set CPU SMT state failed!\n");


Given this is goign to be triggered by a timer, this might be best to be 
a rate limited message to avoid flooding the logs.



+
+   return ret;
+}
+
  static const struct pptable_funcs vangogh_ppt_funcs = {
  
  	.check_fw_status = smu_v11_0_check_fw_status,

.check_fw_version = smu_v11_0_check_fw_version,
.init_smc_tables = vangogh_init_smc_tables,
-   .fini_smc_tables = smu_v11_0_fini_smc_tables,
+   .fini_smc_tables = vangogh_fini_smc_tables,
.init_power = smu_v11_0_init_power,
.fini_power = smu_v11_0_fini_power,
.register_irq_handler =

Re: [PATCH] drm/amdgpu: skip kfd-iommu suspend/resume for S0ix

2023-04-05 Thread Limonciello, Mario


On 4/5/2023 06:29, Liu, Aaron wrote:

GFX is in gfxoff mode during s0ix so we shouldn't need to
actually execute kfd_iommu_suspend/kfd_iommu_resume operation.

Signed-off-by: Aaron Liu 
---

Probably should add to this patch:

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2449


  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3b6b85d9e0be..5094be94fa06 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3304,9 +3304,11 @@ static int amdgpu_device_ip_resume(struct amdgpu_device 
*adev)
  {
 int r;

-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   return r;
+   if (!adev->in_s0ix) {
+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   return r;
+   }

 r = amdgpu_device_ip_resume_phase1(adev);
 if (r)
--
2.39.0

RE: [PATCH v3] drm/amd/amdgpu: Drop the hang limit parameter

2023-04-05 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: SHANMUGAM, SRINIVASAN
> 
> Sent: Wednesday, April 5, 2023 10:24
> To: Koenig, Christian ; Deucher, Alexander
> ; Limonciello, Mario
> ; Russell, Kent 
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> 
> Subject: [PATCH v3] drm/amd/amdgpu: Drop the hang limit parameter
> 
> The driver doesn't resubmit jobs on hangs any more, hence drop
> the hang limit parameter - amdgpu_job_hang_limit, wherever it is used.
> 
> Suggested-by: Christian König 
> Cc: Alex Deucher 
> Cc: Mario Limonciello 
> Cc: Kent Russell 
> Signed-off-by: Srinivasan Shanmugam 

Reviewed-by: Mario Limonciello 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 8 
>  3 files changed, 1 insertion(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index bbac4239ceb3..35a0474ccdb0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -186,7 +186,6 @@ extern char *amdgpu_disable_cu;
>  extern char *amdgpu_virtual_display;
>  extern uint amdgpu_pp_feature_mask;
>  extern uint amdgpu_force_long_training;
> -extern int amdgpu_job_hang_limit;
>  extern int amdgpu_lbpw;
>  extern int amdgpu_compute_multipipe;
>  extern int amdgpu_gpu_recovery;
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 3b6b85d9e0be..051b9e231cf4 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2364,7 +2364,7 @@ static int amdgpu_device_init_schedulers(struct
> amdgpu_device *adev)
>   }
> 
>   r = drm_sched_init(>sched, _sched_ops,
> -ring->num_hw_submission,
> amdgpu_job_hang_limit,
> +ring->num_hw_submission, 0,
>  timeout, adev->reset_domain->wq,
>  ring->sched_score, ring->name,
>  adev->dev);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index e652ffb2c68e..03e928123d71 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -158,7 +158,6 @@ char *amdgpu_virtual_display;
>   */
>  uint amdgpu_pp_feature_mask = 0xfff7bfff;
>  uint amdgpu_force_long_training;
> -int amdgpu_job_hang_limit;
>  int amdgpu_lbpw = -1;
>  int amdgpu_compute_multipipe = -1;
>  int amdgpu_gpu_recovery = -1; /* auto */
> @@ -521,13 +520,6 @@ MODULE_PARM_DESC(virtual_display,
>"Enable virtual display feature (the virtual_display will be 
> set
> like :xx:xx.x,x;:xx:xx.x,x)");
>  module_param_named(virtual_display, amdgpu_virtual_display, charp,
> 0444);
> 
> -/**
> - * DOC: job_hang_limit (int)
> - * Set how much time allow a job hang and not drop it. The default is 0.
> - */
> -MODULE_PARM_DESC(job_hang_limit, "how much time allow a job hang
> and not drop it (default 0)");
> -module_param_named(job_hang_limit, amdgpu_job_hang_limit, int
> ,0444);
> -
>  /**
>   * DOC: lbpw (int)
>   * Override Load Balancing Per Watt (LBPW) support (1 = enable, 0 = disable).
> The default is -1 (auto, enabled).
> --
> 2.25.1

RE: [PATCH] drm/amdgpu: allow more APUs to do mode2 reset when go to S4

2023-03-30 Thread Limonciello, Mario

[AMD Official Use Only - General]

Also in commit message move the issue into Link tag.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2483

> -Original Message-
> From: Zhang, Yifan 
> Sent: Thursday, March 30, 2023 07:29
> To: Huang, Tim ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Limonciello, Mario
> ; Yuan, Perry ; Du,
> Xiaojian ; Ma, Li 
> Subject: RE: [PATCH] drm/amdgpu: allow more APUs to do mode2 reset
> when go to S4
> 
> [AMD Official Use Only - General]
> 
> Please add a Fixes tag:
> 
> Fixes: 2bedd3f21b30 drm/amdgpu: skip ASIC reset for APUs when go to S4
> 
> in your patch.
> 
> 
> -Original Message-
> From: Huang, Tim 
> Sent: Thursday, March 30, 2023 10:33 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Limonciello, Mario
> ; Zhang, Yifan ;
> Yuan, Perry ; Du, Xiaojian ;
> Ma, Li ; Huang, Tim 
> Subject: [PATCH] drm/amdgpu: allow more APUs to do mode2 reset when
> go to S4
> 
> Skip mode2 reset only for IMU enabled APUs when do S4.
> 
> This patch is to fix the regression issue
> https://gitlab.freedesktop.org/drm/amd/-/issues/2483
> It is generated by patch "2bedd3f21b30 drm/amdgpu: skip ASIC reset for
> APUs when go to S4".
> 
> Signed-off-by: Tim Huang 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index 0f7cd3e8e00b..edaf3ded4a04 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -981,7 +981,12 @@ static bool amdgpu_atcs_pci_probe_handle(struct
> pci_dev *pdev)
>   */
>  bool amdgpu_acpi_should_gpu_reset(struct amdgpu_device *adev)  {
> - if (adev->flags & AMD_IS_APU)
> + if ((adev->flags & AMD_IS_APU) &&
> + adev->gfx.imu.funcs) /* Not need to do mode2 reset for IMU
> enabled APUs */
> + return false;
> +
> + if ((adev->flags & AMD_IS_APU) &&
> + amdgpu_acpi_is_s3_active(adev))
>   return false;
> 
>   if (amdgpu_sriov_vf(adev))
> --
> 2.25.1

RE: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

2023-03-24 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, March 23, 2023 21:29
> To: Limonciello, Mario ; Yang, WenYou
> ; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui 
> Cc: Li, Ying ; Liu, Kun ; Liang,
> Richard qi ; amd-gfx@lists.freedesktop.org
> Subject: Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw
> 
> 
> 
> On 3/23/2023 11:36 PM, Limonciello, Mario wrote:
> > On 3/23/2023 12:41, Limonciello, Mario wrote:
> >> On 3/22/2023 00:48, Wenyou Yang wrote:
> >>> When the CPU SMT status change in the fly, sent the SMT-enable
> >>> message to pmfw to notify it that the SMT status changed.
> >>>
> >>> Signed-off-by: Wenyou Yang 
> >>> ---
> >>>   drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 41
> +++
> >>>   drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
> >>>   2 files changed, 46 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> index b5d64749990e..5cd85a9d149d 100644
> >>> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> >>> @@ -22,6 +22,7 @@
> >>>   #define SWSMU_CODE_LAYER_L1
> >>> +#include 
> >>>   #include 
> >>>   #include 
> >>> @@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle,
> >>> uint32_t speed);
> >>>   static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
> >>>   static int smu_set_mp1_state(void *handle, enum pp_mp1_state
> >>> mp1_state);
> >>> +static int smt_notifier_callback(struct notifier_block *nb, unsigned
> >>> long action, void *data);
> >>> +
> >>> +extern struct raw_notifier_head smt_notifier_head;
> >>> +
> >>> +static struct notifier_block smt_notifier = {
> >>> +    .notifier_call = smt_notifier_callback,
> >>> +};
> >>> +
> >>>   static int smu_sys_get_pp_feature_mask(void *handle,
> >>>  char *buf)
> >>>   {
> >>> @@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device
> *adev)
> >>>   return 0;
> >>>   }
> >>> +static struct smu_context *current_smu;
> >>> +
> >>>   static int smu_early_init(void *handle)
> >>>   {
> >>>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >>> @@ -645,6 +656,7 @@ static int smu_early_init(void *handle)
> >>>   mutex_init(>message_lock);
> >>>   adev->powerplay.pp_handle = smu;
> >>> +    current_smu = smu;
> >
> > Although this series is intended for the Van Gogh case right now, I
> > dont't think this would scale well for multiple GPUs in a system.
> >
> > I think that instead you may want to move the notifier callback to be a
> > level "higher" in amdgpu.  Perhaps amdgpu_device.c?  Then when that
> > notifier call is received you'll want to walk through the PCI device
> > space to find any GPUs that are bound with AMDGPU a series of
> > wrappers/calls that end up calling smu_set_cpu_smt_enable with the
> > approriate arguments.
> >
> 
> This is not required when the notifier is registered only within Vangogh
> ppt function. Then follow Evan's suggestion of keeping the notifier
> block inside smu. From the notifier block, it can find the smu block and
> then call cpu_smt_enable/disable. That way notifier callback comes only
> once even with multiple dGPUs + Vangogh and processed for the
> corresponding smu.
> 
> This notifier doesn't need to be registered for platforms only with
> dGPUs or APUs which don't need this.

They don't right now, but I was thinking how this could scale to other
APUs or dGPUs if they are interested in adding support for this message
too.

> 
> Thanks,
> Lijo
> 
> >
> >>>   adev->powerplay.pp_funcs = _pm_funcs;
> >>>   r = smu_set_funcs(adev);
> >>> @@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)
> >>>   if (!smu->ppt_funcs->get_fan_control_mode)
> >>>   smu->adev->pm.no_fan = true;
> >>> +    raw_notifier_chain_register(_notifier_head, _notifier);
> >>> +
> >>>   return 0;
> >>>   }
> >>> @@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)

Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

2023-03-23 Thread Limonciello, Mario


On 3/23/2023 12:41, Limonciello, Mario wrote:

On 3/22/2023 00:48, Wenyou Yang wrote:

When the CPU SMT status change in the fly, sent the SMT-enable
message to pmfw to notify it that the SMT status changed.

Signed-off-by: Wenyou Yang 
---
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 41 +++
  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
  2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c

index b5d64749990e..5cd85a9d149d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -22,6 +22,7 @@
  #define SWSMU_CODE_LAYER_L1
+#include 
  #include 
  #include 
@@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle, 
uint32_t speed);

  static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
  static int smu_set_mp1_state(void *handle, enum pp_mp1_state 
mp1_state);
+static int smt_notifier_callback(struct notifier_block *nb, unsigned 
long action, void *data);

+
+extern struct raw_notifier_head smt_notifier_head;
+
+static struct notifier_block smt_notifier = {
+    .notifier_call = smt_notifier_callback,
+};
+
  static int smu_sys_get_pp_feature_mask(void *handle,
 char *buf)
  {
@@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device *adev)
  return 0;
  }
+static struct smu_context *current_smu;
+
  static int smu_early_init(void *handle)
  {
  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -645,6 +656,7 @@ static int smu_early_init(void *handle)
  mutex_init(>message_lock);
  adev->powerplay.pp_handle = smu;
+    current_smu = smu;


Although this series is intended for the Van Gogh case right now, I 
dont't think this would scale well for multiple GPUs in a system.


I think that instead you may want to move the notifier callback to be a 
level "higher" in amdgpu.  Perhaps amdgpu_device.c?  Then when that 
notifier call is received you'll want to walk through the PCI device 
space to find any GPUs that are bound with AMDGPU a series of 
wrappers/calls that end up calling smu_set_cpu_smt_enable with the 
approriate arguments.




  adev->powerplay.pp_funcs = _pm_funcs;
  r = smu_set_funcs(adev);
@@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)
  if (!smu->ppt_funcs->get_fan_control_mode)
  smu->adev->pm.no_fan = true;
+    raw_notifier_chain_register(_notifier_head, _notifier);
+
  return 0;
  }
@@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)
  smu_fini_microcode(smu);
+    raw_notifier_chain_unregister(_notifier_head, _notifier);
+
  return 0;
  }
@@ -3241,3 +3257,28 @@ int smu_send_hbm_bad_channel_flag(struct 
smu_context *smu, uint32_t size)

  return ret;
  }
+
+static int smu_set_cpu_smt_enable(struct smu_context *smu, bool enable)
+{
+    int ret = -EINVAL;
+
+    if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)
+    ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);
+
+    return ret;
+}
+
+static int smt_notifier_callback(struct notifier_block *nb,
+ unsigned long action, void *data)
+{
+    struct smu_context *smu = current_smu;
+    int ret = NOTIFY_OK;


This initialization is pointless, it's clobbered in the next line.


+
+    ret = (action == SMT_ENABLED) ?
+    smu_set_cpu_smt_enable(smu, true) :
+    smu_set_cpu_smt_enable(smu, false);


How about this instead, it should be more readable:

 ret = smu_set_cpu_smt_enable(smu, action == SMT_ENABLED);


+    if (ret)
+    ret = NOTIFY_BAD;
+
+    return ret;


How about instead:

 dev_dbg(adev->dev, "failed to %sable SMT: %d\n", action == 
SMT_ENABLED ? "en" : "dis", ret);


 return ret ? NOTIFY_BAD : NOTIFY_OK;


+}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h

index 09469c750a96..7c6594bba796 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1354,6 +1354,11 @@ struct pptable_funcs {
   * @init_pptable_microcode: Prepare the pptable microcode to 
upload via PSP

   */
  int (*init_pptable_microcode)(struct smu_context *smu);
+
+    /**
+ * @set_cpu_smt_enable: Set the CPU SMT status
+ */
+    int (*set_cpu_smt_enable)(struct smu_context *smu, bool enable);
  };
  typedef enum {

Re: [v1,2/3] drm/amd/pm: send the SMT-enable message to pmfw

2023-03-23 Thread Limonciello, Mario


On 3/22/2023 00:48, Wenyou Yang wrote:

When the CPU SMT status change in the fly, sent the SMT-enable
message to pmfw to notify it that the SMT status changed.

Signed-off-by: Wenyou Yang 
---
  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 41 +++
  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  5 +++
  2 files changed, 46 insertions(+)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index b5d64749990e..5cd85a9d149d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -22,6 +22,7 @@
  
  #define SWSMU_CODE_LAYER_L1
  
+#include 

  #include 
  #include 
  
@@ -69,6 +70,14 @@ static int smu_set_fan_speed_rpm(void *handle, uint32_t speed);

  static int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
  static int smu_set_mp1_state(void *handle, enum pp_mp1_state mp1_state);
  
+static int smt_notifier_callback(struct notifier_block *nb, unsigned long action, void *data);

+
+extern struct raw_notifier_head smt_notifier_head;
+
+static struct notifier_block smt_notifier = {
+   .notifier_call = smt_notifier_callback,
+};
+
  static int smu_sys_get_pp_feature_mask(void *handle,
   char *buf)
  {
@@ -625,6 +634,8 @@ static int smu_set_funcs(struct amdgpu_device *adev)
return 0;
  }
  
+static struct smu_context *current_smu;

+
  static int smu_early_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
@@ -645,6 +656,7 @@ static int smu_early_init(void *handle)
mutex_init(>message_lock);
  
  	adev->powerplay.pp_handle = smu;

+   current_smu = smu;
adev->powerplay.pp_funcs = _pm_funcs;
  
  	r = smu_set_funcs(adev);

@@ -1105,6 +1117,8 @@ static int smu_sw_init(void *handle)
if (!smu->ppt_funcs->get_fan_control_mode)
smu->adev->pm.no_fan = true;
  
+	raw_notifier_chain_register(_notifier_head, _notifier);

+
return 0;
  }
  
@@ -1122,6 +1136,8 @@ static int smu_sw_fini(void *handle)
  
  	smu_fini_microcode(smu);
  
+	raw_notifier_chain_unregister(_notifier_head, _notifier);

+
return 0;
  }
  
@@ -3241,3 +3257,28 @@ int smu_send_hbm_bad_channel_flag(struct smu_context *smu, uint32_t size)
  
  	return ret;

  }
+
+static int smu_set_cpu_smt_enable(struct smu_context *smu, bool enable)
+{
+   int ret = -EINVAL;
+
+   if (smu->ppt_funcs && smu->ppt_funcs->set_cpu_smt_enable)
+   ret = smu->ppt_funcs->set_cpu_smt_enable(smu, enable);
+
+   return ret;
+}
+
+static int smt_notifier_callback(struct notifier_block *nb,
+unsigned long action, void *data)
+{
+   struct smu_context *smu = current_smu;
+   int ret = NOTIFY_OK;


This initialization is pointless, it's clobbered in the next line.


+
+   ret = (action == SMT_ENABLED) ?
+   smu_set_cpu_smt_enable(smu, true) :
+   smu_set_cpu_smt_enable(smu, false);


How about this instead, it should be more readable:

ret = smu_set_cpu_smt_enable(smu, action == SMT_ENABLED);


+   if (ret)
+   ret = NOTIFY_BAD;
+
+   return ret;


How about instead:

	dev_dbg(adev->dev, "failed to %sable SMT: %d\n", action == SMT_ENABLED 
? "en" : "dis", ret);


return ret ? NOTIFY_BAD : NOTIFY_OK;


+}
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 09469c750a96..7c6594bba796 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1354,6 +1354,11 @@ struct pptable_funcs {
 * @init_pptable_microcode: Prepare the pptable microcode to upload via 
PSP
 */
int (*init_pptable_microcode)(struct smu_context *smu);
+
+   /**
+* @set_cpu_smt_enable: Set the CPU SMT status
+*/
+   int (*set_cpu_smt_enable)(struct smu_context *smu, bool enable);
  };
  
  typedef enum {

Re: [v1,3/3] drm/amd/pm: vangogh: support to send SMT enable message

2023-03-23 Thread Limonciello, Mario


On 3/22/2023 00:48, Wenyou Yang wrote:

Add the support to PPSMC_MSG_SetCClkSMTEnable(0x58) message to pmfw
for vangogh.

Signed-off-by: Wenyou Yang 
---
  .../pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h|  3 ++-
  drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h  |  3 ++-
  .../gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c  | 19 +++
  3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
index 7471e2df2828..2b182dbc6f9c 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v11_5_ppsmc.h
@@ -111,7 +111,8 @@
  #define PPSMC_MSG_GetGfxOffStatus0x50
  #define PPSMC_MSG_GetGfxOffEntryCount0x51
  #define PPSMC_MSG_LogGfxOffResidency 0x52
-#define PPSMC_Message_Count0x53
+#define PPSMC_MSG_SetCClkSMTEnable0x58
+#define PPSMC_Message_Count0x54


This doesn't make sense that the PPSMC_Message_Count would be smaller 
than the biggest message.  This should be:


#define PPSMC_Message_Count 0x59

  
  //Argument for PPSMC_MSG_GfxDeviceDriverReset

  enum {
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
index 297b70b9388f..820812d910bf 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h
@@ -245,7 +245,8 @@
__SMU_DUMMY_MAP(AllowGpo),  \
__SMU_DUMMY_MAP(Mode2Reset),\
__SMU_DUMMY_MAP(RequestI2cTransaction), \
-   __SMU_DUMMY_MAP(GetMetricsTable),
+   __SMU_DUMMY_MAP(GetMetricsTable), \
+   __SMU_DUMMY_MAP(SetCClkSMTEnable),
  
  #undef __SMU_DUMMY_MAP

  #define __SMU_DUMMY_MAP(type) SMU_MSG_##type
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
index 7433dcaa16e0..f0eeb42df96b 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c
@@ -141,6 +141,7 @@ static struct cmn2asic_msg_mapping 
vangogh_message_map[SMU_MSG_MAX_COUNT] = {
MSG_MAP(GetGfxOffStatus,PPSMC_MSG_GetGfxOffStatus,  
0),
MSG_MAP(GetGfxOffEntryCount,
PPSMC_MSG_GetGfxOffEntryCount,  0),
MSG_MAP(LogGfxOffResidency, 
PPSMC_MSG_LogGfxOffResidency,   0),
+   MSG_MAP(SetCClkSMTEnable,   PPSMC_MSG_SetCClkSMTEnable, 
0),
  };
  
  static struct cmn2asic_mapping vangogh_feature_mask_map[SMU_FEATURE_COUNT] = {

@@ -2428,6 +2429,23 @@ static u32 vangogh_get_gfxoff_entrycount(struct 
smu_context *smu, uint64_t *entr
return ret;
  }
  
+static int vangogh_set_cpu_smt_enable(struct smu_context *smu, bool enable)

+{
+   int ret = 0;
+
+   if (enable) {
+   ret = smu_cmn_send_smc_msg_with_param(smu,
+ SMU_MSG_SetCClkSMTEnable,
+ 1, NULL);
+   } else {
+   ret = smu_cmn_send_smc_msg_with_param(smu,
+ SMU_MSG_SetCClkSMTEnable,
+ 0, NULL);
+   }
+
+   return ret;
+}
+
  static const struct pptable_funcs vangogh_ppt_funcs = {
  
  	.check_fw_status = smu_v11_0_check_fw_status,

@@ -2474,6 +2492,7 @@ static const struct pptable_funcs vangogh_ppt_funcs = {
.get_power_limit = vangogh_get_power_limit,
.set_power_limit = vangogh_set_power_limit,
.get_vbios_bootup_values = smu_v11_0_get_vbios_bootup_values,
+   .set_cpu_smt_enable = vangogh_set_cpu_smt_enable,
  };
  
  void vangogh_set_ppt_funcs(struct smu_context *smu)

RE: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling

2023-03-23 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: SHANMUGAM, SRINIVASAN
> 
> Sent: Thursday, March 23, 2023 07:37
> To: Limonciello, Mario ; Koenig, Christian
> ; Deucher, Alexander
> ; Li, Candice ;
> Zhang, Hawking 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling
> 
> [Public]
> 
> Hi Mario,
> 
> Thanks for your comments, it was on " origin/amd-staging-drm-next"
> 

Oh, it's a newer change just landed that I needed to update my local tree, 
thanks.

Fixes: 5778b47626b51 ("drm/amdgpu: Add fatal error handling in nbio v4_3")
Reviewed-by: Mario Limonciello 

> 
> Best Regards,
> Srini
> -Original Message-
> From: Limonciello, Mario 
> Sent: Thursday, March 23, 2023 6:03 PM
> To: SHANMUGAM, SRINIVASAN ;
> Koenig, Christian ; Deucher, Alexander
> ; Li, Candice ;
> Zhang, Hawking 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling
> 
> [Public]
> 
> 
> 
> > -Original Message-
> > From: SHANMUGAM, SRINIVASAN
> > 
> > Sent: Thursday, March 23, 2023 07:32
> > To: Koenig, Christian ; Deucher, Alexander
> > ; Limonciello, Mario
> > ; Li, Candice ;
> Zhang,
> > Hawking 
> > Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> > 
> > Subject: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling
> >
> > CC  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.o
> > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2567:28: error: bitwise or
> > with non-zero value always evaluates to true
> > [-Werror,-Wtautological-bitwise- compare]
> >   if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
> >   ~^~
> >
> > Presumably the author intended to test if AMDGPU_RAS_BLOCK__DF bit
> was
> > set if ras is enabled, so that's what I'm changing the code to.
> > Hopefully to do the right thing.
> >
> > Cc: Christian König 
> > Cc: Alex Deucher 
> > Cc: Mario Limonciello 
> > Cc: Hawking Zhang 
> > Cc: Candice Li 
> > Signed-off-by: Srinivasan Shanmugam 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > index 5b17790218811..fac45f98145d8 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> > @@ -2564,7 +2564,7 @@ int amdgpu_ras_init(struct amdgpu_device
> *adev)
> > adev->nbio.ras = _v7_4_ras;
> > break;
> > case IP_VERSION(4, 3, 0):
> > -   if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
> > +   if (adev->ras_hw_enabled & AMDGPU_RAS_BLOCK__DF)
> > /* unlike other generation of nbio ras,
> >  * nbio v4_3 only support fatal error interrupt
> >  * to inform software that DF is freezed due to
> > --
> > 2.25.1
> 
> This change generally makes sense for what you showed above, but what
> tree is this against?  That doesn't look like amd-staging-drm-next, Linus' 
> tree
> or even some recent tags.

RE: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling

2023-03-23 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: SHANMUGAM, SRINIVASAN
> 
> Sent: Thursday, March 23, 2023 07:32
> To: Koenig, Christian ; Deucher, Alexander
> ; Limonciello, Mario
> ; Li, Candice ; Zhang,
> Hawking 
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> 
> Subject: [PATCH] drm/amd/amdgpu: Fix logic bug in fatal error handling
> 
> CC  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.o
> drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2567:28: error: bitwise or with
> non-zero value always evaluates to true [-Werror,-Wtautological-bitwise-
> compare]
>   if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
>   ~^~
> 
> Presumably the author intended to test if AMDGPU_RAS_BLOCK__DF
> bit was set if ras is enabled, so that's what I'm changing the
> code to. Hopefully to do the right thing.
> 
> Cc: Christian König 
> Cc: Alex Deucher 
> Cc: Mario Limonciello 
> Cc: Hawking Zhang 
> Cc: Candice Li 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index 5b17790218811..fac45f98145d8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2564,7 +2564,7 @@ int amdgpu_ras_init(struct amdgpu_device *adev)
>   adev->nbio.ras = _v7_4_ras;
>   break;
>   case IP_VERSION(4, 3, 0):
> - if (adev->ras_hw_enabled | AMDGPU_RAS_BLOCK__DF)
> + if (adev->ras_hw_enabled & AMDGPU_RAS_BLOCK__DF)
>   /* unlike other generation of nbio ras,
>* nbio v4_3 only support fatal error interrupt
>* to inform software that DF is freezed due to
> --
> 2.25.1

This change generally makes sense for what you showed above, but what tree
is this against?  That doesn't look like amd-staging-drm-next, Linus' tree or 
even
some recent tags.

RE: [PATCH v2] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD Navi

2023-03-20 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Kai-Heng Feng 
> Sent: Wednesday, March 15, 2023 07:07
> To: Deucher, Alexander ; Koenig, Christian
> ; Pan, Xinhui 
> Cc: Kai-Heng Feng ; David Airlie
> ; Daniel Vetter ; Zhang, Hawking
> ; Gao, Likun ; Kuehling,
> Felix ; Zhao, Victor ;
> Xiao, Jack ; Quan, Evan ;
> Limonciello, Mario ; Lazar, Lijo
> ; Chai, Thomas ; Andrey
> Grodzovsky ; Somalapuram, Amaranath
> ; Zhang, Bokun
> ; Liu, Leo ; Gopalakrishnan,
> Veerabadhran (Veera) ; Gong,
> Richard ; Feng, Kenneth
> ; Jiansong Chen ;
> amd-gfx@lists.freedesktop.org; dri-de...@lists.freedesktop.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH v2] drm/amdgpu/nv: Apply ASPM quirk on Intel ADL + AMD
> Navi
> 
> S2idle resume freeze can be observed on Intel ADL + AMD WX5500. This is
> caused by commit 0064b0ce85bb ("drm/amd/pm: enable ASPM by default").
> 
> The root cause is still not clear for now.
> 
> So extend and apply the ASPM quirk from commit e02fe3bc7aba
> ("drm/amdgpu: vi: disable ASPM on Intel Alder Lake based systems"), to
> workaround the issue on Navi cards too.
> 
> Fixes: 0064b0ce85bb ("drm/amd/pm: enable ASPM by default")
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2458
> Reviewed-by: Alex Deucher 
> Signed-off-by: Kai-Heng Feng 

Reviewed-by: Mario Limonciello 

I've applied to this to amd-staging-drm-next, thanks!

> ---
> v2:
>  - Rename the quirk function.
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
>  drivers/gpu/drm/amd/amdgpu/nv.c|  2 +-
>  drivers/gpu/drm/amd/amdgpu/vi.c| 17 +
>  4 files changed, 18 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 164141bc8b4a..5f3b139c1f99 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1272,6 +1272,7 @@ void amdgpu_device_pci_config_reset(struct
> amdgpu_device *adev);
>  int amdgpu_device_pci_reset(struct amdgpu_device *adev);
>  bool amdgpu_device_need_post(struct amdgpu_device *adev);
>  bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
> +bool amdgpu_device_aspm_support_quirk(void);
> 
>  void amdgpu_cs_report_moved_bytes(struct amdgpu_device *adev, u64
> num_bytes,
> u64 num_vis_bytes);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index c4a4e2fe6681..05a34ff79e78 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -80,6 +80,10 @@
> 
>  #include 
> 
> +#if IS_ENABLED(CONFIG_X86)
> +#include 
> +#endif
> +
>  MODULE_FIRMWARE("amdgpu/vega10_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
>  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
> @@ -1356,6 +1360,17 @@ bool amdgpu_device_should_use_aspm(struct
> amdgpu_device *adev)
>   return pcie_aspm_enabled(adev->pdev);
>  }
> 
> +bool amdgpu_device_aspm_support_quirk(void)
> +{
> +#if IS_ENABLED(CONFIG_X86)
> + struct cpuinfo_x86 *c = _data(0);
> +
> + return !(c->x86 == 6 && c->x86_model == INTEL_FAM6_ALDERLAKE);
> +#else
> + return true;
> +#endif
> +}
> +
>  /* if we get transitioned to only one device, take VGA back */
>  /**
>   * amdgpu_device_vga_set_decode - enable/disable vga decode
> diff --git a/drivers/gpu/drm/amd/amdgpu/nv.c
> b/drivers/gpu/drm/amd/amdgpu/nv.c
> index 855d390c41de..26733263913e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/nv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/nv.c
> @@ -578,7 +578,7 @@ static void nv_pcie_gen3_enable(struct
> amdgpu_device *adev)
> 
>  static void nv_program_aspm(struct amdgpu_device *adev)
>  {
> - if (!amdgpu_device_should_use_aspm(adev))
> + if (!amdgpu_device_should_use_aspm(adev) ||
> !amdgpu_device_aspm_support_quirk())
>   return;
> 
>   if (!(adev->flags & AMD_IS_APU) &&
> diff --git a/drivers/gpu/drm/amd/amdgpu/vi.c
> b/drivers/gpu/drm/amd/amdgpu/vi.c
> index 12ef782eb478..ceab8783575c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vi.c
> @@ -81,10 +81,6 @@
>  #include "mxgpu_vi.h"
>  #include "amdgpu_dm.h"
> 
> -#if IS_ENABLED(CONFIG_X86)
> -#include 
> -#endif
> -
>  #define ixPCIE_LC_L1_PM_SUBSTATE 0x100100C6
>  #define
> PCIE_LC_L1_PM_SUBSTATE__LC_L1_SUBSTATES_OVERRIDE_EN_MASK
>   0x0001L
>  #define PCIE_LC_L1_

Re: [v2] drm/amd/pm: Fix sienna cichlid incorrect OD volage after resume

2023-03-06 Thread Limonciello, Mario


On 3/4/2023 17:44, Błażej Szczygieł wrote:

Always setup overdrive tables after resume. Preserve only some
user-defined settings in user_overdrive_table if they're set.

Copy restored user_overdrive_table into od_table to get correct
values.

Signed-off-by: Błażej Szczygieł 
---
  .../amd/pm/swsmu/smu11/sienna_cichlid_ppt.c   | 43 ++-
  1 file changed, 33 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
index 697e98a0a20a..75f18681e984 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/sienna_cichlid_ppt.c
@@ -2143,16 +2143,9 @@ static int sienna_cichlid_set_default_od_settings(struct 
smu_context *smu)
(OverDriveTable_t *)smu->smu_table.boot_overdrive_table;
OverDriveTable_t *user_od_table =
(OverDriveTable_t *)smu->smu_table.user_overdrive_table;
+   OverDriveTable_t user_od_table_bak;
int ret = 0;
  
-	/*

-* For S3/S4/Runpm resume, no need to setup those overdrive tables 
again as
-*   - either they already have the default OD settings got during cold 
bootup
-*   - or they have some user customized OD settings which cannot be 
overwritten
-*/
-   if (smu->adev->in_suspend)
-   return 0;
-
ret = smu_cmn_update_table(smu, SMU_TABLE_OVERDRIVE,
   0, (void *)boot_od_table, false);
if (ret) {
@@ -2163,7 +2156,23 @@ static int sienna_cichlid_set_default_od_settings(struct 
smu_context *smu)
sienna_cichlid_dump_od_table(smu, boot_od_table);
  
  	memcpy(od_table, boot_od_table, sizeof(OverDriveTable_t));

-   memcpy(user_od_table, boot_od_table, sizeof(OverDriveTable_t));
+
+   /*
+* For S3/S4/Runpm resume, we need to setup those overdrive tables 
again,
+* but we have to preserve user defined values in "user_od_table".
+*/
+   if (!smu->adev->in_suspend) {
+   memcpy(user_od_table, boot_od_table, sizeof(OverDriveTable_t));
+   smu->user_dpm_profile.user_od = false;
+   } else if (smu->user_dpm_profile.user_od) {
+   memcpy(_od_table_bak, user_od_table, 
sizeof(OverDriveTable_t));
+   memcpy(user_od_table, boot_od_table, sizeof(OverDriveTable_t));
+   user_od_table->GfxclkFmin = user_od_table_bak.GfxclkFmin;
+   user_od_table->GfxclkFmax = user_od_table_bak.GfxclkFmax;
+   user_od_table->UclkFmin = user_od_table_bak.UclkFmin;
+   user_od_table->UclkFmax = user_od_table_bak.UclkFmax;
+   user_od_table->VddGfxOffset = user_od_table_bak.VddGfxOffset;
+   }
  
  	return 0;

  }
@@ -2373,6 +2382,20 @@ static int sienna_cichlid_od_edit_dpm_table(struct 
smu_context *smu,
return ret;
  }
  
+static int sienna_cichlid_restore_user_od_settings(struct smu_context *smu)

+{
+   struct smu_table_context *table_context = >smu_table;
+   OverDriveTable_t *od_table = table_context->overdrive_table;
+   OverDriveTable_t *user_od_table = table_context->user_overdrive_table;
+   int res;
+
+   res = smu_v11_0_restore_user_od_settings(smu);
+   if (res == 0)
+   memcpy(od_table, user_od_table, sizeof(OverDriveTable_t));
+
+   return res;
+}
+
  static int sienna_cichlid_run_btc(struct smu_context *smu)
  {
int res;
@@ -4400,7 +4423,7 @@ static const struct pptable_funcs 
sienna_cichlid_ppt_funcs = {
.set_soft_freq_limited_range = smu_v11_0_set_soft_freq_limited_range,
.set_default_od_settings = sienna_cichlid_set_default_od_settings,
.od_edit_dpm_table = sienna_cichlid_od_edit_dpm_table,
-   .restore_user_od_settings = smu_v11_0_restore_user_od_settings,
+   .restore_user_od_settings = sienna_cichlid_restore_user_od_settings,


Rather than introduce a new static function perhaps it would be better 
to just change 'smu_v11_0_restore_user_od_settings'.


That could help this issue if it also occurs on Navi10 as well.


.run_btc = sienna_cichlid_run_btc,
.set_power_source = smu_v11_0_set_power_source,
.get_pp_feature_mask = smu_cmn_get_pp_feature_mask,

RE: [PATCH] drm/amd: Fix initialization mistake for NBIO 7.3.0

2023-03-02 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Thomas Glanzmann 
> Sent: Thursday, March 2, 2023 14:17
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Natikar, Basavaraj
> 
> Subject: Re: [PATCH] drm/amd: Fix initialization mistake for NBIO 7.3.0
> 
> Hello Mario,
> 
> * Mario Limonciello  [2023-03-02 18:27]:
> > The same strapping initialization issue that happened on NBIO 7.5.1
> > appears to be happening on NBIO 7.3.0.
> > Apply the same fix to 7.3.0 as well.
> 
> > Note: This workaround relies upon the integrated GPU being enabled
> > in BIOS. If the integrated GPU is disabled in BIOS a different
> > workaround will be required.
> 
> > Reported-by: Thomas Glanzmann 
> > Cc: Basavaraj Natikar 
> > Link: https://lore.kernel.org/linux-
> usb/y%2fz9gdhjpyf2r...@glanzmann.de/T/#u
> > Signed-off-by: Mario Limonciello 
> 
> Tested-by: Thomas Glanzmann 
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c | 14 +-
> >  1 file changed, 9 insertions(+), 5 deletions(-)
> 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
> b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
> > index 4b0d563c6522c..4ef1fa4603c8e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/nbio_v7_2.c
> > @@ -382,11 +382,6 @@ static void nbio_v7_2_init_registers(struct
> amdgpu_device *adev)
> > if (def != data)
> > WREG32_PCIE_PORT(SOC15_REG_OFFSET(NBIO, 0,
> regBIF1_PCIE_MST_CTRL_3), data);
> > break;
> > -   case IP_VERSION(7, 5, 1):
> > -   data = RREG32_SOC15(NBIO, 0,
> regRCC_DEV2_EPF0_STRAP2);
> > -   data &=
> ~RCC_DEV2_EPF0_STRAP2__STRAP_NO_SOFT_RESET_DEV2_F0_MASK;
> > -   WREG32_SOC15(NBIO, 0, regRCC_DEV2_EPF0_STRAP2,
> data);
> > -   fallthrough;
> > default:
> > def = data = RREG32_PCIE_PORT(SOC15_REG_OFFSET(NBIO,
> 0, regPCIE_CONFIG_CNTL));
> > data = REG_SET_FIELD(data, PCIE_CONFIG_CNTL,
> > @@ -399,6 +394,15 @@ static void nbio_v7_2_init_registers(struct
> amdgpu_device *adev)
> > break;
> > }
> 
> My tree did not have the above hunk, so I only applied the second hunk.

Yeah this hunk it changes is on it's way to 6.3 right now.  I think with the 
good
test results we probably want to take this back to stable as well when they
both land.

> 
> I replugged by mouse keyboard several times and I have no longer any
> issues.
> 
> Find output of dmesg; lsusb; lspci; dmidecode; lscpu here:
> 
> https://tg.st/u/498cb495b307353870e4dbba901a9c7aa58b89d918f54fc73f014f
> 9a4778cc2a.txt
> 
> > +   switch (adev->ip_versions[NBIO_HWIP][0]) {
> > +   case IP_VERSION(7, 3, 0):
> > +   case IP_VERSION(7, 5, 1):
> > +   data = RREG32_SOC15(NBIO, 0,
> regRCC_DEV2_EPF0_STRAP2);
> > +   data &=
> ~RCC_DEV2_EPF0_STRAP2__STRAP_NO_SOFT_RESET_DEV2_F0_MASK;
> > +   WREG32_SOC15(NBIO, 0, regRCC_DEV2_EPF0_STRAP2,
> data);
> > +   break;
> > +   }
> > +
> > if (amdgpu_sriov_vf(adev))
> > adev->rmmio_remap.reg_offset =
> SOC15_REG_OFFSET(NBIO, 0,
> >
>   regBIF_BX_PF0_HDP_MEM_COHERENCY_FLUSH_CNTL) << 2;
> 
> Thank you for the workaround.

Sure, thanks for reporting it.

RE: [PATCH 2/3] drm/amd: Use runtime suspend in lieu regular suspend on supported dGPUs

2023-02-21 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Tuesday, February 21, 2023 07:20
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Peter Kopec 
> Subject: Re: [PATCH 2/3] drm/amd: Use runtime suspend in lieu regular suspend
> on supported dGPUs
> 
> 
> 
> On 2/21/2023 1:46 AM, Mario Limonciello wrote:
> > The PMFW on dGPUs that support BACO will transition them in and out
> > of BACO when video/audio move in out of D3/D0.
> >
> > On the Linux side users can configure what sleep mode to use in
> > `/sys/power/mem_sleep`, but if the host hardware doesn't cut the
> > power rails during this state then calling suspend from Linux may
> > cause a mismatch of behavior.
> >
> > To avoid this, only run the runtime suspend and resume callbacks
> > when the dGPU supports BACO or BOCO and the smart flags didn't return
> > to skip these stages (because already runtime suspended).
> >
> > Cc: Peter Kopec 
> > Signed-off-by: Mario Limonciello 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++--
> >   1 file changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > index c3d3a042946d..fdc1cbf8ad10 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > @@ -2418,8 +2418,11 @@ static int amdgpu_pmops_suspend(struct device
> *dev)
> > adev->in_s0ix = true;
> > else if (amdgpu_acpi_is_s3_active(adev))
> > adev->in_s3 = true;
> > -   if (!adev->in_s0ix && !adev->in_s3)
> > +   if (!adev->in_s0ix && !adev->in_s3) {
> > +   pm_runtime_mark_last_busy(dev);
> > +   pm_runtime_autosuspend(dev);
> 
> This is asking the device to be suspended (from a suspend call and that
> sounds weird).  

I had convinced myself that it was necessary from reading documentation,
but re-reading I believe it should not be necessary if smart suspend is used.

If I drop this patch  then the PMFW should still transition it when the video
turns off.

> Runtime pm handler will assume D3cold scenario and
> explicitly request BACO entry. Wondering what would happen if the
> platform doesn't put it in D3cold under s2Idle for dGPUs (BACO/BOCO).
> 

Higher power consumption I expect.

> Thanks,
> Lijo
> 
> > return 0;
> > +   }
> > return amdgpu_device_suspend(drm_dev, true);
> >   }
> >
> > @@ -2440,8 +2443,10 @@ static int amdgpu_pmops_resume(struct device
> *dev)
> > struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > int r;
> >
> > -   if (!adev->in_s0ix && !adev->in_s3)
> > +   if (!adev->in_s0ix && !adev->in_s3) {
> > +   pm_runtime_resume(dev);
> > return 0;
> > +   }
> >
> > /* Avoids registers access if device is physically gone */
> > if (!pci_device_is_present(adev->pdev))

Re: [PATCH 3/3] drm/amd: Don't always set s3 for dGPUs in all sleep modes

2023-02-21 Thread Limonciello, Mario


On 2/21/2023 07:34, Lazar, Lijo wrote:



On 2/21/2023 6:57 PM, Mario Limonciello wrote:

On 2/21/23 07:25, Lazar, Lijo wrote:



On 2/21/2023 1:46 AM, Mario Limonciello wrote:

dGPUs that will be using BACO or BOCO shouldn't be put into S3
when the system is being put into s2idle.

Cc: Peter Kopec 
Signed-off-by: Mario Limonciello 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c

index 25e902077caf..5c69116bc883 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
@@ -1038,8 +1038,13 @@ void amdgpu_acpi_detect(void)
   */
  bool amdgpu_acpi_is_s3_active(struct amdgpu_device *adev)
  {
-    return !(adev->flags & AMD_IS_APU) ||
-    (pm_suspend_target_state == PM_SUSPEND_MEM);
+    if (pm_suspend_target_state == PM_SUSPEND_MEM)
+    return true;
+    if (adev->flags & AMD_IS_APU)
+    return false;


What is the expected path of APUs which don't support S2idle?


They should staying powered on and not running any suspend code.
Since they don't support BACO or BOCO I expect the call to enter 
autosuspend to be a no-op for them.


This was shown to improve power consumption for such cases (a reporter 
actually measured it).
To clarify on this - someone tried s2idle on an APU which doesn't 
support it (no FW S0ix support/PMC driver support) and the power 
consumption is better for the APU. Is it because the peripherals went 
idle now, but in earlier path APU prevented S2idle entry altogether?




I double checked and realize I misspoke - it's not that they don't run 
any suspend code, but they handle the s0ix flow even even without 
underlying hardware support.


https://gitlab.freedesktop.org/agd5f/linux/-/commit/9cdb69924f545fdc3086bc8b085dad8146057141

So the path for them doesn't change in this series.


Thanks,
Lijo



Thanks,
Lijo


+    return !amdgpu_device_supports_baco(>ddev) &&
+    !amdgpu_device_supports_boco(>ddev);
+
  }
  /**

RE: [PATCH v2] drm/amd: Don't allow s0ix on APUs older than Raven

2023-02-20 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Alex Deucher 
> Sent: Monday, February 20, 2023 11:10
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> 
> Subject: Re: [PATCH v2] drm/amd: Don't allow s0ix on APUs older than Raven
> 
> On Mon, Feb 20, 2023 at 11:56 AM Limonciello, Mario
>  wrote:
> >
> > [Public]
> >
> >
> >
> > > -Original Message-
> > > From: Limonciello, Mario 
> > > Sent: Monday, February 13, 2023 15:11
> > > To: amd-gfx@lists.freedesktop.org
> > > Cc: Limonciello, Mario ; Deucher,
> Alexander
> > > 
> > > Subject: [PATCH v2] drm/amd: Don't allow s0ix on APUs older than Raven
> > >
> > > APUs before Raven didn't support s0ix.  As we just relieved some
> > > of the safety checks for s0ix to improve power consumption on
> > > APUs that support it but that are missing BIOS support a new
> > > blind spot was introduced that a user could "try" to run s0ix.
> > >
> > > Plug this hole so that if users try to run s0ix on anything older
> > > than Raven it will just skip suspend of the GPU.
> > >
> > > Fixes: cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> > > Suggested-by: Alexander Deucher 
> > > Signed-off-by: Mario Limonciello 
> > > ---
> > > v1->v2:
> > >  * Don't run any suspend code or resume code in this case
> >
> > Any feedback for this patch?
> 
> Reviewed-by: Alex Deucher 
> 

Thanks.

> I think for S0ix and dGPUs, we probably need some additional work as
> well.  If the user tries s2idle and the platform doesn't actually
> support s0ix (i.e., doesn't actually turn off the power rails), we
> should be using the runtime suspend routines for BACO/BOCO rather than
> the S3 suspend routines.

OK - I'll review the framework code for that case and see what makes sense.

> 
> Alex
> 
> 
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++
> > >  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 7 ++-
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > index fa7375b97fd47..25e902077caf6 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > > @@ -1073,6 +1073,9 @@ bool amdgpu_acpi_is_s0ix_active(struct
> > > amdgpu_device *adev)
> > >   (pm_suspend_target_state != PM_SUSPEND_TO_IDLE))
> > >   return false;
> > >
> > > + if (adev->asic_type < CHIP_RAVEN)
> > > + return false;
> > > +
> > >   /*
> > >* If ACPI_FADT_LOW_POWER_S0 is not set in the FADT, it is
> > > generally
> > >* risky to do any special firmware-related preparations for 
> > > entering
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > index 6c2fe50b528e0..1f6d93dc3d72b 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> > > @@ -2414,8 +2414,10 @@ static int amdgpu_pmops_suspend(struct
> device
> > > *dev)
> > >
> > >   if (amdgpu_acpi_is_s0ix_active(adev))
> > >   adev->in_s0ix = true;
> > > - else
> > > + else if (amdgpu_acpi_is_s3_active(adev))
> > >   adev->in_s3 = true;
> > > + if (!adev->in_s0ix && !adev->in_s3)
> > > + return 0;
> > >   return amdgpu_device_suspend(drm_dev, true);
> > >  }
> > >
> > > @@ -2436,6 +2438,9 @@ static int amdgpu_pmops_resume(struct device
> > > *dev)
> > >   struct amdgpu_device *adev = drm_to_adev(drm_dev);
> > >   int r;
> > >
> > > + if (!adev->in_s0ix && !adev->in_s3)
> > > + return 0;
> > > +
> > >   /* Avoids registers access if device is physically gone */
> > >   if (!pci_device_is_present(adev->pdev))
> > >   adev->no_hw_access = true;
> > > --
> > > 2.25.1

RE: [PATCH v2] drm/amd: Don't allow s0ix on APUs older than Raven

2023-02-20 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Limonciello, Mario 
> Sent: Monday, February 13, 2023 15:11
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ; Deucher, Alexander
> 
> Subject: [PATCH v2] drm/amd: Don't allow s0ix on APUs older than Raven
> 
> APUs before Raven didn't support s0ix.  As we just relieved some
> of the safety checks for s0ix to improve power consumption on
> APUs that support it but that are missing BIOS support a new
> blind spot was introduced that a user could "try" to run s0ix.
> 
> Plug this hole so that if users try to run s0ix on anything older
> than Raven it will just skip suspend of the GPU.
> 
> Fixes: cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> Suggested-by: Alexander Deucher 
> Signed-off-by: Mario Limonciello 
> ---
> v1->v2:
>  * Don't run any suspend code or resume code in this case

Any feedback for this patch?

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 7 ++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index fa7375b97fd47..25e902077caf6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1073,6 +1073,9 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
>   (pm_suspend_target_state != PM_SUSPEND_TO_IDLE))
>   return false;
> 
> + if (adev->asic_type < CHIP_RAVEN)
> + return false;
> +
>   /*
>* If ACPI_FADT_LOW_POWER_S0 is not set in the FADT, it is
> generally
>* risky to do any special firmware-related preparations for entering
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6c2fe50b528e0..1f6d93dc3d72b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2414,8 +2414,10 @@ static int amdgpu_pmops_suspend(struct device
> *dev)
> 
>   if (amdgpu_acpi_is_s0ix_active(adev))
>   adev->in_s0ix = true;
> - else
> + else if (amdgpu_acpi_is_s3_active(adev))
>   adev->in_s3 = true;
> + if (!adev->in_s0ix && !adev->in_s3)
> + return 0;
>   return amdgpu_device_suspend(drm_dev, true);
>  }
> 
> @@ -2436,6 +2438,9 @@ static int amdgpu_pmops_resume(struct device
> *dev)
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>   int r;
> 
> + if (!adev->in_s0ix && !adev->in_s3)
> + return 0;
> +
>   /* Avoids registers access if device is physically gone */
>   if (!pci_device_is_present(adev->pdev))
>   adev->no_hw_access = true;
> --
> 2.25.1

RE: [PATCH] drm/amd: Don't allow s0ix on APUs older than Raven

2023-02-10 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Limonciello, Mario 
> Sent: Friday, February 10, 2023 14:47
> To: amd-gfx@lists.freedesktop.org
> Cc: Limonciello, Mario ; Deucher, Alexander
> 
> Subject: [PATCH] drm/amd: Don't allow s0ix on APUs older than Raven
> 
> APUs before Raven didn't support s0ix.  As we just relieved some
> of the safety checks for s0ix to improve power consumption on
> APUs that support it but that are missing BIOS support a new
> blind spot was introduced that a user could "try" to run s0ix.
> 
> Plug this hole so that if users try to run s0ix on anything older
> than Raven it will just skip suspend of the GPU.
> 
> Fixes: cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support")
> Suggested-by: Alexander Deucher 
> Signed-off-by: Mario Limonciello 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 3 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c  | 5 -
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> index fa7375b97fd47..25e902077caf6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> @@ -1073,6 +1073,9 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
>   (pm_suspend_target_state != PM_SUSPEND_TO_IDLE))
>   return false;
> 
> + if (adev->asic_type < CHIP_RAVEN)
> + return false;
> +
>   /*
>* If ACPI_FADT_LOW_POWER_S0 is not set in the FADT, it is
> generally
>* risky to do any special firmware-related preparations for entering
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index 6c2fe50b528e0..98f8d9873cd84 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -2414,7 +2414,7 @@ static int amdgpu_pmops_suspend(struct device
> *dev)
> 
>   if (amdgpu_acpi_is_s0ix_active(adev))
>   adev->in_s0ix = true;
> - else
> + else if (amdgpu_acpi_is_s3_active(adev))
>   adev->in_s3 = true;

Relooking at this, I wonder if it actually needs this too:


if (!adev->in_s0ix && !adev->in_s3)
return 0;


>   return amdgpu_device_suspend(drm_dev, true);
>  }
> @@ -2436,6 +2436,9 @@ static int amdgpu_pmops_resume(struct device
> *dev)
>   struct amdgpu_device *adev = drm_to_adev(drm_dev);
>   int r;
> 
> + if (!adev->in_s0ix && !adev->in_s3)
> + return 0;
> +
>   /* Avoids registers access if device is physically gone */
>   if (!pci_device_is_present(adev->pdev))
>   adev->no_hw_access = true;
> --
> 2.25.1

RE: [PATCH] drm/amd: Allow s0ix without BIOS support

2023-02-02 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, February 1, 2023 21:49
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org; Rafael Ávila de Espíndola
> 
> Subject: Re: [PATCH] drm/amd: Allow s0ix without BIOS support
> 
> On Wed, Jan 25, 2023 at 1:33 PM Mario Limonciello
>  wrote:
> >
> > We guard the suspend entry code from running unless we have proper
> > BIOS support for either S3 mode or s0ix mode.
> >
> > If a user's system doesn't support either of these modes the kernel
> > still does offer s2idle in `/sys/power/mem_sleep` so there is an
> > expectation from users that it works even if the power consumption
> > remains very high.
> >
> > Rafael Ávila de Espíndola reports that a system of his has a
> > non-functional graphics stack after resuming.  That system doesn't
> > support S3 and the FADT doesn't indicate support for low power idle.
> >
> > Through some experimentation it was concluded that even without the
> > hardware s0i3 support provided by the amd_pmc driver the power
> > consumption over suspend is decreased by running amdgpu's s0ix
> > suspend routine.
> >
> > The numbers over suspend showed:
> > * No patch: 9.2W
> > * Skip amdgpu suspend entirely: 10.5W
> > * Run amdgpu s0ix routine: 7.7W
> >
> > As this does improve the power, remove some of the guard rails in
> > `amdgpu_acpi.c` for only running s0ix suspend routines in the right
> > circumstances.
> >
> > However if this turns out to cause regressions for anyone, we should
> > revert this change and instead opt for skipping suspend/resume routines
> > entirely or try to fix the underlying behavior that makes graphics fail
> > after resume without underlying platform support.
> >
> > Reported-by: Rafael Ávila de Espíndola 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 57b5e11446c65..fa7375b97fd47 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
> >  * S0ix even though the system is suspending to idle, so return 
> > false
> >  * in that case.
> >  */
> > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as BIOS has 
> > not been
> configured for suspend-to-idle.\n"
> >   "To use suspend-to-idle change the sleep mode 
> > in BIOS
> setup.\n");
> > -   return false;
> 
> Thinking about this more, I think we may need to check the asic type
> here.  Pre-Raven APUs didn't support S0ix at all so this may break
> them if they have any checks that use amdgpu_acpi_is_s0ix_active() in
> their code paths.

For them what should we be doing when they try to do s2idle though?
S3 path?  Or nothing?

> 
> Alex
> 
> 
> > -   }
> >
> >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as the kernel has 
> > not been
> compiled with CONFIG_AMD_PMC.\n");
> > -   return false;
> > -#else
> > -   return true;
> >  #endif /* CONFIG_AMD_PMC */
> > +   return true;
> >  }
> >
> >  #endif /* CONFIG_SUSPEND */
> > --
> > 2.25.1
> >

RE: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-02-01 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Greg KH 
> Sent: Sunday, January 29, 2023 07:32
> To: Limonciello, Mario 
> Cc: Linux regressions mailing list ; dri-
> de...@lists.freedesktop.org; sta...@vger.kernel.org;
> stanislav.lisovs...@intel.com; Zuo, Jerry ; amd-
> g...@lists.freedesktop.org; Lin, Wayne ; Guenter
> Roeck ; bske...@redhat.com
> Subject: Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into
> the atomic state"
> 
> On Fri, Jan 27, 2023 at 03:02:41PM +, Limonciello, Mario wrote:
> > [Public]
> >
> >
> >
> > > -Original Message-
> > > From: Linux kernel regression tracking (Thorsten Leemhuis)
> > > 
> > > Sent: Friday, January 27, 2023 03:15
> > > To: Greg KH ; Limonciello, Mario
> > > 
> > > Cc: dri-de...@lists.freedesktop.org; sta...@vger.kernel.org;
> > > stanislav.lisovs...@intel.com; Zuo, Jerry ; amd-
> > > g...@lists.freedesktop.org; Lin, Wayne ; Guenter
> > > Roeck ; bske...@redhat.com
> > > Subject: Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info
> into
> > > the atomic state"
> > >
> > > On 27.01.23 08:39, Greg KH wrote:
> > > > On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
> > > >> On 1/20/2023 11:46, Guenter Roeck wrote:
> > > >>> On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
> > > >>>> This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.
> > > >>>>
> > > >>>> [Why]
> > > >>>> Changes cause regression on amdgpu mst.
> > > >>>> E.g.
> > > >>>> In fill_dc_mst_payload_table_from_drm(), amdgpu expects to
> > > add/remove payload
> > > >>>> one by one and call fill_dc_mst_payload_table_from_drm() to
> update
> > > the HW
> > > >>>> maintained payload table. But previous change tries to go through
> all
> > > the
> > > >>>> payloads in mst_state and update amdpug hw maintained table in
> once
> > > everytime
> > > >>>> driver only tries to add/remove a specific payload stream only. The
> > > newly
> > > >>>> design idea conflicts with the implementation in amdgpu nowadays.
> > > >>>>
> > > >>>> [How]
> > > >>>> Revert this patch first. After addressing all regression problems
> caused
> > > by
> > > >>>> this previous patch, will add it back and adjust it.
> > > >>>
> > > >>> Has there been any progress on this revert, or on fixing the
> underlying
> > > >>> problem ?
> > > >>>
> > > >>> Thanks,
> > > >>> Guenter
> > > >>
> > > >> Hi Guenter,
> > > >>
> > > >> Wayne is OOO for CNY, but let me update you.
> > > >>
> > > >> Harry has sent out this series which is a collection of proper fixes.
> > > >> https://patchwork.freedesktop.org/series/113125/
> > > >>
> > > >> Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> > > >
> > > > Any hint on when those will be reviewed and accepted?  patchwork
> > > doesn't
> > > > show any activity on them, or at least I can't figure it out...
> > >
> > > I didn't look closer (hence please correct me if I'm wrong), but the
> > > core changes afaics are in the DRM pull airlied send a few hours ago to
> > > Linus (note the "amdgpu […] DP MST fixes" line):
> > >
> > >
> https://lore.kernel.org/all/CAPM%3D9tzuu4xnx6T5v7sKsK%2BA5HEaPOc1ie
> > > myznsyqzgztj%3d...@mail.gmail.com/
> >
> > That's right.  There are 4 commits in that PR with the appropriate stable 
> > tags
> > that should fix the majority of the MST issues introduced in 6.1 by
> 4d07b0bc40340
> > ("drm/display/dp_mst: Move all payload info into the atomic state"):
> >
> >   drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count
> assignments
> >   drm/amdgpu/display/mst: limit payload to be updated one by one
> >   drm/amdgpu/display/mst: update mst_mgr relevant variable when long
> HPD
> >   drm/display/dp_mst: Correct the kref of port.
> >
> > There will be follow ups for any remaining corner cases.
> 
> Great, thanks for this, all are now queued up in the 6.1.y queue.
> 
> greg k-h

Greg,

My apologies if this has been covered elsewhere and I missed it but I was
wondering if there was a decision made for whether 6.1.y will be an LTS kernel
release or not?

RE: [PATCH] drm/amd: Allow s0ix without BIOS support

2023-01-30 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Rafael Ávila de Espíndola 
> Sent: Monday, January 30, 2023 08:08
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org
> Cc: Limonciello, Mario 
> Subject: Re: [PATCH] drm/amd: Allow s0ix without BIOS support
> 
> BTW, to which git repo this gets added first? I took a look at
> git://anongit.freedesktop.org/drm-tip, but it is not there.
> 
> Thanks,
> Rafael

Hi,

It will first show up in amd-staging-drm-next here:

https://gitlab.freedesktop.org/agd5f/linux/-/commits/amd-staging-drm-next

It hasn't been refreshed for things accepted this last week yet, but it should
show up this week some time.

Thanks,

> 
> Mario Limonciello  writes:
> 
> > We guard the suspend entry code from running unless we have proper
> > BIOS support for either S3 mode or s0ix mode.
> >
> > If a user's system doesn't support either of these modes the kernel
> > still does offer s2idle in `/sys/power/mem_sleep` so there is an
> > expectation from users that it works even if the power consumption
> > remains very high.
> >
> > Rafael Ávila de Espíndola reports that a system of his has a
> > non-functional graphics stack after resuming.  That system doesn't
> > support S3 and the FADT doesn't indicate support for low power idle.
> >
> > Through some experimentation it was concluded that even without the
> > hardware s0i3 support provided by the amd_pmc driver the power
> > consumption over suspend is decreased by running amdgpu's s0ix
> > suspend routine.
> >
> > The numbers over suspend showed:
> > * No patch: 9.2W
> > * Skip amdgpu suspend entirely: 10.5W
> > * Run amdgpu s0ix routine: 7.7W
> >
> > As this does improve the power, remove some of the guard rails in
> > `amdgpu_acpi.c` for only running s0ix suspend routines in the right
> > circumstances.
> >
> > However if this turns out to cause regressions for anyone, we should
> > revert this change and instead opt for skipping suspend/resume routines
> > entirely or try to fix the underlying behavior that makes graphics fail
> > after resume without underlying platform support.
> >
> > Reported-by: Rafael Ávila de Espíndola 
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2364
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 8 ++--
> >  1 file changed, 2 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > index 57b5e11446c65..fa7375b97fd47 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c
> > @@ -1079,20 +1079,16 @@ bool amdgpu_acpi_is_s0ix_active(struct
> amdgpu_device *adev)
> >  * S0ix even though the system is suspending to idle, so return false
> >  * in that case.
> >  */
> > -   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0)) {
> > +   if (!(acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0))
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as BIOS has
> not been configured for suspend-to-idle.\n"
> >   "To use suspend-to-idle change the sleep mode in
> BIOS setup.\n");
> > -   return false;
> > -   }
> >
> >  #if !IS_ENABLED(CONFIG_AMD_PMC)
> > dev_warn_once(adev->dev,
> >   "Power consumption will be higher as the kernel has not
> been compiled with CONFIG_AMD_PMC.\n");
> > -   return false;
> > -#else
> > -   return true;
> >  #endif /* CONFIG_AMD_PMC */
> > +   return true;
> >  }
> >
> >  #endif /* CONFIG_SUSPEND */
> > --
> > 2.25.1

RE: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-27 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Linux kernel regression tracking (Thorsten Leemhuis)
> 
> Sent: Friday, January 27, 2023 03:15
> To: Greg KH ; Limonciello, Mario
> 
> Cc: dri-de...@lists.freedesktop.org; sta...@vger.kernel.org;
> stanislav.lisovs...@intel.com; Zuo, Jerry ; amd-
> g...@lists.freedesktop.org; Lin, Wayne ; Guenter
> Roeck ; bske...@redhat.com
> Subject: Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into
> the atomic state"
> 
> On 27.01.23 08:39, Greg KH wrote:
> > On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
> >> On 1/20/2023 11:46, Guenter Roeck wrote:
> >>> On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
> >>>> This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.
> >>>>
> >>>> [Why]
> >>>> Changes cause regression on amdgpu mst.
> >>>> E.g.
> >>>> In fill_dc_mst_payload_table_from_drm(), amdgpu expects to
> add/remove payload
> >>>> one by one and call fill_dc_mst_payload_table_from_drm() to update
> the HW
> >>>> maintained payload table. But previous change tries to go through all
> the
> >>>> payloads in mst_state and update amdpug hw maintained table in once
> everytime
> >>>> driver only tries to add/remove a specific payload stream only. The
> newly
> >>>> design idea conflicts with the implementation in amdgpu nowadays.
> >>>>
> >>>> [How]
> >>>> Revert this patch first. After addressing all regression problems caused
> by
> >>>> this previous patch, will add it back and adjust it.
> >>>
> >>> Has there been any progress on this revert, or on fixing the underlying
> >>> problem ?
> >>>
> >>> Thanks,
> >>> Guenter
> >>
> >> Hi Guenter,
> >>
> >> Wayne is OOO for CNY, but let me update you.
> >>
> >> Harry has sent out this series which is a collection of proper fixes.
> >> https://patchwork.freedesktop.org/series/113125/
> >>
> >> Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> >
> > Any hint on when those will be reviewed and accepted?  patchwork
> doesn't
> > show any activity on them, or at least I can't figure it out...
> 
> I didn't look closer (hence please correct me if I'm wrong), but the
> core changes afaics are in the DRM pull airlied send a few hours ago to
> Linus (note the "amdgpu […] DP MST fixes" line):
> 
> https://lore.kernel.org/all/CAPM%3D9tzuu4xnx6T5v7sKsK%2BA5HEaPOc1ie
> myznsyqzgztj%3d...@mail.gmail.com/

That's right.  There are 4 commits in that PR with the appropriate stable tags
that should fix the majority of the MST issues introduced in 6.1 by 
4d07b0bc40340
("drm/display/dp_mst: Move all payload info into the atomic state"):

  drm/amdgpu/display/mst: Fix mst_state->pbn_div and slot count assignments
  drm/amdgpu/display/mst: limit payload to be updated one by one
  drm/amdgpu/display/mst: update mst_mgr relevant variable when long HPD
  drm/display/dp_mst: Correct the kref of port.

There will be follow ups for any remaining corner cases.

> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.

Re: drm/amdgpu: add force_sg_display module parameter

2023-01-26 Thread Limonciello, Mario


On 1/24/2023 09:13, Alex Deucher wrote:

Add a module parameter to force sg (scatter/gather) display
on APUs.  Normally we allow displays in both VRAM and GTT,
but this option forces displays into GTT so we can explicitly
test more scenarios with GTT.

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 12 
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c |  4 
  3 files changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 872450a3a164..73d0a0807138 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -244,6 +244,8 @@ extern int amdgpu_num_kcq;
  #define AMDGPU_VCNFW_LOG_SIZE (32 * 1024)
  extern int amdgpu_vcnfw_log;
  
+extern int amdgpu_force_sg_display;

+
  #define AMDGPU_VM_MAX_NUM_CTX 4096
  #define AMDGPU_SG_THRESHOLD   (256*1024*1024)
  #define AMDGPU_DEFAULT_GTT_SIZE_MB3072ULL /* 3GB by default */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index a75dba2caeca..bc0eaf2330f2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -942,6 +942,18 @@ MODULE_PARM_DESC(smu_pptable_id,
"specify pptable id to be used (-1 = auto(default) value, 0 = use pptable from 
vbios, > 0 = soft pptable id)");
  module_param_named(smu_pptable_id, amdgpu_smu_pptable_id, int, 0444);
  
+/**

+ * DOC: force_sg_display (int)
+ * Force display buffers into GTT (scatter/gather) memory for APUs.
+ * This is used to force GTT only for displays rather than displaying from
+ * either VRAM (carve out) or GTT.
+ *
+ * Defaults to 0, or disabled.
+ */
+int amdgpu_force_sg_display;
+MODULE_PARM_DESC(force_sg_display, "Force S/G display (0 = off (default), 1 = force 
display to use GTT) ");
+module_param_named(force_sg_display, amdgpu_force_sg_display, int, 0444);


To discourage the use of this from non-developers, perhaps it should be:
`module_param_named_unsafe`

That will taint the kernel when it's used.


+
  /* These devices are not supported by amdgpu.
   * They are supported by the mach64, r128, radeon drivers
   */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 2d237f3d3a2e..78dc5d63a6dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1515,6 +1515,10 @@ uint32_t amdgpu_bo_get_preferred_domain(struct 
amdgpu_device *adev,
if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD)
domain = AMDGPU_GEM_DOMAIN_GTT;
}
+   if (amdgpu_force_sg_display &&
+   (adev->asic_type >= CHIP_CARRIZO) &&
+   (adev->flags & AMD_IS_APU))
+   domain = AMDGPU_GEM_DOMAIN_GTT;
return domain;
  }

RE: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-20 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Guenter Roeck  On Behalf Of Guenter Roeck
> Sent: Friday, January 20, 2023 12:18
> To: Limonciello, Mario 
> Cc: Lin, Wayne ; dri-de...@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org; sta...@vger.kernel.org;
> stanislav.lisovs...@intel.com; Zuo, Jerry ;
> bske...@redhat.com
> Subject: Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into
> the atomic state"
> 
> Hi Mario,
> 
> On Fri, Jan 20, 2023 at 11:51:04AM -0600, Limonciello, Mario wrote:
> > On 1/20/2023 11:46, Guenter Roeck wrote:
> > > On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:
> > > > This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.
> > > >
> > > > [Why]
> > > > Changes cause regression on amdgpu mst.
> > > > E.g.
> > > > In fill_dc_mst_payload_table_from_drm(), amdgpu expects to
> add/remove payload
> > > > one by one and call fill_dc_mst_payload_table_from_drm() to update
> the HW
> > > > maintained payload table. But previous change tries to go through all
> the
> > > > payloads in mst_state and update amdpug hw maintained table in once
> everytime
> > > > driver only tries to add/remove a specific payload stream only. The
> newly
> > > > design idea conflicts with the implementation in amdgpu nowadays.
> > > >
> > > > [How]
> > > > Revert this patch first. After addressing all regression problems caused
> by
> > > > this previous patch, will add it back and adjust it.
> > >
> > > Has there been any progress on this revert, or on fixing the underlying
> > > problem ?
> > >
> > > Thanks,
> > > Guenter
> >
> > Hi Guenter,
> >
> > Wayne is OOO for CNY, but let me update you.
> >
> > Harry has sent out this series which is a collection of proper fixes.
> > https://patchwork.freedesktop.org/series/113125/
> >
> > Once that's reviewed and accepted, 4 of them are applicable for 6.1.
> 
> Thanks a lot for the update. There is talk about abandoning v6.1.y as
> LTS candidate, in large part due to this problem, so it would be great
> to get the problem fixed before that happens.

Any idea how soon that decision is happening?  It seems that we have line
of sight to a solution including back to 6.1.y pending that review.  So perhaps
we can put off the decision until those are landed.

Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-20 Thread Limonciello, Mario


On 1/20/2023 11:46, Guenter Roeck wrote:

On Thu, Jan 12, 2023 at 04:50:44PM +0800, Wayne Lin wrote:

This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.

[Why]
Changes cause regression on amdgpu mst.
E.g.
In fill_dc_mst_payload_table_from_drm(), amdgpu expects to add/remove payload
one by one and call fill_dc_mst_payload_table_from_drm() to update the HW
maintained payload table. But previous change tries to go through all the
payloads in mst_state and update amdpug hw maintained table in once everytime
driver only tries to add/remove a specific payload stream only. The newly
design idea conflicts with the implementation in amdgpu nowadays.

[How]
Revert this patch first. After addressing all regression problems caused by
this previous patch, will add it back and adjust it.


Has there been any progress on this revert, or on fixing the underlying
problem ?

Thanks,
Guenter


Hi Guenter,

Wayne is OOO for CNY, but let me update you.

Harry has sent out this series which is a collection of proper fixes.
https://patchwork.freedesktop.org/series/113125/

Once that's reviewed and accepted, 4 of them are applicable for 6.1.

Thanks,

RE: [PATCH 1/2] drm/amdgpu: skip psp suspend for IMU enabled ASICs mode2 reset

2023-01-20 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Huang, Tim 
> Sent: Friday, January 20, 2023 10:29
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Du, Xiaojian ; Ma, Li
> ; Limonciello, Mario ;
> Huang, Tim 
> Subject: [PATCH 1/2] drm/amdgpu: skip psp suspend for IMU enabled ASICs
> mode2 reset
> 
> The psp suspend & resume should be skipped to avoid destroy
> the TMR and reload FWs again for IMU enabled APU ASICs.
> 
> Signed-off-by: Tim Huang 

Reviewed-by: Mario Limonciello 

Please also for this one
Cc: sta...@vger.kernel.org # 6.1 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index efd4f8226120..0f9a5b12c3a5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3036,6 +3036,18 @@ static int
> amdgpu_device_ip_suspend_phase2(struct amdgpu_device *adev)
>   (adev->ip_blocks[i].version->type ==
> AMD_IP_BLOCK_TYPE_SDMA))
>   continue;
> 
> + /* Once swPSP provides the IMU, RLC FW binaries to TOS
> during cold-boot.
> +  * These are in TMR, hence are expected to be reused by
> PSP-TOS to reload
> +  * from this location and RLC Autoload automatically also gets
> loaded
> +  * from here based on PMFW -> PSP message during re-init
> sequence.
> +  * Therefore, the psp suspend & resume should be skipped
> to avoid destroy
> +  * the TMR and reload FWs again for IMU enabled APU ASICs.
> +  */
> + if (amdgpu_in_reset(adev) &&
> + (adev->flags & AMD_IS_APU) && adev->gfx.imu.funcs &&
> + adev->ip_blocks[i].version->type ==
> AMD_IP_BLOCK_TYPE_PSP)
> + continue;
> +
>   /* XXX handle errors */
>   r = adev->ip_blocks[i].version->funcs->suspend(adev);
>   /* XXX handle errors */
> --
> 2.25.1

RE: [PATCH 2/2] drm/amd/pm: drop unneeded dpm features disablement for SMU 13.0.4/11

2023-01-20 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Huang, Tim 
> Sent: Friday, January 20, 2023 10:29
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhang, Yifan
> ; Du, Xiaojian ; Ma, Li
> ; Limonciello, Mario ;
> Huang, Tim 
> Subject: [PATCH 2/2] drm/amd/pm: drop unneeded dpm features
> disablement for SMU 13.0.4/11
> 
> PMFW will handle that properly. Driver involvement may cause some
> unexpected issues.
> 
> Signed-off-by: Tim Huang 
> ---
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index ec52830dde24..800eb5ad1dcb 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -1448,6 +1448,8 @@ static int smu_disable_dpms(struct smu_context
> *smu)
>   case IP_VERSION(13, 0, 0):
>   case IP_VERSION(13, 0, 7):
>   case IP_VERSION(13, 0, 10):
> + case IP_VERSION(13, 0, 4):

To keep a consistent ordering scheme, I think IP_VERSION(13, 0, 4) should come 
after IP_VERION(13, 0, 0).

w/ that fixed:
Reviewed-by: Mario Limonciello 

6.1 is used for IP_VERSION(13, 0, 4), so please also include
Cc: sta...@vger.kernel.org #6.1

> + case IP_VERSION(13, 0, 11):
>   return 0;
>   default:
>   break;
> --
> 2.25.1

Re: Documentation/gpu: update dGPU asic info table

2023-01-19 Thread Limonciello, Mario


On 1/18/2023 15:14, Alex Deucher wrote:

Update to the latest launched dGPUs.

Link: https://www.amd.com/en/graphics/radeon-rx-graphics
Link: https://www.amd.com/en/graphics/amd-radeon-rx-laptops
Signed-off-by: Alex Deucher 


Reviewed-by: Mario Limonciello 


---
  Documentation/gpu/amdgpu/dgpu-asic-info-table.csv | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv 
b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
index 84617aa35dab..882d2518f8ed 100644
--- a/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
+++ b/Documentation/gpu/amdgpu/dgpu-asic-info-table.csv
@@ -22,3 +22,5 @@ AMD Radeon RX 6800(XT) /6900(XT) /W6800, SIENNA_CICHLID, DCN 
3.0.0, 10.3.0, VCN
  AMD Radeon RX 6700 XT / 6800M / 6700M, NAVY_FLOUNDER, DCN 3.0.0, 10.3.2, VCN 
3.0.0, 5.2.2
  AMD Radeon RX 6600(XT) /6600M /W6600 /W6600M, DIMGREY_CAVEFISH, DCN 3.0.2, 
10.3.4, VCN 3.0.16, 5.2.4
  AMD Radeon RX 6500M /6300M /W6500M /W6300M, BEIGE_GOBY, DCN 3.0.3, 10.3.5, 
VCN 3.0.33, 5.2.5
+AMD Radeon RX 7900 XT /XTX, , DCN 3.2.0, 11.0.0, VCN 4.0.0, 6.0.0
+AMD Radeon RX 7600M (XT) /7700S /7600S, , DCN 3.2.1, 11.0.2, VCN 4.0.4, 6.0.2

RE: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Guilherme G. Piccoli 
> Sent: Tuesday, January 17, 2023 12:14
> To: Limonciello, Mario ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander
> 
> Cc: dri-de...@lists.freedesktop.org; Koenig, Christian
> ; Pan, Xinhui ;
> ker...@gpiccoli.net; kernel-...@igalia.com; Zhu, James
> ; Lazar, Lijo ; Liu, Leo
> ; Jiang, Sonny 
> Subject: Re: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect
> SRAM HW model check
> 
> On 17/01/2023 15:08, Limonciello, Mario wrote:
> > [...]
> >
> > Should have added this tag too:
> > Suggested-by: Alexander Deucher 
> >
> > Looks good to me, thanks!
> > Reviewed-by: Mario Limonciello 
> >
> 
> You're totally right, thanks for the reminder and apologies for missing
> that! Just sending V3 heheh
> 
> Ah, thanks for the reviews and prompt responses.
> Cheers,
> 
> 
> Guilherme

No need to resend.  Patchwork will embed the tags when we pick this up.

RE: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM HW model check

2023-01-17 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Guilherme G. Piccoli 
> Sent: Tuesday, January 17, 2023 11:59
> To: amd-gfx@lists.freedesktop.org
> Cc: dri-de...@lists.freedesktop.org; Deucher, Alexander
> ; Koenig, Christian
> ; Pan, Xinhui ;
> ker...@gpiccoli.net; kernel-...@igalia.com; Guilherme G. Piccoli
> ; Zhu, James ; Lazar, Lijo
> ; Liu, Leo ; Limonciello, Mario
> ; Jiang, Sonny 
> Subject: [PATCH v2 2/2] drm/amdgpu/vcn: Remove redundant indirect SRAM
> HW model check
> 
> The HW model validation that guards the indirect SRAM checking in the
> VCN code path is redundant - there's no model that's not included in the
> switch, making it useless in practice [0].
> 
> So, let's remove this switch statement for good.
> 
> [0] lore.kernel.org/amd-
> gfx/mn0pr12mb61013d20b8a2263b22ae1bcfe2...@mn0pr12mb6101.na
> mprd12.prod.outlook.com
> 
> Cc: James Zhu 
> Cc: Lazar Lijo 
> Cc: Leo Liu 
> Cc: Mario Limonciello 
> Cc: Sonny Jiang 
> Signed-off-by: Guilherme G. Piccoli 

Should have added this tag too:
Suggested-by: Alexander Deucher 

Looks good to me, thanks!
Reviewed-by: Mario Limonciello 

> ---
> 
> 
> V2:
> * Changed the approach after ML discussion- instead of cleaning up
> the switch statement, removed it entirely - special thanks to Alex
> and Mario for the feedback!
> 
> Notice that patch 3 was dropped from this series after reviews.
> 
> 
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 81 +
>  1 file changed, 3 insertions(+), 78 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> index 1b1a3c9e1863..02d428ddf2f8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> @@ -110,84 +110,9 @@ int amdgpu_vcn_sw_init(struct amdgpu_device
> *adev)
>   for (i = 0; i < adev->vcn.num_vcn_inst; i++)
>   atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0);
> 
> - switch (adev->ip_versions[UVD_HWIP][0]) {
> - case IP_VERSION(1, 0, 0):
> - case IP_VERSION(1, 0, 1):
> - case IP_VERSION(2, 5, 0):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(2, 2, 0):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(2, 6, 0):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(2, 0, 0):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(2, 0, 2):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(3, 0, 0):
> - case IP_VERSION(3, 0, 64):
> - case IP_VERSION(3, 0, 192):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(3, 0, 2):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(3, 0, 16):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(3, 0, 33):
> - if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> - (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> - adev->vcn.indirect_sram = true;
> - break;
> - case IP_VERSION(3, 1, 1):
> - if ((

RE: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-17 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, January 17, 2023 09:11
> To: Guilherme G. Piccoli 
> Cc: Limonciello, Mario ; Liu, Leo
> ; amd-gfx@lists.freedesktop.org; Jiang, Sonny
> ; ker...@gpiccoli.net; Pan, Xinhui
> ; dri-de...@lists.freedesktop.org; Lazar, Lijo
> ; kernel-...@igalia.com; Deucher, Alexander
> ; Zhu, James ;
> Koenig, Christian ; Pierre-Loup Griffais
> 
> Subject: Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force
> (en/dis)abling indirect SRAM mode
> 
> On Tue, Jan 17, 2023 at 9:33 AM Guilherme G. Piccoli
>  wrote:
> >
> > On 16/01/2023 23:33, Limonciello, Mario wrote:
> > > [...]
> > >
> > > For debugging these type of problems, I think an effective debugging
> > > tactic would have been to mask the IP block (amdgpu.ip_block_mask).
> >
> > Thank you, it worked indeed - nice suggestion!
> >
> > Though I see two problems with that: first, I'm not sure what's the
> > impact in the GPU functioning when I disable some IP block.
> >

It depends on the individual block what the impact is.  For example
if you don't have VCN, then you can't do any accelerated video playback.

> > Second, the parameter is a bit hard to figure - we need to clear a bit
> > for the IP block we want to disable, and the doc suggest to read on
> > dmesg to get this information (it seems it changes depending on the HW
> > model), but I couldn't parse the proper bit from dmesg. Needed to
> > instrument the kernel to find the proper bit heh
> >

Isn't it this stuff (taken from a CZN system):

[7.797779] [drm] add ip block number 0 
[7.797781] [drm] add ip block number 1 
[7.797782] [drm] add ip block number 2 
[7.797783] [drm] add ip block number 3 
[7.797783] [drm] add ip block number 4 
[7.797784] [drm] add ip block number 5 
[7.797785] [drm] add ip block number 6 
[7.797786] [drm] add ip block number 7 
[7.797787] [drm] add ip block number 8 
[7.797788] [drm] add ip block number 9 

So for that system it would be bit 8 to disable vcn.

In terms of how debugging would work:
I would expect when you get your failure it will have been the previous
block # that failed, and so you can reboot with that block masked and
see if you get further.

> > The second part is easy to improve (we can just show this bit in
> > dmesg!), I might do that instead of proposing this parameter, which
> > seems didn't raise much excitement after all heheh
> >
> > Finally, I'm still curious on why Deck was working fine with the
> > indirect SRAM mode disabled (by mistake) in many kernels - was it in
> > practice the same as disabling the VCN IP block?
> 
> IIRC, it depends on the fuse recipe for the particular ASIC.
> 
> Alex
> 
> 
> >
> > Thanks,
> >
> >
> > Guilherme
> >

Re: [PATCH 3/3] drm/amdgpu/vcn: Add parameter to force (en/dis)abling indirect SRAM mode

2023-01-16 Thread Limonciello, Mario


On 1/16/2023 18:47, Guilherme G. Piccoli wrote:

On 16/01/2023 20:00, Alex Deucher wrote:

[...]

It's not clear to me when this would be used.  We only disable it
briefly during new asic bring up, after that we never touch it again.
No end user on production hardware should ever have to change it and
doing so would break VCN on their system.

Alex


[+ Pierre-Loup]

Steam Deck is facing a pretty weird scenario then heheh

Commit 82132ecc543 ("drm/amdgpu: enable Vangogh VCN indirect sram mode")
corrected a long-term issue in which the indirect SRAM mode wasn't
enabled for Vangogh - and Deck GPU architecture is Vangogh, so it was
working perfectly with that disabled.

Happens that a bad FW candidate seems to have broken something - it was
a bit convoluted to debug, but we proved that disabling indirect SRAM is
a good workaround so far, it "restored the behavior" pre-82132ecc543.

Hence my proposal - this parameter would've made life so much easier,
and we're start using it "downstream" now. While I understand that of
course the FW should be fixed, meanwhile this is a cheap solution to
allow further debug and real use of the system.



For debugging these type of problems, I think an effective debugging 
tactic would have been to mask the IP block (amdgpu.ip_block_mask).

RE: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code

2023-01-16 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Monday, January 16, 2023 15:54
> To: Limonciello, Mario 
> Cc: Guilherme G. Piccoli ; amd-
> g...@lists.freedesktop.org; Jiang, Sonny ;
> ker...@gpiccoli.net; Pan, Xinhui ; dri-
> de...@lists.freedesktop.org; Lazar, Lijo ; kernel-
> d...@igalia.com; Deucher, Alexander ; Zhu,
> James ; Liu, Leo ; Koenig,
> Christian 
> Subject: Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM
> checking code
> 
> On Mon, Jan 16, 2023 at 4:49 PM Limonciello, Mario
>  wrote:
> >
> > [Public]
> >
> >
> >
> > > -Original Message-
> > > From: Alex Deucher 
> > > Sent: Monday, January 16, 2023 15:42
> > > To: Guilherme G. Piccoli 
> > > Cc: amd-gfx@lists.freedesktop.org; Jiang, Sonny
> ;
> > > ker...@gpiccoli.net; Pan, Xinhui ; dri-
> > > de...@lists.freedesktop.org; Lazar, Lijo ;
> Limonciello,
> > > Mario ; kernel-...@igalia.com; Deucher,
> > > Alexander ; Zhu, James
> > > ; Liu, Leo ; Koenig, Christian
> > > 
> > > Subject: Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM
> > > checking code
> > >
> > > On Mon, Jan 16, 2023 at 4:20 PM Guilherme G. Piccoli
> > >  wrote:
> > > >
> > > > Currently both conditionals checked to enable indirect SRAM are
> > > > repeated for all HW/IP models. Let's just simplify it a bit so
> > > > code is more compact and readable.
> > > >
> > > > While at it, add the legacy names as a comment per case block, to
> > > > help whoever is reading the code and doesn't have much experience
> > > > with the name/number mapping.
> > > >
> > > > Cc: James Zhu 
> > > > Cc: Lazar Lijo 
> > > > Cc: Leo Liu 
> > > > Cc: Mario Limonciello 
> > > > Cc: Sonny Jiang 
> > > > Signed-off-by: Guilherme G. Piccoli 
> > > > ---
> > > >
> > > >
> > > > Hi folks, first of all thanks in advance for reviews/comments!
> > > >
> > > > This work is based on agd5f/amd-staging-drm-next branch - there is this
> > > > patch from Mario that refactored the amdgpu_vcn.c, and since it
> dropped
> > > > the legacy names from the switch cases, I've decided to also include
> them
> > > > here as comments.
> > > >
> > > > I'm not sure if that's a good idea, feels good for somebody not so
> > > > used to the code read the codenames instead of purely numbers, but
> > > > if you wanna move away from the legacy names for good, lemme know
> > > > and I can rework without these comments.
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > Guilherme
> > > >
> > > >
> > > >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 84 +---
> -
> > > >  1 file changed, 16 insertions(+), 68 deletions(-)
> > > >
> > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > > > index 1b1a3c9e1863..1f880e162d9d 100644
> > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > > > @@ -111,78 +111,26 @@ int amdgpu_vcn_sw_init(struct
> amdgpu_device
> > > *adev)
> > > > atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 
> > > > 0);
> > > >
> > > > switch (adev->ip_versions[UVD_HWIP][0]) {
> > > > -   case IP_VERSION(1, 0, 0):
> > > > -   case IP_VERSION(1, 0, 1):
> > > > -   case IP_VERSION(2, 5, 0):
> > > > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> > > > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > > > -   adev->vcn.indirect_sram = true;
> > > > -   break;
> > > > -   case IP_VERSION(2, 2, 0):
> > > > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> > > > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > > > -   adev->vcn.indirect_sram = true;
> > > > -   break;
> > > > -   case IP_VERSION(2, 6, 0):
> > > > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP)
> &&
> > > >

RE: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM checking code

2023-01-16 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Monday, January 16, 2023 15:42
> To: Guilherme G. Piccoli 
> Cc: amd-gfx@lists.freedesktop.org; Jiang, Sonny ;
> ker...@gpiccoli.net; Pan, Xinhui ; dri-
> de...@lists.freedesktop.org; Lazar, Lijo ; Limonciello,
> Mario ; kernel-...@igalia.com; Deucher,
> Alexander ; Zhu, James
> ; Liu, Leo ; Koenig, Christian
> 
> Subject: Re: [PATCH 2/3] drm/amdgpu/vcn: Deduplicate indirect SRAM
> checking code
> 
> On Mon, Jan 16, 2023 at 4:20 PM Guilherme G. Piccoli
>  wrote:
> >
> > Currently both conditionals checked to enable indirect SRAM are
> > repeated for all HW/IP models. Let's just simplify it a bit so
> > code is more compact and readable.
> >
> > While at it, add the legacy names as a comment per case block, to
> > help whoever is reading the code and doesn't have much experience
> > with the name/number mapping.
> >
> > Cc: James Zhu 
> > Cc: Lazar Lijo 
> > Cc: Leo Liu 
> > Cc: Mario Limonciello 
> > Cc: Sonny Jiang 
> > Signed-off-by: Guilherme G. Piccoli 
> > ---
> >
> >
> > Hi folks, first of all thanks in advance for reviews/comments!
> >
> > This work is based on agd5f/amd-staging-drm-next branch - there is this
> > patch from Mario that refactored the amdgpu_vcn.c, and since it dropped
> > the legacy names from the switch cases, I've decided to also include them
> > here as comments.
> >
> > I'm not sure if that's a good idea, feels good for somebody not so
> > used to the code read the codenames instead of purely numbers, but
> > if you wanna move away from the legacy names for good, lemme know
> > and I can rework without these comments.
> >
> > Cheers,
> >
> >
> > Guilherme
> >
> >
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 84 +
> >  1 file changed, 16 insertions(+), 68 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > index 1b1a3c9e1863..1f880e162d9d 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > @@ -111,78 +111,26 @@ int amdgpu_vcn_sw_init(struct amdgpu_device
> *adev)
> > atomic_set(>vcn.inst[i].dpg_enc_submission_cnt, 0);
> >
> > switch (adev->ip_versions[UVD_HWIP][0]) {
> > -   case IP_VERSION(1, 0, 0):
> > -   case IP_VERSION(1, 0, 1):
> > -   case IP_VERSION(2, 5, 0):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(2, 2, 0):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(2, 6, 0):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(2, 0, 0):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(2, 0, 2):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(3, 0, 0):
> > -   case IP_VERSION(3, 0, 64):
> > -   case IP_VERSION(3, 0, 192):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram = true;
> > -   break;
> > -   case IP_VERSION(3, 0, 2):
> > -   if ((adev->firmware.load_type == AMDGPU_FW_LOAD_PSP) &&
> > -   (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG))
> > -   adev->vcn.indirect_sram

RE: [PATCH 1/3] drm/amd: Adjust legacy IP discovery for Picasso/Raven/Raven2

2023-01-16 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Monday, January 16, 2023 07:51
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] drm/amd: Adjust legacy IP discovery for
> Picasso/Raven/Raven2
> 
> On Sun, Jan 15, 2023 at 2:22 PM Mario Limonciello
>  wrote:
> >
> > The switch/case statement currently combines 10.0.0 and 10.0.1, but
> > 10.0.1 is only used for Raven 2.  So split the two cases up to
> > make this clearer.
> 
> Keep the logic as is.  We don't know the revision id which is used to
> differentiate the raven variants until after IP discovery so we can't
> assign the proper IP versions for each raven variant and raven asics
> don't have an IP discovery table (it's hardcoded in
> amdgpu_discovery.c).
> 

Got it thanks, will drop this patch.

> Alex
> 
> >
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c | 12 
> >  1 file changed, 4 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > index c03824d0311bd..0d950ae14b27c 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ucode.c
> > @@ -1074,15 +1074,11 @@ static const char
> *amdgpu_ucode_legacy_naming(struct amdgpu_device *adev, int bl
> > }
> > break;
> > case IP_VERSION(10, 0, 0):
> > +   if (adev->apu_flags & AMD_APU_IS_PICASSO)
> > +   return "picasso";
> > +   return "raven";
> > case IP_VERSION(10, 0, 1):
> > -   if (adev->asic_type == CHIP_RAVEN) {
> > -   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
> > -   return "raven2";
> > -   else if (adev->apu_flags & 
> > AMD_APU_IS_PICASSO)
> > -   return "picasso";
> > -   return "raven";
> > -   }
> > -   break;
> > +   return "raven2";
> > case IP_VERSION(11, 0, 0):
> > return "navi10";
> > case IP_VERSION(11, 0, 2):
> > --
> > 2.25.1
> >

Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-13 Thread Limonciello, Mario


On 1/13/2023 13:28, Lyude Paul wrote:

On Fri, 2023-01-13 at 11:25 +0100, Daniel Vetter wrote:

On Fri, Jan 13, 2023 at 12:16:57PM +0200, Jani Nikula wrote:


Cc: intel-gfx, drm maintainers

Please have the courtesy of Cc'ing us for changes impacting us, and
maybe try to involve us earlier instead of surprising us like
this. Looks like this has been debugged for at least three months, and
the huge revert at this point is going to set us back with what's been
developed on top of it for i915 DP MST and DSC.


tbf I assumed this wont land when I've seen it fly by. It feels a bit much
like living under a rock for half a year and then creating a mess for
everyone else who's been building on top of this is not great.

Like yes it's a regression, but apparently not a blantantly obvious one,
and I think if we just ram this in there's considerable community goodwill
down the drain. Someone needs to get that goodwill up the drain again.


It's a regression, I get that, but this is also going to be really nasty
to deal with. It's a 2500-line commit, plus the dependencies, which I
don't think are accounted for here. (What's the baseline for the revert
anyway?) I don't expect all the dependent commits to be easy to revert
or backport to v6.1 or v6.2.

*sad trombone*


Yeah that's the other thing. 2500 patch revert is not cc stable material.
So this isn't even really helping users all that much.

Unless it also comes with full amounts of backports of the reverts on all
affected drivers for all curent stable trees, fully validated.


The silver lining here is that in terms of how many stable trees this is 
broken it's only 6.1.y.


Wayne's revert is against drm-tip.

I found that attempting backporting his revert I run into
conflicts in 6.2-rc3 because of:

831a277ef001 ("Revert "drm/i915: Extract drm_dp_atomic_find_vcpi_slots 
cycle to separate function"")


I worked through them and have the result here:
https://gitlab.freedesktop.org/superm1/linux/-/commit/8e926eb77c41e7f32f3d8943cdf7d140ed319b79

and conflicts in 6.1.y from the lack of:

8c7d980da9ba ("drm/nouveau/disp: move DP MST payload config method")

I worked through those as well and the result is here:

https://gitlab.freedesktop.org/superm1/linux/-/commit/2145b4de3fea9908cda6bef0693a797cc7f4ddfc

Affected people in Gitlab #2171 have tested amdgpu works w/ MST again 
with 6.1.5 with that applied.


To your point, we do need someone with the appropriate hardware to make 
sure we didn't make i915 or nouveau worse by this revert though.




This is bad. I do think we need to have some understanding first of what
"fix this in amdgpu" would look like as plan B. Because plan A does not
look like a happy one at all.


Yeah this whole thing has been a mess, I'm partially to blame here - we should
have reverted earlier, but a lot of this has been me finding out that the
problem here is a lot bigger then I previously imagined - and has not at all
been easy to untangle. I've also dumped so much time into trying to figure it
out that was more or less the only reason I acked this in the first place, I'm
literally just quite tired and exhausted at this point from spinning my wheels
on trying to fix this ;_;.

I am sure there is a real proper fix for this, if anyone wants to help me try
and figure this out I'm happy to setup remote access to the machines I've got
here. I'll also try to push myself to dig further into this next week again.


-Daniel


BR,
Jani.

RE: Coverity: dm_dmub_sw_init(): Incorrect expression

2023-01-12 Thread Limonciello, Mario

[AMD Official Use Only - General]

This particular one was fixed already in 
https://patchwork.freedesktop.org/patch/518050/ which got applied today.

> -Original Message-
> From: coverity-bot 
> Sent: Thursday, January 12, 2023 16:25
> To: Limonciello, Mario 
> Cc: linux-ker...@vger.kernel.org; amd-gfx@lists.freedesktop.org; Siqueira,
> Rodrigo ; Li, Sun peng (Leo)
> ; Li, Roman ; Zuo, Jerry
> ; Wu, Hersen ; dri-
> de...@lists.freedesktop.org; Koenig, Christian ;
> Lazar, Lijo ; Pillai, Aurabindo
> ; Wentland, Harry ;
> Deucher, Alexander ; Daniel Vetter
> ; David Airlie ; Pan, Xinhui
> ; Wheeler, Daniel ;
> Gustavo A. R. Silva ; linux-n...@vger.kernel.org;
> linux-harden...@vger.kernel.org
> Subject: Coverity: dm_dmub_sw_init(): Incorrect expression
> 
> Hello!
> 
> This is an experimental semi-automated report about issues detected by
> Coverity from a scan of next-20230111 as part of the linux-next scan project:
> https://scan.coverity.com/projects/linux-next-weekly-scan
> 
> You're getting this email because you were associated with the identified
> lines of code (noted below) that were touched by commits:
> 
>   Tue Jan 10 14:32:57 2023 -0500
> a7ab345149b8 ("drm/amd/display: Load DMUB microcode during early_init")
> 
> Coverity reported the following:
> 
> *** CID 1530544:  Incorrect expression  (IDENTICAL_BRANCHES)
> drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c:1951 in
> dm_dmub_sw_init()
> 1945
> 1946  switch (adev->ip_versions[DCE_HWIP][0]) {
> 1947  case IP_VERSION(2, 1, 0):
> 1948  dmub_asic = DMUB_ASIC_DCN21;
> 1949  break;
> 1950  case IP_VERSION(3, 0, 0):
> vvv CID 1530544:  Incorrect expression  (IDENTICAL_BRANCHES)
> vvv The same code is executed regardless of whether "adev-
> >ip_versions[GC_HWIP][0] == 656128U" is true, because the 'then' and 'else'
> branches are identical. Should one of the branches be modified, or the entire 
> 'if'
> statement replaced?
> 1951  if (adev->ip_versions[GC_HWIP][0] == IP_VERSION(10, 3, 
> 0))
> 1952  dmub_asic = DMUB_ASIC_DCN30;
> 1953  else
> 1954  dmub_asic = DMUB_ASIC_DCN30;
> 1955  break;
> 1956  case IP_VERSION(3, 0, 1):
> 
> If this is a false positive, please let us know so we can mark it as
> such, or teach the Coverity rules to be smarter. If not, please make
> sure fixes get into linux-next. :) For patches fixing this, please
> include these lines (but double-check the "Fixes" first):
> 
> Reported-by: coverity-bot 
> Addresses-Coverity-ID: 1530544 ("Incorrect expression")
> Fixes: a7ab345149b8 ("drm/amd/display: Load DMUB microcode during
> early_init")
> 
> Thanks for your attention!
> 
> --
> Coverity-bot

Re: [PATCH] Revert "drm/display/dp_mst: Move all payload info into the atomic state"

2023-01-12 Thread Limonciello, Mario


On 1/12/2023 02:50, Wayne Lin wrote:

This reverts commit 4d07b0bc403403438d9cf88450506240c5faf92f.

[Why]
Changes cause regression on amdgpu mst.
E.g.
In fill_dc_mst_payload_table_from_drm(), amdgpu expects to add/remove payload
one by one and call fill_dc_mst_payload_table_from_drm() to update the HW
maintained payload table. But previous change tries to go through all the
payloads in mst_state and update amdpug hw maintained table in once everytime
driver only tries to add/remove a specific payload stream only. The newly
design idea conflicts with the implementation in amdgpu nowadays.

[How]
Revert this patch first. After addressing all regression problems caused by
this previous patch, will add it back and adjust it.

Signed-off-by: Wayne Lin 
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2171
Cc: sta...@vger.kernel.org # 6.1
Cc: Lyude Paul 
Cc: Harry Wentland 
Cc: Mario Limonciello 
Cc: Ville Syrjälä 
Cc: Ben Skeggs 
Cc: Stanislav Lisovskiy 
Cc: Fangzhi Zuo 
---
  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  53 +-
  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 106 ++-
  .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  87 ++-
  .../amd/display/include/link_service_types.h  |   3 -
  drivers/gpu/drm/display/drm_dp_mst_topology.c | 724 --
  drivers/gpu/drm/i915/display/intel_dp_mst.c   |  67 +-
  drivers/gpu/drm/i915/display/intel_hdcp.c |  24 +-
  drivers/gpu/drm/nouveau/dispnv50/disp.c   | 167 ++--
  include/drm/display/drm_dp_mst_helper.h   | 177 +++--
  9 files changed, 878 insertions(+), 530 deletions(-)


Hi Wayne,

What branch is this intended to apply against?  I shared that it existed 
to reporters in #2171 and they said they couldn't apply it against 
drm-next (03a0a1040), v6.2-rc3 or v6.1.5.


I guess it's unclear to me the correct path this is supposed to start.
Should we be reverting in drm-fixes, drm-next, or directly to 6.2-rc?

Thanks,



diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 77277d90b6e2..674f5dc1102b 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -6548,7 +6548,6 @@ static int dm_encoder_helper_atomic_check(struct 
drm_encoder *encoder,
const struct drm_display_mode *adjusted_mode = 
_state->adjusted_mode;
struct drm_dp_mst_topology_mgr *mst_mgr;
struct drm_dp_mst_port *mst_port;
-   struct drm_dp_mst_topology_state *mst_state;
enum dc_color_depth color_depth;
int clock, bpp = 0;
bool is_y420 = false;
@@ -6562,13 +6561,6 @@ static int dm_encoder_helper_atomic_check(struct 
drm_encoder *encoder,
if (!crtc_state->connectors_changed && !crtc_state->mode_changed)
return 0;
  
-	mst_state = drm_atomic_get_mst_topology_state(state, mst_mgr);

-   if (IS_ERR(mst_state))
-   return PTR_ERR(mst_state);
-
-   if (!mst_state->pbn_div)
-   mst_state->pbn_div = 
dm_mst_get_pbn_divider(aconnector->mst_port->dc_link);
-
if (!state->duplicated) {
int max_bpc = conn_state->max_requested_bpc;
is_y420 = drm_mode_is_420_also(>display_info, adjusted_mode) 
&&
@@ -6580,10 +6572,11 @@ static int dm_encoder_helper_atomic_check(struct 
drm_encoder *encoder,
clock = adjusted_mode->clock;
dm_new_connector_state->pbn = drm_dp_calc_pbn_mode(clock, bpp, 
false);
}
-
-   dm_new_connector_state->vcpi_slots =
-   drm_dp_atomic_find_time_slots(state, mst_mgr, mst_port,
- dm_new_connector_state->pbn);
+   dm_new_connector_state->vcpi_slots = 
drm_dp_atomic_find_time_slots(state,
+  
mst_mgr,
+  
mst_port,
+  
dm_new_connector_state->pbn,
+  
dm_mst_get_pbn_divider(aconnector->dc_link));
if (dm_new_connector_state->vcpi_slots < 0) {
DRM_DEBUG_ATOMIC("failed finding vcpi slots: %d\n", 
(int)dm_new_connector_state->vcpi_slots);
return dm_new_connector_state->vcpi_slots;
@@ -6654,14 +6647,17 @@ static int dm_update_mst_vcpi_slots_for_dsc(struct 
drm_atomic_state *state,
dm_conn_state->vcpi_slots = slot_num;
  
  			ret = drm_dp_mst_atomic_enable_dsc(state, aconnector->port,

-  dm_conn_state->pbn, 
false);
+  dm_conn_state->pbn, 
0, false);
if (ret < 0)
return ret;
  
  			continue;

}
  
-		vcpi = drm_dp_mst_atomic_enable_dsc(state, aconnector->port,

RE: [PATCH] drm/amd: Only load TA microcode for psp v12_0 once

2023-01-10 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Alex Deucher 
> Sent: Tuesday, January 10, 2023 13:29
> To: Limonciello, Mario 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd: Only load TA microcode for psp v12_0 once
> 
> On Tue, Jan 10, 2023 at 2:16 PM Mario Limonciello
>  wrote:
> >
> > During rebase from patch series accidentally ended up with two calls
> > to load TA microcode for psp v12_0. Only one is needed, so remove the
> > second.
> >
> > Fixes: f1efed401badb ("drm/amd: Parse both v1 and v2 TA microcode
> headers using same function")
> > Signed-off-by: Mario Limonciello 
> 
> Reviewed-by: Alex Deucher 
> 
> I'll squash this into f1efed401badb.

You mean when you send it up for drm-next?  At least for amd-staging-drm-next it
should probably be it's own separate commit though still right?

> 
> Alex
> 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/psp_v12_0.c | 3 ---
> >  1 file changed, 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c
> b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c
> > index e82a0c2bf1faa..fcd708eae75cc 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v12_0.c
> > @@ -55,9 +55,6 @@ static int psp_v12_0_init_microcode(struct
> psp_context *psp)
> > amdgpu_ucode_ip_version_decode(adev, MP0_HWIP, ucode_prefix,
> sizeof(ucode_prefix));
> >
> > err = psp_init_asd_microcode(psp, ucode_prefix);
> > -   if (err)
> > -   return err;
> > -   err = psp_init_ta_microcode(psp, ucode_prefix);
> > if (err)
> > return err;
> >
> > --
> > 2.25.1
> >

RE: [PATCH v7 20/45] drm/amd: Parse both v1 and v2 TA microcode headers using same function

2023-01-09 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, January 5, 2023 21:27
> To: Limonciello, Mario ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org
> Cc: Javier Martinez Canillas ; Carlos Soriano Sanchez
> ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; David Airlie ; Daniel Vetter
> ; Koenig, Christian ; Pan,
> Xinhui ; David Airlie 
> Subject: Re: [PATCH v7 20/45] drm/amd: Parse both v1 and v2 TA microcode
> headers using same function
> 
> 
> 
> On 1/5/2023 10:31 PM, Mario Limonciello wrote:
> > Several IP versions duplicate code and can't use the common helpers.
> > Move this code into a single function so that the helpers can be used.
> >
> > Signed-off-by: Mario Limonciello 
> > ---
> > v6->v7:
> >   * Drop tags
> >   * Only set adev->psp.securedisplay_context.context on PSPv12 Renoir
> and
> > PSP v10 which matches previous behavior.  If it should match for Cezanne
> > and PSPv11 too we can undo this part of the check.
> > v5->v6:
> >   * Rebase on earlier patches
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 123
> ++--
> >   drivers/gpu/drm/amd/amdgpu/psp_v10_0.c  |  64 +---
> >   drivers/gpu/drm/amd/amdgpu/psp_v11_0.c  |  80 ++-
> >   drivers/gpu/drm/amd/amdgpu/psp_v12_0.c  |  66 ++---
> >   4 files changed, 115 insertions(+), 218 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 7a2fc920739b..bdc2bf87a286 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -3272,41 +3272,76 @@ static int parse_ta_bin_descriptor(struct
> psp_context *psp,
> > return 0;
> >   }
> >
> > -int psp_init_ta_microcode(struct psp_context *psp,
> > - const char *chip_name)
> > +static int parse_ta_v1_microcode(struct psp_context *psp)
> >   {
> > +   const struct ta_firmware_header_v1_0 *ta_hdr;
> > struct amdgpu_device *adev = psp->adev;
> > -   char fw_name[PSP_FW_NAME_LEN];
> > -   const struct ta_firmware_header_v2_0 *ta_hdr;
> > -   int err = 0;
> > -   int ta_index = 0;
> >
> > -   if (!chip_name) {
> > -   dev_err(adev->dev, "invalid chip name for ta microcode\n");
> > +   ta_hdr = (const struct ta_firmware_header_v1_0 *) adev-
> >psp.ta_fw->data;
> > +
> > +   if (le16_to_cpu(ta_hdr->header.header_version_major) != 1)
> > return -EINVAL;
> > -   }
> >
> > -   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_ta.bin",
> chip_name);
> > -   err = request_firmware(>psp.ta_fw, fw_name, adev->dev);
> > -   if (err)
> > -   goto out;
> > +   adev->psp.xgmi_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->xgmi.fw_version);
> > +   adev->psp.xgmi_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->xgmi.size_bytes);
> > +   adev->psp.xgmi_context.context.bin_desc.start_addr =
> > +   (uint8_t *)ta_hdr +
> > +   le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
> > +
> > +   adev->psp.ras_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->ras.fw_version);
> > +   adev->psp.ras_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->ras.size_bytes);
> > +   adev->psp.ras_context.context.bin_desc.start_addr =
> > +   (uint8_t *)adev-
> >psp.xgmi_context.context.bin_desc.start_addr +
> > +   le32_to_cpu(ta_hdr->ras.offset_bytes);
> > +
> > +   adev->psp.hdcp_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->hdcp.fw_version);
> > +   adev->psp.hdcp_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->hdcp.size_bytes);
> > +   adev->psp.hdcp_context.context.bin_desc.start_addr =
> > +   (uint8_t *)ta_hdr +
> > +   le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
> > +
> > +   adev->psp.dtm_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->dtm.fw_version);
> > +   adev->psp.dtm_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->dtm.size_bytes);
> > +   adev->psp.dtm_context.context.bin_desc.start_addr =
> > +   (uint8_t *)adev-
> >psp.hdcp_context.context.bin_desc.start_addr +
> >

RE: [PATCH v2] drm/amd/amdgpu: Fix an uninitialized variable

2023-01-09 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Monday, January 9, 2023 10:04
> To: SHANMUGAM, SRINIVASAN 
> Cc: Wentland, Harry ; Deucher, Alexander
> ; Koenig, Christian
> ; amd-gfx@lists.freedesktop.org; Limonciello,
> Mario 
> Subject: Re: [PATCH v2] drm/amd/amdgpu: Fix an uninitialized variable
> 
> On Mon, Jan 9, 2023 at 10:58 AM Srinivasan Shanmugam
>  wrote:
> >
> >   CC  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.o
> > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:217:71: error: variable 'i' is
> uninitialized when used here [-Werror,-Wuninitialized]
> > snprintf(fw_name, sizeof(fw_name), "amdgpu/%s%d.bin",
> ucode_prefix, i);
> > 
> > ^
> > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:207:16: note: initialize the
> variable 'i' to silence this warning
> > int err = 0, i;
> >   ^
> >= 0
> >
> > As suggested by Christian, buggy
> > "snprintf(fw_name, sizeof(fw_name), "amdgpu/%s%d.bin", ucode_prefix,
> i);"
> > shouldn't be "i" in the first place, but rather using "instance",
> > because for instance greater than 0, we want to have different
> > sdma firmware for different instance we add the instance number.
> >
> > Remove setting err to 0 as well. This is considered very bad coding style.
> >
> > Cc: Christian König 
> > Cc: Mario Limonciello 
> > Cc: Alex Deucher 
> > Signed-off-by: Srinivasan Shanmugam 
> 
> Reviewed-by: Alex Deucher 

Thanks for the fix!

Reviewed-by: Mario Limonciello 

> 
> > Change-Id: I2f1180af4f37bf1efd4d47e7bf64425b0b3809fb
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> > index 0e1e2521fe25a..e9b78739b9ff7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
> > @@ -204,7 +204,7 @@ int amdgpu_sdma_init_microcode(struct
> amdgpu_device *adev,
> >  {
> > struct amdgpu_firmware_info *info = NULL;
> > const struct common_firmware_header *header = NULL;
> > -   int err = 0, i;
> > +   int err, i;
> > const struct sdma_firmware_header_v2_0 *sdma_hdr;
> > uint16_t version_major;
> > char ucode_prefix[30];
> > @@ -214,7 +214,7 @@ int amdgpu_sdma_init_microcode(struct
> amdgpu_device *adev,
> > if (instance == 0)
> > snprintf(fw_name, sizeof(fw_name), "amdgpu/%s.bin",
> ucode_prefix);
> > else
> > -   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s%d.bin",
> ucode_prefix, i);
> > +   snprintf(fw_name, sizeof(fw_name), "amdgpu/%s%d.bin",
> ucode_prefix, instance);
> > err = amdgpu_ucode_request(adev, 
> >sdma.instance[instance].fw, fw_name);
> > if (err)
> > goto out;
> > --
> > 2.25.1
> >

RE: [PATCH v6 20/45] drm/amd: Parse both v1 and v2 TA microcode headers using same function

2023-01-05 Thread Limonciello, Mario

[AMD Official Use Only - General]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, January 5, 2023 07:22
> To: Limonciello, Mario ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org
> Cc: Javier Martinez Canillas ; Carlos Soriano Sanchez
> ; amd-gfx@lists.freedesktop.org; dri-
> de...@lists.freedesktop.org; David Airlie ; Daniel Vetter
> ; Koenig, Christian ; Pan,
> Xinhui 
> Subject: Re: [PATCH v6 20/45] drm/amd: Parse both v1 and v2 TA microcode
> headers using same function
> 
> 
> 
> On 1/5/2023 9:12 AM, Mario Limonciello wrote:
> > Several IP versions duplicate code and can't use the common helpers.
> > Move this code into a single function so that the helpers can be used.
> >
> > Reviewed-by: Alex Deucher 
> > Signed-off-by: Mario Limonciello 
> > ---
> > v5->v6:
> >   * Rebase on earlier patches
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 120
> ++--
> >   drivers/gpu/drm/amd/amdgpu/psp_v10_0.c  |  64 +
> >   drivers/gpu/drm/amd/amdgpu/psp_v11_0.c  |  77 ++-
> >   drivers/gpu/drm/amd/amdgpu/psp_v12_0.c  |  62 +---
> >   4 files changed, 109 insertions(+), 214 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 7a2fc920739b..d971e3785eaf 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -3272,41 +3272,75 @@ static int parse_ta_bin_descriptor(struct
> psp_context *psp,
> > return 0;
> >   }
> >
> > -int psp_init_ta_microcode(struct psp_context *psp,
> > - const char *chip_name)
> > +static int parse_ta_v1_microcode(struct psp_context *psp)
> >   {
> > +   const struct ta_firmware_header_v1_0 *ta_hdr;
> > struct amdgpu_device *adev = psp->adev;
> > -   char fw_name[PSP_FW_NAME_LEN];
> > -   const struct ta_firmware_header_v2_0 *ta_hdr;
> > -   int err = 0;
> > -   int ta_index = 0;
> >
> > -   if (!chip_name) {
> > -   dev_err(adev->dev, "invalid chip name for ta microcode\n");
> > +   ta_hdr = (const struct ta_firmware_header_v1_0 *)
> > +adev->psp.ta_fw->data;
> > +
> > +   if (le16_to_cpu(ta_hdr->header.header_version_major) != 1)
> > return -EINVAL;
> > +
> > +   adev->psp.xgmi_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->xgmi.fw_version);
> > +   adev->psp.xgmi_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->xgmi.size_bytes);
> > +   adev->psp.xgmi_context.context.bin_desc.start_addr =
> > +   (uint8_t *)ta_hdr +
> > +   le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
> > +   adev->psp.ta_fw_version = le32_to_cpu(ta_hdr-
> >header.ucode_version);
> > +   adev->psp.ras_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->ras.fw_version);
> > +   adev->psp.ras_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->ras.size_bytes);
> > +   adev->psp.ras_context.context.bin_desc.start_addr =
> > +   (uint8_t *)adev-
> >psp.xgmi_context.context.bin_desc.start_addr +
> > +   le32_to_cpu(ta_hdr->ras.offset_bytes);
> > +   adev->psp.hdcp_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->hdcp.fw_version);
> > +   adev->psp.hdcp_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->hdcp.size_bytes);
> > +   adev->psp.hdcp_context.context.bin_desc.start_addr =
> > +   (uint8_t *)ta_hdr +
> > +   le32_to_cpu(ta_hdr->header.ucode_array_offset_bytes);
> > +   adev->psp.ta_fw_version = le32_to_cpu(ta_hdr-
> >header.ucode_version);
> > +   adev->psp.dtm_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->dtm.fw_version);
> > +   adev->psp.dtm_context.context.bin_desc.size_bytes =
> > +   le32_to_cpu(ta_hdr->dtm.size_bytes);
> > +   adev->psp.dtm_context.context.bin_desc.start_addr =
> > +   (uint8_t *)adev-
> >psp.hdcp_context.context.bin_desc.start_addr +
> > +   le32_to_cpu(ta_hdr->dtm.offset_bytes);
> > +   if (adev->apu_flags & AMD_APU_IS_RENOIR) {
> > +   adev-
> >psp.securedisplay_context.context.bin_desc.fw_version =
> > +   le32_to_cpu(ta_hdr->securedisplay.fw_version);
> > +   adev-
> >psp.secu

RE: [PATCH v5 10/45] drm/amd: Load VCN microcode during early_init

2023-01-04 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Alex Deucher 
> Sent: Wednesday, January 4, 2023 11:16
> To: Limonciello, Mario 
> Cc: Deucher, Alexander ; linux-
> ker...@vger.kernel.org; Pan, Xinhui ; Lazar, Lijo
> ; Javier Martinez Canillas ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Carlos Soriano
> Sanchez ; Koenig, Christian
> 
> Subject: Re: [PATCH v5 10/45] drm/amd: Load VCN microcode during
> early_init
> 
> On Wed, Jan 4, 2023 at 11:42 AM Mario Limonciello
>  wrote:
> >
> > Simplifies the code so that all VCN versions will get the firmware
> > name from `amdgpu_ucode_ip_version_decode` and then use this
> filename
> > to load microcode as part of the early_init process.
> >
> > Signed-off-by: Mario Limonciello 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 91 +-
> ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h |  1 +
> >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c   |  5 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c   |  5 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c   |  5 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c   |  5 +-
> >  drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c   |  5 +-
> >  7 files changed, 50 insertions(+), 67 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > index b5692f825589..55bbe4c8ff5b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c
> > @@ -36,25 +36,25 @@
> >  #include "soc15d.h"
> >
> >  /* Firmware Names */
> > -#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
> > -#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
> > -#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
> > -#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
> > -#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
> > -#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
> > -#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
> > -#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
> > -#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
> > -#define FIRMWARE_SIENNA_CICHLID
> "amdgpu/sienna_cichlid_vcn.bin"
> > -#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
> > -#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
> > -#define FIRMWARE_DIMGREY_CAVEFISH
> "amdgpu/dimgrey_cavefish_vcn.bin"
> > -#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
> > -#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
> > -#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
> > -#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
> > -#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
> > -#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
> > +#define FIRMWARE_RAVEN "amdgpu/raven_vcn.bin"
> > +#define FIRMWARE_PICASSO   "amdgpu/picasso_vcn.bin"
> > +#define FIRMWARE_RAVEN2"amdgpu/raven2_vcn.bin"
> > +#define FIRMWARE_ARCTURUS  "amdgpu/arcturus_vcn.bin"
> > +#define FIRMWARE_RENOIR"amdgpu/renoir_vcn.bin"
> > +#define FIRMWARE_GREEN_SARDINE "amdgpu/green_sardine_vcn.bin"
> > +#define FIRMWARE_NAVI10"amdgpu/navi10_vcn.bin"
> > +#define FIRMWARE_NAVI14"amdgpu/navi14_vcn.bin"
> > +#define FIRMWARE_NAVI12"amdgpu/navi12_vcn.bin"
> > +#define FIRMWARE_SIENNA_CICHLID
> "amdgpu/sienna_cichlid_vcn.bin"
> > +#define FIRMWARE_NAVY_FLOUNDER "amdgpu/navy_flounder_vcn.bin"
> > +#define FIRMWARE_VANGOGH   "amdgpu/vangogh_vcn.bin"
> > +#define FIRMWARE_DIMGREY_CAVEFISH
> "amdgpu/dimgrey_cavefish_vcn.bin"
> > +#define FIRMWARE_ALDEBARAN "amdgpu/aldebaran_vcn.bin"
> > +#define FIRMWARE_BEIGE_GOBY"amdgpu/beige_goby_vcn.bin"
> > +#define FIRMWARE_YELLOW_CARP   "amdgpu/yellow_carp_vcn.bin"
> > +#define FIRMWARE_VCN_3_1_2 "amdgpu/vcn_3_1_2.bin"
> > +#define FIRMWARE_VCN4_0_0  "amdgpu/vcn_4_0_0.bin"
> > +#define FIRMWARE_VCN4_0_2  "amdgpu/vcn_4_0_2.bin"
> >  #define FIRMWARE_VCN4_0_4  "amdgpu/vcn_4_0_4.bin"
> 
> Is this just a whitespace change?

Ah yeah; it was from various rebases mo

RE: [PATCH v4 27/27] drm/amd: Optimize SRIOV switch/case for PSP microcode load

2023-01-04 Thread Limonciello, Mario

[Public]



> -Original Message-
> From: Christian König 
> Sent: Wednesday, January 4, 2023 07:18
> To: Limonciello, Mario ; Deucher, Alexander
> ; linux-ker...@vger.kernel.org
> Cc: Pan, Xinhui ; Lazar, Lijo ;
> Javier Martinez Canillas ; dri-
> de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org; Daniel Vetter
> ; Carlos Soriano Sanchez ; David
> Airlie ; Koenig, Christian 
> Subject: Re: [PATCH v4 27/27] drm/amd: Optimize SRIOV switch/case for PSP
> microcode load
> 
> Am 03.01.23 um 23:18 schrieb Mario Limonciello:
> > Now that IP version decoding is used, a number of case statements
> > can be combined.
> >
> > Signed-off-by: Mario Limonciello 
> 
> This patch can probably be pushed as small cleanup independent of the
> previous patches.
> 
> In general I usually suggest to push those separately to make the patch
> set concentrate on the real changes at hand.
> 
> Anyway this patch here is Reviewed-by: Christian König
> 
> 

Thanks!
This optimization is only possible because of earlier changes in the series.
Will add your tag for v5.

> Regards,
> Christian.
> 
> > ---
> > v3->v4:
> >   * New patch
> >
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 +---
> >   1 file changed, 1 insertion(+), 7 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index f45362dd8228..83e253b5d928 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -132,14 +132,8 @@ static int psp_init_sriov_microcode(struct
> psp_context *psp)
> >
> > switch (adev->ip_versions[MP0_HWIP][0]) {
> > case IP_VERSION(9, 0, 0):
> > -   adev->virt.autoload_ucode_id =
> AMDGPU_UCODE_ID_CP_MEC2;
> > -   ret = psp_init_cap_microcode(psp, ucode_prefix);
> > -   break;
> > -   case IP_VERSION(11, 0, 9):
> > -   adev->virt.autoload_ucode_id =
> AMDGPU_UCODE_ID_CP_MEC2;
> > -   ret = psp_init_cap_microcode(psp, ucode_prefix);
> > -   break;
> > case IP_VERSION(11, 0, 7):
> > +   case IP_VERSION(11, 0, 9):
> > adev->virt.autoload_ucode_id =
> AMDGPU_UCODE_ID_CP_MEC2;
> > ret = psp_init_cap_microcode(psp, ucode_prefix);
> > break;

Re: drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0

2022-12-19 Thread Limonciello, Mario


On 12/19/2022 06:12, Tim Huang wrote:

MES is part of gfxoff for S0i3 and does not require self-test after S0i3.
Besides, self-test will free the BO that triggers a wraning while in
the suspend state.

[   81.656085] WARNING: CPU: 2 PID: 1550 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 
[amdgpu]
[   81.679435] Call Trace:
[   81.679726]  
[   81.679981]  amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[   81.680857]  amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[   81.681665]  mes_v11_0_late_init+0x37/0x50 [amdgpu]
[   81.682423]  amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[   81.683257]  amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[   81.684043]  amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[   81.684818]  pci_pm_resume+0x5c/0xa0
[   81.685247]  ? pci_pm_thaw+0x90/0x90
[   81.685658]  dpm_run_callback+0x4e/0x160
[   81.686110]  device_resume+0xad/0x210
[   81.686529]  async_resume+0x1e/0x40
[   81.686931]  async_run_entry_fn+0x33/0x120
[   81.687405]  process_one_work+0x21d/0x3f0
[   81.687869]  worker_thread+0x4a/0x3c0
[   81.688293]  ? process_one_work+0x3f0/0x3f0
[   81.688777]  kthread+0xff/0x130
[   81.689157]  ? kthread_complete_and_exit+0x20/0x20
[   81.689707]  ret_from_fork+0x22/0x30
[   81.690118]  
[   81.690380] ---[ end trace  ]---


Is this still needed with https://patchwork.freedesktop.org/patch/515278/ ?



Signed-off-by: Tim Huang 
---
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 5459366f49ff..80e8cf826e71 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1342,7 +1342,7 @@ static int mes_v11_0_late_init(void *handle)
  {
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
  
-	if (!amdgpu_in_reset(adev) &&

+   if (!amdgpu_in_reset(adev) && !adev->in_suspend &&


I think in this case you should be using adev->in_s0ix instead.


(adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3)))
amdgpu_mes_self_test(adev);

1 2 3 >

1 - 100 of 205 matches

Mail list logo