Hi Mario

Nick Hastings reported in Debian in https://bugs.debian.org/1036530
lockups from his system after updating from a 6.0 based version to
6.1.y.

#regzbot ^introduced 24867516f06d

he bisected the issue and tracked it down to:

On Sun, May 28, 2023 at 10:14:51AM +0900, Nick Hastings wrote:
> Control: tags -1 - moreinfo
> 
> Hi,
> 
> I repeated the git bisect, and the bad commit seems to be:
> 
> (git)-[v6.1-rc1~206^2~4^5~3|bisect] % git bisect bad
> 24867516f06dabedef3be7eea0ef0846b91538bc is the first bad commit
> commit 24867516f06dabedef3be7eea0ef0846b91538bc
> Author: Mario Limonciello <mario.limoncie...@amd.com>
> Date:   Tue Aug 23 13:51:31 2022 -0500
> 
>     ACPI: OSI: Remove Linux-Dell-Video _OSI string
>     
>     This string was introduced because drivers for NVIDIA hardware
>     had bugs supporting RTD3 in the past.
>     
>     Before proprietary NVIDIA driver started to support RTD3, Ubuntu had
>     had a mechanism for switching PRIME on and off, though it had required
>     to logout/login to make the library switch happen.
>     
>     When the PRIME had been off, the mechanism had unloaded the NVIDIA
>     driver and put the device into D3cold, but the GPU had never come back
>     to D0 again which is why ODMs used the _OSI to expose an old _DSM
>     method to switch the power on/off.
>     
>     That has been fixed by commit 5775b843a619 ("PCI: Restore config space
>     on runtime resume despite being unbound"). so vendors shouldn't be
>     using this string to modify ASL any more.
>     
>     Reviewed-by: Lyude Paul <ly...@redhat.com>
>     Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com>
>     Signed-off-by: Rafael J. Wysocki <rafael.j.wyso...@intel.com>
> 
>  drivers/acpi/osi.c | 9 ---------
>  1 file changed, 9 deletions(-)
> 
> This machine is a Dell with an nvidia chip so it looks like this really
> could be the commit that that is causing the problems. The description
> of the commit also seems (to my untrained eye) to be consistent with the
> error reported on the console when the lockup occurs:
> 
> [   58.729863] ACPI Error: Aborting method \_SB.PCI0.PGON due to previous 
> error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
> [   58.729904] ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to 
> previous error (AE_AML_LOOP_TIMEOUT) (20220331/psparse-529)
> [   60.083261] vfio-pci 0000:01:00.0 Unable to change power state from D3cold 
> to D0, device inaccessible
> 
> Hopefully this is enough information for experts to resolve this.

Does this ring some bell for you? Do you need any further information
from Nick?

Regards,
Salvatore

Reply via email to