Fw: Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-15 Thread Roger Ramjet
I've attached a pic of Update Manager showing Kernels installed.


Sent with Proton Mail secure email.

--- Forwarded Message ---
From: Roger Ramjet 
Date: On Monday, July 15th, 2024 at 8:42 AM
Subject: Fw: Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference 
when IOMMU enabled, leading to black screen
To: Bug 2068738 <2068...@bugs.launchpad.net>


> I view the Kernels in the update manager, it shows 5.15.0-116 is "installed" 
> and "supported until April 2027"
> 
> The Kernel is loaded and installed but "not found"
> Then another window opens and I must choose "Boot from next volume"
> Then I'm given the choice of booting from 5.15.0-107-generic (on /dev/sda5)
> 
> If I can give you more info. let me know.
> 
> Thanks, Ralph Goe
> 
> 
> 
> 
> Sent with Proton Mail secure email.
> 
> 
> --- Forwarded Message ---
> From: Roger Ramjet eo...@proton.me
> 
> Date: On Monday, July 15th, 2024 at 8:33 AM
> Subject: Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference 
> when IOMMU enabled, leading to black screen
> To: Bug 2068738 2068...@bugs.launchpad.net
> 
> 
> 
> > Unfortunately, I still have the same problem, after updating, I power down 
> > and restart, I get:
> > error: file `/boot/' not found.
> > 
> > Not sure what to do now.
> > 
> > Thanks, Ralph Goe
> > 
> > Sent with Proton Mail secure email.
> > 
> > On Monday, July 15th, 2024 at 4:22 AM, Launchpad Bug Tracker 
> > 2068...@bugs.launchpad.net wrote:
> > 
> > > This bug was fixed in the package linux - 5.15.0-116.126
> > > 
> > > ---
> > > linux (5.15.0-116.126) jammy; urgency=medium
> > > 
> > > * jammy/linux: 5.15.0-116.126 -proposed tracker (LP: #2071603)
> > > 
> > > * idxd: NULL pointer dereference reading wq op_config attribute (LP: 
> > > #2069081)
> > > - SAUCE: dmaengine: idxd: set is_visible member of idxd_wq_attribute_group
> > > 
> > > * AMD GPUs fail with null pointer dereference when IOMMU enabled, leading 
> > > to
> > > black screen (LP: #2068738)
> > > - SAUCE: Revert "drm/amdgpu: init iommu after amdkfd device init"
> > > 
> > > linux (5.15.0-115.125) jammy; urgency=medium
> > > 
> > > * jammy/linux: 5.15.0-115.125 -proposed tracker (LP: #2068396)
> > > 
> > > * Packaging resync (LP: #1786013)
> > > - [Packaging] debian.master/dkms-versions -- update from kernel-versions
> > > (main/2024.06.10)
> > > 
> > > * Jammy update: v5.15.158 upstream stable release (LP: #2067974)
> > > - smb: client: fix rename(2) regression against samba
> > > - cifs: reinstate original behavior again for forceuid/forcegid
> > > - HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
> > > 
> > > - HID: logitech-dj: allow mice to use all types of reports
> > > - arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 
> > > Puma
> > > - arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
> > > - arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 
> > > Puma
> > > - arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
> > > - arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
> > > - arm64: dts: mediatek: mt7622: add support for coherent DMA
> > > - arm64: dts: mediatek: mt7622: introduce nodes for Wireless Ethernet 
> > > Dispatch
> > > - arm64: dts: mediatek: mt7622: fix clock controllers
> > > - arm64: dts: mediatek: mt7622: fix IR nodename
> > > - arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
> > > - arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
> > > - arm64: dts: mediatek: mt2712: fix validation errors
> > > - ARC: [plat-hsdk]: Remove misplaced interrupt-cells property
> > > - wifi: iwlwifi: mvm: remove old PASN station when adding a new one
> > > - wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd
> > > - vxlan: drop packets from invalid src-address
> > > - mlxsw: core: Unregister EMAD trap using FORWARD action
> > > - icmp: prevent possible NULL dereferences from icmp_build_probe()
> > > - bridge/br_netlink.c: no need to return void function
> > > - NFC: trf7970a: disable all regulators on removal
> > > - ipv4: check for NULL idev in ip_route_use_hint()
> > > - net: usb: ax88179_178a: stop lying about skb->truesi

Fw: Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-15 Thread Roger Ramjet
I view the Kernels in the update manager, it shows 5.15.0-116 is
"installed" and "supported until April 2027"

The Kernel is loaded and installed but "not found"
Then another window opens and I must choose "Boot from next volume"
Then I'm given the choice of booting from 5.15.0-107-generic (on /dev/sda5)

If I can give you more info. let me know.

Thanks, Ralph Goe



Sent with Proton Mail secure email.

--- Forwarded Message ---
From: Roger Ramjet 
Date: On Monday, July 15th, 2024 at 8:33 AM
Subject: Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when 
IOMMU enabled, leading to black screen
To: Bug 2068738 <2068...@bugs.launchpad.net>


> Unfortunately, I still have the same problem, after updating, I power down 
> and restart, I get:
> error: file `/boot/' not found.
> 
> 
> Not sure what to do now.
> 
> Thanks, Ralph Goe
> 
> 
> 
> 
> Sent with Proton Mail secure email.
> 
> 
> On Monday, July 15th, 2024 at 4:22 AM, Launchpad Bug Tracker 
> 2068...@bugs.launchpad.net wrote:
> 
> > This bug was fixed in the package linux - 5.15.0-116.126
> > 
> > ---
> > linux (5.15.0-116.126) jammy; urgency=medium
> > 
> > * jammy/linux: 5.15.0-116.126 -proposed tracker (LP: #2071603)
> > 
> > * idxd: NULL pointer dereference reading wq op_config attribute (LP: 
> > #2069081)
> > - SAUCE: dmaengine: idxd: set is_visible member of idxd_wq_attribute_group
> > 
> > * AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to
> > black screen (LP: #2068738)
> > - SAUCE: Revert "drm/amdgpu: init iommu after amdkfd device init"
> > 
> > linux (5.15.0-115.125) jammy; urgency=medium
> > 
> > * jammy/linux: 5.15.0-115.125 -proposed tracker (LP: #2068396)
> > 
> > * Packaging resync (LP: #1786013)
> > - [Packaging] debian.master/dkms-versions -- update from kernel-versions
> > (main/2024.06.10)
> > 
> > * Jammy update: v5.15.158 upstream stable release (LP: #2067974)
> > - smb: client: fix rename(2) regression against samba
> > - cifs: reinstate original behavior again for forceuid/forcegid
> > - HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
> > 
> > - HID: logitech-dj: allow mice to use all types of reports
> > - arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
> > - arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
> > - arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 
> > Puma
> > - arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
> > - arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
> > - arm64: dts: mediatek: mt7622: add support for coherent DMA
> > - arm64: dts: mediatek: mt7622: introduce nodes for Wireless Ethernet 
> > Dispatch
> > - arm64: dts: mediatek: mt7622: fix clock controllers
> > - arm64: dts: mediatek: mt7622: fix IR nodename
> > - arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
> > - arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
> > - arm64: dts: mediatek: mt2712: fix validation errors
> > - ARC: [plat-hsdk]: Remove misplaced interrupt-cells property
> > - wifi: iwlwifi: mvm: remove old PASN station when adding a new one
> > - wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd
> > - vxlan: drop packets from invalid src-address
> > - mlxsw: core: Unregister EMAD trap using FORWARD action
> > - icmp: prevent possible NULL dereferences from icmp_build_probe()
> > - bridge/br_netlink.c: no need to return void function
> > - NFC: trf7970a: disable all regulators on removal
> > - ipv4: check for NULL idev in ip_route_use_hint()
> > - net: usb: ax88179_178a: stop lying about skb->truesize
> > 
> > - net: gtp: Fix Use-After-Free in gtp_dellink
> > - ipvs: Fix checksumming on GSO of SCTP packets
> > - net: openvswitch: Fix Use-After-Free in ovs_ct_exit
> > - mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
> > - mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity 
> > update
> > - mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
> > - mlxsw: spectrum_acl_tcam: Rate limit error message
> > - mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
> > - mlxsw: spectrum_acl_tcam: Fix warning during rehash
> > - mlxsw: spectrum_acl_tcam: Fix incorrect list API usage
> > - mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work
> > - netfilter: nf_tables: honor table dormant flag from netdev release event

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-15 Thread Roger Ramjet
Unfortunately, I still have the same problem, after updating, I power down and 
restart, I get:
error: file `/boot/' not found.


Not sure what to do now.

Thanks, Ralph Goe



Sent with Proton Mail secure email.

On Monday, July 15th, 2024 at 4:22 AM, Launchpad Bug Tracker
<2068...@bugs.launchpad.net> wrote:

> This bug was fixed in the package linux - 5.15.0-116.126
> 
> ---
> linux (5.15.0-116.126) jammy; urgency=medium
> 
> * jammy/linux: 5.15.0-116.126 -proposed tracker (LP: #2071603)
> 
> * idxd: NULL pointer dereference reading wq op_config attribute (LP: #2069081)
> - SAUCE: dmaengine: idxd: set is_visible member of idxd_wq_attribute_group
> 
> * AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to
> black screen (LP: #2068738)
> - SAUCE: Revert "drm/amdgpu: init iommu after amdkfd device init"
> 
> linux (5.15.0-115.125) jammy; urgency=medium
> 
> * jammy/linux: 5.15.0-115.125 -proposed tracker (LP: #2068396)
> 
> * Packaging resync (LP: #1786013)
> - [Packaging] debian.master/dkms-versions -- update from kernel-versions
> (main/2024.06.10)
> 
> * Jammy update: v5.15.158 upstream stable release (LP: #2067974)
> - smb: client: fix rename(2) regression against samba
> - cifs: reinstate original behavior again for forceuid/forcegid
> - HID: intel-ish-hid: ipc: Fix dev_err usage with uninitialized dev->devc
> 
> - HID: logitech-dj: allow mice to use all types of reports
> - arm64: dts: rockchip: enable internal pull-up on Q7_USB_ID for RK3399 Puma
> - arm64: dts: rockchip: fix alphabetical ordering RK3399 puma
> - arm64: dts: rockchip: enable internal pull-up on PCIE_WAKE# for RK3399 Puma
> - arm64: dts: rockchip: Remove unsupported node from the Pinebook Pro dts
> - arm64: dts: mediatek: mt8183: Add power-domains properity to mfgcfg
> - arm64: dts: mediatek: mt7622: add support for coherent DMA
> - arm64: dts: mediatek: mt7622: introduce nodes for Wireless Ethernet Dispatch
> - arm64: dts: mediatek: mt7622: fix clock controllers
> - arm64: dts: mediatek: mt7622: fix IR nodename
> - arm64: dts: mediatek: mt7622: fix ethernet controller "compatible"
> - arm64: dts: mediatek: mt7622: drop "reset-names" from thermal block
> - arm64: dts: mediatek: mt2712: fix validation errors
> - ARC: [plat-hsdk]: Remove misplaced interrupt-cells property
> - wifi: iwlwifi: mvm: remove old PASN station when adding a new one
> - wifi: iwlwifi: mvm: return uid from iwl_mvm_build_scan_cmd
> - vxlan: drop packets from invalid src-address
> - mlxsw: core: Unregister EMAD trap using FORWARD action
> - icmp: prevent possible NULL dereferences from icmp_build_probe()
> - bridge/br_netlink.c: no need to return void function
> - NFC: trf7970a: disable all regulators on removal
> - ipv4: check for NULL idev in ip_route_use_hint()
> - net: usb: ax88179_178a: stop lying about skb->truesize
> 
> - net: gtp: Fix Use-After-Free in gtp_dellink
> - ipvs: Fix checksumming on GSO of SCTP packets
> - net: openvswitch: Fix Use-After-Free in ovs_ct_exit
> - mlxsw: spectrum_acl_tcam: Fix race during rehash delayed work
> - mlxsw: spectrum_acl_tcam: Fix possible use-after-free during activity update
> - mlxsw: spectrum_acl_tcam: Fix possible use-after-free during rehash
> - mlxsw: spectrum_acl_tcam: Rate limit error message
> - mlxsw: spectrum_acl_tcam: Fix memory leak during rehash
> - mlxsw: spectrum_acl_tcam: Fix warning during rehash
> - mlxsw: spectrum_acl_tcam: Fix incorrect list API usage
> - mlxsw: spectrum_acl_tcam: Fix memory leak when canceling rehash work
> - netfilter: nf_tables: honor table dormant flag from netdev release event
> path
> - i40e: Do not use WQ_MEM_RECLAIM flag for workqueue
> - i40e: Report MFS in decimal base instead of hex
> - iavf: Fix TC config comparison with existing adapter TC config
> - net: ethernet: ti: am65-cpts: Fix PTPv1 message type on TX packets
> - af_unix: Suppress false-positive lockdep splat for spin_lock() in
> __unix_gc().
> - serial: core: Provide port lock wrappers
> - serial: mxs-auart: add spinlock around changing cts state
> - drm-print: add drm_dbg_driver to improve namespace symmetry
> - drm/vmwgfx: Fix crtc's atomic check conditional
> - Revert "crypto: api - Disallow identical driver names"
> - net/mlx5e: Fix a race in command alloc flow
> - tracing: Show size of requested perf buffer
> - tracing: Increase PERF_MAX_TRACE_SIZE to handle Sentinel1 and docker
> together
> - x86/cpu: Fix check for RDPKRU in __show_regs()
> - Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()
> - Bluetooth: btusb: Add Realtek RTL8852BE support ID 0x0bda:0x4853
> - Bluetooth: qca: fix NULL-deref on non-serdev suspend
> - mmc: sdhci-msm: pervent access to suspended controller
> - btrfs: fix information leak in btrfs_ioctl_logical_to_ino()
> - cpu: Re-enable CPU mitigations by default for !X86 architectures
> - [Configs] Update CPU mitigation configs
> - arm64: dts: rockchip: enable internal pull-up for Q7_THRM# on RK3399 Puma
> - drm/amdgpu/sdma5.2: use legacy 

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-12 Thread Pete Orlando
Still waiting for update, July 12 2024.


On Wed, Jul 10, 2024 at 4:55 AM Michael Leetz <2068...@bugs.launchpad.net>
wrote:

> Thank you for your effort.
>
> When the update will be available officially?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Committed
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu :06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>
>? srso_return_thunk+0x5/0x10
>? show_trace_log_lvl+0x28e/0x2ea
>? show_trace_log_lvl+0x28e/0x2ea
>? dm_hw_fini+0x23/0x30 [amdgpu]
>? show_regs.part.0+0x23/0x29
>? __die_body.cold+0x8/0xd
>? __die+0x2b/0x37
>? page_fault_oops+0x13b/0x170
>? srso_return_thunk+0x5/0x10
>? do_user_addr_fault+0x321/0x670
>? srso_return_thunk+0x5/0x10
>? __free_pages_ok+0x34a/0x4f0
>? exc_page_fault+0x77/0x170
>? asm_exc_page_fault+0x27/0x30
>? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>dm_hw_fini+0x23/0x30 [amdgpu]
>amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>local_pci_probe+0x4b/0x90
>? srso_return_thunk+0x5/0x10
>pci_device_probe+0x119/0x200
>really_probe+0x222/0x420
>__driver_probe_device+0xe8/0x140
>driver_probe_device+0x23/0xc0
>__driver_attach+0xf7/0x1f0
>? __device_attach_driver+0x140/0x140
>bus_for_each_dev+0x7f/0xd0
>driver_attach+0x1e/0x30
>bus_add_driver+0x148/0x220
>? srso_return_thunk+0x5/0x10
>driver_register+0x95/0x100
>__pci_register_driver+0x68/0x70
>amdgpu_init+0x7c/0x1000 [amdgpu]
>? 0xc0e0b000
>do_one_initcall+0x49/0x1e0
>? srso_return_thunk+0x5/0x10
>? kmem_cache_alloc_trace+0x19e/0x2e0
>do_init_module+0x52/0x260
>load_module+0xb45/0xbe0
>__do_sys_finit_module+0xbf/0x120
>__x64_sys_finit_module+0x18/0x20
>x64_sys_call+0x1ac3/0x1fa0
>do_syscall_64+0x56/0xb0
>   ...
>entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang 
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:
>
>   https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
>
>   If you install the test kernel, your system should boot successfully.
>
>   [Where problems could occur]
>
>   We are reverting a problematic patch and going back to how it was
>   before 5.15.0-112-generic. This should not cause any issues for users.
>
>  

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-04 Thread Pete Orlando
problem still exists after running your instructions. I am still only able
to boot live by booting into the 107.

On Thu, Jul 4, 2024 at 3:20 PM Matthew Ruffell <2068...@bugs.launchpad.net>
wrote:

> Thanks for testing Matt, H.A.
>
> I marked the bug as verified. We should be all good for a release to
> -updates
> early next week. I'll write a new message as soon as the kernel has been
> released for everyone.
>
> Thanks,
> Matthew
>
> ** Tags removed: verification-needed-jammy-linux
> ** Tags added: verification-done-jammy-linux
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Committed
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu :06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>
>? srso_return_thunk+0x5/0x10
>? show_trace_log_lvl+0x28e/0x2ea
>? show_trace_log_lvl+0x28e/0x2ea
>? dm_hw_fini+0x23/0x30 [amdgpu]
>? show_regs.part.0+0x23/0x29
>? __die_body.cold+0x8/0xd
>? __die+0x2b/0x37
>? page_fault_oops+0x13b/0x170
>? srso_return_thunk+0x5/0x10
>? do_user_addr_fault+0x321/0x670
>? srso_return_thunk+0x5/0x10
>? __free_pages_ok+0x34a/0x4f0
>? exc_page_fault+0x77/0x170
>? asm_exc_page_fault+0x27/0x30
>? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>dm_hw_fini+0x23/0x30 [amdgpu]
>amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>local_pci_probe+0x4b/0x90
>? srso_return_thunk+0x5/0x10
>pci_device_probe+0x119/0x200
>really_probe+0x222/0x420
>__driver_probe_device+0xe8/0x140
>driver_probe_device+0x23/0xc0
>__driver_attach+0xf7/0x1f0
>? __device_attach_driver+0x140/0x140
>bus_for_each_dev+0x7f/0xd0
>driver_attach+0x1e/0x30
>bus_add_driver+0x148/0x220
>? srso_return_thunk+0x5/0x10
>driver_register+0x95/0x100
>__pci_register_driver+0x68/0x70
>amdgpu_init+0x7c/0x1000 [amdgpu]
>? 0xc0e0b000
>do_one_initcall+0x49/0x1e0
>? srso_return_thunk+0x5/0x10
>? kmem_cache_alloc_trace+0x19e/0x2e0
>do_init_module+0x52/0x260
>load_module+0xb45/0xbe0
>__do_sys_finit_module+0xbf/0x120
>__x64_sys_finit_module+0x18/0x20
>x64_sys_call+0x1ac3/0x1fa0
>do_syscall_64+0x56/0xb0
>   ...
>entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang 
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-04 Thread Blake Johal
-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2068738

Title:
  AMD GPUs fail with null pointer dereference when IOMMU enabled,
  leading to black screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-04 Thread H A Killenbeck
followed your instructions, and the kernel update seems to work fine on 
my system

On 7/3/24 11:50 PM, Matthew Ruffell wrote:
> Hi everyone,
>
> The Kernel Team have respun the latest 5.15 kernel with the fix, and have 
> placed
> it into -proposed for verification.
>
> Could someone more technically minded help test it and let me know if it fixes
> the problem?
>
> Instructions to Install (On a jammy system):
> 1) cat << EOF | sudo tee /etc/apt/sources.list.d/ubuntu-$(lsb_release 
> -cs)-proposed.list
> # Enable Ubuntu proposed archive
> deb http://archive.ubuntu.com/ubuntu/ $(lsb_release -cs)-proposed main 
> universe
> EOF
> 2) sudo apt update
> 3) sudo apt install linux-image-5.15.0-116-generic 
> linux-modules-5.15.0-116-generic linux-modules-extra-5.15.0-116-generic 
> linux-headers-5.15.0-116-generic
> 4) sudo rm /etc/apt/sources.list.d/ubuntu-$(lsb_release -cs)-proposed.list
> 5) sudo apt update
> 6) sudo reboot
> 7) uname -rv
> 5.15.0-116-generic #126-Ubuntu SMP Mon Jul 1 10:14:24 UTC 2024
>
> Can you let me know if you can boot to your desktop and have a working
> screen?
>
> We are still on track for a release to -updates next week sometime.
>
> Thanks,
> Matthew
>

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2068738

Title:
  AMD GPUs fail with null pointer dereference when IOMMU enabled,
  leading to black screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-03 Thread Blake Johal
-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2068738

Title:
  AMD GPUs fail with null pointer dereference when IOMMU enabled,
  leading to black screen

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738/+subscriptions


-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-07-01 Thread Pete Orlando
Still no fix as of July 1, 2024. Have to boot manually with the 107 to get
my Acer laptop up and running. Hope someone can fix this problem soon.
Thanks for all the hard work on this issue.

On Mon, Jul 1, 2024 at 8:10 AM Daniel <2068...@bugs.launchpad.net>
wrote:

> Did I understand correctly that this bug will be fixed after kernel
> 5.15.0-113?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Committed
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu :06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>
>? srso_return_thunk+0x5/0x10
>? show_trace_log_lvl+0x28e/0x2ea
>? show_trace_log_lvl+0x28e/0x2ea
>? dm_hw_fini+0x23/0x30 [amdgpu]
>? show_regs.part.0+0x23/0x29
>? __die_body.cold+0x8/0xd
>? __die+0x2b/0x37
>? page_fault_oops+0x13b/0x170
>? srso_return_thunk+0x5/0x10
>? do_user_addr_fault+0x321/0x670
>? srso_return_thunk+0x5/0x10
>? __free_pages_ok+0x34a/0x4f0
>? exc_page_fault+0x77/0x170
>? asm_exc_page_fault+0x27/0x30
>? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>dm_hw_fini+0x23/0x30 [amdgpu]
>amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>local_pci_probe+0x4b/0x90
>? srso_return_thunk+0x5/0x10
>pci_device_probe+0x119/0x200
>really_probe+0x222/0x420
>__driver_probe_device+0xe8/0x140
>driver_probe_device+0x23/0xc0
>__driver_attach+0xf7/0x1f0
>? __device_attach_driver+0x140/0x140
>bus_for_each_dev+0x7f/0xd0
>driver_attach+0x1e/0x30
>bus_add_driver+0x148/0x220
>? srso_return_thunk+0x5/0x10
>driver_register+0x95/0x100
>__pci_register_driver+0x68/0x70
>amdgpu_init+0x7c/0x1000 [amdgpu]
>? 0xc0e0b000
>do_one_initcall+0x49/0x1e0
>? srso_return_thunk+0x5/0x10
>? kmem_cache_alloc_trace+0x19e/0x2e0
>do_init_module+0x52/0x260
>load_module+0xb45/0xbe0
>__do_sys_finit_module+0xbf/0x120
>__x64_sys_finit_module+0x18/0x20
>x64_sys_call+0x1ac3/0x1fa0
>do_syscall_64+0x56/0xb0
>   ...
>entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang 
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:
>
>   https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
>
>   If you install the test kernel, your system should boot successfully.
>
>   [Where problems could occur]
>
>   

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-06-26 Thread Roger Ramjet
I just loaded Linux Mint Kernel 5.15.0-113 and it is not recognized at
all, I'm told to load a Kernel first, press any key and I have to go
back to the index of Linux Kernels and choose 5.15.0-107 to power up.

Thanks, Ralph



Sent with Proton Mail secure email.

On Tuesday, June 18th, 2024 at 7:12 AM, Roger Ramjet 
wrote:

> Thank you Matthew.
> 
> 
> 
> 
> Sent with Proton Mail secure email.
> 
> 
> On Tuesday, June 18th, 2024 at 2:41 AM, Matthew Ruffell 
> 2068...@bugs.launchpad.net wrote:
> 
> > Hi everyone,
> > 
> > I spoke with Stefan Bader of the Kernel Team again. They are planning to 
> > include
> > the patch in a respin of the current 2024.06.10 SRU cycle. 
> > https://kernel.ubuntu.com/
> > 
> > I will let you know once the kernel has been built and placed into 
> > -proposed for
> > verification.
> > 
> > But at this stage, as long as the respin occurs, this should be released the
> > week of 8th July, give or take a few days, if anything else pops up.
> > 
> > Thanks,
> > Matthew
> > 
> > --
> > You received this bug notification because you are subscribed to a
> > duplicate bug report (2069485).
> > https://bugs.launchpad.net/bugs/2068738
> > 
> > Title:
> > AMD GPUs fail with null pointer dereference when IOMMU enabled,
> > leading to black screen
> > 
> > Status in linux package in Ubuntu:
> > Fix Released
> > Status in linux source package in Jammy:
> > Fix Committed
> > 
> > Bug description:
> > BugLink: https://bugs.launchpad.net/bugs/2068738
> > 
> > [Impact]
> > 
> > On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> > enabled, the system fails to boot correctly, and all users see is a
> > black screen.
> > 
> > This is caused by a null pointer dereference when enabling the IOMMU
> > after the device has been initialised. It should happen the other way
> > around.
> > 
> > AMD-Vi: AMD IOMMUv2 loaded and initialized
> > ...
> > amdgpu: Topology: Add APU node [0x15d8:0x1002]
> > kfd kfd: amdgpu: added device 1002:15d8
> > kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> > ...
> > amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
> > amdgpu :06:00.0: amdgpu: Fatal error during GPU init
> > amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
> > ...
> > BUG: kernel NULL pointer dereference, address: 013c
> > ...
> > CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic 
> > #122-Ubuntu
> > ...
> > RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > ...
> > Call Trace:
> > 
> > 
> > ? srso_return_thunk+0x5/0x10
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? show_trace_log_lvl+0x28e/0x2ea
> > ? dm_hw_fini+0x23/0x30 [amdgpu]
> > ? show_regs.part.0+0x23/0x29
> > ? __die_body.cold+0x8/0xd
> > ? __die+0x2b/0x37
> > ? page_fault_oops+0x13b/0x170
> > ? srso_return_thunk+0x5/0x10
> > ? do_user_addr_fault+0x321/0x670
> > ? srso_return_thunk+0x5/0x10
> > ? __free_pages_ok+0x34a/0x4f0
> > ? exc_page_fault+0x77/0x170
> > ? asm_exc_page_fault+0x27/0x30
> > ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> > dm_hw_fini+0x23/0x30 [amdgpu]
> > amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> > amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> > amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> > amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> > amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> > local_pci_probe+0x4b/0x90
> > ? srso_return_thunk+0x5/0x10
> > pci_device_probe+0x119/0x200
> > really_probe+0x222/0x420
> > __driver_probe_device+0xe8/0x140
> > driver_probe_device+0x23/0xc0
> > __driver_attach+0xf7/0x1f0
> > ? __device_attach_driver+0x140/0x140
> > bus_for_each_dev+0x7f/0xd0
> > driver_attach+0x1e/0x30
> > bus_add_driver+0x148/0x220
> > ? srso_return_thunk+0x5/0x10
> > driver_register+0x95/0x100
> > __pci_register_driver+0x68/0x70
> > amdgpu_init+0x7c/0x1000 [amdgpu]
> > ? 0xc0e0b000
> > do_one_initcall+0x49/0x1e0
> > ? srso_return_thunk+0x5/0x10
> > ? kmem_cache_alloc_trace+0x19e/0x2e0
> > do_init_module+0x52/0x260
> > load_module+0xb45/0xbe0
> > __do_sys_finit_module+0xbf/0x120
> > __x64_sys_finit_module+0x18/0x20
> > x64_sys_call+0x1ac3/0x1fa0
> > do_syscall_64+0x56/0xb0
> > ...
> > entry_SYSCALL_64_after_hwframe+0x67/0xd1
> > 
> > A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> > to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> > 
> > [Fix]
> > 
> > The regression was caused by the following commit that landed in
> > 5.15.0-112-generic, and 5.15.150 upstream:
> > 
> > commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> > Author: Yifan Zhang yifan1.zh...@amd.com
> > 
> > Date: Tue Sep 28 15:42:35 2021 +0800
> > Subject: drm/amdgpu: init iommu after amdkfd device init
> > Link: 
> > https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> > 
> > The fix is to revert this patch, as it was not suppose to be
> > backported to 5.15 stable.
> > 
> > The mailing list discussion with AMD developers is:
> > 
> > 

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-06-24 Thread Pete Orlando
How do we fix this?

On Tue, Jun 18, 2024 at 9:40 AM Acru <2068...@bugs.launchpad.net> wrote:

> Hey,
> After disabling the kernel update via the update manager, it now got
> installed again, again resulting in a not working system.
> Isn't it possible to just completely remove this one version from the repo?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   Fix Committed
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu :06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>
>? srso_return_thunk+0x5/0x10
>? show_trace_log_lvl+0x28e/0x2ea
>? show_trace_log_lvl+0x28e/0x2ea
>? dm_hw_fini+0x23/0x30 [amdgpu]
>? show_regs.part.0+0x23/0x29
>? __die_body.cold+0x8/0xd
>? __die+0x2b/0x37
>? page_fault_oops+0x13b/0x170
>? srso_return_thunk+0x5/0x10
>? do_user_addr_fault+0x321/0x670
>? srso_return_thunk+0x5/0x10
>? __free_pages_ok+0x34a/0x4f0
>? exc_page_fault+0x77/0x170
>? asm_exc_page_fault+0x27/0x30
>? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>dm_hw_fini+0x23/0x30 [amdgpu]
>amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>local_pci_probe+0x4b/0x90
>? srso_return_thunk+0x5/0x10
>pci_device_probe+0x119/0x200
>really_probe+0x222/0x420
>__driver_probe_device+0xe8/0x140
>driver_probe_device+0x23/0xc0
>__driver_attach+0xf7/0x1f0
>? __device_attach_driver+0x140/0x140
>bus_for_each_dev+0x7f/0xd0
>driver_attach+0x1e/0x30
>bus_add_driver+0x148/0x220
>? srso_return_thunk+0x5/0x10
>driver_register+0x95/0x100
>__pci_register_driver+0x68/0x70
>amdgpu_init+0x7c/0x1000 [amdgpu]
>? 0xc0e0b000
>do_one_initcall+0x49/0x1e0
>? srso_return_thunk+0x5/0x10
>? kmem_cache_alloc_trace+0x19e/0x2e0
>do_init_module+0x52/0x260
>load_module+0xb45/0xbe0
>__do_sys_finit_module+0xbf/0x120
>__x64_sys_finit_module+0x18/0x20
>x64_sys_call+0x1ac3/0x1fa0
>do_syscall_64+0x56/0xb0
>   ...
>entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang 
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15 stable.
>
>   The mailing list discussion with AMD developers is:
>
>   https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
>
>   The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
>   sending as a Ubuntu SAUCE patch. If the upstream status changes, we
>   can NAK and resend.
>
>   [Testcase]
>
>   You need a system with an AMD Picasso/Raven 2 device. It will likely
>   be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
>   2 device is affected.
>
>   Install the kernel and boot. Make sure full modesetting is enabled.
>
>   There is a test kernel available in the ppa below:
>
>   https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
>
>   If you install the test kernel, your system should boot successfully.
>
>   [Where problems could occur]
>
>   We are reverting a problematic patch 

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-06-18 Thread Roger Ramjet
Thank you Matthew.



Sent with Proton Mail secure email.

On Tuesday, June 18th, 2024 at 2:41 AM, Matthew Ruffell
<2068...@bugs.launchpad.net> wrote:

> Hi everyone,
> 
> I spoke with Stefan Bader of the Kernel Team again. They are planning to 
> include
> the patch in a respin of the current 2024.06.10 SRU cycle. 
> https://kernel.ubuntu.com/
> 
> I will let you know once the kernel has been built and placed into -proposed 
> for
> verification.
> 
> But at this stage, as long as the respin occurs, this should be released the
> week of 8th July, give or take a few days, if anything else pops up.
> 
> Thanks,
> Matthew
> 
> --
> You received this bug notification because you are subscribed to a
> duplicate bug report (2069485).
> https://bugs.launchpad.net/bugs/2068738
> 
> Title:
> AMD GPUs fail with null pointer dereference when IOMMU enabled,
> leading to black screen
> 
> Status in linux package in Ubuntu:
> Fix Released
> Status in linux source package in Jammy:
> Fix Committed
> 
> Bug description:
> BugLink: https://bugs.launchpad.net/bugs/2068738
> 
> [Impact]
> 
> On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
> enabled, the system fails to boot correctly, and all users see is a
> black screen.
> 
> This is caused by a null pointer dereference when enabling the IOMMU
> after the device has been initialised. It should happen the other way
> around.
> 
> AMD-Vi: AMD IOMMUv2 loaded and initialized
> ...
> amdgpu: Topology: Add APU node [0x15d8:0x1002]
> kfd kfd: amdgpu: added device 1002:15d8
> kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
> ...
> amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
> amdgpu :06:00.0: amdgpu: Fatal error during GPU init
> amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
> ...
> BUG: kernel NULL pointer dereference, address: 013c
> ...
> CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic #122-Ubuntu
> ...
> RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> ...
> Call Trace:
> 
> 
> ? srso_return_thunk+0x5/0x10
> ? show_trace_log_lvl+0x28e/0x2ea
> ? show_trace_log_lvl+0x28e/0x2ea
> ? dm_hw_fini+0x23/0x30 [amdgpu]
> ? show_regs.part.0+0x23/0x29
> ? __die_body.cold+0x8/0xd
> ? __die+0x2b/0x37
> ? page_fault_oops+0x13b/0x170
> ? srso_return_thunk+0x5/0x10
> ? do_user_addr_fault+0x321/0x670
> ? srso_return_thunk+0x5/0x10
> ? __free_pages_ok+0x34a/0x4f0
> ? exc_page_fault+0x77/0x170
> ? asm_exc_page_fault+0x27/0x30
> ? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
> dm_hw_fini+0x23/0x30 [amdgpu]
> amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
> amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
> amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
> amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
> amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
> local_pci_probe+0x4b/0x90
> ? srso_return_thunk+0x5/0x10
> pci_device_probe+0x119/0x200
> really_probe+0x222/0x420
> __driver_probe_device+0xe8/0x140
> driver_probe_device+0x23/0xc0
> __driver_attach+0xf7/0x1f0
> ? __device_attach_driver+0x140/0x140
> bus_for_each_dev+0x7f/0xd0
> driver_attach+0x1e/0x30
> bus_add_driver+0x148/0x220
> ? srso_return_thunk+0x5/0x10
> driver_register+0x95/0x100
> __pci_register_driver+0x68/0x70
> amdgpu_init+0x7c/0x1000 [amdgpu]
> ? 0xc0e0b000
> do_one_initcall+0x49/0x1e0
> ? srso_return_thunk+0x5/0x10
> ? kmem_cache_alloc_trace+0x19e/0x2e0
> do_init_module+0x52/0x260
> load_module+0xb45/0xbe0
> __do_sys_finit_module+0xbf/0x120
> __x64_sys_finit_module+0x18/0x20
> x64_sys_call+0x1ac3/0x1fa0
> do_syscall_64+0x56/0xb0
> ...
> entry_SYSCALL_64_after_hwframe+0x67/0xd1
> 
> A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
> to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
> 
> [Fix]
> 
> The regression was caused by the following commit that landed in
> 5.15.0-112-generic, and 5.15.150 upstream:
> 
> commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
> Author: Yifan Zhang yifan1.zh...@amd.com
> 
> Date: Tue Sep 28 15:42:35 2021 +0800
> Subject: drm/amdgpu: init iommu after amdkfd device init
> Link: 
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
> 
> The fix is to revert this patch, as it was not suppose to be
> backported to 5.15 stable.
> 
> The mailing list discussion with AMD developers is:
> 
> https://lore.kernel.org/amd-gfx/20240523173031.4212-1-w_ar...@gmx.de/
> 
> The fix hasn't been acknowledged by Greg KH or Sasha Levin yet, so
> sending as a Ubuntu SAUCE patch. If the upstream status changes, we
> can NAK and resend.
> 
> [Testcase]
> 
> You need a system with an AMD Picasso/Raven 2 device. It will likely
> be an APU, and not a discrete graphics card, but any AMD Picasso/Raven
> 2 device is affected.
> 
> Install the kernel and boot. Make sure full modesetting is enabled.
> 
> There is a test kernel available in the ppa below:
> 
> https://launchpad.net/~mruffell/+archive/ubuntu/lp2068738-test
> 
> If you install the 

Re: [Bug 2068738] Re: AMD GPUs fail with null pointer dereference when IOMMU enabled, leading to black screen

2024-06-15 Thread Pete Orlando
Hi Matthew, I have an Acer Aspire 5 and running Linux Mint 21.2 Victoria
base: Ubuntu 22.04 jammy with AMD Ryzen 7 3700U with Radeon Vega.
I get the Black screen on boot after installing updates about a week ago.
will there be an update
that will come through the normal update manager ? I don't have programming
skills like many users
of Linux so that would much appreciated. Thank You !

On Thu, Jun 13, 2024 at 8:25 PM Matthew Ruffell <2068...@bugs.launchpad.net>
wrote:

> Hi everyone,
>
> An update:
>
> Greg KH has picked up the patch and added it to upstream stable now:
>
> https://lore.kernel.org/amd-gfx/2024061223-suitable-handler-b6f2@gregkh/
> https://lore.kernel.org/amd-gfx/2024061239-rehydrate-flyable-343e@gregkh/
>
> I suppose we can drop the UBUNTU: SAUCE tags.
>
> I talked to Stefan Bader on the Kernel Team. His current feeling is that
> they might respin the -generic kernels before the release of the current
> cycle (2024.06.10 as per https://kernel.ubuntu.com/) but they are still
> unsure. They might see what else comes up this cycle before they decide.
>
> I'll follow up with the Kernel Team in a couple days.
>
> Thanks,
> Matthew
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2068738
>
> Title:
>   AMD GPUs fail with null pointer dereference when IOMMU enabled,
>   leading to black screen
>
> Status in linux package in Ubuntu:
>   Fix Released
> Status in linux source package in Jammy:
>   In Progress
>
> Bug description:
>   BugLink: https://bugs.launchpad.net/bugs/2068738
>
>   [Impact]
>
>   On systems with AMD Picasso/Raven 2 GPU devices, when the IOMMU is
>   enabled, the system fails to boot correctly, and all users see is a
>   black screen.
>
>   This is caused by a null pointer dereference when enabling the IOMMU
>   after the device has been initialised. It should happen the other way
>   around.
>
>   AMD-Vi: AMD IOMMUv2 loaded and initialized
>   ...
>   amdgpu: Topology: Add APU node [0x15d8:0x1002]
>   kfd kfd: amdgpu: added device 1002:15d8
>   kfd kfd: amdgpu: Failed to resume IOMMU for device 1002:15d8
>   ...
>   amdgpu :06:00.0: amdgpu: amdgpu_device_ip_init failed
>   amdgpu :06:00.0: amdgpu: Fatal error during GPU init
>   amdgpu :06:00.0: amdgpu: amdgpu: finishing device.
>   ...
>   BUG: kernel NULL pointer dereference, address: 013c
>   ...
>   CPU: 1 PID: 223 Comm: systemd-udevd Not tainted 5.15.0-112-generic
> #122-Ubuntu
>   ...
>   RIP: 0010:amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>   ...
>   Call Trace:
>
>? srso_return_thunk+0x5/0x10
>? show_trace_log_lvl+0x28e/0x2ea
>? show_trace_log_lvl+0x28e/0x2ea
>? dm_hw_fini+0x23/0x30 [amdgpu]
>? show_regs.part.0+0x23/0x29
>? __die_body.cold+0x8/0xd
>? __die+0x2b/0x37
>? page_fault_oops+0x13b/0x170
>? srso_return_thunk+0x5/0x10
>? do_user_addr_fault+0x321/0x670
>? srso_return_thunk+0x5/0x10
>? __free_pages_ok+0x34a/0x4f0
>? exc_page_fault+0x77/0x170
>? asm_exc_page_fault+0x27/0x30
>? amdgpu_dm_fini+0x149/0x1f0 [amdgpu]
>dm_hw_fini+0x23/0x30 [amdgpu]
>amdgpu_device_ip_fini_early.isra.0+0x278/0x312 [amdgpu]
>amdgpu_device_fini_hw+0x156/0x208 [amdgpu]
>amdgpu_driver_unload_kms+0x69/0x90 [amdgpu]
>amdgpu_driver_load_kms.cold+0x81/0x107 [amdgpu]
>amdgpu_pci_probe+0x1d1/0x290 [amdgpu]
>local_pci_probe+0x4b/0x90
>? srso_return_thunk+0x5/0x10
>pci_device_probe+0x119/0x200
>really_probe+0x222/0x420
>__driver_probe_device+0xe8/0x140
>driver_probe_device+0x23/0xc0
>__driver_attach+0xf7/0x1f0
>? __device_attach_driver+0x140/0x140
>bus_for_each_dev+0x7f/0xd0
>driver_attach+0x1e/0x30
>bus_add_driver+0x148/0x220
>? srso_return_thunk+0x5/0x10
>driver_register+0x95/0x100
>__pci_register_driver+0x68/0x70
>amdgpu_init+0x7c/0x1000 [amdgpu]
>? 0xc0e0b000
>do_one_initcall+0x49/0x1e0
>? srso_return_thunk+0x5/0x10
>? kmem_cache_alloc_trace+0x19e/0x2e0
>do_init_module+0x52/0x260
>load_module+0xb45/0xbe0
>__do_sys_finit_module+0xbf/0x120
>__x64_sys_finit_module+0x18/0x20
>x64_sys_call+0x1ac3/0x1fa0
>do_syscall_64+0x56/0xb0
>   ...
>entry_SYSCALL_64_after_hwframe+0x67/0xd1
>
>   A workaround does exist. Users can set "nomodeset" or "amd_iommu=off"
>   to GRUB_CMDLINE_LINUX_DEFAULT, update-grub and reboot.
>
>   [Fix]
>
>   The regression was caused by the following commit that landed in
>   5.15.0-112-generic, and 5.15.150 upstream:
>
>   commit 3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd ubuntu-jammy
>   Author: Yifan Zhang 
>   Date: Tue Sep 28 15:42:35 2021 +0800
>   Subject: drm/amdgpu: init iommu after amdkfd device init
>   Link:
> https://git.launchpad.net/~ubuntu-kernel/ubuntu/+source/linux/+git/jammy/commit/?id=3c7e53c0d4b43ffe6e7715414b5f2b3177881ecd
>
>   The fix is to revert this patch, as it was not suppose to be
>   backported to 5.15