[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
** Description changed: - impact - being noticed a lot, only affects 5.4, fix in subsequent failures + SRU Justification: - The offending patch was removed in 20.10 and later kernels (it was - reverted upstream not long after being merged into mainline but we never - reverted it) + [IMPACT] + This is being reported by a hardware partner as it is being noticed a + lot both in their internal testing teams and also being reported with + some frequency by customers who are seeing these messages in their logs + and thus it is generating an unusualy high volume of support calls from + the field. - following error messages are observed + In 5.4, commit d60cd06331a3566d3305b3c7b566e79edf4e2095 was introduced + upstream and pulled into Ubuntu between 5.4.0-58.64 and 5.4.0-59.65. + Upstream, these errors were discovered and that patch was reverted (see + Fix Below). We carry the revert commit in all subsequent Focal HWE + kernels starting at 5.12, but the fix was never pulled back into Focal + 5.4. + + according to the hardware partner: + + the following error messages are observed when rebooting a machine that + uses the BCM5720 chipset, which is a widely used 1GbE controller found + on LOMs and OCP NICs as well as many PCIe NIC models. [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart - I have observed the following. when I test older kernel + The hardware partner did some bisection and observed the following: Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes - later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. + [FIX] + The fix is to apply this patch from upstream: - looks like due to the following patch we are observing this issue. The - driver is not handling D3 state properly + commit 9d3fcb28f9b9750b474811a2964ce022df56336e + Author: Josef Bacik + Date: Tue Mar 16 22:17:48 2021 -0400 - PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI + Revert "PM: ACPI: reboot: Use S5 for reboot" + + This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095. + + This patch causes a panic when rebooting my Dell Poweredge r440. I do + not have the full panic log as it's lost at that stage of the reboot and + I do not have a serial console. Reverting this patch makes my system + able to reboot again. - https://kernel.ubuntu.com/git/ubuntu/ubuntu- - focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d + Example: + https://code.launchpad.net/~bladernr/ubuntu/+source/linux/+git/focal/+ref/1917471 + + [TEST CASE] + Install the patched kernel on a machine that uses a BCM5720 LOM and reboot the machine and see that the errors no longer appear. ** Summary changed: - [Regression] Bus Fatal Error observed when reboot on BCM5720 + [SRU][Re
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
** Description changed: + impact + being noticed a lot, only affects 5.4, fix in subsequent failures + + The offending patch was removed in 20.10 and later kernels (it was + reverted upstream not long after being merged into mainline but we never + reverted it) + + following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart - - I have observed the following. when I test older kernel - + I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes - later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: impact being noticed a lot, only affects 5.4, fix in subsequent failures The offending patch was removed in 20.10 and later kernels (it was reverted upstream not long after being merged into mainline but we never reverted it) following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Err
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
Can you please try "reboot=" kernel parameter? the value can be "bios, acpi, kbd, triple, efi, or pci". -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
@Kai-Heng Feng In reply to comment #14 Issue is still observed even after blacklisting tg3 driver by giving "blacklist=tg3 modprobe.blacklist=tg3" in kernel parameter. Please find the serial logs of the efforts. ** Attachment added: "fatal_blacklist" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+attachment/5571631/+files/fatal_blacklist -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
Sujith, does the issue happen when "tg3" is blacklisted? Something like "blacklist=tg3 modprobe.blacklist=tg3" in kernel parameter. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
Hi Jeff, I think we should now go ahead with the SRU request for the fix. rmmod tg3 does not help workaround the issue. Fix details: Revert "PM: ACPI: reboot: Use S5 for reboot" https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel?h=v5.17-rc7&id=9d3fcb28f9b9750b474811a2964ce022df56336e -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
** Attachment added: "Fatal_issue_rmmod_tg3.txt" https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+attachment/5570368/+files/Fatal_issue_rmmod_tg3.txt -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
@sujith In reply to comment #8. Issue is still observed even after removing tg3 driver by using "rmmod tg3". Please find the serial logs of the efforts. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
The patch you're requesting reverts this patch: commit d60cd06331a3566d3305b3c7b566e79edf4e2095 Author: Kai-Heng Feng Date: Fri Oct 30 15:06:57 2020 +0800 PM: ACPI: reboot: Use S5 for reboot After reboot, it's not possible to use hotkeys to enter BIOS setup and boot menu on some HP laptops. BIOS folks identified the root cause is the missing _PTS call, and BIOS is expecting _PTS to do proper reset. Using S5 for reboot is default behavior under Windows, "A full shutdown (S5) occurs when a system restart is requested" [1], so let's do the same here. [1] https://docs.microsoft.com/en-us/windows/win32/power/system-power-states Signed-off-by: Kai-Heng Feng [ rjw: Subject edit ] Signed-off-by: Rafael J. Wysocki It looks like this was applied to 5.4 and the patch that reverts this was pulled into 5.13 and later, so that is why 5.4 is the only affected version. I was confused as to why this wasn't also appearing in Impish and Jammy. Now we know. So I can close those tasks. ** Changed in: linux (Ubuntu Impish) Status: In Progress => Fix Released ** Changed in: linux (Ubuntu Jammy) Status: In Progress => Fix Released -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: Fix Released Status in linux source package in Focal: In Progress Status in linux source package in Impish: Fix Released Status in linux source package in Jammy: Fix Released Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
Status changed to 'Confirmed' because the bug affects multiple users. ** Changed in: linux (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Impish: In Progress Status in linux source package in Jammy: In Progress Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
** Project changed: linux => linux (Ubuntu) ** Also affects: linux (Ubuntu Jammy) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Focal) Importance: Undecided Status: New ** Also affects: linux (Ubuntu Impish) Importance: Undecided Status: New ** Changed in: linux (Ubuntu Focal) Status: New => In Progress ** Changed in: linux (Ubuntu Focal) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Impish) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Jammy) Importance: Undecided => Medium ** Changed in: linux (Ubuntu Focal) Assignee: (unassigned) => Jeff Lane (bladernr) ** Changed in: linux (Ubuntu Impish) Assignee: (unassigned) => Jeff Lane (bladernr) ** Changed in: linux (Ubuntu Jammy) Assignee: (unassigned) => Jeff Lane (bladernr) ** Changed in: linux (Ubuntu Impish) Status: New => In Progress ** Changed in: linux (Ubuntu Jammy) Status: New => In Progress -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Impish: In Progress Status in linux source package in Jammy: In Progress Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp
[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720
** Also affects: linux (Ubuntu) Importance: Undecided Status: New ** No longer affects: linux (Ubuntu) -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1917471 Title: [Regression] Bus Fatal Error observed when reboot on BCM5720 Status in linux package in Ubuntu: In Progress Status in linux source package in Focal: In Progress Status in linux source package in Impish: In Progress Status in linux source package in Jammy: In Progress Bug description: following error messages are observed [ 146.429212] shutdown[1]: Rebooting. [ 146.435151] kvm: exiting hardware virtualization [ 146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x4009 [ 148.088133] [qede_unload:2236(eno12409)]Link is down [ 148.183618] qede :31:00.1: Ending qede_remove successfully [ 148.518541] [qede_unload:2236(eno12399)]Link is down [ 148.625066] qede :31:00.0: Ending qede_remove successfully [ 148.762067] ACPI: Preparing to enter system sleep state S5 [ 148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 148.803731] {1}[Hardware Error]: event severity: recoverable [ 148.810191] {1}[Hardware Error]: Error 0, type: fatal [ 148.816088] {1}[Hardware Error]: section_type: PCIe error [ 148.822391] {1}[Hardware Error]: port_type: 0, PCIe end point [ 148.829026] {1}[Hardware Error]: version: 3.0 [ 148.834266] {1}[Hardware Error]: command: 0x0006, status: 0x0010 [ 148.841140] {1}[Hardware Error]: device_id: :04:00.0 [ 148.847309] {1}[Hardware Error]: slot: 0 [ 148.852077] {1}[Hardware Error]: secondary_bus: 0x00 [ 148.857876] {1}[Hardware Error]: vendor_id: 0x14e4, device_id: 0x165f [ 148.865145] {1}[Hardware Error]: class_code: 02 [ 148.870845] {1}[Hardware Error]: aer_uncor_status: 0x0010, aer_uncor_mask: 0x0001 [ 148.879842] {1}[Hardware Error]: aer_uncor_severity: 0x000ef030 [ 148.886575] {1}[Hardware Error]: TLP Header: 4001 030f 90028090 [ 148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 0x0001 [ 148.902795] tg3 :04:00.0: AER:[20] UnsupReq (First) [ 148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, aer_agent=Requester ID [ 148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030 [ 148.925558] tg3 :04:00.0: AER: TLP Header: 4001 030f 90028090 [ 148.933984] reboot: Restarting system [ 148.938319] reboot: machine restart I have observed the following. when I test older kernel Kernel version Fatal Error 5.4.0-42.46 No 5.4.0-45.49 No 5.4.0-47.51 No 5.4.0-48.52 No 5.4.0-51.56 No 5.4.0-52.57 No 5.4.0-53.59 No 5.4.0-54.60 No 5.4.0-58.64 No 5.4.0-59.65 yes 5.4.0-60.67 yes later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65. looks like due to the following patch we are observing this issue. The driver is not handling D3 state properly PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI https://kernel.ubuntu.com/git/ubuntu/ubuntu- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp