[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-24 Thread Jeff Lane
** Description changed:

- impact
- being noticed a lot, only affects 5.4, fix in subsequent failures
+ SRU Justification:
  
- The offending patch was removed in 20.10 and later kernels (it was
- reverted upstream not long after being merged into mainline but we never
- reverted it)
+ [IMPACT]
  
+ This is being reported by a hardware partner as it is being noticed a
+ lot both in their internal testing teams and also being reported with
+ some frequency by customers who are seeing these messages in their logs
+ and thus it is generating an unusualy high volume of support calls from
+ the field.
  
- following error messages are observed
+ In 5.4, commit d60cd06331a3566d3305b3c7b566e79edf4e2095 was introduced
+ upstream and pulled into Ubuntu between 5.4.0-58.64 and 5.4.0-59.65.
+ Upstream, these errors were discovered and that patch was reverted (see
+ Fix Below).  We carry the revert commit in all subsequent Focal HWE
+ kernels starting at 5.12, but the fix was never pulled back into Focal
+ 5.4.
+ 
+ according to the hardware partner:
+ 
+ the following error messages are observed when rebooting a machine that
+ uses the BCM5720 chipset, which is a widely used 1GbE controller found
+ on LOMs and OCP NICs as well as many PCIe NIC models.
  
  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart
  
- I  have observed the following. when I test older kernel
+ The hardware partner did some bisection and observed the following:
  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes
  
- later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.
+ [FIX]
+ The fix is to apply this patch from upstream:
  
- looks like due to the following patch we are observing this issue. The
- driver is not handling D3 state properly
+ commit 9d3fcb28f9b9750b474811a2964ce022df56336e
+ Author: Josef Bacik 
+ Date:   Tue Mar 16 22:17:48 2021 -0400
  
- PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
+ Revert "PM: ACPI: reboot: Use S5 for reboot"
+ 
+ This reverts commit d60cd06331a3566d3305b3c7b566e79edf4e2095.
+ 
+ This patch causes a panic when rebooting my Dell Poweredge r440.  I do
+ not have the full panic log as it's lost at that stage of the reboot and
+ I do not have a serial console.  Reverting this patch makes my system
+ able to reboot again.
  
- https://kernel.ubuntu.com/git/ubuntu/ubuntu-
- focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d
+ Example:
+ 
https://code.launchpad.net/~bladernr/ubuntu/+source/linux/+git/focal/+ref/1917471
+ 
+ [TEST CASE] 
+ Install the patched kernel on a machine that uses a BCM5720 LOM and reboot 
the machine and see that the errors no longer appear.

** Summary changed:

- [Regression] Bus Fatal Error observed when reboot on BCM5720
+ [SRU][Re

[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-23 Thread Jeff Lane
** Description changed:

+ impact
+ being noticed a lot, only affects 5.4, fix in subsequent failures
+ 
+ The offending patch was removed in 20.10 and later kernels (it was
+ reverted upstream not long after being merged into mainline but we never
+ reverted it)
+ 
+ 
  following error messages are observed
  
  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart
  
- 
- I  have observed the following. when I test older kernel 
- 
+ I  have observed the following. when I test older kernel
  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes
  
- 
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.
  
  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly
  
  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI
  
  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  impact
  being noticed a lot, only affects 5.4, fix in subsequent failures

  The offending patch was removed in 20.10 and later kernels (it was
  reverted upstream not long after being merged into mainline but we
  never reverted it)


  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Err

[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-23 Thread Kai-Heng Feng
Can you please try "reboot=" kernel parameter? the value can be "bios,
acpi, kbd, triple, efi, or pci".

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-22 Thread Vinay HM
@Kai-Heng Feng
In reply to comment #14
Issue is still observed even after blacklisting tg3 driver by giving 
"blacklist=tg3 modprobe.blacklist=tg3" in kernel parameter.
Please find the serial logs of the efforts.


** Attachment added: "fatal_blacklist"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+attachment/5571631/+files/fatal_blacklist

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-21 Thread Kai-Heng Feng
Sujith, does the issue happen when "tg3" is blacklisted? Something like
"blacklist=tg3 modprobe.blacklist=tg3" in kernel parameter.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-20 Thread Sujith Pandel
Hi Jeff,
I think we should now go ahead with the SRU request for the fix. 
rmmod tg3 does not help workaround the issue.

Fix details:
Revert "PM: ACPI: reboot: Use S5 for reboot"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/kernel?h=v5.17-rc7&id=9d3fcb28f9b9750b474811a2964ce022df56336e

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-18 Thread Vinay HM
** Attachment added: "Fatal_issue_rmmod_tg3.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+attachment/5570368/+files/Fatal_issue_rmmod_tg3.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-18 Thread Vinay HM
@sujith 
In reply to comment #8.
Issue is still observed even after removing tg3 driver by using "rmmod tg3".
Please find the serial logs of the efforts.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-17 Thread Jeff Lane
The patch you're requesting reverts this patch:
commit d60cd06331a3566d3305b3c7b566e79edf4e2095
Author: Kai-Heng Feng 
Date:   Fri Oct 30 15:06:57 2020 +0800

PM: ACPI: reboot: Use S5 for reboot

After reboot, it's not possible to use hotkeys to enter BIOS setup
and boot menu on some HP laptops.

BIOS folks identified the root cause is the missing _PTS call, and
BIOS is expecting _PTS to do proper reset.

Using S5 for reboot is default behavior under Windows, "A full
shutdown (S5) occurs when a system restart is requested" [1], so
let's do the same here.

[1] https://docs.microsoft.com/en-us/windows/win32/power/system-power-states

Signed-off-by: Kai-Heng Feng 
[ rjw: Subject edit ]
Signed-off-by: Rafael J. Wysocki 

It looks like this was applied to 5.4 and the patch that reverts this
was pulled into 5.13 and later, so that is why 5.4 is the only affected
version. I was confused as to why this wasn't also appearing in Impish
and Jammy. Now we know.  So I can close those tasks.

** Changed in: linux (Ubuntu Impish)
   Status: In Progress => Fix Released

** Changed in: linux (Ubuntu Jammy)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  Fix Released
Status in linux source package in Jammy:
  Fix Released

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-17 Thread Launchpad Bug Tracker
Status changed to 'Confirmed' because the bug affects multiple users.

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-17 Thread Jeff Lane
** Project changed: linux => linux (Ubuntu)

** Also affects: linux (Ubuntu Jammy)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Impish)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Focal)
   Status: New => In Progress

** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Impish)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Jammy)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Focal)
 Assignee: (unassigned) => Jeff Lane (bladernr)

** Changed in: linux (Ubuntu Impish)
 Assignee: (unassigned) => Jeff Lane (bladernr)

** Changed in: linux (Ubuntu Jammy)
 Assignee: (unassigned) => Jeff Lane (bladernr)

** Changed in: linux (Ubuntu Impish)
   Status: New => In Progress

** Changed in: linux (Ubuntu Jammy)
   Status: New => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1917471] Re: [Regression] Bus Fatal Error observed when reboot on BCM5720

2022-03-17 Thread Jeff Lane
** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

** No longer affects: linux (Ubuntu)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1917471

Title:
  [Regression] Bus Fatal Error observed when reboot on BCM5720

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Focal:
  In Progress
Status in linux source package in Impish:
  In Progress
Status in linux source package in Jammy:
  In Progress

Bug description:
  following error messages are observed

  [  146.429212] shutdown[1]: Rebooting.
  [  146.435151] kvm: exiting hardware virtualization
  [  146.575319] megaraid_sas :67:00.0: megasas_disable_intr_fusion is 
called outbound_intr_mask:0x4009
  [  148.088133] [qede_unload:2236(eno12409)]Link is down
  [  148.183618] qede :31:00.1: Ending qede_remove successfully
  [  148.518541] [qede_unload:2236(eno12399)]Link is down
  [  148.625066] qede :31:00.0: Ending qede_remove successfully
  [  148.762067] ACPI: Preparing to enter system sleep state S5
  [  148.794638] {1}[Hardware Error]: Hardware error from APEI Generic Hardware 
Error Source: 5
  [  148.803731] {1}[Hardware Error]: event severity: recoverable
  [  148.810191] {1}[Hardware Error]:  Error 0, type: fatal
  [  148.816088] {1}[Hardware Error]:   section_type: PCIe error
  [  148.822391] {1}[Hardware Error]:   port_type: 0, PCIe end point
  [  148.829026] {1}[Hardware Error]:   version: 3.0
  [  148.834266] {1}[Hardware Error]:   command: 0x0006, status: 0x0010
  [  148.841140] {1}[Hardware Error]:   device_id: :04:00.0
  [  148.847309] {1}[Hardware Error]:   slot: 0
  [  148.852077] {1}[Hardware Error]:   secondary_bus: 0x00
  [  148.857876] {1}[Hardware Error]:   vendor_id: 0x14e4, device_id: 0x165f
  [  148.865145] {1}[Hardware Error]:   class_code: 02
  [  148.870845] {1}[Hardware Error]:   aer_uncor_status: 0x0010, 
aer_uncor_mask: 0x0001
  [  148.879842] {1}[Hardware Error]:   aer_uncor_severity: 0x000ef030
  [  148.886575] {1}[Hardware Error]:   TLP Header: 4001 030f 90028090 

  [  148.894823] tg3 :04:00.0: AER: aer_status: 0x0010, aer_mask: 
0x0001
  [  148.902795] tg3 :04:00.0: AER:[20] UnsupReq   (First)
  [  148.910234] tg3 :04:00.0: AER: aer_layer=Transaction Layer, 
aer_agent=Requester ID
  [  148.918806] tg3 :04:00.0: AER: aer_uncor_severity: 0x000ef030
  [  148.925558] tg3 :04:00.0: AER:   TLP Header: 4001 030f 
90028090 
  [  148.933984] reboot: Restarting system
  [  148.938319] reboot: machine restart

  
  I  have observed the following. when I test older kernel 

  
  Kernel  version   Fatal Error
  5.4.0-42.46   No
  5.4.0-45.49   No
  5.4.0-47.51   No
  5.4.0-48.52   No
  5.4.0-51.56   No
  5.4.0-52.57   No
  5.4.0-53.59   No
  5.4.0-54.60   No
  5.4.0-58.64   No
  5.4.0-59.65   yes
  5.4.0-60.67   yes

  
  later I have bisect kernel between 5.4.0-58.64 and 5.4.0-59.65.

  looks like due to the following patch we are observing this issue. The
  driver is not handling D3 state properly

  PCI/ACPI: Whitelist hotplug ports for D3 if power managed by ACPI

  https://kernel.ubuntu.com/git/ubuntu/ubuntu-
  focal.git/commit/?id=b9319dd02269593911403dd5d684368bcef3261d

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1917471/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp