> -----Original Message----- > From: Intel-wired-lan <[email protected]> On Behalf > Of Ben Hutchings > Sent: Saturday, July 12, 2025 5:13 PM > To: [email protected]; linux-pci <linux- > [email protected]>; Pavan Chebbi <[email protected]>; > Michael Chan <[email protected]> > Cc: Laurent Bonnaud <[email protected]>; [email protected]; > [email protected] > Subject: Re: [Intel-wired-lan] Bug#1104670: linux-image-6.12.25-amd64: > system does not shut down - GHES: Fatal hardware error > > Hi all, > > On Sun, 2025-05-04 at 13:45 +0200, Laurent Bonnaud wrote: > [...] > > - Previously the kernel would output an error in > /var/lib/systemd/pstore/ but would shutdown anyway. > > > > - Now, with kernel 6.1.135-1, the shutdown is blocked as with > 6.12.x kernels (see below). > > -- > > Laurent. > > > > <30>[ 961.098671] systemd-shutdown[1]: Rebooting. > > <6>[ 961.098743] kvm: exiting hardware virtualization <6>[ > > 961.361878] megaraid_sas 0000:17:00.0: megasas_disable_intr_fusion > is > > called outbound_intr_mask:0x40000009 <6>[ 961.414526] ACPI: PM: > > Preparing to enter system sleep state S5 <0>[ 963.828210] > > {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error > > Source: 5 <0>[ 963.828213] {1}[Hardware Error]: event severity: > fatal <0>[ 963.828214] {1}[Hardware Error]: Error 0, type: fatal > > <0>[ 963.828216] {1}[Hardware Error]: section_type: PCIe error > > <0>[ 963.828216] {1}[Hardware Error]: port_type: 0, PCIe end > point > > <0>[ 963.828217] {1}[Hardware Error]: version: 3.0 > > <0>[ 963.828218] {1}[Hardware Error]: command: 0x0002, status: > 0x0010 > > <0>[ 963.828220] {1}[Hardware Error]: device_id: 0000:01:00.1 > > <0>[ 963.828221] {1}[Hardware Error]: slot: 6 > > <0>[ 963.828222] {1}[Hardware Error]: secondary_bus: 0x00 > > <0>[ 963.828223] {1}[Hardware Error]: vendor_id: 0x8086, > device_id: 0x1563 > > <0>[ 963.828224] {1}[Hardware Error]: class_code: 020000 > > <0>[ 963.828225] {1}[Hardware Error]: aer_uncor_status: > 0x00100000, aer_uncor_mask: 0x00018000 > > <0>[ 963.828226] {1}[Hardware Error]: aer_uncor_severity: > 0x000ef010 > > <0>[ 963.828227] {1}[Hardware Error]: TLP Header: 40000001 > 0000000f 90028090 00000000 > [...] > > It seems that this is a known bug in the BIOS of several Dell > PowerEdge models including (in this case) the R540. > > A workaround was added to the tg3 driver > <https://git.kernel.org/linus/e0efe83ed325277bb70f9435d4d9fc70bebdcca8 > > > and a similar change was proposed (but not accepted) in the i40e > driver <https://lore.kernel.org/all/20241227035459.90602-1- > [email protected]/>. > On tihis system the erorr log points to a deivce handled by the ixgbe > driver, and no workaround has been implemented for that. > > Since this issue seems to affect multiple different NIC vendors and > drivers, would it make more sense to implement this workaround as a > PCI quirk? > I support the idea of PCI workaround, but who will implement it ? Alex
> Ben. > > -- > Ben Hutchings > Experience is directly proportional to the value of equipment > destroyed > - Carolyn > Scheppner

