Verification: ubuntu@awrep3:~$ cat /proc/version Linux version 4.15.0-23-generic (buildd@bos02-arm64-002) (gcc version 7.3.0 (Ubuntu/Linaro 7.3.0-16ubuntu3)) #25-Ubuntu SMP Wed May 23 17:59:52 UTC 2018 ubuntu@awrep3:~$ sudo ras-mc-ctl --errors No Memory errors.
PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. ubuntu@awrep3:~$ sudo ./einj-aer.sh Injecting PCI Express Correctable Error [ 782.454317] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 5 [ 782.454321] {1}[Hardware Error]: It has been corrected by h/w and requires no further action [ 782.454324] {1}[Hardware Error]: event severity: corrected [ 782.454329] {1}[Hardware Error]: precise tstamp: 2018-05-25 15:02:13 [ 782.454332] {1}[Hardware Error]: Error 0, type: corrected [ 782.454335] {1}[Hardware Error]: section_type: PCIe error [ 782.454337] {1}[Hardware Error]: port_type: 4, root port [ 782.454340] {1}[Hardware Error]: version: 3.0 [ 782.454342] {1}[Hardware Error]: command: 0x0407, status: 0x0010 [ 782.454345] {1}[Hardware Error]: device_id: 0000:00:00.0 [ 782.454347] {1}[Hardware Error]: slot: 0 [ 782.454349] {1}[Hardware Error]: secondary_bus: 0x01 [ 782.454351] {1}[Hardware Error]: vendor_id: 0x17cb, device_id: 0x0401 [ 782.454354] {1}[Hardware Error]: class_code: 000406 [ 782.454356] {1}[Hardware Error]: bridge: secondary_status: 0x0000, control: 0x0000 [ 782.454398] pcieport 0000:00:00.0: aer_status: 0x00000001, aer_mask: 0x0000e000 [ 782.460780] Receiver Error [ 782.460784] pcieport 0000:00:00.0: aer_layer=Physical Layer, aer_agent=Receiver ID ubuntu@awrep3:~$ sudo ras-mc-ctl --errors No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error 2 2018-05-25 15:02:37 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. ** Tags removed: verification-needed-bionic ** Tags added: verification-done-bionic -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1769730 Title: Some PCIe errors not surfaced through rasdaemon Status in linux package in Ubuntu: In Progress Status in linux source package in Bionic: Fix Committed Bug description: [Impact] The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe errors to the AER (Advanced Error Reporting) driver, which surfaces them to userspace. However, we're currently only reporting "recoverable" errors and not errors of other types (e.g. correctable), thus hiding signs of faulty hardware from the user. [Test Case] $ sudo apt install rasdaemon # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the attached script to inject a correctable PCIe error. $ sudo ras-mc-ctl --errors # There should be an entry for the injected error, as shown below: No Memory errors. PCIe AER events: 1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error No Extlog errors. No MCE errors. [Fix] There is a 2-patch upstream fix that addresses this issue and cleanly cherry-picks into Ubuntu. The solution is to not artficially limit which PCIe errors are reported down to the AER driver to those that are recoverable. [Regression Risk] Above test was ran on x86 & ARM platforms to mitigate regression risk. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp