** Description changed:

  [Impact]
  The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe 
errors to the AER (Advanced Error Reporting) driver, which surfaces them to 
userspace. However, we're currently only reporting "recoverable" errors and not 
errors of other types (e.g. correctable), thus hiding signs of faulty hardware 
from the user.
  
  [Test Case]
  $ sudo apt install rasdaemon
  # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the 
attached script to inject a correctable PCIe error.
  $ sudo ras-mc-ctl --errors
  # There should be an entry for the injected error, as shown below:
  No Memory errors.
  
  PCIe AER events:
  1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error
  
  No Extlog errors.
  
  No MCE errors.
  
  [Regression Risk]
+ Above test was ran on x86 & ARM platforms to mitigate regression risk.

** Attachment added: "einj-aer.sh"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+attachment/5135673/+files/einj-aer.sh

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1769730

Title:
  Some PCIe errors not surfaced through rasdaemon

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]
  The APEI (ACPI Platform Error Interface) interface is supposed to report PCIe 
errors to the AER (Advanced Error Reporting) driver, which surfaces them to 
userspace. However, we're currently only reporting "recoverable" errors and not 
errors of other types (e.g. correctable), thus hiding signs of faulty hardware 
from the user.

  [Test Case]
  $ sudo apt install rasdaemon
  # On a system that supports ACPI EINJ (dmesg | grep "ACPI: EINJ"), use the 
attached script to inject a correctable PCIe error.
  $ sudo ras-mc-ctl --errors
  # There should be an entry for the injected error, as shown below:
  No Memory errors.

  PCIe AER events:
  1 2018-05-07 17:55:46 +0000 Fatal error: Receiver Error

  No Extlog errors.

  No MCE errors.

  [Regression Risk]
  Above test was ran on x86 & ARM platforms to mitigate regression risk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1769730/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to