[PATCH v2 5/5] Documentation/ABI: Add details of PCI AER statistics

2018-05-23 Thread Rajat Jain
Add the PCI AER statistics details to
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
and provide a pointer to it in
Documentation/PCI/pcieaer-howto.txt

Signed-off-by: Rajat Jain 
---
v2: Move the documentation to Documentation/ABI/

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++
 Documentation/PCI/pcieaer-howto.txt   |   5 +
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats 
b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
new file mode 100644
index ..f55c389290ac
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@@ -0,0 +1,103 @@
+==
+PCIe Device AER statistics
+==
+These attributes show up under all the devices that are AER capable. These
+statistical counters indicate the errors "as seen/reported by the device".
+Note that this may mean that if an end point is causing problems, the AER
+counters may increment at its link partner (e.g. root port) because the
+errors will be "seen" / reported by the link partner and not the the
+problematic end point itself (which may report all counters as 0 as it never
+saw any problems).
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of correctable errors seen and reported by this
+   PCI device using ERR_COR.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable fatal errors seen and reported
+   by this PCI device using ERR_FATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable non-fatal errors seen and reported
+   by this PCI device using ERR_NONFATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_correctable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_COR. A sample result looks like this:
+-
+Receiver Error = 0x174
+Bad TLP = 0x19
+Bad DLLP = 0x3
+RELAY_NUM Rollover = 0x0
+Replay Timer Timeout = 0x1
+Advisory Non-Fatal = 0x0
+Corrected Internal Error = 0x0
+Header Log Overflow = 0x0
+-
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_uncorrectable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_FATAL or ERR_NONFATAL. A sample result
+   looks like this:
+-
+Undefined = 0x0
+Data Link Protocol = 0x0
+Surprise Down Error = 0x0
+Poisoned TLP = 0x0
+Flow Control Protocol = 0x0
+Completion Timeout = 0x0
+Completer Abort = 0x0
+Unexpected Completion = 0x0
+Receiver Overflow = 0x0
+Malformed TLP = 0x0
+ECRC = 0x0
+Unsupported Request = 0x0
+ACS Violation = 0x0
+Uncorrectable Internal Error = 0x0
+MC Blocked TLP = 0x0
+AtomicOp Egress Blocked = 0x0
+TLP Prefix Blocked Error = 0x0
+-
+
+
+PCIe Rootport AER statistics
+
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please 
note
+that the rootports also transmit (internally) the ERR_* messages for errors 
seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_COR messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_FATAL messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com

[PATCH v2 5/5] Documentation/ABI: Add details of PCI AER statistics

2018-05-23 Thread Rajat Jain
Add the PCI AER statistics details to
Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
and provide a pointer to it in
Documentation/PCI/pcieaer-howto.txt

Signed-off-by: Rajat Jain 
---
v2: Move the documentation to Documentation/ABI/

 .../testing/sysfs-bus-pci-devices-aer_stats   | 103 ++
 Documentation/PCI/pcieaer-howto.txt   |   5 +
 2 files changed, 108 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats

diff --git a/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats 
b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
new file mode 100644
index ..f55c389290ac
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats
@@ -0,0 +1,103 @@
+==
+PCIe Device AER statistics
+==
+These attributes show up under all the devices that are AER capable. These
+statistical counters indicate the errors "as seen/reported by the device".
+Note that this may mean that if an end point is causing problems, the AER
+counters may increment at its link partner (e.g. root port) because the
+errors will be "seen" / reported by the link partner and not the the
+problematic end point itself (which may report all counters as 0 as it never
+saw any problems).
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of correctable errors seen and reported by this
+   PCI device using ERR_COR.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable fatal errors seen and reported
+   by this PCI device using ERR_FATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of uncorrectable non-fatal errors seen and reported
+   by this PCI device using ERR_NONFATAL.
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_correctable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_COR. A sample result looks like this:
+-
+Receiver Error = 0x174
+Bad TLP = 0x19
+Bad DLLP = 0x3
+RELAY_NUM Rollover = 0x0
+Replay Timer Timeout = 0x1
+Advisory Non-Fatal = 0x0
+Corrected Internal Error = 0x0
+Header Log Overflow = 0x0
+-
+
+Where: /sys/bus/pci/devices//aer_stats/dev_breakdown_uncorrectable
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Breakdown of of correctable errors seen and reported by this
+   PCI device using ERR_FATAL or ERR_NONFATAL. A sample result
+   looks like this:
+-
+Undefined = 0x0
+Data Link Protocol = 0x0
+Surprise Down Error = 0x0
+Poisoned TLP = 0x0
+Flow Control Protocol = 0x0
+Completion Timeout = 0x0
+Completer Abort = 0x0
+Unexpected Completion = 0x0
+Receiver Overflow = 0x0
+Malformed TLP = 0x0
+ECRC = 0x0
+Unsupported Request = 0x0
+ACS Violation = 0x0
+Uncorrectable Internal Error = 0x0
+MC Blocked TLP = 0x0
+AtomicOp Egress Blocked = 0x0
+TLP Prefix Blocked Error = 0x0
+-
+
+
+PCIe Rootport AER statistics
+
+These attributes showup under only the rootports that are AER capable. These
+indicate the number of error messages as "reported to" the rootport. Please 
note
+that the rootports also transmit (internally) the ERR_* messages for errors 
seen
+by the internal rootport PCI device, so these counters includes them and are
+thus cumulative of all the error messages on the PCI hierarchy originating
+at that root port.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_cor_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_COR messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_fatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number of ERR_FATAL messages reported to rootport.
+
+Where: /sys/bus/pci/devices//aer_stats/rootport_total_nonfatal_errs
+Date:  May 2018
+Kernel Version: 4.17.0
+Contact:   linux-...@vger.kernel.org, raja...@google.com
+Description:   Total number