On Thu, Mar 19, 2026 at 07:13:09PM +0800, Kai-Heng Feng wrote:
> Add support for decoding NVIDIA-specific CPER sections delivered via
> the APEI GHES vendor record notifier chain. NVIDIA hardware generates
> vendor-specific CPER sections containing error signatures and diagnostic
> register dumps. This implementation registers a notifier_block with the
> GHES vendor record notifier and decodes these sections, printing error
> details via dev_info().
> 
> The driver binds to ACPI device NVDA2012, present on NVIDIA server
> platforms. The NVIDIA CPER section contains a fixed header with error
> metadata (signature, error type, severity, socket) followed by
> variable-length register address-value pairs for hardware diagnostics.
> 
> This work is based on libcper [0].
> 
> Example output:
> nvidia-ghes NVDA2012:00: NVIDIA CPER section, error_data_length: 544
> nvidia-ghes NVDA2012:00: signature: CMET-INFO
> nvidia-ghes NVDA2012:00: error_type: 0
> nvidia-ghes NVDA2012:00: error_instance: 0
> nvidia-ghes NVDA2012:00: severity: 3
> nvidia-ghes NVDA2012:00: socket: 0
> nvidia-ghes NVDA2012:00: number_regs: 32
> nvidia-ghes NVDA2012:00: instance_base: 0x0000000000000000
> nvidia-ghes NVDA2012:00: register[0]: address=0x8000000100000000 
> value=0x0000000100000000

Is there a convenient way to connect NVDA2012:00 with the actual device?  I
assume this is typically a PCIe device?  How would we relate this with PCIe
errors?

Consider a cover letter.  Some of these comments apply to the series.

Wrap commit logs to fit in 75 columns.  When indented by "git log", all of
these overflow 80 columns by just a few characters.

Possibly reorder so the acpi/apei patches are together.  I don't think the
NVIDIA record handler depends on the PCI patch.

Typical subject line style in drivers/acpi/apei appears to be:

  ACPI: APEI: GHES: Add ...

> +config ACPI_APEI_NVIDIA_GHES
> +     tristate "NVIDIA GHES vendor record handler"
> +     depends on ACPI_APEI_GHES

Maybe s/ACPI_APEI_NVIDIA_GHES/ACPI_APEI_GHES_NVIDIA/ since there will
likely be more, and they'll sort nicely if the vendor is at the end.

> +     help
> +       Support for decoding NVIDIA-specific CPER sections delivered via
> +       the APEI GHES vendor record notifier chain. Registers a handler
> +       for the NVIDIA section GUID and logs error signatures, severity,
> +       socket, and diagnostic register address-value pairs.
> +
> +       Enable on NVIDIA server platforms (e.g. DGX, HGX) that expose
> +       ACPI device NVDA2012 in their firmware tables.

Wrap to fit in 80 columns like the rest of this file.

> +++ b/drivers/acpi/apei/nvidia-ghes.c

Maybe rename to "ghes-nvidia.c" so future decoders for other vendors are
grouped?

Reply via email to