On Wed,  5 Nov 2025 21:44:47 +1000
Gavin Shan <[email protected]> wrote:

> The current GHES raw data maximal length isn't enough for 16 consecutive
> CPER errors, which will be sent to a guest with 4KiB page size on a
> erroneous 64KiB host page. Note those 16 CPER errors will be contained
> in one single error block, meaning all CPER errors should be identical
> in terms of type and severity and all of them should be delivered in
> one shot.
> 
> Increase GHES raw data maximal length from 1KiB to 4KiB so that the
> error block has enough storage space for 16 consecutive CPER errors.
> 
> Signed-off-by: Gavin Shan <[email protected]>
> ---
>  docs/specs/acpi_hest_ghes.rst | 2 +-
>  hw/acpi/ghes.c                | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/specs/acpi_hest_ghes.rst b/docs/specs/acpi_hest_ghes.rst
> index aaf7b1ad11..acf31d6eeb 100644
> --- a/docs/specs/acpi_hest_ghes.rst
> +++ b/docs/specs/acpi_hest_ghes.rst
> @@ -68,7 +68,7 @@ Design Details
>      and N Read Ack Register entries. The size for each entry is 8-byte.
>      The Error Status Data Block table contains N Error Status Data Block
>      entries. The size for each entry is defined at the source code as
> -    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 1024 bytes). The total size
> +    ACPI_GHES_MAX_RAW_DATA_LENGTH (currently 4096 bytes). The total size

is it safe to bump without compat glue?

consider VM migrated from old QEMU to new one,
it will have  etc/hardware_errors allocated with 1K GESB,
and more importantly error_block_addressN will have 1K offsets as well

however with ACPI_GHES_MAX_RAW_DATA_LENGTH all length checks will
let >1K blocks to be written into into 1K 'formated' etc/hardware_errors.

Thanks to previous refactoring we get all addresses right (1K version),
but if you write large GESB there it will either overlap with the next GESB
or a smaller GESB might overwrite tail of preceding large one.
And in works case it's OOB when writing large GESB in the last block.

Given we have to write GESB successfully or abort, there is no point
in adding compat knobs. But we still need to check if GEBS will fit into
whatever block size etc/hardware_errors inside guest RAM is laid out originally.

>      for the "etc/hardware_errors" fw_cfg blob is
>      (N * 8 * 2 + N * ACPI_GHES_MAX_RAW_DATA_LENGTH) bytes.
>      N is the number of the kinds of hardware error sources.
> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> index 06555905ce..a9c08e73c0 100644
> --- a/hw/acpi/ghes.c
> +++ b/hw/acpi/ghes.c
> @@ -33,7 +33,7 @@
>  #define ACPI_HEST_ADDR_FW_CFG_FILE          "etc/acpi_table_hest_addr"
>  
>  /* The max size in bytes for one error block */
> -#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (1 * KiB)
> +#define ACPI_GHES_MAX_RAW_DATA_LENGTH   (4 * KiB)
>  
>  /* Generic Hardware Error Source version 2 */
>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10


Reply via email to