Gavin Shan <[email protected]> writes:

> Hi Markus,
>
> On 11/11/25 3:25 PM, Markus Armbruster wrote:
>> Gavin Shan <[email protected]> writes:
>> 
>>> Use error_abort in acpi_ghes_memory_errors() so that the caller needn't
>>> explicitly call abort() on errors. With this change, its return value
>>> isn't needed any more.
>>>
>>> Suggested-by: Igor Mammedov <[email protected]>
>>> Signed-off-by: Gavin Shan <[email protected]>
>>> ---
>>>   hw/acpi/ghes-stub.c    |  6 +++---
>>>   hw/acpi/ghes.c         | 15 ++++-----------
>>>   include/hw/acpi/ghes.h |  5 +++--
>>>   target/arm/kvm.c       | 10 +++-------
>>>   4 files changed, 13 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/hw/acpi/ghes-stub.c b/hw/acpi/ghes-stub.c
>>> index 4faf573aeb..4ef914ffc5 100644
>>> --- a/hw/acpi/ghes-stub.c
>>> +++ b/hw/acpi/ghes-stub.c
>>> @@ -11,10 +11,10 @@
>>>  #include "qemu/osdep.h"
>>>  #include "hw/acpi/ghes.h"
>>>  
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> -                            uint64_t *addresses, uint32_t num_of_addresses)
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> +                             uint64_t *addresses, uint32_t 
>>> num_of_addresses,
>>> +                             Error **errp)
>>>  {
>>> -    return -1;
>>>  }
>> 
>> Before the patch, this function always fails: it returns -1.
>> 
>> Afterwards, it always succeeds: it doesn't set @errp.
>> 
>> Which one is correct?
>> 
>
> Both are correct. This variant is only used on !CONFIG_ACPI_APEI. In this 
> case,
> there is no chance to call this variant in the only caller 
> kvm_arch_on_sigbus_vcpu().
> acpi_ghes_get_state() returns NULL on !CONFIG_ACPI_APEI there.

If this is not supposed to be called, have it abort() to make it
obvious.

>>>  
>>>  AcpiGhesState *acpi_ghes_get_state(void)
>>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
>>> index 055e5d719a..aa469c03f2 100644
>>> --- a/hw/acpi/ghes.c
>>> +++ b/hw/acpi/ghes.c
>>> @@ -543,8 +543,9 @@ void ghes_record_cper_errors(AcpiGhesState *ags, const 
>>> void *cper, size_t len,
>>>      notifier_list_notify(&acpi_generic_error_notifiers, &source_id);
>>>  }
>>>  
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> -                            uint64_t *addresses, uint32_t num_of_addresses)
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> +                             uint64_t *addresses, uint32_t 
>>> num_of_addresses,
>>> +                             Error **errp)
>> 
>> qapi/error.h:
>> 
>>   * - Whenever practical, also return a value that indicates success /
>>   *   failure.  This can make the error checking more concise, and can
>>   *   avoid useless error object creation and destruction.  Note that
>>   *   we still have many functions returning void.  We recommend
>>   *   • bool-valued functions return true on success / false on failure,
>>   *   • pointer-valued functions return non-null / null pointer, and
>>   *   • integer-valued functions return non-negative / negative.
>> 
>
> Question: If it's deterministic that caller passes @error_abort or 
> @error_fatal
> to acpi_ghes_memory_errors(), QEMU crashes with a core dump or exit before its
> caller to check the return value. In this case, it's still preferred for
> acpi_ghes_memory_errors() returns a value to indicate the success or failure?

Yes, to separate concerns: acpi_ghes_memory_errors() remains oblivious
of how it is called.

>>>  {
>>>      /* Memory Error Section Type */
>>>      const uint8_t guid[] =
>>> @@ -555,7 +556,6 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, 
>>> uint16_t source_id,
>>>       * Table 17-13 Generic Error Data Entry
>>>       */
>>>      QemuUUID fru_id = {};
>>> -    Error *errp = NULL;
>>>      int data_length;
>>>      GArray *block;
>>>      uint32_t block_status, i;
>>> @@ -592,16 +592,9 @@ int acpi_ghes_memory_errors(AcpiGhesState *ags, 
>>> uint16_t source_id,
>>>      }
>>>  
>>>      /* Report the error */
>>> -    ghes_record_cper_errors(ags, block->data, block->len, source_id, 
>>> &errp);
>>> +    ghes_record_cper_errors(ags, block->data, block->len, source_id, errp);
>>>  
>>>      g_array_free(block, true);
>>> -
>>> -    if (errp) {
>>> -        error_report_err(errp);
>>> -        return -1;
>>> -    }
>>> -
>>> -    return 0;
>>>  }
>> 
>> The error reporting moves into the caller.
>> 
>
> Similar question as above. If it's deterministic for the caller passes 
> @error_abort
> or @error_fatal to acpi_ghes_memory_errors() and then to 
> ghes_record_cper_errors(),
> QEMU crashes with a core dump or exit before error_report_err(errp) can be 
> executed.
> In this case, it's still preferred to have error_report_err(&error_abort) or
> error_report_err(&error_fatal) in its caller?

Yes.

>>>  
>>>  AcpiGhesState *acpi_ghes_get_state(void)
>>> diff --git a/include/hw/acpi/ghes.h b/include/hw/acpi/ghes.h
>>> index f73908985d..35c7bbbb01 100644
>>> --- a/include/hw/acpi/ghes.h
>>> +++ b/include/hw/acpi/ghes.h
>>> @@ -98,8 +98,9 @@ void acpi_build_hest(AcpiGhesState *ags, GArray 
>>> *table_data,
>>>                       const char *oem_id, const char *oem_table_id);
>>>  void acpi_ghes_add_fw_cfg(AcpiGhesState *vms, FWCfgState *s,
>>>                            GArray *hardware_errors);
>>> -int acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> -                            uint64_t *addresses, uint32_t 
>>> num_of_addresses);
>>> +void acpi_ghes_memory_errors(AcpiGhesState *ags, uint16_t source_id,
>>> +                             uint64_t *addresses, uint32_t 
>>> num_of_addresses,
>>> +                             Error **errp);
>>>  void ghes_record_cper_errors(AcpiGhesState *ags, const void *cper, size_t 
>>> len,
>>>                               uint16_t source_id, Error **errp);
>>>  
>>> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
>>> index 459ca4a9b0..a889315606 100644
>>> --- a/target/arm/kvm.c
>>> +++ b/target/arm/kvm.c
>>> @@ -2458,13 +2458,9 @@ void kvm_arch_on_sigbus_vcpu(CPUState *c, int code, 
>>> void *addr)
>>>              addresses[0] = paddr;
>>>              if (code == BUS_MCEERR_AR) {
>>>                  kvm_cpu_synchronize_state(c);
>>> -                if (!acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>> -                                             addresses, 1)) {
>>> -                    kvm_inject_arm_sea(c);
>>> -                } else {
>>> -                    error_report("failed to record the error");
>>> -                    abort();
>>> -                }
>>> +                acpi_ghes_memory_errors(ags, ACPI_HEST_SRC_ID_SYNC,
>>> +                                        addresses, 1, &error_abort);
>>> +                kvm_inject_arm_sea(c);
>> 
>> Before the patch, we get two error reports, like this:
>> 
>>      qemu-system-FOO: OSPM does not acknowledge previous error, so can not 
>> record CPER for current error anymore
>>      qemu-system-FOO: failed to record the error
>>      Aborted (core dumped)
>> 
>> Such error cascades should be avoided.
>> 
>> Afterwards, we get one:
>> 
>>      Unexpected error at SOURCE-FILE:LINE-NUMBER:
>>      qemu-system-FOO: OSPM does not acknowledge previous error, so can not 
>> record CPER for current error anymore
>>      Aborted (core dumped)
>> 
>> Are all errors reported by acpi_ghes_memory_errors() programming errors,
>> i.e. when an error is reported, there's a bug for us to fix?
>> 
>> If not, abort() is wrong before the patch, and &error_abort is wrong
>> afterwards.
>> 
>> You can use &error_fatal for fatal errors that are not programming
>> errors.
>
> No, there is no programming errors and the core dump is actually no sense.

Confirms my suspicion :)

> It makes more sense for the caller to pass @error_fatal down to 
> acpi_ghes_memory_errors().

Makes sense.

>>>              }
>>>              return;
>>>          }
>> 
>
> Thanks,
> Gavin


Reply via email to