Em Tue, 6 Aug 2024 16:31:13 +0200 Igor Mammedov <imamm...@redhat.com> escreveu:
> > + /* Could also be read back from the error_block_address register */ > > + *error_block_addr = base + > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > > + ACPI_GHES_ERROR_SOURCE_COUNT * sizeof(uint64_t) + > > + error_source_to_index[notify] * ACPI_GHES_MAX_RAW_DATA_LENGTH; > > + > > + return true; > > +} > > I don't like all this pointer math, which is basically a reverse engineered > QEMU actions on startup + guest provided etc/hardware_errors address. > > For once, it assumes error_source_to_index[] matches order in which HEST > error sources were described, which is fragile. > > 2nd: migration-wive it's disaster, since old/new HEST/hardware_errors tables > in RAM migrated from older version might not match above assumptions > of target QEMU. > > I see 2 ways to rectify it: > 1st: preferred/cleanest would be to tell QEMU (via fw_cfg) address of HEST > table > in guest RAM, like we do with etc/hardware_errors, see > build_ghes_error_table() > ... > tell firmware to write hardware_errors GPA into > and then fetch from HEST table in RAM, the guest patched error/ack > addresses > for given source_id > > code-wise: relatively simple once one wraps their own head over > how this whole APEI thing works in QEMU > workflow is described in docs/specs/acpi_hest_ghes.rst > look to me as sufficient to grasp it. > (but my view is very biased given my prior knowledge, > aka: docs/comments/examples wrt acpi patching are good > enough) > (if it's not clear how to do it, ask me for pointers) That sounds a better approach, however... > 2nd: sort of hack based on build_ghes_v2() Error Status Address/Read Ack > Register > patching instructions > bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE, > > address_offset + GAS_ADDR_OFFSET, sizeof(uint64_t), > > ACPI_GHES_ERRORS_FW_CFG_FILE, source_id * > sizeof(uint64_t)); > ^^^^^^^^^^^^^^^^^^^^^^^^^ > during build_ghes_v2() also store on a side mapping > source_id -> error address offset : read ack address > > so when you are injecting error, you'd at least use offsets > used at start time, to get rid of risk where injection code > diverge from HEST:etc/hardware_errors layout at start time. > > However to make migration safe, one would need to add a fat > comment not to change order ghest error sources in HEST _and_ > a dedicated unit test to make sure we catch it when that happens. > bios_tables_test should be able to catch the change, but it won't > say what's wrong, hence a test case that explicitly checks order > and loudly & clear complains when we will break order assumptions. > > downside: > * we are are limiting ways HEST could be composed/reshuffled in > future > * consumption of extra CI resources > * and well, it relies on above duct tape holding all pieces > together I ended opting to do approach (2) on this changeset, as the current code is already using bios_linker_loader_add_pointer() for ghes, being deeply relying on the block address/ack and cper calculus. To avoid troubles on this duct tape, I opted to move all offset math to a single function at ghes.c: /* * ID numbers used to fill HEST source ID field */ enum AcpiHestSourceId { ACPI_HEST_SRC_ID_SEA, ACPI_HEST_SRC_ID_GED, /* Shall be the last one */ ACPI_HEST_SRC_ID_COUNT } AcpiHestSourceId; ... static bool acpi_hest_address_offset(enum AcpiGhesNotifyType notify, uint64_t *error_block_offset, uint64_t *ack_offset, uint64_t *cper_offset, enum AcpiHestSourceId *source_id) { enum AcpiHestSourceId source; uint64_t offset; switch (notify) { case ACPI_GHES_NOTIFY_SEA: /* Only on ARMv8 */ source = ACPI_HEST_SRC_ID_SEA; break; case ACPI_GHES_NOTIFY_GPIO: source = ACPI_HEST_SRC_ID_GED; break; default: return true; } if (source_id) { *source_id = source; } /* * Please see docs/specs/acpi_hest_ghes.rst for the memory layout. * In summary, memory starts with error addresses, then acks and * finally CPER blocks. */ offset = source * sizeof(uint64_t); if (error_block_offset) { *error_block_offset = offset; } if (ack_offset) { *ack_offset = offset + ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t); } if (cper_offset) { *cper_offset = 2 * ACPI_HEST_SRC_ID_COUNT * sizeof(uint64_t) + source * ACPI_GHES_MAX_RAW_DATA_LENGTH; } return false; } I also removed the anonymous enum with SEA/GPIO source IDs, using only the ACPI notify type as arguments at the function calls. As there's now a single point where the offsets from docs/specs/acpi_hest_ghes.rst are enforced, this should be error prone. The code could later be changed to use approach (2), on a separate cleanup. Thanks, Mauro