On Tue, Nov 18, 2025 at 10:47:55AM +0000, Jonathan Cameron wrote: > On Thu, 13 Nov 2025 03:25:27 +1000 > Gavin Shan <[email protected]> wrote: > > > In the combination of 64KiB host and 4KiB guest, a problematic host > > page affects 16x guest pages. Those 16x guest pages are most likely > > owned by separate threads and accessed by the threads in parallel. > > It means 16x memory errors can be raised at once. However, we're > > unable to handle this situation because the only error source has > > one read acknowledgement register in current design. QEMU has to > > crash in the following path due to the previously delivered error > > isn't acknowledged by the guest on attempt to deliver another error. > > > > kvm_vcpu_thread_fn > > kvm_cpu_exec > > kvm_arch_on_sigbus_vcpu > > kvm_cpu_synchronize_state > > acpi_ghes_memory_errors > > abort > > > > This series fixes the issue by sending 16x consective CPER errors > > which are contained in a single GHES error block. > > > > PATCH[1-4] Increases GHES raw data maximal length from 1KiB to 4KiB > > PATCH[5] Supports multiple error records in a single error block > > PATCH[6-7] Improves the error handling in the error delivery path > > PATCH[8] Sends 16x consective CPERs in a single block if needed > > > > Hi Gavin, > > Just a quick head's up to say we've had some internal discussions around the > kernel handling of broader address masks in CPER and think it is probably > broken. Rectifying that may at least simplify what is needed on the QEMU side > of things and maybe even handle much larger blocks (2M and larger).
Btw, I just added a logic at rasdaemon to catch SIGBUS errors: https://github.com/mchehab/rasdaemon/pull/199 But so far, I didn't find a proper way to check such code. Jonathan/Gavin, Do you know a good way for us to check how the mm SEA notification is handled with QEMU? Regards, Mauro
