[ Adding Fabiano who reworked all exception models for 7.0 and Nick who rewrote the Linux side sometime ago ]
On 8/15/22 18:20, Matheus Ferst wrote:
Currently, PowerPC interrupts are handled as follows: 1) The CPU_INTERRUPT_HARD bit of cs->interrupt_request gates all interrupts; 2) The bits of env->pending_interrupts identify which particular interrupt is raised; 3) ppc_set_irq can be used to set/clear env->pending_interrupt bit and CPU_INTERRUPT_HARD, but some places access env->pending_interrupt directly; 4) ppc_cpu_exec_interrupt is called by cpu_handle_interrupt when cs->interrupt_request indicates that there is some interrupt pending. This method checks CPU_INTERRUPT_HARD and calls ppc_hw_interrupt. If env->pending_interrupt is zero after this call, CPU_INTERRUPT_HARD will be cleared. 5) ppc_hw_interrupt checks if there is any unmasked interrupt and calls powerpc_excp with the appropriate POWERPC_EXCP_* value. The method will also reset the corresponding bit in env->pending_interrupt for interrupts that clear on delivery. If all pending interrupts are masked, CPU_INTERRUPT_HARD will be set, but ppc_hw_interrupt will not deliver or clear the interrupt, so CPU_INTERRUPT_HARD will not be reset by ppc_cpu_exec_interrupt. With that, cs->has_work keeps returning true, creating a loop that acquires and release qemu_mutex_lock_iothread, causing the poor performance reported in [1]. This patch series attempts to rework the PowerPC interrupt code to set CPU_INTERRUPT_HARD only when there are unmasked interrupts. Then cs->has_work can be simplified to a check of CPU_INTERRUPT_HARD, so it also only returns true when at least one interrupt can be delivered. To achieve that, we are basically following Alex Bannée's suggestion[2] in the original thread: the interrupt masking logic will be factored out of ppc_hw_interrupt in a new method, ppc_pending_interrupts. This method is then used to decide if CPU_INTERRUPT_HARD should be set or cleared after changes to MSR, LPCR, env->pending_interrupts, and power-management instructions. We used [3] to check for regressions at each patch in this series. After patch 12, booting a powernv machine with a newer skiboot with "-smp 4" goes from 1m09s to 20.79s.
whaou ! PowerNV is really an heavy weight platform, so that's a great improvement. Did you try KVM guests under PowerNV (L1 under an emulated L0) and KVM under pseries (L2 under an emulated L1) ? Try some intensive I/O on a SMP machine, like a large scp transfer. We should try the MacOS images also. Thanks, C.
[1] https://lists.gnu.org/archive/html/qemu-ppc/2022-06/msg00336.html [2] https://lists.gnu.org/archive/html/qemu-ppc/2022-06/msg00372.html [3] https://github.com/legoater/qemu-ppc-boot Matheus Ferst (13): target/ppc: define PPC_INTERRUPT_* values directly target/ppc: always use ppc_set_irq to set env->pending_interrupts target/ppc: move interrupt masking out of ppc_hw_interrupt target/ppc: prepare to split ppc_interrupt_pending by excp_model target/ppc: create a interrupt masking method for POWER9/POWER10 target/ppc: remove embedded interrupts from ppc_pending_interrupt_p9 target/ppc: create a interrupt masking method for POWER8 target/ppc: remove unused interrupts from ppc_pending_interrupt_p8 target/ppc: create a interrupt masking method for POWER7 target/ppc: remove unused interrupts from ppc_pending_interrupt_p7 target/ppc: remove ppc_store_lpcr from CONFIG_USER_ONLY builds target/ppc: introduce ppc_maybe_interrupt target/ppc: unify cpu->has_work based on cs->interrupt_request hw/ppc/ppc.c | 17 +- hw/ppc/trace-events | 2 +- target/ppc/cpu.c | 2 + target/ppc/cpu.h | 43 +-- target/ppc/cpu_init.c | 212 +------------ target/ppc/excp_helper.c | 651 ++++++++++++++++++++++++++++++++------- target/ppc/helper_regs.c | 2 + target/ppc/misc_helper.c | 11 +- target/ppc/translate.c | 8 +- 9 files changed, 580 insertions(+), 368 deletions(-)