On Fri, Oct 06, 2017 at 11:15:31PM +0200, Cédric Le Goater wrote: > On 10/06/2017 11:07 AM, David Gibson wrote: > > On Thu, Oct 05, 2017 at 06:49:58PM +0200, Cédric Le Goater wrote: > >> When a CPU is stopped with the 'stop-self' RTAS call, its state > >> 'halted' is switched to 1 and, in this case, the MSR is not taken into > >> account anymore in the cpu_has_work() routine. Only the pending > >> hardware interrupts are checked with their LPCR:PECE* enablement bit. > >> > >> If the DECR timer fires after 'stop-self' is called and before the CPU > >> 'stop' state is reached, the nearly-dead CPU will have some work to do > >> and the guest will crash. This case happens very frequently with the > >> not yet upstream P9 XIVE exploitation mode. In XICS mode, the DECR is > >> occasionally fired but after 'stop' state, so no work is to be done > >> and the guest survives. > >> > >> I suspect there is a race between the QEMU mainloop triggering the > >> timers and the TCG CPU thread but I could not quite identify the root > >> cause. To be safe, let's disable the decrementer interrupt in the LPCR > >> when the CPU is halted and reenable it when the CPU is restarted. > >> > >> Signed-off-by: Cédric Le Goater <c...@kaod.org> > >> --- > >> hw/ppc/spapr_rtas.c | 16 ++++++++++++++++ > >> 1 file changed, 16 insertions(+) > >> > >> diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c > >> index cdf0b607a0a0..2389220c9738 100644 > >> --- a/hw/ppc/spapr_rtas.c > >> +++ b/hw/ppc/spapr_rtas.c > >> @@ -174,6 +174,15 @@ static void rtas_start_cpu(PowerPCCPU *cpu_, > >> sPAPRMachineState *spapr, > >> kvm_cpu_synchronize_state(cs); > >> > >> env->msr = (1ULL << MSR_SF) | (1ULL << MSR_ME); > >> + > >> + /* Enable DECR interrupt */ > >> + if (env->mmu_model == POWERPC_MMU_3_00) { > > > > Hm. Checking mmu_model doesn't seem right to me. I mean, it'll get > > the right answer in practice, but the LPCR programming has nothing > > whatsoever to do with the MMU. > > > > I think explicitly checking if cpu_ is a POWER9 instance with > > object_dynamic_cast would be a better option. > > OK. So I guess we should change the switch statement in cpu_ppc_set_papr() > also.
Yeah, I guess so. No rush. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson
signature.asc
Description: PGP signature