Quoting David Gibson (2018-07-26 00:07:46) > On Thu, May 03, 2018 at 11:20:44PM -0500, Michael Roth wrote: > > In some cases (e.g. spapr) we record guest timebase after qmp_stop() > > via a runstate hook so we can restore it on qmp_cont(). If a migration > > occurs in between those events we end up saving it again, this time > > based on the current timebase the guest would be seeing had it been > > running. This has the effect of advancing the guest timebase while > > it is stopped, which is not what the code intends. > > > > Other than simple jumps in time, this has been seen to trigger what > > appear to be RCU-related crashes in recent kernels when the advance > > exceeds rcu_cpu_stall_timeout, and it can be triggered by fairly > > common operations such as `virsh migrate ... --timeout 60`. > > > > Cc: Alexey Kardashevskiy <a...@ozlabs.ru> > > Cc: David Gibson <da...@gibson.dropbear.id.au> > > Cc: Laurent Vivier <lviv...@redhat.com> > > Cc: qemu-...@nongnu.org > > Cc: qemu-sta...@nongnu.org > > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com> > > Sorry, this fell off my radar for ages, but I've finally had a chance > to look at it properly.
No problem, I had assumed it just wasn't deemed needed based on the discussion. > > I'm not totally convinced this handle all the possible edge cases > correctly, but I am convinced it gives behaviour that's more correct > than we have now. It doesn't introduce changes to the interface or > migration stream that would break things in future, so I've applied it > to ppc-for-3.1. There's a flaw with the patch that Greg pointed out to me previously in that it doesn't just avoid the presave hook when vm_stop is issued manually via monitor, but also when vm_stop is issued by the migration code itself. The latter wasn't intentional. I'm not sure if we currently have a good way to distinguish between the 2 cases to rectify that. As the patch currently stands it is effectively a revert of the Laurent's original patch. FWIW, the RCU-related crashes mentioned in the original commit turned out to be a separate issue that was exacerbated by RCU stall warnings messages (which *would* be less likely with this patch). So this would be more about user experience and spurious log messages than an actual known bug fix. I am still of the opinion that we shouldn't resave the clock if the vm_stop was manually-issued; it's just that the current patch oversteps that goal. > > > --- > > hw/ppc/ppc.c | 12 ++++++++++++ > > target/ppc/cpu-qom.h | 1 + > > 2 files changed, 13 insertions(+) > > > > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c > > index ec4be25f49..ff0a107864 100644 > > --- a/hw/ppc/ppc.c > > +++ b/hw/ppc/ppc.c > > @@ -865,6 +865,15 @@ static void timebase_save(PPCTimebase *tb) > > uint64_t ticks = cpu_get_host_ticks(); > > PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu); > > > > + /* since we generally save timebase just after the guest > > + * has stopped, avoid trying to save it again since we will > > + * end up advancing it by the amount of ticks that have > > + * elapsed in the host since the initial save > > + */ > > + if (tb->saved) { > > + return; > > + } > > + > > if (!first_ppc_cpu->env.tb_env) { > > error_report("No timebase object"); > > return; > > @@ -877,6 +886,7 @@ static void timebase_save(PPCTimebase *tb) > > * there is no need to update it from KVM here > > */ > > tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset; > > + tb->saved = true; > > } > > > > static void timebase_load(PPCTimebase *tb) > > @@ -908,6 +918,8 @@ static void timebase_load(PPCTimebase *tb) > > &pcpu->env.tb_env->tb_offset); > > #endif > > } > > + > > + tb->saved = false; > > } > > > > void cpu_ppc_clock_vm_state_change(void *opaque, int running, > > diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h > > index deaa46a14b..ec2dbcdcae 100644 > > --- a/target/ppc/cpu-qom.h > > +++ b/target/ppc/cpu-qom.h > > @@ -210,6 +210,7 @@ typedef struct PowerPCCPUClass { > > typedef struct PPCTimebase { > > uint64_t guest_timebase; > > int64_t time_of_the_day_ns; > > + bool saved; > > } PPCTimebase; > > > > extern const struct VMStateDescription vmstate_ppc_timebase; > > -- > David Gibson | I'll have my music baroque, and my code > david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ > | _way_ _around_! > http://www.ozlabs.org/~dgibson