Quoting David Gibson (2018-07-26 00:07:46)
> On Thu, May 03, 2018 at 11:20:44PM -0500, Michael Roth wrote:
> > In some cases (e.g. spapr) we record guest timebase after qmp_stop()
> > via a runstate hook so we can restore it on qmp_cont(). If a migration
> > occurs in between those events we end up saving it again, this time
> > based on the current timebase the guest would be seeing had it been
> > running. This has the effect of advancing the guest timebase while
> > it is stopped, which is not what the code intends.
> > 
> > Other than simple jumps in time, this has been seen to trigger what
> > appear to be RCU-related crashes in recent kernels when the advance
> > exceeds rcu_cpu_stall_timeout, and it can be triggered by fairly
> > common operations such as `virsh migrate ... --timeout 60`.
> > 
> > Cc: Alexey Kardashevskiy <a...@ozlabs.ru>
> > Cc: David Gibson <da...@gibson.dropbear.id.au>
> > Cc: Laurent Vivier <lviv...@redhat.com>
> > Cc: qemu-...@nongnu.org
> > Cc: qemu-sta...@nongnu.org
> > Signed-off-by: Michael Roth <mdr...@linux.vnet.ibm.com>
> 
> Sorry, this fell off my radar for ages, but I've finally had a chance
> to look at it properly.

No problem, I had assumed it just wasn't deemed needed based on the
discussion.

> 
> I'm not totally convinced this handle all the possible edge cases
> correctly, but I am convinced it gives behaviour that's more correct
> than we have now.  It doesn't introduce changes to the interface or
> migration stream that would break things in future, so I've applied it
> to ppc-for-3.1.

There's a flaw with the patch that Greg pointed out to me previously
in that it doesn't just avoid the presave hook when vm_stop is issued
manually via monitor, but also when vm_stop is issued by the migration
code itself. The latter wasn't intentional. I'm not sure if we currently
have a good way to distinguish between the 2 cases to rectify that.

As the patch currently stands it is effectively a revert of the
Laurent's original patch.

FWIW, the RCU-related crashes mentioned in the original commit turned
out to be a separate issue that was exacerbated by RCU stall warnings
messages (which *would* be less likely with this patch). So this would
be more about user experience and spurious log messages than an actual
known bug fix. I am still of the opinion that we shouldn't resave the
clock if the vm_stop was manually-issued; it's just that the current
patch oversteps that goal.

> 
> > ---
> >  hw/ppc/ppc.c         | 12 ++++++++++++
> >  target/ppc/cpu-qom.h |  1 +
> >  2 files changed, 13 insertions(+)
> > 
> > diff --git a/hw/ppc/ppc.c b/hw/ppc/ppc.c
> > index ec4be25f49..ff0a107864 100644
> > --- a/hw/ppc/ppc.c
> > +++ b/hw/ppc/ppc.c
> > @@ -865,6 +865,15 @@ static void timebase_save(PPCTimebase *tb)
> >      uint64_t ticks = cpu_get_host_ticks();
> >      PowerPCCPU *first_ppc_cpu = POWERPC_CPU(first_cpu);
> >  
> > +    /* since we generally save timebase just after the guest
> > +     * has stopped, avoid trying to save it again since we will
> > +     * end up advancing it by the amount of ticks that have
> > +     * elapsed in the host since the initial save
> > +     */
> > +    if (tb->saved) {
> > +        return;
> > +    }
> > +
> >      if (!first_ppc_cpu->env.tb_env) {
> >          error_report("No timebase object");
> >          return;
> > @@ -877,6 +886,7 @@ static void timebase_save(PPCTimebase *tb)
> >       * there is no need to update it from KVM here
> >       */
> >      tb->guest_timebase = ticks + first_ppc_cpu->env.tb_env->tb_offset;
> > +    tb->saved = true;
> >  }
> >  
> >  static void timebase_load(PPCTimebase *tb)
> > @@ -908,6 +918,8 @@ static void timebase_load(PPCTimebase *tb)
> >                          &pcpu->env.tb_env->tb_offset);
> >  #endif
> >      }
> > +
> > +    tb->saved = false;
> >  }
> >  
> >  void cpu_ppc_clock_vm_state_change(void *opaque, int running,
> > diff --git a/target/ppc/cpu-qom.h b/target/ppc/cpu-qom.h
> > index deaa46a14b..ec2dbcdcae 100644
> > --- a/target/ppc/cpu-qom.h
> > +++ b/target/ppc/cpu-qom.h
> > @@ -210,6 +210,7 @@ typedef struct PowerPCCPUClass {
> >  typedef struct PPCTimebase {
> >      uint64_t guest_timebase;
> >      int64_t time_of_the_day_ns;
> > +    bool saved;
> >  } PPCTimebase;
> >  
> >  extern const struct VMStateDescription vmstate_ppc_timebase;
> 
> -- 
> David Gibson                    | I'll have my music baroque, and my code
> david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
>                                 | _way_ _around_!
> http://www.ozlabs.org/~dgibson

Reply via email to