On 13/12/2017 20:59, Alexander Graf wrote: > > > On 13.12.17 20:29, Laurent Vivier wrote: >> On 13/12/2017 20:19, Alexander Graf wrote: >>> >>> >>> On 02.02.17 06:14, David Gibson wrote: >>>> From: Laurent Vivier <lviv...@redhat.com> >>>> >>>> This is a port to ppc of the i386 commit: >>>> 00f4d64 kvmclock: clock should count only if vm is running >>>> >>>> We remove timebase_post_load function, and use the VM state >>>> change handler to save and restore the guest_timebase (on stop >>>> and continue). >>>> >>>> We keep timebase_pre_save to reduce the clock difference on >>>> migration like in: >>>> 6053a86 kvmclock: reduce kvmclock difference on migration >>>> >>>> Time base offset has originally been introduced by commit >>>> 98a8b52 spapr: Add support for time base offset migration >>>> >>>> So while VM is paused, the time is stopped. This allows to have >>>> the same result with date (based on Time Base Register) and >>>> hwclock (based on "get-time-of-day" RTAS call). >>>> >>>> Moreover in TCG mode, the Time Base is always paused, so this >>>> patch also adjust the behavior between TCG and KVM. >>>> >>>> VM state field "time_of_the_day_ns" is now useless but we keep >>>> it to be able to migrate to older version of the machine. >>>> >>>> As vmstate_ppc_timebase structure (with timebase_pre_save() and >>>> timebase_post_load() functions) was only used by vmstate_spapr, >>>> we register the VM state change handler only in ppc_spapr_init(). >>>> >>>> Signed-off-by: Laurent Vivier <lviv...@redhat.com> >>>> Signed-off-by: David Gibson <da...@gibson.dropbear.id.au> >>> >>> Just a small heads-up: I've been debugging an OpenQA regression lately >>> where our automated testing regressed with QEMU 2.9. With stock 2.9.1, I >>> get a failure rate of "weird" effects (probably TB divergence between >>> vcpus) of ~30%. With this patch reverted it's back to 0%. >>> >>> I *think* something here causes the TB offset of multiple threads (I'm >>> running -smp 2,threads=2) to diverge. >>> >>> I'll keep debugging things tomorrow, but I'll be happy to see anyone >>> else beat me to analyze what is going wrong ;). >> >> Don't know if it can be related, but for migration we need: > > > As expected, this did not fix it. I'll keep digging. > > My hunch is that we now set VTB on different cores at different times, > introducing tiny VTB offsets which can lead to negative TB differences > inside the guest.
Did you find where is the problem? Can I help? Thanks, Laurent