kvm: Implement virtual time adjustment

Andrew Jones Thu, 19 Dec 2019 06:33:38 -0800

On Mon, Dec 16, 2019 at 06:06:30PM +0000, Peter Maydell wrote:
> On Mon, 16 Dec 2019 at 16:44, Andrew Jones <drjo...@redhat.com> wrote:
> >
> > On Mon, Dec 16, 2019 at 03:40:16PM +0000, Peter Maydell wrote:
> > > On Mon, 16 Dec 2019 at 15:14, Peter Maydell <peter.mayd...@linaro.org> 
> > > wrote:
> > > > How does this interact with the usual register sync to/from
> > > > KVM (ie kvm_arch_get_registers(), which I think will do a
> > > > GET_ONE_REG read of the TIMER_CNT register the way it does
> > > > any other sysreg, inside write_kvmstate_to_list(), plus
> > > > kvm_arch_set_registers() which does the write back to the
> > > > kernel in write_list_to_kvmstate()) ? Presumably we want this
> > > > version to take precedence by the set_virtual_time call
> > > > happening after the kvm_arch_set_registers, but is this
> > > > guaranteed ?
> > >
> > > ...you might also want to look at the effects of simply
> > > removing the KVM_REG_ARM_TIMER_CNT entry from the
> > > 'non_runtime_cpregs[]' array -- in commit 4b7a6bf402bd064
> > > we explicitly stopped reading/writing this register's value
> > > to/from the kernel except for inbound migration, and it
> > > feels like this patchset is now rolling back that approach,
> > > so maybe we should also be (configurably) rolling back some
> > > of its implementation rather than just leaving it in place.
> >
> > I feel like I already considered that, maybe even tried it, a few months
> > ago when I first looked at this. I must have decided against it for some
> > reason at the time, but I don't recall what. Now I can say the reason is
> > because we only do this save/restore when we transition to/from paused
> > state, though.
> 
> I found the thread which discussed the bug which originally
> caused us to add commit 4b7a6bf402bd064:
> https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015665.html
>  -- there are some codepaths which cause us to do a sync from/to
> KVM for one VCPU while others are still running. If we do a
> read-CNT-and-write-back then we effectively cause time to jump
> backwards for the other still-running CPUs.
> 
> So we do still want to have TIMER_CNT listed as being KVM_PUT_FULL_STATE
> regardless, or we re-introduce that bug.


Thanks for digging that up. I now recall also having read that history
back when I first discovered KVM_REG_ARM_TIMER_CNT was special.

> 
> Your approach in this patchset reads and writes on vm-paused,
> so it won't have the pre-2015 problems.
> 
> It still feels odd that we're storing this bit of guest state
> in two places now though -- in kvm_vtime, and also in its usual
> place in the cpreg_array data structures (we write back the
> value from kvm_vtime when the VM starts running, and we write
> back the value from the cpreg_array for a PUT_FULL_STATE, which
> the comments claim is only on startup or when we just loaded
> migration state (and also undocumentedly but reasonably on
> cpu-hotplug, which arm doesn't have yet).
> 
> I've just spent a little while digging through code, and
> haven't been able to satisfy myself on the ordering of which
> writeback wins: for a loadvm I think we first do a
> cpu_synchronize_all_post_init() (writing back the counter
> value from the migration data) and then after than we will
> unpause the VM -- why doesn't this overwrite the correct
> value with the wrong value from kvm_vtime ?

Hmm... I think I may have gotten lost when I went through this before.
I just went through again, and still won't claim that I'm not a bit
lost, but it does appear I got it backwards. When I get a chance I'll
try to test this properly.

We could use the same location as normal, in the cpreg_array. I'd just
need to add a search of cpreg_indexes[] in order to get the index
needed for cpreg_values[]. 

> 
> I just noticed also that the logic used in this patch
> doesn't match what other architectures do in their vm_state_change
> function -- eg cpu_ppc_clock_vm_state_change() has an
> "if (running) { load } else { save }", and kvmclock_vm_state_change()
> for i386 also has "if (running) { ... } else { ... }", though
> it has an extra wrinkle where it captures "are we PAUSED?"
> to use in the pre_save function; the comment above
> kvmclock_pre_save() suggests maybe that would be useful for other
> than x86, too. kvm_s390_tod_vm_state_change() has
> logic that's a slightly more complicated variation on just
> testing the 'running' flag, but it doesn't look at the
> specific new state.

Yes, originally I had just if (running) {} else {}, but after looking at
https://lists.gnu.org/archive/html/qemu-devel/2019-03/msg03695.html and
seeing that the other architectures were careful to track the "are we
paused" state, I got the feeling that we should be more specific and
changed to if (running) {} else if (paused) {}. That's probably wrong,
though, if we want to track all vm-stopped time.

> 
> > > I note also that the commit message there had a remark
> > > about inconsistencies between VCPUs -- is the right thing
> > > to handle this per-VM rather than per-VCPU somehow?
> >
> > per-VM would make sense, because the counters should be synchronized
> > among the VCPUs. KVM does that for us, though, so whichever VCPU last
> > restores its counter is the one that will determine the final value.
> >
> > Maybe we should have a VM ioctl instead, but ATM we only have VCPU ioctls.
> 
> I meant more "only do the save/load once per VM in QEMU but
> do it by working with just one VCPU". But I guess since migration
> works on all the VCPUs we're ok to do pause-resume the same way
> (and I've now tracked down what the 'inconsistentencies between VCPUs'
> were: they're when we were syncing the CNT value for one vCPU when
> others were still running.)
> 
> thanks
> -- PMM
> 

Thanks,
drew

Re: [RFC PATCH v2 3/5] target/arm/kvm: Implement virtual time adjustment

Reply via email to