On Mon, 16 Dec 2019 at 16:44, Andrew Jones <drjo...@redhat.com> wrote:
>
> On Mon, Dec 16, 2019 at 03:40:16PM +0000, Peter Maydell wrote:
> > On Mon, 16 Dec 2019 at 15:14, Peter Maydell <peter.mayd...@linaro.org> 
> > wrote:
> > > How does this interact with the usual register sync to/from
> > > KVM (ie kvm_arch_get_registers(), which I think will do a
> > > GET_ONE_REG read of the TIMER_CNT register the way it does
> > > any other sysreg, inside write_kvmstate_to_list(), plus
> > > kvm_arch_set_registers() which does the write back to the
> > > kernel in write_list_to_kvmstate()) ? Presumably we want this
> > > version to take precedence by the set_virtual_time call
> > > happening after the kvm_arch_set_registers, but is this
> > > guaranteed ?
> >
> > ...you might also want to look at the effects of simply
> > removing the KVM_REG_ARM_TIMER_CNT entry from the
> > 'non_runtime_cpregs[]' array -- in commit 4b7a6bf402bd064
> > we explicitly stopped reading/writing this register's value
> > to/from the kernel except for inbound migration, and it
> > feels like this patchset is now rolling back that approach,
> > so maybe we should also be (configurably) rolling back some
> > of its implementation rather than just leaving it in place.
>
> I feel like I already considered that, maybe even tried it, a few months
> ago when I first looked at this. I must have decided against it for some
> reason at the time, but I don't recall what. Now I can say the reason is
> because we only do this save/restore when we transition to/from paused
> state, though.

I found the thread which discussed the bug which originally
caused us to add commit 4b7a6bf402bd064:
https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015665.html
 -- there are some codepaths which cause us to do a sync from/to
KVM for one VCPU while others are still running. If we do a
read-CNT-and-write-back then we effectively cause time to jump
backwards for the other still-running CPUs.

So we do still want to have TIMER_CNT listed as being KVM_PUT_FULL_STATE
regardless, or we re-introduce that bug.

Your approach in this patchset reads and writes on vm-paused,
so it won't have the pre-2015 problems.

It still feels odd that we're storing this bit of guest state
in two places now though -- in kvm_vtime, and also in its usual
place in the cpreg_array data structures (we write back the
value from kvm_vtime when the VM starts running, and we write
back the value from the cpreg_array for a PUT_FULL_STATE, which
the comments claim is only on startup or when we just loaded
migration state (and also undocumentedly but reasonably on
cpu-hotplug, which arm doesn't have yet).

I've just spent a little while digging through code, and
haven't been able to satisfy myself on the ordering of which
writeback wins: for a loadvm I think we first do a
cpu_synchronize_all_post_init() (writing back the counter
value from the migration data) and then after than we will
unpause the VM -- why doesn't this overwrite the correct
value with the wrong value from kvm_vtime ?

I just noticed also that the logic used in this patch
doesn't match what other architectures do in their vm_state_change
function -- eg cpu_ppc_clock_vm_state_change() has an
"if (running) { load } else { save }", and kvmclock_vm_state_change()
for i386 also has "if (running) { ... } else { ... }", though
it has an extra wrinkle where it captures "are we PAUSED?"
to use in the pre_save function; the comment above
kvmclock_pre_save() suggests maybe that would be useful for other
than x86, too. kvm_s390_tod_vm_state_change() has
logic that's a slightly more complicated variation on just
testing the 'running' flag, but it doesn't look at the
specific new state.

> > I note also that the commit message there had a remark
> > about inconsistencies between VCPUs -- is the right thing
> > to handle this per-VM rather than per-VCPU somehow?
>
> per-VM would make sense, because the counters should be synchronized
> among the VCPUs. KVM does that for us, though, so whichever VCPU last
> restores its counter is the one that will determine the final value.
>
> Maybe we should have a VM ioctl instead, but ATM we only have VCPU ioctls.

I meant more "only do the save/load once per VM in QEMU but
do it by working with just one VCPU". But I guess since migration
works on all the VCPUs we're ok to do pause-resume the same way
(and I've now tracked down what the 'inconsistentencies between VCPUs'
were: they're when we were syncing the CNT value for one vCPU when
others were still running.)

thanks
-- PMM

Reply via email to