On Wed, Jun 14, 2017 at 01:12:12PM +0200, Paolo Bonzini wrote: > > > On 06/06/2017 20:19, Roman Kagan wrote: > > There is a design flaw in the Hyper-V SynIC implementation in KVM: when > > message page or event flags page is enabled by setting the corresponding > > msr, KVM zeroes it out. This violates the spec in general (per spec, > > the pages have to be overlay ones and only zeroed at cpu reset), but > > it's non-fatal in normal operation because the user exit happens after > > the page is zeroed, so it's the underlying guest page which is zeroed > > out, and sane guests don't depend on its contents to be preserved while > > it's overlaid. > > > > However, in the case of vmstate load the overlay pages are set up before > > msrs are set so the contents of those pages get lost. > > > > To work it around, avoid setting up overlay pages in .post_load. > > Instead, postpone it until after the msrs are pushed to KVM. As a > > result, KVM just zeroes out the underlying guest pages similar to how it > > happens during guest-initiated msr writes, which is tolerable. > > Why not disable the zeroing for host-initiated MSR writes? This is > pretty clearly a KVM bug, we can push it to stable kernels too.
The only problem with this is that QEMU will have no reliable way to know if the KVM it runs with has this bug fixed or not. Machines without vmbus work and even migrate fine with the current KVM despite this bug (the only user of those pages currently is synic timers which re-arm themselves and post messages regardless of zeroing). Now updating QEMU to a vmbus-enabled version without updating the kernel will make the migrations cause guest hangs. If that is tolerable I can happily drop this patch as it complicates code a little. Distros probably won't be affected as they can make sure their kernels have this bug fixed before they roll out a vmbus-capable QEMU. What do you think? Thanks, Roman.