* Ard Biesheuvel (ard.biesheu...@linaro.org) wrote: > On 31 January 2018 at 09:53, Christoffer Dall > <christoffer.d...@linaro.org> wrote: > > On Mon, Jan 29, 2018 at 10:32:12AM +0000, Marc Zyngier wrote: > >> On 29/01/18 10:04, Peter Maydell wrote: > >> > On 29 January 2018 at 09:53, Dr. David Alan Gilbert > >> > <dgilb...@redhat.com> wrote: > >> >> * Peter Maydell (peter.mayd...@linaro.org) wrote: > >> >>> On 26 January 2018 at 19:46, Dr. David Alan Gilbert > >> >>> <dgilb...@redhat.com> wrote: > >> >>>> * Peter Maydell (peter.mayd...@linaro.org) wrote: > >> >>>>> I think the correct fix here is that your test code should turn > >> >>>>> its MMU on. Trying to treat guest RAM as uncacheable doesn't work > >> >>>>> for Arm KVM guests (for the same reason that VGA device video memory > >> >>>>> doesn't work). If it's RAM your guest has to arrange to map it as > >> >>>>> Normal Cacheable, and then everything should work fine. > >> >>>> > >> >>>> Does this cause problems with migrating at just the wrong point during > >> >>>> a VM boot? > >> >>> > >> >>> It wouldn't surprise me if it did, but I don't think I've ever > >> >>> tried to provoke that problem... > >> >> > >> >> If you think it'll get the RAM contents wrong, it might be best to fail > >> >> the migration if you can detect the cache is disabled in the guest. > >> > > >> > I guess QEMU could look at the value of the "MMU disabled/enabled" bit > >> > in the guest's system registers, and refuse migration if it's off... > >> > > >> > (cc'd Marc, Christoffer to check that I don't have the wrong end > >> > of the stick about how thin the ice is in the period before the > >> > guest turns on its MMU...) > >> > >> Once MMU and caches are on, we should be in a reasonable place for QEMU > >> to have a consistent view of the memory. The trick is to prevent the > >> vcpus from changing that. A guest could perfectly turn off its MMU at > >> any given time if it needs to (and it is actually required on some HW if > >> you want to mitigate headlining CVEs), and KVM won't know about that. > >> > > > > (Clarification: KVM can detect this is it bother to check the VCPU's > > system registers, but we don't trap to KVM when the VCPU turns off its > > caches, right?) > > > >> You may have to pause the vcpus before starting the migration, or > >> introduce a new KVM feature that would automatically pause a vcpu that > >> is trying to disable its MMU while the migration is on. This would > >> involve trapping all the virtual memory related system registers, with > >> an obvious cost. But that cost would be limited to the time it takes to > >> migrate the memory, so maybe that's acceptable. > >> > > Is that even sufficient? > > > > What if the following happened. (1) guest turns off MMU, (2) guest > > writes some data directly to ram (3) qemu stops the vcpu (4) qemu reads > > guest ram. QEMU's view of guest ram is now incorrect (stale, > > incoherent, ...). > > > > I'm also not really sure if pausing one VCPU because it turned off its > > MMU will go very well when trying to migrate a large VM (wouldn't this > > ask for all the other VCPUs beginning to complain that the stopped VCPU > > appears to be dead?). As a short-term 'fix' it's probably better to > > refuse migration if you detect that a VCPU had begun turning off its > > MMU. > > > > On the larger scale of thins; this appears to me to be another case of > > us really needing some way to coherently access memory between QEMU and > > the VM, but in the case of the VCPU turning off the MMU prior to > > migration, we don't even know where it may have written data, and I'm > > therefore not really sure what the 'proper' solution would be. > > > > (cc'ing Ard who has has thought about this problem before in the context > > of UEFI and VGA.) > > > > Actually, the VGA case is much simpler because the host is not > expected to write to the framebuffer, only read from it, and the guest > is not expected to create a cacheable mapping for it, so any > incoherency can be trivially solved by cache invalidation on the host > side. (Note that this has nothing to do with DMA coherency, but only > with PCI MMIO BARs that are backed by DRAM in the host) > > In the migration case, it is much more complicated, and I think > capturing the state of the VM in a way that takes incoherency between > caches and main memory into account is simply infeasible (i.e., the > act of recording the state of guest RAM via a cached mapping may evict > clean cachelines that are out of sync, and so it is impossible to > record both the cached *and* the delta with the uncached state) > > I wonder how difficult it would be to > a) enable trapping of the MMU system register when a guest CPU is > found to have its MMU off at migration time > b) allow the guest CPU to complete whatever it thinks it needs to be > doing with the MMU off > c) once it re-enables the MMU, proceed with capturing the memory state > > Full disclosure: I know very little about KVM migration ...
The difficulty is that migration is 'live' - i.e. the guest is running while we're copying the data across; that means that a guest might do any of these MMU things multiple times - so if we wait for it to be right, will it go back to being wrong? How long do you wait? (It's not a bad hack if that's the best we can do though). Now of course 'live' itself sounds scary for consistency, but the only thing we really require is that a page is marked dirty some time after it's been written to so that we cause it to be sent again and that we eventually send a correct version; it's ok for us to be sending inconsistent versions as long as we eventually send the right version. Dave -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK