On Fri, Aug 17, 2012 at 05:06:18PM +1000, Benjamin Herrenschmidt wrote:
> On Wed, 2012-08-15 at 14:59 -0300, Marcelo Tosatti wrote:
> > 
> > The guest should not expect memory accesses to an address
> > to behave sanely while changing a BAR anyway.
> > 
> > Yes, compatibility for change of GPA base can be done in the
> > kernel. I can look into it next week if nobody has done so at
> > that point. 
> 
> There's one thing to be extra careful about here is if we start
> doing that for normal memory (in case we start breaking it up
> in slots, such as NUMA setups etc...).
> 
> The problem is that we must not allow normal memory accesses to be
> handled via the "emulation" code (ie MMIO emulation or load/store
> emulation, whatever we call it).
> 
> Part of the issues is that on architectures that don't use IPIs for
> TLB invalidations but instead use some form of HW broadcast such as
> PowerPC or ARM, there is an inherent race in that the emulation code can
> keep a guest physical address (and perform the relevant access to the
> corresponding memory region) way beyond the point where the guest
> virtual->physical translation leading to that address has been
> invalidated.
> 
> This doesn't happen on x86 because essentially the completion of the
> invalidation IPI has to wait for all VCPUs to "respond" and thus to
> finish whatever emulation they are doing. This is not the case on archs
> with a HW invalidate broadcast.
> 
> This is a nasty race, and while we more/less decided that it was
> survivable as long as we only go through emulation for devices (as we
> don't play swapping games with them in the guest kernel), the minute we
> allow normal guest memory access to "slip through", we have broken the
> guest virtual memory model.

This emulation is in hardware, yes? It is the lack of a TLB entry (or
the lack of a valid pagetable to fill the TLB) that triggers it?

> So if we are manipulated memory slots used for guest RAM we must -not-
> break atomicity, since during the time the slot is gone, it will
> fallback to emulation, introducing the above race (at least on PowerPC
> and ARM).

You could say get the vcpus to a known state (which has a side effect of
making sure that emulation is stopped), no? (just as a mental exercise).

> Cheers,
> Ben.

Yes. Well, Avi mentioned earlier that there are users for change of GPA
base. But, if my understanding is correct, the code that emulates
change of BAR in QEMU is:

        /* now do the real mapping */
        if (r->addr != PCI_BAR_UNMAPPED) {
            memory_region_del_subregion(r->address_space, r->memory);
        }
        r->addr = new_addr;
        if (r->addr != PCI_BAR_UNMAPPED) {
            memory_region_add_subregion_overlap(r->address_space,
                                                r->addr, r->memory, 1);

These translate to two kvm_set_user_memory ioctls. 

"> Without taking into consideration backwards compatibility, userspace 
 > can first delete the slot and later create a new one.

 Current qemu will in fact do that.  Not sure about older ones.
"

Avi, where it does that?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to