Guillaume Thouvenin wrote:
> Hello,
>
> I have a question about how guest page table and shadow page table
> work together and more precisely, about how host is involved when guest
> access a page that it's already in the page table.
>
> The guest maintains its page table to translate guest virtual address
> to guest physical address. It uses the MMU and cr3 register for the
> translation. When there is a page fault, the host gets involved and it
> catches the guest physical address (by walking through the guest page
> table?) and fills its shadow page table.
When a page fault occurs, a vmexit is generated. The vmexit is
delivered before delivery to the interrupt handler but after CR2 is
populated with the faulting address. We can then obtain the faulting
virtual address from CR2 in the host and walk the guest's page table to
determine what guest physical address should be mapped at that guest
virtual address (if any at all).
We then use that information to add an entry to the shadow page table.
The guest runs with the shadow page table installed, the hardware never
actually sees the guest page table. This is because the hardware has no
knowledge of MMU virtualization (this is all assuming pre-EPT/NPT hardware).
> Thus the host can make the
> translation from guest physical address to physical address.
And we use a combination of things to make that determination. First,
there is a set of slots that cover ranges of physical addresses. Slots
are necessary on x86 as there may be very large holes in physical
memory. It also makes certain optimization easier. Each slot contains
a base virtual address. This is a QEMU userspace address that is where
we malloc()'d memory for that particular slot. Normally, this
corresponds to phys_ram_base within QEMU.
We add the guest physical address to the virtual address base in the
slot (minus the slot starting address) to obtain a QEMU userspace
address for the guest physical address. Once we have that, we can call
get_user_pages() to pin the userspace memory into physical memory. We
can then obtain a host physical address. That's what we use to populate
the shadow page table.
The flip side of this is that as long as we have a shadow page table
mapping referencing a host physical address, we must keep the page
pinned into memory. Once we drop the shadow page table entry (because
the guest process has gone away or we decide to evict it from the cache)
we can reduce the reference count via put_page().
This is where mmu notifiers come into play. If Linux really wants to
reclaim memory on the host, there's no way ATM for it to do so with the
memory we have pinned. mmu notifiers provide a mechanism to Linux to
ask KVM to explicitly unpin a particular guest page.
> Now, if
> the guest reads the same page, the PTE will point to guest physical
> address but there will not be any page fault so the host will not be
> involved in the translation (not so sure). I don't see how the guest
> virtual address will be translated to physical address? I miss
> something but I don't see what.
>
I think what you're missing is that while the guest is running, it's CR3
does not point to it's own page table but rather to the shadow page
table. We intercept CR3 reads to pretend to the guest that it's page
table is, in fact, installed but it's really the shadow page table
that's in the hardware register.
Regards,
Anthony Liguori
> Thanks for your help,
> Guillaume
>
> -
> Check out the new SourceForge.net Marketplace.
> It's the best place to buy or sell services for
> just about anything Open Source.
> http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
> ___
> kvm-devel mailing list
> kvm-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/kvm-devel
>
-
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://ad.doubleclick.net/clk;164216239;13503038;w?http://sf.net/marketplace
___
kvm-devel mailing list
kvm-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/kvm-devel