Marcelo Tosatti wrote:
> Right, patch at end of the message restarts the process if the pte
> changes under the walker. The goto is pretty ugly, but I fail to see any
> elegant way of doing that. Ideas?
>
>
goto is fine for that. But there's a subtle livelock here: suppose vcpu
0 is in guest mode with continuously updating a memory location. vcpu 1
is faulting with that memory location acting as a pte. While we're in
kernel mode, we aren't responding to signals like we should; so we need
to abort the walk and let the guest retry; that way we go through the
signal_pending() check.
However, this is an intrusive change, so let's start with the goto and
drop it later in favor or an abort.
>>> @@ -1510,6 +1510,9 @@ static int emulator_write_phys(struct kvm_vcpu *vcpu,
>>> gpa_t gpa,
>>> {
>>> int ret;
>>>
>>> + /* No need for kvm_cmpxchg_guest_pte here, its the guest
>>> + * responsability to synchronize pte updates and page faults.
>>> + */
>>> ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes);
>>> if (ret < 0)
>>> return 0;
>>>
>> Hmm. What if an i386 pae guest carefully uses cmpxchg8b to atomically
>> set a pte? kvm_write_guest() doesn't guarantee atomicity, so an
>> intended atomic write can be seen splitted by the guest walker doing a
>> concurrent walk.
>>
>
> True, an atomic write is needed... a separate patch for that seems more
> appropriate.
>
>
>
Yes.
> +static inline bool FNAME(cmpxchg_gpte)(struct kvm *kvm,
> + gfn_t table_gfn, unsigned index,
> + pt_element_t orig_pte, pt_element_t new_pte)
> +{
> + pt_element_t ret;
> + pt_element_t *table;
> + struct page *page;
> +
> + page = gfn_to_page(kvm, table_gfn);
> + table = kmap_atomic(page, KM_USER0);
> +
> + ret = CMPXCHG(&table[index], orig_pte, new_pte);
> +
> + kunmap_atomic(page, KM_USER0);
> +
>
Missing kvm_release_page_dirty() here. May also move mark_page_dirty()
here.
No need to force inlining.
> + return (ret != orig_pte);
> +}
> +
> /*
> * Fetch a guest pte for a guest virtual address
> */
> @@ -91,6 +112,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
> gpa_t pte_gpa;
>
> pgprintk("%s: addr %lx\n", __FUNCTION__, addr);
> +walk:
> walker->level = vcpu->mmu.root_level;
> pte = vcpu->cr3;
> #if PTTYPE == 64
> @@ -135,8 +157,9 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
>
> if (!(pte & PT_ACCESSED_MASK)) {
> mark_page_dirty(vcpu->kvm, table_gfn);
> - pte |= PT_ACCESSED_MASK;
> - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
> + if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn,
> + index, pte, pte|PT_ACCESSED_MASK))
> + goto walk;
>
We lose the accessed bit in the local variable pte here. Not sure if it
matters but let's play it safe.
> }
>
> if (walker->level == PT_PAGE_TABLE_LEVEL) {
> @@ -159,9 +182,13 @@ static int FNAME(walk_addr)(struct guest_walker *walker,
> }
>
> if (write_fault && !is_dirty_pte(pte)) {
> + bool ret;
> mark_page_dirty(vcpu->kvm, table_gfn);
> - pte |= PT_DIRTY_MASK;
> - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte));
> + ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte,
> + pte|PT_DIRTY_MASK);
> + if (ret)
> + goto walk;
> +
Again we lose a bit in pte. That ends up in walker->pte and is quite
important.
--
Any sufficiently difficult bug is indistinguishable from a feature.
-------------------------------------------------------------------------
SF.Net email is sponsored by:
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
kvm-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/kvm-devel