Marcelo Tosatti wrote: > Right, patch at end of the message restarts the process if the pte > changes under the walker. The goto is pretty ugly, but I fail to see any > elegant way of doing that. Ideas? > >
goto is fine for that. But there's a subtle livelock here: suppose vcpu 0 is in guest mode with continuously updating a memory location. vcpu 1 is faulting with that memory location acting as a pte. While we're in kernel mode, we aren't responding to signals like we should; so we need to abort the walk and let the guest retry; that way we go through the signal_pending() check. However, this is an intrusive change, so let's start with the goto and drop it later in favor or an abort. >>> @@ -1510,6 +1510,9 @@ static int emulator_write_phys(struct kvm_vcpu *vcpu, >>> gpa_t gpa, >>> { >>> int ret; >>> >>> + /* No need for kvm_cmpxchg_guest_pte here, its the guest >>> + * responsability to synchronize pte updates and page faults. >>> + */ >>> ret = kvm_write_guest(vcpu->kvm, gpa, val, bytes); >>> if (ret < 0) >>> return 0; >>> >> Hmm. What if an i386 pae guest carefully uses cmpxchg8b to atomically >> set a pte? kvm_write_guest() doesn't guarantee atomicity, so an >> intended atomic write can be seen splitted by the guest walker doing a >> concurrent walk. >> > > True, an atomic write is needed... a separate patch for that seems more > appropriate. > > > Yes. > +static inline bool FNAME(cmpxchg_gpte)(struct kvm *kvm, > + gfn_t table_gfn, unsigned index, > + pt_element_t orig_pte, pt_element_t new_pte) > +{ > + pt_element_t ret; > + pt_element_t *table; > + struct page *page; > + > + page = gfn_to_page(kvm, table_gfn); > + table = kmap_atomic(page, KM_USER0); > + > + ret = CMPXCHG(&table[index], orig_pte, new_pte); > + > + kunmap_atomic(page, KM_USER0); > + > Missing kvm_release_page_dirty() here. May also move mark_page_dirty() here. No need to force inlining. > + return (ret != orig_pte); > +} > + > /* > * Fetch a guest pte for a guest virtual address > */ > @@ -91,6 +112,7 @@ static int FNAME(walk_addr)(struct guest_walker *walker, > gpa_t pte_gpa; > > pgprintk("%s: addr %lx\n", __FUNCTION__, addr); > +walk: > walker->level = vcpu->mmu.root_level; > pte = vcpu->cr3; > #if PTTYPE == 64 > @@ -135,8 +157,9 @@ static int FNAME(walk_addr)(struct guest_walker *walker, > > if (!(pte & PT_ACCESSED_MASK)) { > mark_page_dirty(vcpu->kvm, table_gfn); > - pte |= PT_ACCESSED_MASK; > - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte)); > + if (FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, > + index, pte, pte|PT_ACCESSED_MASK)) > + goto walk; > We lose the accessed bit in the local variable pte here. Not sure if it matters but let's play it safe. > } > > if (walker->level == PT_PAGE_TABLE_LEVEL) { > @@ -159,9 +182,13 @@ static int FNAME(walk_addr)(struct guest_walker *walker, > } > > if (write_fault && !is_dirty_pte(pte)) { > + bool ret; > mark_page_dirty(vcpu->kvm, table_gfn); > - pte |= PT_DIRTY_MASK; > - kvm_write_guest(vcpu->kvm, pte_gpa, &pte, sizeof(pte)); > + ret = FNAME(cmpxchg_gpte)(vcpu->kvm, table_gfn, index, pte, > + pte|PT_DIRTY_MASK); > + if (ret) > + goto walk; > + Again we lose a bit in pte. That ends up in walker->pte and is quite important. -- Any sufficiently difficult bug is indistinguishable from a feature. ------------------------------------------------------------------------- SF.Net email is sponsored by: Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ kvm-devel mailing list kvm-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/kvm-devel