Re: [PATCH v2 0/9] cleanup around kvm_sync_page, and a few micro-optimizations
On 2016/03/08 17:30, Paolo Bonzini wrote: > On 08/03/2016 09:00, Takuya Yoshikawa wrote: >>> KVM: MMU: introduce kvm_mmu_flush_or_zap >>> KVM: MMU: move TLB flush out of __kvm_sync_page >>> KVM: MMU: use kvm_sync_page in kvm_sync_pages >>> KVM: MMU: cleanup __kvm_sync_page and its callers >>> KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* >>> KVM: MMU: move zap/flush to kvm_mmu_get_page >>> KVM: MMU: coalesce more page zapping in mmu_sync_children >> >> 1-7 look good to me. > > Great, these are the ones where I wanted more review. > > Can I add your Reviewed-by for these? Yes, please. Reviewed-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> Takuya
Re: [PATCH v2 0/9] cleanup around kvm_sync_page, and a few micro-optimizations
On 2016/03/08 17:30, Paolo Bonzini wrote: > On 08/03/2016 09:00, Takuya Yoshikawa wrote: >>> KVM: MMU: introduce kvm_mmu_flush_or_zap >>> KVM: MMU: move TLB flush out of __kvm_sync_page >>> KVM: MMU: use kvm_sync_page in kvm_sync_pages >>> KVM: MMU: cleanup __kvm_sync_page and its callers >>> KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* >>> KVM: MMU: move zap/flush to kvm_mmu_get_page >>> KVM: MMU: coalesce more page zapping in mmu_sync_children >> >> 1-7 look good to me. > > Great, these are the ones where I wanted more review. > > Can I add your Reviewed-by for these? Yes, please. Reviewed-by: Takuya Yoshikawa Takuya
Re: [PATCH v2 0/9] cleanup around kvm_sync_page, and a few micro-optimizations
On 2016/03/07 23:15, Paolo Bonzini wrote: > Having committed the ubsan fixes, this are the cleanups that are left. > > Compared to v1, I have fixed the patch to coalesce page zapping after > mmu_sync_children (as requested by Takuya and Guangrong), and I have > rewritten is_last_gpte again in an even simpler way. > > Paolo > > Paolo Bonzini (9): >KVM: MMU: introduce kvm_mmu_flush_or_zap >KVM: MMU: move TLB flush out of __kvm_sync_page >KVM: MMU: use kvm_sync_page in kvm_sync_pages >KVM: MMU: cleanup __kvm_sync_page and its callers >KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* >KVM: MMU: move zap/flush to kvm_mmu_get_page >KVM: MMU: coalesce more page zapping in mmu_sync_children 1-7 look good to me. >KVM: MMU: simplify is_last_gpte >KVM: MMU: micro-optimize gpte_access 8 and 9 look reasonable to me, though I read them only briefly. Takuya
Re: [PATCH v2 0/9] cleanup around kvm_sync_page, and a few micro-optimizations
On 2016/03/07 23:15, Paolo Bonzini wrote: > Having committed the ubsan fixes, this are the cleanups that are left. > > Compared to v1, I have fixed the patch to coalesce page zapping after > mmu_sync_children (as requested by Takuya and Guangrong), and I have > rewritten is_last_gpte again in an even simpler way. > > Paolo > > Paolo Bonzini (9): >KVM: MMU: introduce kvm_mmu_flush_or_zap >KVM: MMU: move TLB flush out of __kvm_sync_page >KVM: MMU: use kvm_sync_page in kvm_sync_pages >KVM: MMU: cleanup __kvm_sync_page and its callers >KVM: MMU: invert return value of mmu.sync_page and *kvm_sync_page* >KVM: MMU: move zap/flush to kvm_mmu_get_page >KVM: MMU: coalesce more page zapping in mmu_sync_children 1-7 look good to me. >KVM: MMU: simplify is_last_gpte >KVM: MMU: micro-optimize gpte_access 8 and 9 look reasonable to me, though I read them only briefly. Takuya
Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
On 2016/02/24 22:17, Paolo Bonzini wrote: > Move the call to kvm_mmu_flush_or_zap outside the loop. > > Signed-off-by: Paolo Bonzini> --- > arch/x86/kvm/mmu.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 725316df32ec..6d47b5c43246 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, > struct mmu_page_path parents; > struct kvm_mmu_pages pages; > LIST_HEAD(invalid_list); > + bool flush = false; > > while (mmu_unsync_walk(parent, )) { > bool protected = false; > - bool flush = false; > > for_each_sp(pages, sp, parents, i) > protected |= rmap_write_protect(vcpu, sp->gfn); > > - if (protected) > + if (protected) { > kvm_flush_remote_tlbs(vcpu->kvm); > + flush = false; > + } > > for_each_sp(pages, sp, parents, i) { > flush |= kvm_sync_page(vcpu, sp, _list); > mmu_pages_clear_parents(); > } > - kvm_mmu_flush_or_zap(vcpu, _list, false, flush); > cond_resched_lock(>kvm->mmu_lock); This may release the mmu_lock before committing the zapping. Is it safe? If so, we may want to see the reason in the changelog. Takuya > } > + > + kvm_mmu_flush_or_zap(vcpu, _list, false, flush); > } > > static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp) >
Re: [PATCH 09/12] KVM: MMU: coalesce zapping page after mmu_sync_children
On 2016/02/24 22:17, Paolo Bonzini wrote: > Move the call to kvm_mmu_flush_or_zap outside the loop. > > Signed-off-by: Paolo Bonzini > --- > arch/x86/kvm/mmu.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 725316df32ec..6d47b5c43246 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -2029,24 +2029,27 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, > struct mmu_page_path parents; > struct kvm_mmu_pages pages; > LIST_HEAD(invalid_list); > + bool flush = false; > > while (mmu_unsync_walk(parent, )) { > bool protected = false; > - bool flush = false; > > for_each_sp(pages, sp, parents, i) > protected |= rmap_write_protect(vcpu, sp->gfn); > > - if (protected) > + if (protected) { > kvm_flush_remote_tlbs(vcpu->kvm); > + flush = false; > + } > > for_each_sp(pages, sp, parents, i) { > flush |= kvm_sync_page(vcpu, sp, _list); > mmu_pages_clear_parents(); > } > - kvm_mmu_flush_or_zap(vcpu, _list, false, flush); > cond_resched_lock(>kvm->mmu_lock); This may release the mmu_lock before committing the zapping. Is it safe? If so, we may want to see the reason in the changelog. Takuya > } > + > + kvm_mmu_flush_or_zap(vcpu, _list, false, flush); > } > > static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp) >
[PATCH 0/2] KVM: x86: MMU: Clean up handle_mmio_page_fault() handling in kvm_mmu_page_fault()
The end result is very similar to handle_ept_misconfig()'s corresponding code. It may also be possible to change handle_ept_misconfig() not to call handle_mmio_page_fault() separately from kvm_mmu_page_fault(): the only difference seems to be whether it checks for PFERR_RSVD_MASK. Takuya Yoshikawa (2): KVM: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault() KVM: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault() arch/x86/kvm/mmu.c | 54 +- arch/x86/kvm/paging_tmpl.h | 19 ++-- 2 files changed, 26 insertions(+), 47 deletions(-) -- 2.1.0
[PATCH 0/2] KVM: x86: MMU: Clean up handle_mmio_page_fault() handling in kvm_mmu_page_fault()
The end result is very similar to handle_ept_misconfig()'s corresponding code. It may also be possible to change handle_ept_misconfig() not to call handle_mmio_page_fault() separately from kvm_mmu_page_fault(): the only difference seems to be whether it checks for PFERR_RSVD_MASK. Takuya Yoshikawa (2): KVM: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault() KVM: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault() arch/x86/kvm/mmu.c | 54 +- arch/x86/kvm/paging_tmpl.h | 19 ++-- 2 files changed, 26 insertions(+), 47 deletions(-) -- 2.1.0
[PATCH 2/2] KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()
Rather than placing a handle_mmio_page_fault() call in each vcpu->arch.mmu.page_fault() handler, moving it up to kvm_mmu_page_fault() makes the code better: - avoids code duplication - for kvm_arch_async_page_ready(), which is the other caller of vcpu->arch.mmu.page_fault(), removes an extra error_code check - avoids returning both RET_MMIO_PF_* values and raw integer values from vcpu->arch.mmu.page_fault() Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 39 --- arch/x86/kvm/paging_tmpl.h | 19 ++- 2 files changed, 22 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a28b734..2ce3892 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3370,13 +3370,6 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, gva, true); - - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - } - r = mmu_topup_memory_caches(vcpu); if (r) return r; @@ -3460,13 +3453,6 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, gpa, true); - - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - } - r = mmu_topup_memory_caches(vcpu); if (r) return r; @@ -4361,18 +4347,27 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, enum emulation_result er; bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu); + if (unlikely(error_code & PFERR_RSVD_MASK)) { + r = handle_mmio_page_fault(vcpu, cr2, direct); + if (r == RET_MMIO_PF_EMULATE) { + emulation_type = 0; + goto emulate; + } + if (r == RET_MMIO_PF_RETRY) + return 1; + if (r < 0) + return r; + } + r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r < 0) - goto out; - - if (!r) { - r = 1; - goto out; - } + return r; + if (!r) + return 1; if (mmio_info_in_cache(vcpu, cr2, direct)) emulation_type = 0; - +emulate: er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); switch (er) { @@ -4386,8 +4381,6 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, default: BUG(); } -out: - return r; } EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 6c9fed9..05827ff 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -702,24 +702,17 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, addr, mmu_is_nested(vcpu)); - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - - /* -* page fault with PFEC.RSVD = 1 is caused by shadow -* page fault, should not be used to walk guest page -* table. -*/ - error_code &= ~PFERR_RSVD_MASK; - }; - r = mmu_topup_memory_caches(vcpu); if (r) return r; /* +* If PFEC.RSVD is set, this is a shadow page fault. +* The bit needs to be cleared before walking guest page tables. +*/ + error_code &= ~PFERR_RSVD_MASK; + + /* * Look up the guest pte for the faulting address. */ r = FNAME(walk_addr)(, vcpu, addr, error_code); -- 2.1.0
[PATCH 2/2] KVM: x86: MMU: Move handle_mmio_page_fault() call to kvm_mmu_page_fault()
Rather than placing a handle_mmio_page_fault() call in each vcpu->arch.mmu.page_fault() handler, moving it up to kvm_mmu_page_fault() makes the code better: - avoids code duplication - for kvm_arch_async_page_ready(), which is the other caller of vcpu->arch.mmu.page_fault(), removes an extra error_code check - avoids returning both RET_MMIO_PF_* values and raw integer values from vcpu->arch.mmu.page_fault() Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 39 --- arch/x86/kvm/paging_tmpl.h | 19 ++- 2 files changed, 22 insertions(+), 36 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a28b734..2ce3892 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3370,13 +3370,6 @@ static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, gva, true); - - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - } - r = mmu_topup_memory_caches(vcpu); if (r) return r; @@ -3460,13 +3453,6 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, gpa, true); - - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - } - r = mmu_topup_memory_caches(vcpu); if (r) return r; @@ -4361,18 +4347,27 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, enum emulation_result er; bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu); + if (unlikely(error_code & PFERR_RSVD_MASK)) { + r = handle_mmio_page_fault(vcpu, cr2, direct); + if (r == RET_MMIO_PF_EMULATE) { + emulation_type = 0; + goto emulate; + } + if (r == RET_MMIO_PF_RETRY) + return 1; + if (r < 0) + return r; + } + r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r < 0) - goto out; - - if (!r) { - r = 1; - goto out; - } + return r; + if (!r) + return 1; if (mmio_info_in_cache(vcpu, cr2, direct)) emulation_type = 0; - +emulate: er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); switch (er) { @@ -4386,8 +4381,6 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, default: BUG(); } -out: - return r; } EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 6c9fed9..05827ff 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -702,24 +702,17 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); - if (unlikely(error_code & PFERR_RSVD_MASK)) { - r = handle_mmio_page_fault(vcpu, addr, mmu_is_nested(vcpu)); - if (likely(r != RET_MMIO_PF_INVALID)) - return r; - - /* -* page fault with PFEC.RSVD = 1 is caused by shadow -* page fault, should not be used to walk guest page -* table. -*/ - error_code &= ~PFERR_RSVD_MASK; - }; - r = mmu_topup_memory_caches(vcpu); if (r) return r; /* +* If PFEC.RSVD is set, this is a shadow page fault. +* The bit needs to be cleared before walking guest page tables. +*/ + error_code &= ~PFERR_RSVD_MASK; + + /* * Look up the guest pte for the faulting address. */ r = FNAME(walk_addr)(, vcpu, addr, error_code); -- 2.1.0
[PATCH 1/2] KVM: x86: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault()
These two have only slight differences: - whether 'addr' is of type u64 or of type gva_t - whether they have 'direct' parameter or not Concerning the former, quickly_check_mmio_pf()'s u64 is better because 'addr' needs to be able to have both a guest physical address and a guest virtual address. The latter is just a stylistic issue as we can always calculate the mode from the 'vcpu' as is_mmio_page_fault() does. This patch keeps the parameter to make the following patch cleaner. In addition, the patch renames the function to mmio_info_in_cache() to make it clear what it actually checks for. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 95a955d..a28b734 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3273,7 +3273,7 @@ static bool is_shadow_zero_bits_set(struct kvm_mmu *mmu, u64 spte, int level) return __is_rsvd_bits_set(>shadow_zero_check, spte, level); } -static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct) +static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) { if (direct) return vcpu_match_mmio_gpa(vcpu, addr); @@ -3332,7 +3332,7 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) u64 spte; bool reserved; - if (quickly_check_mmio_pf(vcpu, addr, direct)) + if (mmio_info_in_cache(vcpu, addr, direct)) return RET_MMIO_PF_EMULATE; reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, ); @@ -4354,19 +4354,12 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu) kvm_mmu_commit_zap_page(vcpu->kvm, _list); } -static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr) -{ - if (vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu)) - return vcpu_match_mmio_gpa(vcpu, addr); - - return vcpu_match_mmio_gva(vcpu, addr); -} - int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, void *insn, int insn_len) { int r, emulation_type = EMULTYPE_RETRY; enum emulation_result er; + bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu); r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r < 0) @@ -4377,7 +4370,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, goto out; } - if (is_mmio_page_fault(vcpu, cr2)) + if (mmio_info_in_cache(vcpu, cr2, direct)) emulation_type = 0; er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); -- 2.1.0
[PATCH 1/2] KVM: x86: MMU: Consolidate quickly_check_mmio_pf() and is_mmio_page_fault()
These two have only slight differences: - whether 'addr' is of type u64 or of type gva_t - whether they have 'direct' parameter or not Concerning the former, quickly_check_mmio_pf()'s u64 is better because 'addr' needs to be able to have both a guest physical address and a guest virtual address. The latter is just a stylistic issue as we can always calculate the mode from the 'vcpu' as is_mmio_page_fault() does. This patch keeps the parameter to make the following patch cleaner. In addition, the patch renames the function to mmio_info_in_cache() to make it clear what it actually checks for. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 15 --- 1 file changed, 4 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 95a955d..a28b734 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3273,7 +3273,7 @@ static bool is_shadow_zero_bits_set(struct kvm_mmu *mmu, u64 spte, int level) return __is_rsvd_bits_set(>shadow_zero_check, spte, level); } -static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct) +static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) { if (direct) return vcpu_match_mmio_gpa(vcpu, addr); @@ -3332,7 +3332,7 @@ int handle_mmio_page_fault(struct kvm_vcpu *vcpu, u64 addr, bool direct) u64 spte; bool reserved; - if (quickly_check_mmio_pf(vcpu, addr, direct)) + if (mmio_info_in_cache(vcpu, addr, direct)) return RET_MMIO_PF_EMULATE; reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, ); @@ -4354,19 +4354,12 @@ static void make_mmu_pages_available(struct kvm_vcpu *vcpu) kvm_mmu_commit_zap_page(vcpu->kvm, _list); } -static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr) -{ - if (vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu)) - return vcpu_match_mmio_gpa(vcpu, addr); - - return vcpu_match_mmio_gva(vcpu, addr); -} - int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, void *insn, int insn_len) { int r, emulation_type = EMULTYPE_RETRY; enum emulation_result er; + bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu); r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); if (r < 0) @@ -4377,7 +4370,7 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, goto out; } - if (is_mmio_page_fault(vcpu, cr2)) + if (mmio_info_in_cache(vcpu, cr2, direct)) emulation_type = 0; er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); -- 2.1.0
[PATCH] KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
Not just in order to clean up the code, but to make it faster by using enhanced instructions: the initialization became 20-30% faster on our testing machine. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a1a3d19..7f5a82b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2041,14 +2041,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, } } -static void init_shadow_page_table(struct kvm_mmu_page *sp) -{ - int i; - - for (i = 0; i < PT64_ENT_PER_PAGE; ++i) - sp->spt[i] = 0ull; -} - static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp) { sp->write_flooding_count = 0; @@ -2128,7 +2120,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, account_shadowed(vcpu->kvm, sp); } sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen; - init_shadow_page_table(sp); + clear_page(sp->spt); trace_kvm_mmu_get_page(sp, true); return sp; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: x86: MMU: Use clear_page() instead of init_shadow_page_table()
Not just in order to clean up the code, but to make it faster by using enhanced instructions: the initialization became 20-30% faster on our testing machine. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 10 +- 1 file changed, 1 insertion(+), 9 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a1a3d19..7f5a82b 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2041,14 +2041,6 @@ static void mmu_sync_children(struct kvm_vcpu *vcpu, } } -static void init_shadow_page_table(struct kvm_mmu_page *sp) -{ - int i; - - for (i = 0; i < PT64_ENT_PER_PAGE; ++i) - sp->spt[i] = 0ull; -} - static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp) { sp->write_flooding_count = 0; @@ -2128,7 +2120,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, account_shadowed(vcpu->kvm, sp); } sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen; - init_shadow_page_table(sp); + clear_page(sp->spt); trace_kvm_mmu_get_page(sp, true); return sp; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
As kvm_mmu_get_page() was changed so that every parent pointer would not get into the sp->parent_ptes chain before the entry pointed to by it was set properly, we can use the for_each_rmap_spte macro instead of pte_list_walk(). Signed-off-by: Takuya Yoshikawa Cc: Xiao Guangrong --- arch/x86/kvm/mmu.c | 27 ++- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ec61b22..204c7d4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!rmap_head->val) - return; - - if (!(rmap_head->val & 1)) - return fn((u64 *)rmap_head->val); - - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). In addition, the patch avoids calling mark_unsync() for other parents in the sp->parent_ptes chain than the newly added parent_pte, because they have been there since before the current page fault handling started. Signed-off-by: Takuya Yoshikawa Cc: Xiao Guangrong --- arch/x86/kvm/mmu.c | 23 +-- arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 11 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7f46e3e..ec61b22 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2119,12 +2119,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - if (sp->unsync_children) { + if (sp->unsync_children) kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); - kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) - kvm_mmu_mark_parents_unsync(sp); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2135,8 +2131,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, @@ -2204,7 +2198,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp) { u64 spte; @@ -2215,6 +2210,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) shadow_user_mask | shadow_x_mask | shadow_accessed_mask; mmu_spte_set(sptep, spte); + + mmu_page_add_parent_pte(vcpu, sp, sptep); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2273,11 +2273,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2743,7 +2738,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp); + link_shadow_page(vcpu, iterator.sptep, sp); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 11650ea..0dcf9c8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 204c7d4..a1a3d19 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2720,8 +2719,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp); } @@ -3078,8 +3076,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3091,9 +3088,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3130,7 +3125,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3163,9 +3158,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0dcf9c8..91e939b 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp); } -- 2.1.0 -- To unsubscribe from this l
[PATCH V4 0/3] KVM: x86: MMU: Clean up x86's mmu code for future work - part2
Guests worked normally in shadow paging mode (ept=0) on my test machine. Please check if the first two patches reflect what you meant correctly. Takuya Yoshikawa (3): [1] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() [2] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() [3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() arch/x86/kvm/mmu.c | 70 +++--- arch/x86/kvm/paging_tmpl.h | 10 +++ 2 files changed, 26 insertions(+), 54 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V4 0/3] KVM: x86: MMU: Clean up x86's mmu code for future work - part2
Guests worked normally in shadow paging mode (ept=0) on my test machine. Please check if the first two patches reflect what you meant correctly. Takuya Yoshikawa (3): [1] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() [2] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() [3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() arch/x86/kvm/mmu.c | 70 +++--- arch/x86/kvm/paging_tmpl.h | 10 +++ 2 files changed, 26 insertions(+), 54 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). In addition, the patch avoids calling mark_unsync() for other parents in the sp->parent_ptes chain than the newly added parent_pte, because they have been there since before the current page fault handling started. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.x...@linux.intel.com> --- arch/x86/kvm/mmu.c | 23 +-- arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 11 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7f46e3e..ec61b22 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2119,12 +2119,8 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - if (sp->unsync_children) { + if (sp->unsync_children) kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); - kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) - kvm_mmu_mark_parents_unsync(sp); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2135,8 +2131,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, @@ -2204,7 +2198,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp) { u64 spte; @@ -2215,6 +2210,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) shadow_user_mask | shadow_x_mask | shadow_accessed_mask; mmu_spte_set(sptep, spte); + + mmu_page_add_parent_pte(vcpu, sp, sptep); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2273,11 +2273,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2743,7 +2738,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp); + link_shadow_page(vcpu, iterator.sptep, sp); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 11650ea..0dcf9c8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/3] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 204c7d4..a1a3d19 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2720,8 +2719,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp); } @@ -3078,8 +3076,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3091,9 +3088,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3130,7 +3125,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3163,9 +3158,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0dcf9c8..91e939b 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp)
[PATCH 2/3] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
As kvm_mmu_get_page() was changed so that every parent pointer would not get into the sp->parent_ptes chain before the entry pointed to by it was set properly, we can use the for_each_rmap_spte macro instead of pte_list_walk(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> Cc: Xiao Guangrong <guangrong.x...@linux.intel.com> --- arch/x86/kvm/mmu.c | 27 ++- 1 file changed, 6 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ec61b22..204c7d4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!rmap_head->val) - return; - - if (!(rmap_head->val & 1)) - return fn((u64 *)rmap_head->val); - - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
On 2015/11/26 1:32, Paolo Bonzini wrote: On 20/11/2015 09:57, Xiao Guangrong wrote: You can move this patch to the front of [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set() (then the parent spte is present now), you can directly clean up for_each_rmap_spte(). So basically squash together the two patches (8/10 and 9/10) except the change to kvm_mmu_mark_parents_unsync; then in the second patch switch from pte_list_walk to for_each_rmap_spte. That makes sense indeed. Sorry for my being late to respond to Xiao's suggestions. I could not use my development machine for a while this week. In short, this kvm_mmu_mark_parents_unsync() call in kvm_mmu_get_page() should have been mark_unsync() for the new parent_pte only, because we are constructing the mappings from/to it and other parents in the sp->parent_ptes are not related to this fault? As the code has been this way for some time, a bit scary to change it, but I'll do some tests without that extra kvm_mmu_mark_parents_unsync() with a guest (with ept=0) this afternoon. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
On 2015/11/26 1:32, Paolo Bonzini wrote: On 20/11/2015 09:57, Xiao Guangrong wrote: You can move this patch to the front of [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() By moving kvm_mmu_mark_parents_unsync() to the behind of mmu_spte_set() (then the parent spte is present now), you can directly clean up for_each_rmap_spte(). So basically squash together the two patches (8/10 and 9/10) except the change to kvm_mmu_mark_parents_unsync; then in the second patch switch from pte_list_walk to for_each_rmap_spte. That makes sense indeed. Sorry for my being late to respond to Xiao's suggestions. I could not use my development machine for a while this week. In short, this kvm_mmu_mark_parents_unsync() call in kvm_mmu_get_page() should have been mark_unsync() for the new parent_pte only, because we are constructing the mappings from/to it and other parents in the sp->parent_ptes are not related to this fault? As the code has been this way for some time, a bit scary to change it, but I'll do some tests without that extra kvm_mmu_mark_parents_unsync() with a guest (with ept=0) this afternoon. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
On 2015/11/20 17:46, Xiao Guangrong wrote: You just ignored my comment on the previous version... I'm sorry but please read the explanation in patch 00. I've read your comments and I'm not ignoring you. Since this patch set has become huge than expected, I'm sending this version so that patch 01-07 can be applied first. For patch 08-10, I think we need to check more because there seems to be some confusion between us. You can also read other discussions between Marcelo, Paolo and me. Anyway, since these three patches has been placed at the end of the series now, I hope we can concentrate on them easier than before. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b020323..9baf884 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2724,8 +2723,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp); } @@ -3082,8 +3080,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3095,9 +3092,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3134,7 +3129,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3167,9 +3162,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0dcf9c8..91e939b 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp); } -- 2.1.0 -- To unsubscribe from this l
[PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 22 -- arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 10 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4e29d9a..b020323 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2107,14 +2107,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } - mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2125,8 +2120,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, @@ -2194,7 +2187,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp) { u64 spte; @@ -2205,6 +2199,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) shadow_user_mask | shadow_x_mask | shadow_accessed_mask; mmu_spte_set(sptep, spte); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); + + mmu_page_add_parent_pte(vcpu, sp, sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2263,11 +2262,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2733,7 +2727,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp); + link_shadow_page(vcpu, iterator.sptep, sp); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 11650ea..0dcf9c8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does nearly the same as the for_each_rmap_spte macro. The only difference is that is_shadow_present_pte() checks cannot be placed there because kvm_mmu_mark_parents_unsync() can be called with a new parent pointer whose entry is not set yet. By calling mark_unsync() separately for the parent and adding the parent pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro works with no problem. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 36 +--- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7f46e3e..4e29d9a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!rmap_head->val) - return; - - if (!(rmap_head->val & 1)) - return fn((u64 *)rmap_head->val); - - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) @@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra allocation error check and zero-initialization of parent_ptes: shadow page headers allocated by kmem_cache_zalloc() are always in the per-VCPU pools. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5b249d4..7f46e3e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1726,8 +1726,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, mmu_spte_clear_no_track(parent_pte); } -static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, - u64 *parent_pte, int direct) +static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) { struct kvm_mmu_page *sp; @@ -1743,8 +1742,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); - sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); kvm_mod_used_mmu_pages(vcpu->kvm, +1); return sp; } @@ -2133,10 +2130,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, trace_kvm_mmu_get_page(sp, false); return sp; } + ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); - if (!sp) - return sp; + + sp = kvm_mmu_alloc_page(vcpu, direct); + + mmu_page_add_parent_pte(vcpu, sp, parent_pte); + sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/10] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 26 +- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3104748..5b249d4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1098,17 +1098,23 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap_head->val) return NULL; if (!(rmap_head->val & 1)) { iter->desc = NULL; - return (u64 *)rmap_head->val; + sptep = (u64 *)rmap_head->val; + goto out; } iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + BUG_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1118,14 +1124,14 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1133,17 +1139,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + BUG_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \ -_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ -_spte_ = rmap_get_next(_iter_)) +_spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1358,7 +1367,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head) bool flush = false; while ((sptep = rmap_get_first(rmap_head, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 74c120c..3104748 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index f7b0488..1cee3ec 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9832bc9..74c120c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index d8fdc5c..11650ea 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw->gfn, pfn, prefault, map_writable); +
[PATCH 03/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8a1593f..9832bc9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1809,6 +1809,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1818,8 +1825,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1828,28 +1837,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2012,9 +2014,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d9a6801..8a1593f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); - return r; out_unlock: @@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa --- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c | 196 arch/x86/kvm/mmu_audit.c| 13 +-- 3 files changed, 113 insertions(+), 104 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f608e17..8140077 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -214,6 +214,10 @@ union kvm_mmu_page_role { }; }; +struct kvm_rmap_head { + unsigned long val; +}; + struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; @@ -231,7 +235,7 @@ struct kvm_mmu_page { bool unsync; int root_count; /* Currently serving as active root */ unsigned int unsync_children; - unsigned long parent_ptes; /* Reverse mapping for parent_pte */ + struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ /* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */ unsigned long mmu_valid_gen; @@ -606,7 +610,7 @@ struct kvm_lpage_info { }; struct kvm_arch_memory_slot { - unsigned long *rmap[KVM_NR_PAGE_SIZES]; + struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES]; struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 276d2f2..d9a6801 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -909,36 +909,35 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, } /* - * Pte mapping structures: + * About rmap_head encoding: * - * If pte_list bit zero is zero, then pte_list point to the spte. - * - * If pte_list bit zero is one, (then pte_list & ~1) points to a struct + * If the bit zero of rmap_head->val is clear, then it points to the only spte + * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct * pte_list_desc containing more mappings. - * - * Returns the number of pte entries before the spte was added or zero if - * the spte was not added. - * + */ + +/* + * Returns the number of pointers in the rmap chain, not counting the new one. */ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, - unsigned long *pte_list) + struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; int i, count = 0; - if (!*pte_list) { + if (!rmap_head->val) { rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte); - *pte_list = (unsigned long)spte; - } else if (!(*pte_list & 1)) { + rmap_head->val = (unsigned long)spte; + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte); desc = mmu_alloc_pte_list_desc(vcpu); - desc->sptes[0] = (u64 *)*pte_list; + desc->sptes[0] = (u64 *)rmap_head->val; desc->sptes[1] = spte; - *pte_list = (unsigned long)desc | 1; + rmap_head->val = (unsigned long)desc | 1; ++count; } else { rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); - desc = (struct pte_list_desc *)(*pte_list & ~1ul); + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { desc = desc->more; count += PTE_LIST_EXT; @@ -955,8 +954,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, } static void -pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, - int i, struct pte_list_desc *prev_desc) +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head, + struct pte_list_desc *desc, int i, + struct pte_list_desc *prev_desc) { int j; @@ -967,43 +967,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, if (j != 0) return; if (!prev_desc && !desc->more) - *pte_list = (unsigned long)desc->sptes[0]; + rmap_head->val = (unsigned long)desc->sptes[0]; else if (prev_desc) prev_desc->more = desc->more; else - *pte_list = (unsigned long)desc->more | 1; + rmap_head->val = (unsigned long)desc->more | 1; mmu_free_pte_list_desc(desc); } -static void pte_list_remove(u64 *spte, unsigned long *pte_list) +static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) { struct pte_lis
[PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work
It seems like you all are busy now, so I've made this patch set so that mechanical and trivial changes come before. V2->V3: Patch 01: Rebased and moved here. Updated stale comments. We may also want to use a union, inside the struct, to eliminate casting to (u64 *) type when spte is in the head in the future. Patch 02-05: No change. About patch 03: There was a comment on the usage of braces for a single line else-if statement from Xiao. As I answered, checkpatch did not complain about this, and when the corresponding if block has multiple lines, some developers prefer/recommend this style. Feel free to modify it if you don't like it. Patch 06: Changed WARN_ON to BUG_ON as Marcelo suggested. Patch 07: Removed unnecessary zero-initialization of sp->parent_ptes as Xiao suggested. I think these seven patches are ready for inclusion. Patch 08-10: No change now, though there were a few comments. This patch set is not intended to optimize anything, so these patches try to keep the way mark_unsync() gets called as much as possible: the only changes are when this gets called for the new parent_pte and when mmu_page_add_parent_pte() gets called. For these three, I'm not sure what we should do now, still RFC? We can also consider other approaches, e.g. moving link_shadow_page() in the kvm_get_mmu_page() as Paolo suggested before. Takuya Takuya Yoshikawa (10): [01] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct [02] KVM: x86: MMU: Remove unused parameter of __direct_map() [03] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap [04] KVM: x86: MMU: Make mmu_set_spte() return emulate value [05] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() [06] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes [07] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() [08] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() [09] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() [10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c| 370 ++ arch/x86/kvm/mmu_audit.c | 15 +- arch/x86/kvm/paging_tmpl.h| 20 +-- 5 files changed, 201 insertions(+), 216 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
On 2015/11/20 17:46, Xiao Guangrong wrote: You just ignored my comment on the previous version... I'm sorry but please read the explanation in patch 00. I've read your comments and I'm not ignoring you. Since this patch set has become huge than expected, I'm sending this version so that patch 01-07 can be applied first. For patch 08-10, I think we need to check more because there seems to be some confusion between us. You can also read other discussions between Marcelo, Paolo and me. Anyway, since these three patches has been placed at the end of the series now, I hope we can concentrate on them easier than before. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH V3 00/10] KVM: x86: MMU: Clean up x86's mmu code for future work
It seems like you all are busy now, so I've made this patch set so that mechanical and trivial changes come before. V2->V3: Patch 01: Rebased and moved here. Updated stale comments. We may also want to use a union, inside the struct, to eliminate casting to (u64 *) type when spte is in the head in the future. Patch 02-05: No change. About patch 03: There was a comment on the usage of braces for a single line else-if statement from Xiao. As I answered, checkpatch did not complain about this, and when the corresponding if block has multiple lines, some developers prefer/recommend this style. Feel free to modify it if you don't like it. Patch 06: Changed WARN_ON to BUG_ON as Marcelo suggested. Patch 07: Removed unnecessary zero-initialization of sp->parent_ptes as Xiao suggested. I think these seven patches are ready for inclusion. Patch 08-10: No change now, though there were a few comments. This patch set is not intended to optimize anything, so these patches try to keep the way mark_unsync() gets called as much as possible: the only changes are when this gets called for the new parent_pte and when mmu_page_add_parent_pte() gets called. For these three, I'm not sure what we should do now, still RFC? We can also consider other approaches, e.g. moving link_shadow_page() in the kvm_get_mmu_page() as Paolo suggested before. Takuya Takuya Yoshikawa (10): [01] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct [02] KVM: x86: MMU: Remove unused parameter of __direct_map() [03] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap [04] KVM: x86: MMU: Make mmu_set_spte() return emulate value [05] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() [06] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes [07] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() [08] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() [09] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() [10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c| 370 ++ arch/x86/kvm/mmu_audit.c | 15 +- arch/x86/kvm/paging_tmpl.h| 20 +-- 5 files changed, 201 insertions(+), 216 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c | 196 arch/x86/kvm/mmu_audit.c| 13 +-- 3 files changed, 113 insertions(+), 104 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f608e17..8140077 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -214,6 +214,10 @@ union kvm_mmu_page_role { }; }; +struct kvm_rmap_head { + unsigned long val; +}; + struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; @@ -231,7 +235,7 @@ struct kvm_mmu_page { bool unsync; int root_count; /* Currently serving as active root */ unsigned int unsync_children; - unsigned long parent_ptes; /* Reverse mapping for parent_pte */ + struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ /* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */ unsigned long mmu_valid_gen; @@ -606,7 +610,7 @@ struct kvm_lpage_info { }; struct kvm_arch_memory_slot { - unsigned long *rmap[KVM_NR_PAGE_SIZES]; + struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES]; struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 276d2f2..d9a6801 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -909,36 +909,35 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, } /* - * Pte mapping structures: + * About rmap_head encoding: * - * If pte_list bit zero is zero, then pte_list point to the spte. - * - * If pte_list bit zero is one, (then pte_list & ~1) points to a struct + * If the bit zero of rmap_head->val is clear, then it points to the only spte + * in this rmap chain. Otherwise, (rmap_head->val & ~1) points to a struct * pte_list_desc containing more mappings. - * - * Returns the number of pte entries before the spte was added or zero if - * the spte was not added. - * + */ + +/* + * Returns the number of pointers in the rmap chain, not counting the new one. */ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, - unsigned long *pte_list) + struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; int i, count = 0; - if (!*pte_list) { + if (!rmap_head->val) { rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte); - *pte_list = (unsigned long)spte; - } else if (!(*pte_list & 1)) { + rmap_head->val = (unsigned long)spte; + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte); desc = mmu_alloc_pte_list_desc(vcpu); - desc->sptes[0] = (u64 *)*pte_list; + desc->sptes[0] = (u64 *)rmap_head->val; desc->sptes[1] = spte; - *pte_list = (unsigned long)desc | 1; + rmap_head->val = (unsigned long)desc | 1; ++count; } else { rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); - desc = (struct pte_list_desc *)(*pte_list & ~1ul); + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { desc = desc->more; count += PTE_LIST_EXT; @@ -955,8 +954,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, } static void -pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, - int i, struct pte_list_desc *prev_desc) +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head, + struct pte_list_desc *desc, int i, + struct pte_list_desc *prev_desc) { int j; @@ -967,43 +967,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, if (j != 0) return; if (!prev_desc && !desc->more) - *pte_list = (unsigned long)desc->sptes[0]; + rmap_head->val = (unsigned long)desc->sptes[0]; else if (prev_desc) prev_desc->more = desc->more; else - *pte_list = (unsigned long)desc->more | 1; + rmap_head->val = (unsigned long)desc->more | 1; mmu_free_pte_list_desc(desc); } -static void pte_list_remove(u64 *spte, unsigned long *pte_list) +static void pte_list_remove(u64 *s
[PATCH 02/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index d9a6801..8a1593f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); - return r; out_unlock: @@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 8a1593f..9832bc9 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1809,6 +1809,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1818,8 +1825,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1828,28 +1837,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2012,9 +2014,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 74c120c..3104748 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index f7b0488..1cee3ec 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9832bc9..74c120c 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index d8fdc5c..11650ea 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw-
[PATCH 06/10] KVM: x86: MMU: Consolidate BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 26 +- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 3104748..5b249d4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1098,17 +1098,23 @@ struct rmap_iterator { static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap_head->val) return NULL; if (!(rmap_head->val & 1)) { iter->desc = NULL; - return (u64 *)rmap_head->val; + sptep = (u64 *)rmap_head->val; + goto out; } iter->desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + BUG_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1118,14 +1124,14 @@ static u64 *rmap_get_first(struct kvm_rmap_head *rmap_head, */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1133,17 +1139,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + BUG_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_head_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(_rmap_head_, _iter_); \ -_spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ -_spte_ = rmap_get_next(_iter_)) +_spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1358,7 +1367,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head) bool flush = false; while ((sptep = rmap_get_first(rmap_head, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra allocation error check and zero-initialization of parent_ptes: shadow page headers allocated by kmem_cache_zalloc() are always in the per-VCPU pools. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 14 +++--- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 5b249d4..7f46e3e 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1726,8 +1726,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, mmu_spte_clear_no_track(parent_pte); } -static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, - u64 *parent_pte, int direct) +static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) { struct kvm_mmu_page *sp; @@ -1743,8 +1742,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); - sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); kvm_mod_used_mmu_pages(vcpu->kvm, +1); return sp; } @@ -2133,10 +2130,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, trace_kvm_mmu_get_page(sp, false); return sp; } + ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); - if (!sp) - return sp; + + sp = kvm_mmu_alloc_page(vcpu, direct); + + mmu_page_add_parent_pte(vcpu, sp, parent_pte); + sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does nearly the same as the for_each_rmap_spte macro. The only difference is that is_shadow_present_pte() checks cannot be placed there because kvm_mmu_mark_parents_unsync() can be called with a new parent pointer whose entry is not set yet. By calling mark_unsync() separately for the parent and adding the parent pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro works with no problem. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 36 +--- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7f46e3e..4e29d9a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(struct kvm_rmap_head *rmap_head, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!rmap_head->val) - return; - - if (!(rmap_head->val & 1)) - return fn((u64 *)rmap_head->val); - - desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static struct kvm_rmap_head *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1749,7 +1729,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) @@ -2119,12 +2104,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 09/10] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 22 -- arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 10 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 4e29d9a..b020323 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2107,14 +2107,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } - mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2125,8 +2120,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); - mmu_page_add_parent_pte(vcpu, sp, parent_pte); - sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, @@ -2194,7 +2187,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp) { u64 spte; @@ -2205,6 +2199,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp) shadow_user_mask | shadow_x_mask | shadow_accessed_mask; mmu_spte_set(sptep, spte); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); + + mmu_page_add_parent_pte(vcpu, sp, sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2263,11 +2262,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2733,7 +2727,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp); + link_shadow_page(vcpu, iterator.sptep, sp); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 11650ea..0dcf9c8 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp); + link_shadow_page(vcpu, it.sptep, sp); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index b020323..9baf884 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2071,8 +2071,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2724,8 +2723,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp); } @@ -3082,8 +3080,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3095,9 +3092,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3134,7 +3129,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3167,9 +3162,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 0dcf9c8..91e939b 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp)
Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
On 2015/11/19 11:46, Xiao Guangrong wrote: Actually, some people prefer to put braces when one of the if/else-if/else cases has multiple lines. You can see some examples in kernel/sched/core.c: see hrtick_start(), sched_fork(), free_sched_domain(). In our case, I thought putting braces would align the else-if and else and make the code look a bit nicer, but I know this may be just a matter of personal feeling. In short, unless the maintainer, Paolo for this file, has any preference, both ways will be accepted. The reason why i pointed this out is that it is the style documented in Documentation/CodingStyle: | Do not unnecessarily use braces where a single statement will do. | |if (condition) |action(); | Ah, this is a different thing. For this case, there is a consensus and checkpatch will complain if we don't obey the rule. What I explained was: if (condition) { line1; line2; // multiple lines } else if { single-line-statement; -- (*1) } else single-line-statement; -- (*2) For (*1) and (*2), especially for (*1), some people put braces. Actually, Ingo Molnar hated this braces-style too much and blamed many developers who used this style (include me, that why i was nervous to see this style :( ). I think he likes the coding style of kernel/sched/core.c very much, as you know. Actually that is one reason why I took it as an example. Let's just choose the way which Paolo prefers for this time, I don't know which is better. Thank you, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
On 2015/11/18 18:09, Paolo Bonzini wrote: On 18/11/2015 04:21, Xiao Guangrong wrote: On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote: @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); -sp->parent_ptes = 0; +sp->parent_ptes.val = 0; The sp is allocated from kmem_cache_zalloc() so explicitly initialize it to zero is not needed. Right, but it should be a separate patch. Takuya, since you are going to send another version of this series, can you also: Yes, I'm preparing to do so. 1) move this patch either to the beginning or to the end 2) include "KVM: x86: always set accessed bit in shadow PTEs", also near the beginning of the series? Commit 1c9a5e19b1af8a2c ("KVM: x86: MMU: always set accessed bit in shadow PTEs") will be the first. Then, the ordering will become something like this: 02: Encapsulate the type of rmap-chain head in a new struct 03: Remove unused parameter of __direct_map() 04: Add helper function to clear a bit in unsync child bitmap 05: Make mmu_set_spte() return emulate value 06: Remove is_rmap_spte() and use is_shadow_present_pte() These five seem to be easy ones for you to apply: since patch 02 touches many places, it should go first to become the base of the following work. 07: Consolidate BUG_ON checks for reverse-mapped sptes I will change the WARN_ON to BUG_ON. // Marcelo's comment 08: Move initialization of parent_ptes out from kvm_mmu_alloc_page() In this patch, I will delete "sp->parent_ptes.val = 0;" line since this is the problem of kvm_mmu_alloc_page(), though not a new one. // Xiao's comment 09: Use for_each_rmap_spte macro instead of pte_list_walk() There is some confusion between us: Paolo and I agreed that the patch keeps the original way and calls mark_unsync() the same way as before, but there are still comments from Marcelo and Xiao and those comments seem to explain the code differently. I will check again, but I may not change this one and the following two patches in the next version. If we can eliminate some of the mark_unsync() calls, that will be kind of an optimization which this series does not intend to achieve. Anyway, by moving the non-trivial two patches (09 and 10) here, reviewing will become easier and you can apply the other patches separately. 10: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() 11: Remove unused parameter parent_pte from kvm_mmu_get_page() Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
On 2015/11/18 11:44, Xiao Guangrong wrote: On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote: +if (!ret) { +clear_unsync_child_bit(sp, i); +continue; +} else if (ret > 0) { nr_unsync_leaf += ret; Just a single line here, braces are unnecessary. -else +} else return ret; I know we can eliminate the braces, but that does not mean we should do so: there seems to be no consensus about this style issue and checkpatch accepts both ways. Actually, some people prefer to put braces when one of the if/else-if/else cases has multiple lines. You can see some examples in kernel/sched/core.c: see hrtick_start(), sched_fork(), free_sched_domain(). In our case, I thought putting braces would align the else-if and else and make the code look a bit nicer, but I know this may be just a matter of personal feeling. In short, unless the maintainer, Paolo for this file, has any preference, both ways will be accepted. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
On 2015/11/18 11:44, Xiao Guangrong wrote: On 11/12/2015 07:50 PM, Takuya Yoshikawa wrote: +if (!ret) { +clear_unsync_child_bit(sp, i); +continue; +} else if (ret > 0) { nr_unsync_leaf += ret; Just a single line here, braces are unnecessary. -else +} else return ret; I know we can eliminate the braces, but that does not mean we should do so: there seems to be no consensus about this style issue and checkpatch accepts both ways. Actually, some people prefer to put braces when one of the if/else-if/else cases has multiple lines. You can see some examples in kernel/sched/core.c: see hrtick_start(), sched_fork(), free_sched_domain(). In our case, I thought putting braces would align the else-if and else and make the code look a bit nicer, but I know this may be just a matter of personal feeling. In short, unless the maintainer, Paolo for this file, has any preference, both ways will be accepted. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
On 2015/11/18 18:09, Paolo Bonzini wrote: On 18/11/2015 04:21, Xiao Guangrong wrote: On 11/12/2015 07:55 PM, Takuya Yoshikawa wrote: @@ -1720,7 +1724,7 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); -sp->parent_ptes = 0; +sp->parent_ptes.val = 0; The sp is allocated from kmem_cache_zalloc() so explicitly initialize it to zero is not needed. Right, but it should be a separate patch. Takuya, since you are going to send another version of this series, can you also: Yes, I'm preparing to do so. 1) move this patch either to the beginning or to the end 2) include "KVM: x86: always set accessed bit in shadow PTEs", also near the beginning of the series? Commit 1c9a5e19b1af8a2c ("KVM: x86: MMU: always set accessed bit in shadow PTEs") will be the first. Then, the ordering will become something like this: 02: Encapsulate the type of rmap-chain head in a new struct 03: Remove unused parameter of __direct_map() 04: Add helper function to clear a bit in unsync child bitmap 05: Make mmu_set_spte() return emulate value 06: Remove is_rmap_spte() and use is_shadow_present_pte() These five seem to be easy ones for you to apply: since patch 02 touches many places, it should go first to become the base of the following work. 07: Consolidate BUG_ON checks for reverse-mapped sptes I will change the WARN_ON to BUG_ON. // Marcelo's comment 08: Move initialization of parent_ptes out from kvm_mmu_alloc_page() In this patch, I will delete "sp->parent_ptes.val = 0;" line since this is the problem of kvm_mmu_alloc_page(), though not a new one. // Xiao's comment 09: Use for_each_rmap_spte macro instead of pte_list_walk() There is some confusion between us: Paolo and I agreed that the patch keeps the original way and calls mark_unsync() the same way as before, but there are still comments from Marcelo and Xiao and those comments seem to explain the code differently. I will check again, but I may not change this one and the following two patches in the next version. If we can eliminate some of the mark_unsync() calls, that will be kind of an optimization which this series does not intend to achieve. Anyway, by moving the non-trivial two patches (09 and 10) here, reviewing will become easier and you can apply the other patches separately. 10: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() 11: Remove unused parameter parent_pte from kvm_mmu_get_page() Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
On 2015/11/19 11:46, Xiao Guangrong wrote: Actually, some people prefer to put braces when one of the if/else-if/else cases has multiple lines. You can see some examples in kernel/sched/core.c: see hrtick_start(), sched_fork(), free_sched_domain(). In our case, I thought putting braces would align the else-if and else and make the code look a bit nicer, but I know this may be just a matter of personal feeling. In short, unless the maintainer, Paolo for this file, has any preference, both ways will be accepted. The reason why i pointed this out is that it is the style documented in Documentation/CodingStyle: | Do not unnecessarily use braces where a single statement will do. | |if (condition) |action(); | Ah, this is a different thing. For this case, there is a consensus and checkpatch will complain if we don't obey the rule. What I explained was: if (condition) { line1; line2; // multiple lines } else if { single-line-statement; -- (*1) } else single-line-statement; -- (*2) For (*1) and (*2), especially for (*1), some people put braces. Actually, Ingo Molnar hated this braces-style too much and blamed many developers who used this style (include me, that why i was nervous to see this style :( ). I think he likes the coding style of kernel/sched/core.c very much, as you know. Actually that is one reason why I took it as an example. Let's just choose the way which Paolo prefers for this time, I don't know which is better. Thank you, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
On 2015/11/14 7:08, Marcelo Tosatti wrote: On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote: At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. It should be a BUG_ON, if KVM continues it will corrupt (more) memory. In the sense that we cannot predict what kind of corruption it will cause, I agree with you. But if it can only corrupt that guest's memory, it is a bit sad to kill unrelated guests, and host, too. Anyway, since we cannot say for sure what a possible bug can cause, I agree with you now. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
On 2015/11/14 18:20, Marcelo Tosatti wrote: The actual issue is this: a higher level page that had, under its children, no out of sync pages, now, due to your addition, a child that is unsync: initial state: level1 final state: level1 -x-> level2 -x-> level3 Where -x-> are the links created by this pagefault fixing round. If _any_ page under you is unsync (not necessarily the ones this pagefault is accessing), you have to mark parents unsync. I understand this, but I don't think my patch will break this. What kvm_mmu_mark_parents_unsync() does is: for each p_i in sp->parent_ptes rmap chain mark_unsync(p_i); Then, mark_unsync() finds the parent sp including that p_i to set ->unsync_child_bitmap and increment ->unsync_children if necessary. It may also call kvm_mmu_mark_parents_unsync() recursively. I understand we need to tell the parents "you have an unsync child/descendant" until this information reaches the top level by that recursive calls. But since these recursive calls cannot come back to the starting sp, the child->parent graph has no loop, each mark_unsync(p_i) will not be affected by other parents in that sp->parent_ptes rmap chain, from which we started the recursive calls. As the following code shows, my patch does mark_unsync(parent_pte) separately, and then mmu_page_add_parent_pte(vcpu, sp, parent_pte): - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); So, as you worried, during each mark_unsync(p_i) is processed, this parent_pte does not exist in that sp->parent_ptes rmap chain. But as I explained above, this does not change anything about what each mark_unsync(p_i) call does, so keeps the original behaviour. By the way, I think "kvm_mmu_mark_parents_unsync" and "mark_unsync" do not tell what they actually do well. When I first saw the names, I thought they would just set the parents' sp->unsync. To reflect the following meaning better, it should be propagate_unsync(_to_parents) or something: Tell the parents "you have an unsync child/descendant" until this unsync information reaches the top level Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
On 2015/11/14 18:20, Marcelo Tosatti wrote: The actual issue is this: a higher level page that had, under its children, no out of sync pages, now, due to your addition, a child that is unsync: initial state: level1 final state: level1 -x-> level2 -x-> level3 Where -x-> are the links created by this pagefault fixing round. If _any_ page under you is unsync (not necessarily the ones this pagefault is accessing), you have to mark parents unsync. I understand this, but I don't think my patch will break this. What kvm_mmu_mark_parents_unsync() does is: for each p_i in sp->parent_ptes rmap chain mark_unsync(p_i); Then, mark_unsync() finds the parent sp including that p_i to set ->unsync_child_bitmap and increment ->unsync_children if necessary. It may also call kvm_mmu_mark_parents_unsync() recursively. I understand we need to tell the parents "you have an unsync child/descendant" until this information reaches the top level by that recursive calls. But since these recursive calls cannot come back to the starting sp, the child->parent graph has no loop, each mark_unsync(p_i) will not be affected by other parents in that sp->parent_ptes rmap chain, from which we started the recursive calls. As the following code shows, my patch does mark_unsync(parent_pte) separately, and then mmu_page_add_parent_pte(vcpu, sp, parent_pte): - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); So, as you worried, during each mark_unsync(p_i) is processed, this parent_pte does not exist in that sp->parent_ptes rmap chain. But as I explained above, this does not change anything about what each mark_unsync(p_i) call does, so keeps the original behaviour. By the way, I think "kvm_mmu_mark_parents_unsync" and "mark_unsync" do not tell what they actually do well. When I first saw the names, I thought they would just set the parents' sp->unsync. To reflect the following meaning better, it should be propagate_unsync(_to_parents) or something: Tell the parents "you have an unsync child/descendant" until this unsync information reaches the top level Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
On 2015/11/14 7:08, Marcelo Tosatti wrote: On Thu, Nov 12, 2015 at 08:53:43PM +0900, Takuya Yoshikawa wrote: At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. It should be a BUG_ON, if KVM continues it will corrupt (more) memory. In the sense that we cannot predict what kind of corruption it will cause, I agree with you. But if it can only corrupt that guest's memory, it is a bit sad to kill unrelated guests, and host, too. Anyway, since we cannot say for sure what a possible bug can cause, I agree with you now. Thanks, Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
On 2015/11/12 23:27, Paolo Bonzini wrote: On 12/11/2015 12:56, Takuya Yoshikawa wrote: diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 9d21b44..f414ca6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } Here I think you can remove completely the if (sp) kvm_mmu_put_page(sp, it.sptep); later in FNAME(fetch). Apart from this nit, it's okay. Yes, that's what this patch does below: @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } Since this is the only user of kvm_mmu_put_page(), it also removes the definition: @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; Actually, I don't understand why this is named kvm_mmu_put_page() for just removing parent_pte pointer from the sp->parent_ptes pointer chain. On to kvm_mmu_get_page... if (!direct) { if (rmap_write_protect(vcpu, gfn)) kvm_flush_remote_tlbs(vcpu->kvm); if (level > PT_PAGE_TABLE_LEVEL && need_sync) kvm_sync_pages(vcpu, gfn); This seems fishy. need_sync is set if sp->unsync, but then the parents have not been unsynced yet. Reaching here means that kvm_mmu_get_page() could not return sp from inside the for_each_gfn_sp() loop above, so even without this patch, mark_unsync() has not been called. Here, sp holds the new page allocated by kvm_mmu_alloc_page(). One confusing thing is that hlist_add_head() right before this "if (!direct)" line has already added the new sp to the hash list, so it will be found by for_each_gfn_indirect_valid_sp() in kvm_sync_pages(). Because this sp is new and sp->unsync is not set, kvm_sync_pages() will just skip it and look for other sp's whose ->unsync were found to be set in the for_each_gfn_sp() loop. I'm not 100% sure if the existence of the parent_pte pointer in the newly created sp->parent_ptes chain alone makes any difference: @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); sp->gfn = gfn; sp->role = role; On the other hand, all calls to kvm_mmu_get_page except for the roots are followed by link_shadow_page... Perhaps if parent_pte != NULL you can call link_shadow_page directly from kvm_mmu_get_page. The call would go before the "if (!direct)" and it would subsume all the existing calls. We could probably also warn if (parent_pte == NULL) != (level == vcpu->arch.mmu.root_level) in kvm_mmu_get_page. I think we should set the spte after init_shadow_page_table(), and to make this subsume all the existing calls, we need to change the "return sp;" in the for_each_gfn_sp() loop to a goto statement so that the end of this function will become something like this: init_shadow_page(sp); out: if (parent_pte) { mmu_page_add_parent_pte(vcpu, sp, parent_pte); link_shadow_page(parent_pte, sp, accessed); } trace_kvm_mmu_get_page(sp, created); return sp; So, "bool accessed" needs to be passed to kvm_mmu_get_page(). But any way, we need to understand if mmu_page_add_parent_pte() really needs to be placed before the "if (!direct)" block. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 33fe720..101e77d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2072,8 +2072,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2730,8 +2729,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp, true); } @@ -3088,8 +3086,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3101,9 +3098,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3140,7 +3135,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3173,9 +3168,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index f414ca6..ee9d211 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } -- 2.1.0
[PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 21 - arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 10 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9273cd4..33fe720 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } - mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); sp->gfn = gfn; sp->role = role; @@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp, bool accessed) { u64 spte; @@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed) spte |= shadow_accessed_mask; mmu_spte_set(sptep, spte); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); + + mmu_page_add_parent_pte(vcpu, sp, sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2738,7 +2733,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp, true); + link_shadow_page(vcpu, iterator.sptep, sp, true); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 9d21b44..f414ca6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra error check at its call site since the allocation cannot fail. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 85f4bbd..9273cd4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1707,8 +1707,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, mmu_spte_clear_no_track(parent_pte); } -static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, - u64 *parent_pte, int direct) +static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) { struct kvm_mmu_page *sp; @@ -1724,8 +1723,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); - sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); kvm_mod_used_mmu_pages(vcpu->kvm, +1); return sp; } @@ -2124,10 +2121,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, trace_kvm_mmu_get_page(sp, false); return sp; } + ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); - if (!sp) - return sp; + + sp = kvm_mmu_alloc_page(vcpu, direct); + + sp->parent_ptes.val = 0; + mmu_page_add_parent_pte(vcpu, sp, parent_pte); + sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa --- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c | 169 +--- arch/x86/kvm/mmu_audit.c| 13 ++-- 3 files changed, 100 insertions(+), 90 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0535359..c5a0c4a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -214,6 +214,10 @@ union kvm_mmu_page_role { }; }; +struct kvm_rmap_head { + unsigned long val; +}; + struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; @@ -231,7 +235,7 @@ struct kvm_mmu_page { bool unsync; int root_count; /* Currently serving as active root */ unsigned int unsync_children; - unsigned long parent_ptes; /* Reverse mapping for parent_pte */ + struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ /* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */ unsigned long mmu_valid_gen; @@ -604,7 +608,7 @@ struct kvm_lpage_info { }; struct kvm_arch_memory_slot { - unsigned long *rmap[KVM_NR_PAGE_SIZES]; + struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES]; struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ee7b101..85f4bbd 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -916,24 +916,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, * */ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, - unsigned long *pte_list) + struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; int i, count = 0; - if (!*pte_list) { + if (!rmap_head->val) { rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte); - *pte_list = (unsigned long)spte; - } else if (!(*pte_list & 1)) { + rmap_head->val = (unsigned long)spte; + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte); desc = mmu_alloc_pte_list_desc(vcpu); - desc->sptes[0] = (u64 *)*pte_list; + desc->sptes[0] = (u64 *)rmap_head->val; desc->sptes[1] = spte; - *pte_list = (unsigned long)desc | 1; + rmap_head->val = (unsigned long)desc | 1; ++count; } else { rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); - desc = (struct pte_list_desc *)(*pte_list & ~1ul); + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { desc = desc->more; count += PTE_LIST_EXT; @@ -950,8 +950,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, } static void -pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, - int i, struct pte_list_desc *prev_desc) +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head, + struct pte_list_desc *desc, int i, + struct pte_list_desc *prev_desc) { int j; @@ -962,43 +963,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, if (j != 0) return; if (!prev_desc && !desc->more) - *pte_list = (unsigned long)desc->sptes[0]; + rmap_head->val = (unsigned long)desc->sptes[0]; else if (prev_desc) prev_desc->more = desc->more; else - *pte_list = (unsigned long)desc->more | 1; + rmap_head->val = (unsigned long)desc->more | 1; mmu_free_pte_list_desc(desc); } -static void pte_list_remove(u64 *spte, unsigned long *pte_list) +static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; struct pte_list_desc *prev_desc; int i; - if (!*pte_list) { + if (!rmap_head->val) { printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte); BUG(); - } else if (!(*pte_list & 1)) { + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_remove: %p 1->0\n", spte); - if ((u64 *)*pte_list != spte) { + if ((u64 *)rmap_head->val != spte) { printk(KERN_ERR "pte_list_remove: %p 1->BUG\n", spte);
[PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 26 +- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1691171..ee7b101 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1079,17 +1079,23 @@ struct rmap_iterator { */ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap) return NULL; if (!(rmap & 1)) { iter->desc = NULL; - return (u64 *)rmap; + sptep = (u64 *)rmap; + goto out; } iter->desc = (struct pte_list_desc *)(rmap & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \ - _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ - _spte_ = rmap_get_next(_iter_)) + _spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp) bool flush = false; while ((sptep = rmap_get_first(*rmapp, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does nearly the same as the for_each_rmap_spte macro. The only difference is that is_shadow_present_pte() checks cannot be placed there because kvm_mmu_mark_parents_unsync() can be called with a new parent pointer whose entry is not set yet. By calling mark_unsync() separately for the parent and adding the parent pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro works with no problem. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 36 +--- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e8cfdc4..1691171 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!*pte_list) - return; - - if (!(*pte_list & 1)) - return fn((u64 *)*pte_list); - - desc = (struct pte_list_desc *)(*pte_list & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static unsigned long *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them with no clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c229356..e8cfdc4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 03d518e..90ee420 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c3bbc82..f3120aa 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f3120aa..c229356 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 3058a22..9d21b44 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw->gfn, pfn, prefault, map_writable); +
[PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e7c2c14..c3bbc82 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); - return r; out_unlock: @@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
v1->v2: Patch 5 and 7 are added based on Paolo's suggestions. Patch 8-10 are new. Patch 1/2/3/4: no change. Patch 5: Needed a bit more work than I had expected. Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate). Patch 7: As expected, many places needed to be converted. Patch 8: This is new, but only a small change. Patch 9: Kind of an RFC (though I have checked it to some extent). Following two places need to be carefully checked: - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page() - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case Patch 10: Trivial cleanup, assuming that patch 9 is correct. In summary: patch 1-7 is the result of updating v1 based on the suggestions. Although patch 5 does not look so nice than expected, this is the most conservative approach, and patch 8-10 try to alleviate the sadness. Takuya Takuya Yoshikawa (10): 01: KVM: x86: MMU: Remove unused parameter of __direct_map() 02: KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap 03: KVM: x86: MMU: Make mmu_set_spte() return emulate value 04: KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() 05: KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() 06: KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes 07: KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct 08: KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() 09: KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() 10: KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c| 357 ++ arch/x86/kvm/mmu_audit.c | 15 +- arch/x86/kvm/paging_tmpl.h| 20 +-- 5 files changed, 196 insertions(+), 208 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 03/10] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index f3120aa..c229356 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 3058a22..9d21b44 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw-
[PATCH 02/10] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c3bbc82..f3120aa 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 07/10] KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct
New struct kvm_rmap_head makes the code type-safe to some extent. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c | 169 +--- arch/x86/kvm/mmu_audit.c| 13 ++-- 3 files changed, 100 insertions(+), 90 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 0535359..c5a0c4a 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -214,6 +214,10 @@ union kvm_mmu_page_role { }; }; +struct kvm_rmap_head { + unsigned long val; +}; + struct kvm_mmu_page { struct list_head link; struct hlist_node hash_link; @@ -231,7 +235,7 @@ struct kvm_mmu_page { bool unsync; int root_count; /* Currently serving as active root */ unsigned int unsync_children; - unsigned long parent_ptes; /* Reverse mapping for parent_pte */ + struct kvm_rmap_head parent_ptes; /* rmap pointers to parent sptes */ /* The page is obsolete if mmu_valid_gen != kvm->arch.mmu_valid_gen. */ unsigned long mmu_valid_gen; @@ -604,7 +608,7 @@ struct kvm_lpage_info { }; struct kvm_arch_memory_slot { - unsigned long *rmap[KVM_NR_PAGE_SIZES]; + struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES]; struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; }; diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index ee7b101..85f4bbd 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -916,24 +916,24 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, * */ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, - unsigned long *pte_list) + struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; int i, count = 0; - if (!*pte_list) { + if (!rmap_head->val) { rmap_printk("pte_list_add: %p %llx 0->1\n", spte, *spte); - *pte_list = (unsigned long)spte; - } else if (!(*pte_list & 1)) { + rmap_head->val = (unsigned long)spte; + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_add: %p %llx 1->many\n", spte, *spte); desc = mmu_alloc_pte_list_desc(vcpu); - desc->sptes[0] = (u64 *)*pte_list; + desc->sptes[0] = (u64 *)rmap_head->val; desc->sptes[1] = spte; - *pte_list = (unsigned long)desc | 1; + rmap_head->val = (unsigned long)desc | 1; ++count; } else { rmap_printk("pte_list_add: %p %llx many->many\n", spte, *spte); - desc = (struct pte_list_desc *)(*pte_list & ~1ul); + desc = (struct pte_list_desc *)(rmap_head->val & ~1ul); while (desc->sptes[PTE_LIST_EXT-1] && desc->more) { desc = desc->more; count += PTE_LIST_EXT; @@ -950,8 +950,9 @@ static int pte_list_add(struct kvm_vcpu *vcpu, u64 *spte, } static void -pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, - int i, struct pte_list_desc *prev_desc) +pte_list_desc_remove_entry(struct kvm_rmap_head *rmap_head, + struct pte_list_desc *desc, int i, + struct pte_list_desc *prev_desc) { int j; @@ -962,43 +963,43 @@ pte_list_desc_remove_entry(unsigned long *pte_list, struct pte_list_desc *desc, if (j != 0) return; if (!prev_desc && !desc->more) - *pte_list = (unsigned long)desc->sptes[0]; + rmap_head->val = (unsigned long)desc->sptes[0]; else if (prev_desc) prev_desc->more = desc->more; else - *pte_list = (unsigned long)desc->more | 1; + rmap_head->val = (unsigned long)desc->more | 1; mmu_free_pte_list_desc(desc); } -static void pte_list_remove(u64 *spte, unsigned long *pte_list) +static void pte_list_remove(u64 *spte, struct kvm_rmap_head *rmap_head) { struct pte_list_desc *desc; struct pte_list_desc *prev_desc; int i; - if (!*pte_list) { + if (!rmap_head->val) { printk(KERN_ERR "pte_list_remove: %p 0->BUG\n", spte); BUG(); - } else if (!(*pte_list & 1)) { + } else if (!(rmap_head->val & 1)) { rmap_printk("pte_list_remove: %p 1->0\n", spte); - if ((u64 *)*pte_list != spte) { + if ((u64 *)rmap_head->val != spte) { printk(KERN_ERR "pte_list_remove: %p 1-&g
[PATCH 04/10] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them with no clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c229356..e8cfdc4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 03d518e..90ee420 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 08/10] KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page()
Make kvm_mmu_alloc_page() do just what its name tells to do, and remove the extra error check at its call site since the allocation cannot fail. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 15 --- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 85f4bbd..9273cd4 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1707,8 +1707,7 @@ static void drop_parent_pte(struct kvm_mmu_page *sp, mmu_spte_clear_no_track(parent_pte); } -static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, - u64 *parent_pte, int direct) +static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, int direct) { struct kvm_mmu_page *sp; @@ -1724,8 +1723,6 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, * this feature. See the comments in kvm_zap_obsolete_pages(). */ list_add(>link, >kvm->arch.active_mmu_pages); - sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); kvm_mod_used_mmu_pages(vcpu->kvm, +1); return sp; } @@ -2124,10 +2121,14 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, trace_kvm_mmu_get_page(sp, false); return sp; } + ++vcpu->kvm->stat.mmu_cache_miss; - sp = kvm_mmu_alloc_page(vcpu, parent_pte, direct); - if (!sp) - return sp; + + sp = kvm_mmu_alloc_page(vcpu, direct); + + sp->parent_ptes.val = 0; + mmu_page_add_parent_pte(vcpu, sp, parent_pte); + sp->gfn = gfn; sp->role = role; hlist_add_head(>hash_link, -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 00/10 V2] KVM: x86: MMU: Clean up x86's mmu code for future work
v1->v2: Patch 5 and 7 are added based on Paolo's suggestions. Patch 8-10 are new. Patch 1/2/3/4: no change. Patch 5: Needed a bit more work than I had expected. Patch 6: Removed extra comment of v1 (patch 5 made it inappropriate). Patch 7: As expected, many places needed to be converted. Patch 8: This is new, but only a small change. Patch 9: Kind of an RFC (though I have checked it to some extent). Following two places need to be carefully checked: - in kvm_mmu_get_page: "if (!direct)" block after kvm_mmu_alloc_page() - in FNAME(fetch): "if (FNAME(gpte_changed)(vcpu, gw, it.level - 1))" case Patch 10: Trivial cleanup, assuming that patch 9 is correct. In summary: patch 1-7 is the result of updating v1 based on the suggestions. Although patch 5 does not look so nice than expected, this is the most conservative approach, and patch 8-10 try to alleviate the sadness. Takuya Takuya Yoshikawa (10): 01: KVM: x86: MMU: Remove unused parameter of __direct_map() 02: KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap 03: KVM: x86: MMU: Make mmu_set_spte() return emulate value 04: KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() 05: KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk() 06: KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes 07: KVM: x86: MMU: Encapsulate the type of rmap-chain head in a new struct 08: KVM: x86: MMU: Move initialization of parent_ptes out from kvm_mmu_alloc_page() 09: KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page() 10: KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page() Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/include/asm/kvm_host.h | 8 +- arch/x86/kvm/mmu.c| 357 ++ arch/x86/kvm/mmu_audit.c | 15 +- arch/x86/kvm/paging_tmpl.h| 20 +-- 5 files changed, 196 insertions(+), 208 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 01/10] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e7c2c14..c3bbc82 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,11 +3017,9 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); - return r; out_unlock: @@ -3531,8 +3528,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 10/10] KVM: x86: MMU: Remove unused parameter parent_pte from kvm_mmu_get_page()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 20 +++- arch/x86/kvm/paging_tmpl.h | 4 ++-- 2 files changed, 9 insertions(+), 15 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 33fe720..101e77d 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2072,8 +2072,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, gva_t gaddr, unsigned level, int direct, -unsigned access, -u64 *parent_pte) +unsigned access) { union kvm_mmu_page_role role; unsigned quadrant; @@ -2730,8 +2729,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, base_addr &= PT64_LVL_ADDR_MASK(iterator.level); pseudo_gfn = base_addr >> PAGE_SHIFT; sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr, - iterator.level - 1, - 1, ACC_ALL, iterator.sptep); + iterator.level - 1, 1, ACC_ALL); link_shadow_page(vcpu, iterator.sptep, sp, true); } @@ -3088,8 +3086,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) if (vcpu->arch.mmu.shadow_root_level == PT64_ROOT_LEVEL) { spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, - 1, ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, 0, 0, PT64_ROOT_LEVEL, 1, ACC_ALL); ++sp->root_count; spin_unlock(>kvm->mmu_lock); vcpu->arch.mmu.root_hpa = __pa(sp->spt); @@ -3101,9 +3098,7 @@ static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, i << (30 - PAGE_SHIFT), - i << 30, - PT32_ROOT_LEVEL, 1, ACC_ALL, - NULL); + i << 30, PT32_ROOT_LEVEL, 1, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3140,7 +3135,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); sp = kvm_mmu_get_page(vcpu, root_gfn, 0, PT64_ROOT_LEVEL, - 0, ACC_ALL, NULL); + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); @@ -3173,9 +3168,8 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) } spin_lock(>kvm->mmu_lock); make_mmu_pages_available(vcpu); - sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, - PT32_ROOT_LEVEL, 0, - ACC_ALL, NULL); + sp = kvm_mmu_get_page(vcpu, root_gfn, i << 30, PT32_ROOT_LEVEL, + 0, ACC_ALL); root = __pa(sp->spt); ++sp->root_count; spin_unlock(>kvm->mmu_lock); diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index f414ca6..ee9d211 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -587,7 +587,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, if (!is_shadow_present_pte(*it.sptep)) { table_gfn = gw->table_gfn[it.level - 2]; sp = kvm_mmu_get_page(vcpu, table_gfn, addr, it.level-1, - false, access, it.sptep); + false, access); } /* @@ -617,7 +617,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, direct_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, - true, direct_access, it.sptep); + true, direct_access); link_shadow_page(vcpu, it.sptep, sp,
[PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
Every time kvm_mmu_get_page() is called with a non-NULL parent_pte argument, link_shadow_page() follows that to set the parent entry so that the new mapping will point to the returned page table. Moving parent_pte handling there allows to clean up the code because parent_pte is passed to kvm_mmu_get_page() just for mark_unsync() and mmu_page_add_parent_pte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 21 - arch/x86/kvm/paging_tmpl.h | 6 ++ 2 files changed, 10 insertions(+), 17 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9273cd4..33fe720 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2108,14 +2108,9 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); - if (parent_pte) - mark_unsync(parent_pte); } - mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); sp->gfn = gfn; sp->role = role; @@ -2196,7 +2190,8 @@ static void shadow_walk_next(struct kvm_shadow_walk_iterator *iterator) return __shadow_walk_next(iterator, *iterator->sptep); } -static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed) +static void link_shadow_page(struct kvm_vcpu *vcpu, u64 *sptep, +struct kvm_mmu_page *sp, bool accessed) { u64 spte; @@ -2210,6 +2205,11 @@ static void link_shadow_page(u64 *sptep, struct kvm_mmu_page *sp, bool accessed) spte |= shadow_accessed_mask; mmu_spte_set(sptep, spte); + + if (sp->unsync_children || sp->unsync) + mark_unsync(sptep); + + mmu_page_add_parent_pte(vcpu, sp, sptep); } static void validate_direct_spte(struct kvm_vcpu *vcpu, u64 *sptep, @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; @@ -2738,7 +2733,7 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, iterator.level - 1, 1, ACC_ALL, iterator.sptep); - link_shadow_page(iterator.sptep, sp, true); + link_shadow_page(vcpu, iterator.sptep, sp, true); } } return emulate; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 9d21b44..f414ca6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } for (; @@ -618,7 +618,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, sp = kvm_mmu_get_page(vcpu, direct_gfn, addr, it.level-1, true, direct_access, it.sptep); - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } clear_sp_write_flooding_count(it.sptep); @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 06/10] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which must not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 26 +- 2 files changed, 19 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 1691171..ee7b101 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1079,17 +1079,23 @@ struct rmap_iterator { */ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap) return NULL; if (!(rmap & 1)) { iter->desc = NULL; - return (u64 *)rmap; + sptep = (u64 *)rmap; + goto out; } iter->desc = (struct pte_list_desc *)(rmap & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1099,14 +1105,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1114,17 +1120,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \ - _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ - _spte_ = rmap_get_next(_iter_)) + _spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1338,7 +1347,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp) bool flush = false; while ((sptep = rmap_get_first(*rmapp, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 05/10] KVM: x86: MMU: Use for_each_rmap_spte macro instead of pte_list_walk()
kvm_mmu_mark_parents_unsync() alone uses pte_list_walk(), witch does nearly the same as the for_each_rmap_spte macro. The only difference is that is_shadow_present_pte() checks cannot be placed there because kvm_mmu_mark_parents_unsync() can be called with a new parent pointer whose entry is not set yet. By calling mark_unsync() separately for the parent and adding the parent pointer to the parent_ptes chain later in kvm_mmu_get_page(), the macro works with no problem. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 36 +--- 1 file changed, 13 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index e8cfdc4..1691171 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1007,26 +1007,6 @@ static void pte_list_remove(u64 *spte, unsigned long *pte_list) } } -typedef void (*pte_list_walk_fn) (u64 *spte); -static void pte_list_walk(unsigned long *pte_list, pte_list_walk_fn fn) -{ - struct pte_list_desc *desc; - int i; - - if (!*pte_list) - return; - - if (!(*pte_list & 1)) - return fn((u64 *)*pte_list); - - desc = (struct pte_list_desc *)(*pte_list & ~1ul); - while (desc) { - for (i = 0; i < PTE_LIST_EXT && desc->sptes[i]; ++i) - fn(desc->sptes[i]); - desc = desc->more; - } -} - static unsigned long *__gfn_to_rmap(gfn_t gfn, int level, struct kvm_memory_slot *slot) { @@ -1741,7 +1721,12 @@ static struct kvm_mmu_page *kvm_mmu_alloc_page(struct kvm_vcpu *vcpu, static void mark_unsync(u64 *spte); static void kvm_mmu_mark_parents_unsync(struct kvm_mmu_page *sp) { - pte_list_walk(>parent_ptes, mark_unsync); + u64 *sptep; + struct rmap_iterator iter; + + for_each_rmap_spte(>parent_ptes, , sptep) { + mark_unsync(sptep); + } } static void mark_unsync(u64 *spte) @@ -2111,12 +2096,17 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) break; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); if (sp->unsync_children) { kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); kvm_mmu_mark_parents_unsync(sp); - } else if (sp->unsync) + if (parent_pte) + mark_unsync(parent_pte); + } else if (sp->unsync) { kvm_mmu_mark_parents_unsync(sp); + if (parent_pte) + mark_unsync(parent_pte); + } + mmu_page_add_parent_pte(vcpu, sp, parent_pte); __clear_sp_write_flooding_count(sp); trace_kvm_mmu_get_page(sp, false); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 09/10 RFC] KVM: x86: MMU: Move parent_pte handling from kvm_mmu_get_page() to link_shadow_page()
On 2015/11/12 23:27, Paolo Bonzini wrote: On 12/11/2015 12:56, Takuya Yoshikawa wrote: diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index 9d21b44..f414ca6 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -598,7 +598,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, goto out_gpte_changed; if (sp) - link_shadow_page(it.sptep, sp, PT_GUEST_ACCESSED_MASK); + link_shadow_page(vcpu, it.sptep, sp, PT_GUEST_ACCESSED_MASK); } Here I think you can remove completely the if (sp) kvm_mmu_put_page(sp, it.sptep); later in FNAME(fetch). Apart from this nit, it's okay. Yes, that's what this patch does below: @@ -629,8 +629,6 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, return emulate; out_gpte_changed: - if (sp) - kvm_mmu_put_page(sp, it.sptep); kvm_release_pfn_clean(pfn); return 0; } Since this is the only user of kvm_mmu_put_page(), it also removes the definition: @@ -2268,11 +2268,6 @@ static void kvm_mmu_page_unlink_children(struct kvm *kvm, mmu_page_zap_pte(kvm, sp, sp->spt + i); } -static void kvm_mmu_put_page(struct kvm_mmu_page *sp, u64 *parent_pte) -{ - mmu_page_remove_parent_pte(sp, parent_pte); -} - static void kvm_mmu_unlink_parents(struct kvm *kvm, struct kvm_mmu_page *sp) { u64 *sptep; Actually, I don't understand why this is named kvm_mmu_put_page() for just removing parent_pte pointer from the sp->parent_ptes pointer chain. On to kvm_mmu_get_page... if (!direct) { if (rmap_write_protect(vcpu, gfn)) kvm_flush_remote_tlbs(vcpu->kvm); if (level > PT_PAGE_TABLE_LEVEL && need_sync) kvm_sync_pages(vcpu, gfn); This seems fishy. need_sync is set if sp->unsync, but then the parents have not been unsynced yet. Reaching here means that kvm_mmu_get_page() could not return sp from inside the for_each_gfn_sp() loop above, so even without this patch, mark_unsync() has not been called. Here, sp holds the new page allocated by kvm_mmu_alloc_page(). One confusing thing is that hlist_add_head() right before this "if (!direct)" line has already added the new sp to the hash list, so it will be found by for_each_gfn_indirect_valid_sp() in kvm_sync_pages(). Because this sp is new and sp->unsync is not set, kvm_sync_pages() will just skip it and look for other sp's whose ->unsync were found to be set in the for_each_gfn_sp() loop. I'm not 100% sure if the existence of the parent_pte pointer in the newly created sp->parent_ptes chain alone makes any difference: @@ -2127,7 +2122,6 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, sp = kvm_mmu_alloc_page(vcpu, direct); sp->parent_ptes.val = 0; - mmu_page_add_parent_pte(vcpu, sp, parent_pte); sp->gfn = gfn; sp->role = role; On the other hand, all calls to kvm_mmu_get_page except for the roots are followed by link_shadow_page... Perhaps if parent_pte != NULL you can call link_shadow_page directly from kvm_mmu_get_page. The call would go before the "if (!direct)" and it would subsume all the existing calls. We could probably also warn if (parent_pte == NULL) != (level == vcpu->arch.mmu.root_level) in kvm_mmu_get_page. I think we should set the spte after init_shadow_page_table(), and to make this subsume all the existing calls, we need to change the "return sp;" in the for_each_gfn_sp() loop to a goto statement so that the end of this function will become something like this: init_shadow_page(sp); out: if (parent_pte) { mmu_page_add_parent_pte(vcpu, sp, parent_pte); link_shadow_page(parent_pte, sp, accessed); } trace_kvm_mmu_get_page(sp, created); return sp; So, "bool accessed" needs to be passed to kvm_mmu_get_page(). But any way, we need to understand if mmu_page_add_parent_pte() really needs to be placed before the "if (!direct)" block. Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
On 2015/11/09 19:14, Paolo Bonzini wrote: Can you also change kvm_mmu_mark_parents_unsync to use for_each_rmap_spte instead of pte_list_walk? It is the last use of pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte with parent_ptes as the argument. No problem, I will do. Since parent_ptes is also explained as the "reverse mapping" list of parent sptes (in mmu.txt and kvm_host.h), using rmap helpers will not look so strange. BTW, on my todo list is to change the rmap items to a struct (with a single u64 inside) for type safety. Since you are touching this code, perhaps you can give it a shot? Yes, almost done here (assuming that you mean 'unsigned long'). But I have some candidates for its name in mind: 1. struct kvm_rmap { unsigned long val; }; 2. struct kvm_rmap_head { unsigned long val; }; 3. struct kvm_rmap_list_head { unsigned long val; }; 4. struct kvm_spte_list_head { unsigned long val; }; Since this is the head of the reverse mapping list of sptes, I thought name 3 might be the best and first made a patch with it, but it was a bit longer than I had hoped it to be. I have changed it to name 2, and it looks a bit nicer now, but even shorter, e.g. name 1, may be good as well. Do you have any preference? Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
On 2015/11/09 19:14, Paolo Bonzini wrote: Can you also change kvm_mmu_mark_parents_unsync to use for_each_rmap_spte instead of pte_list_walk? It is the last use of pte_list_walk, and it's nice if we have two uses of for_each_rmap_spte with parent_ptes as the argument. No problem, I will do. Since parent_ptes is also explained as the "reverse mapping" list of parent sptes (in mmu.txt and kvm_host.h), using rmap helpers will not look so strange. BTW, on my todo list is to change the rmap items to a struct (with a single u64 inside) for type safety. Since you are touching this code, perhaps you can give it a shot? Yes, almost done here (assuming that you mean 'unsigned long'). But I have some candidates for its name in mind: 1. struct kvm_rmap { unsigned long val; }; 2. struct kvm_rmap_head { unsigned long val; }; 3. struct kvm_rmap_list_head { unsigned long val; }; 4. struct kvm_spte_list_head { unsigned long val; }; Since this is the head of the reverse mapping list of sptes, I thought name 3 might be the best and first made a patch with it, but it was a bit longer than I had hoped it to be. I have changed it to name 2, and it looks a bit nicer now, but even shorter, e.g. name 1, may be good as well. Do you have any preference? Takuya -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which should not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 31 ++- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c5e2363..353d752 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1099,17 +1099,28 @@ struct rmap_iterator { */ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap) return NULL; if (!(rmap & 1)) { iter->desc = NULL; - return (u64 *)rmap; + sptep = (u64 *)rmap; + goto out; } iter->desc = (struct pte_list_desc *)(rmap & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + /* +* Parent sptes found in sp->parent_ptes lists are also checked here +* since kvm_mmu_unlink_parents() uses this function. If the condition +* needs to be changed for them, make another wrapper function. +*/ + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \ - _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ - _spte_ = rmap_get_next(_iter_)) + _spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp) bool flush = false; while ((sptep = rmap_get_first(*rmapp, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without no clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 69e7d20..c5e2363 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 03d518e..90ee420 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a9622a2..69e7d20 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index b41faa9..de24499 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw->gfn, pfn, prefault, map_writable); +
[PATCH 2/5] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a76bc04..a9622a2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7d85bca..a76bc04 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,8 +3017,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); @@ -3541,8 +3539,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work
Patch 1/2/3 are easy ones. Following two, patch 4/5, may not be ideal solutions, but at least explain, or try to explain, the problems. Takuya Yoshikawa (5): KVM: x86: MMU: Remove unused parameter of __direct_map() KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap KVM: x86: MMU: Make mmu_set_spte() return emulate value KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/kvm/mmu.c| 118 -- arch/x86/kvm/mmu_audit.c | 2 +- arch/x86/kvm/paging_tmpl.h| 10 ++-- 4 files changed, 70 insertions(+), 64 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/5] KVM: x86: MMU: Remove unused parameter of __direct_map()
Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 7d85bca..a76bc04 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2708,9 +2708,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) __direct_pte_prefetch(vcpu, sp, sptep); } -static int __direct_map(struct kvm_vcpu *vcpu, gpa_t v, int write, - int map_writable, int level, gfn_t gfn, pfn_t pfn, - bool prefault) +static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, + int level, gfn_t gfn, pfn_t pfn, bool prefault) { struct kvm_shadow_walk_iterator iterator; struct kvm_mmu_page *sp; @@ -3018,8 +3017,7 @@ static int nonpaging_map(struct kvm_vcpu *vcpu, gva_t v, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, v, write, map_writable, level, gfn, pfn, -prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); @@ -3541,8 +3539,7 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t gpa, u32 error_code, make_mmu_pages_available(vcpu); if (likely(!force_pt_level)) transparent_hugepage_adjust(vcpu, , , ); - r = __direct_map(vcpu, gpa, write, map_writable, -level, gfn, pfn, prefault); + r = __direct_map(vcpu, write, map_writable, level, gfn, pfn, prefault); spin_unlock(>kvm->mmu_lock); return r; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte()
is_rmap_spte(), originally named is_rmap_pte(), was introduced when the simple reverse mapping was implemented by commit cd4a4e5374110444 ("[PATCH] KVM: MMU: Implement simple reverse mapping"). At that point, its role was clear and only rmap_add() and rmap_remove() were using it to select sptes that need to be reverse-mapped. Independently of that, is_shadow_present_pte() was first introduced by commit c7addb902054195b ("KVM: Allow not-present guest page faults to bypass kvm") to do bypass_guest_pf optimization, which does not exist any more. These two seem to have changed their roles somewhat, and is_rmap_spte() just calls is_shadow_present_pte() now. Since using both of them without no clear distinction just makes the code confusing, remove is_rmap_spte(). Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 13 - arch/x86/kvm/mmu_audit.c | 2 +- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 69e7d20..c5e2363 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -311,11 +311,6 @@ static int is_large_pte(u64 pte) return pte & PT_PAGE_SIZE_MASK; } -static int is_rmap_spte(u64 pte) -{ - return is_shadow_present_pte(pte); -} - static int is_last_spte(u64 pte, int level) { if (level == PT_PAGE_TABLE_LEVEL) @@ -540,7 +535,7 @@ static bool mmu_spte_update(u64 *sptep, u64 new_spte) u64 old_spte = *sptep; bool ret = false; - WARN_ON(!is_rmap_spte(new_spte)); + WARN_ON(!is_shadow_present_pte(new_spte)); if (!is_shadow_present_pte(old_spte)) { mmu_spte_set(sptep, new_spte); @@ -595,7 +590,7 @@ static int mmu_spte_clear_track_bits(u64 *sptep) else old_spte = __update_clear_spte_slow(sptep, 0ull); - if (!is_rmap_spte(old_spte)) + if (!is_shadow_present_pte(old_spte)) return 0; pfn = spte_to_pfn(old_spte); @@ -2575,7 +2570,7 @@ static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); - if (is_rmap_spte(*sptep)) { + if (is_shadow_present_pte(*sptep)) { /* * If we overwrite a PTE page pointer with a 2MB PMD, unlink * the parent of the now unreachable PTE. @@ -2919,7 +2914,7 @@ static bool fast_page_fault(struct kvm_vcpu *vcpu, gva_t gva, int level, * If the mapping has been changed, let the vcpu fault on the * same address again. */ - if (!is_rmap_spte(spte)) { + if (!is_shadow_present_pte(spte)) { ret = true; goto exit; } diff --git a/arch/x86/kvm/mmu_audit.c b/arch/x86/kvm/mmu_audit.c index 03d518e..90ee420 100644 --- a/arch/x86/kvm/mmu_audit.c +++ b/arch/x86/kvm/mmu_audit.c @@ -183,7 +183,7 @@ static void check_mappings_rmap(struct kvm *kvm, struct kvm_mmu_page *sp) return; for (i = 0; i < PT64_ENT_PER_PAGE; ++i) { - if (!is_rmap_spte(sp->spt[i])) + if (!is_shadow_present_pte(sp->spt[i])) continue; inspect_spte_has_rmap(kvm, sp->spt + i); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/5] KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap
Both __mmu_unsync_walk() and mmu_pages_clear_parents() have three line code which clears a bit in the unsync child bitmap; the former places it inside a loop block and uses a few goto statements to jump to it. A new helper function, clear_unsync_child_bit(), makes the code cleaner. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a76bc04..a9622a2 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1806,6 +1806,13 @@ static int mmu_pages_add(struct kvm_mmu_pages *pvec, struct kvm_mmu_page *sp, return (pvec->nr == KVM_PAGE_ARRAY_NR); } +static inline void clear_unsync_child_bit(struct kvm_mmu_page *sp, int idx) +{ + --sp->unsync_children; + WARN_ON((int)sp->unsync_children < 0); + __clear_bit(idx, sp->unsync_child_bitmap); +} + static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_pages *pvec) { @@ -1815,8 +1822,10 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, struct kvm_mmu_page *child; u64 ent = sp->spt[i]; - if (!is_shadow_present_pte(ent) || is_large_pte(ent)) - goto clear_child_bitmap; + if (!is_shadow_present_pte(ent) || is_large_pte(ent)) { + clear_unsync_child_bit(sp, i); + continue; + } child = page_header(ent & PT64_BASE_ADDR_MASK); @@ -1825,28 +1834,21 @@ static int __mmu_unsync_walk(struct kvm_mmu_page *sp, return -ENOSPC; ret = __mmu_unsync_walk(child, pvec); - if (!ret) - goto clear_child_bitmap; - else if (ret > 0) + if (!ret) { + clear_unsync_child_bit(sp, i); + continue; + } else if (ret > 0) { nr_unsync_leaf += ret; - else + } else return ret; } else if (child->unsync) { nr_unsync_leaf++; if (mmu_pages_add(pvec, child, i)) return -ENOSPC; } else -goto clear_child_bitmap; - - continue; - -clear_child_bitmap: - __clear_bit(i, sp->unsync_child_bitmap); - sp->unsync_children--; - WARN_ON((int)sp->unsync_children < 0); + clear_unsync_child_bit(sp, i); } - return nr_unsync_leaf; } @@ -2009,9 +2011,7 @@ static void mmu_pages_clear_parents(struct mmu_page_path *parents) if (!sp) return; - --sp->unsync_children; - WARN_ON((int)sp->unsync_children < 0); - __clear_bit(idx, sp->unsync_child_bitmap); + clear_unsync_child_bit(sp, idx); level++; } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); } -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 3/5] KVM: x86: MMU: Make mmu_set_spte() return emulate value
mmu_set_spte()'s code is based on the assumption that the emulate parameter has a valid pointer value if set_spte() returns true and write_fault is not zero. In other cases, emulate may be NULL, so a NULL-check is needed. Stop passing emulate pointer and make mmu_set_spte() return the emulate value instead to clean up this complex interface. Prefetch functions can just throw away the return value. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 27 ++- arch/x86/kvm/paging_tmpl.h | 10 +- 2 files changed, 19 insertions(+), 18 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index a9622a2..69e7d20 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2564,13 +2564,13 @@ done: return ret; } -static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, -unsigned pte_access, int write_fault, int *emulate, -int level, gfn_t gfn, pfn_t pfn, bool speculative, -bool host_writable) +static bool mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, unsigned pte_access, +int write_fault, int level, gfn_t gfn, pfn_t pfn, +bool speculative, bool host_writable) { int was_rmapped = 0; int rmap_count; + bool emulate = false; pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, *sptep, write_fault, gfn); @@ -2600,12 +2600,12 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, if (set_spte(vcpu, sptep, pte_access, level, gfn, pfn, speculative, true, host_writable)) { if (write_fault) - *emulate = 1; + emulate = true; kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); } - if (unlikely(is_mmio_spte(*sptep) && emulate)) - *emulate = 1; + if (unlikely(is_mmio_spte(*sptep))) + emulate = true; pgprintk("%s: setting spte %llx\n", __func__, *sptep); pgprintk("instantiating %s PTE (%s) at %llx (%llx) addr %p\n", @@ -2624,6 +2624,8 @@ static void mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, } kvm_release_pfn_clean(pfn); + + return emulate; } static pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, @@ -2658,9 +2660,8 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, return -1; for (i = 0; i < ret; i++, gfn++, start++) - mmu_set_spte(vcpu, start, access, 0, NULL, -sp->role.level, gfn, page_to_pfn(pages[i]), -true, true); + mmu_set_spte(vcpu, start, access, 0, sp->role.level, gfn, +page_to_pfn(pages[i]), true, true); return 0; } @@ -2721,9 +2722,9 @@ static int __direct_map(struct kvm_vcpu *vcpu, int write, int map_writable, for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) { if (iterator.level == level) { - mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, -write, , level, gfn, pfn, -prefault, map_writable); + emulate = mmu_set_spte(vcpu, iterator.sptep, ACC_ALL, + write, level, gfn, pfn, prefault, + map_writable); direct_pte_prefetch(vcpu, iterator.sptep); ++vcpu->stat.pf_fixed; break; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index b41faa9..de24499 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -475,8 +475,8 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, * we call mmu_set_spte() with host_writable = true because * pte_prefetch_gfn_to_pfn always gets a writable pfn. */ - mmu_set_spte(vcpu, spte, pte_access, 0, NULL, PT_PAGE_TABLE_LEVEL, -gfn, pfn, true, true); + mmu_set_spte(vcpu, spte, pte_access, 0, PT_PAGE_TABLE_LEVEL, gfn, pfn, +true, true); return true; } @@ -556,7 +556,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, struct kvm_mmu_page *sp = NULL; struct kvm_shadow_walk_iterator it; unsigned direct_access, access = gw->pt_access; - int top_level, emulate = 0; + int top_level, emulate; direct_access = gw->pte_access; @@ -622,8 +622,8 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr, } clear_sp_write_flooding_count(it.sptep); - mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, , -it.level, gw-
[PATCH 0/5] KVM: x86: MMU: Clean up x86's mmu code for future work
Patch 1/2/3 are easy ones. Following two, patch 4/5, may not be ideal solutions, but at least explain, or try to explain, the problems. Takuya Yoshikawa (5): KVM: x86: MMU: Remove unused parameter of __direct_map() KVM: x86: MMU: Add helper function to clear a bit in unsync child bitmap KVM: x86: MMU: Make mmu_set_spte() return emulate value KVM: x86: MMU: Remove is_rmap_spte() and use is_shadow_present_pte() KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes Documentation/virtual/kvm/mmu.txt | 4 +- arch/x86/kvm/mmu.c| 118 -- arch/x86/kvm/mmu_audit.c | 2 +- arch/x86/kvm/paging_tmpl.h| 10 ++-- 4 files changed, 70 insertions(+), 64 deletions(-) -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] KVM: x86: MMU: Consolidate WARN_ON/BUG_ON checks for reverse-mapped sptes
At some call sites of rmap_get_first() and rmap_get_next(), BUG_ON is placed right after the call to detect unrelated sptes which should not be found in the reverse-mapping list. Move this check in rmap_get_first/next() so that all call sites, not just the users of the for_each_rmap_spte() macro, will be checked the same way. In addition, change the BUG_ON to WARN_ON since killing the whole host is the last thing that KVM should try. One thing to keep in mind is that kvm_mmu_unlink_parents() also uses rmap_get_first() to handle parent sptes. The change will not break it because parent sptes are present, at least until drop_parent_pte() actually unlinks them, and not mmio-sptes. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- Documentation/virtual/kvm/mmu.txt | 4 ++-- arch/x86/kvm/mmu.c| 31 ++- 2 files changed, 24 insertions(+), 11 deletions(-) diff --git a/Documentation/virtual/kvm/mmu.txt b/Documentation/virtual/kvm/mmu.txt index 3a4d681..daf9c0f 100644 --- a/Documentation/virtual/kvm/mmu.txt +++ b/Documentation/virtual/kvm/mmu.txt @@ -203,10 +203,10 @@ Shadow pages contain the following information: page cannot be destroyed. See role.invalid. parent_ptes: The reverse mapping for the pte/ptes pointing at this page's spt. If -parent_ptes bit 0 is zero, only one spte points at this pages and +parent_ptes bit 0 is zero, only one spte points at this page and parent_ptes points at this single spte, otherwise, there exists multiple sptes pointing at this page and (parent_ptes & ~0x1) points at a data -structure with a list of parent_ptes. +structure with a list of parent sptes. unsync: If true, then the translations in this page may not match the guest's translation. This is equivalent to the state of the tlb when a pte is diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index c5e2363..353d752 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -1099,17 +1099,28 @@ struct rmap_iterator { */ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) { + u64 *sptep; + if (!rmap) return NULL; if (!(rmap & 1)) { iter->desc = NULL; - return (u64 *)rmap; + sptep = (u64 *)rmap; + goto out; } iter->desc = (struct pte_list_desc *)(rmap & ~1ul); iter->pos = 0; - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; +out: + /* +* Parent sptes found in sp->parent_ptes lists are also checked here +* since kvm_mmu_unlink_parents() uses this function. If the condition +* needs to be changed for them, make another wrapper function. +*/ + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } /* @@ -1119,14 +1130,14 @@ static u64 *rmap_get_first(unsigned long rmap, struct rmap_iterator *iter) */ static u64 *rmap_get_next(struct rmap_iterator *iter) { + u64 *sptep; + if (iter->desc) { if (iter->pos < PTE_LIST_EXT - 1) { - u64 *sptep; - ++iter->pos; sptep = iter->desc->sptes[iter->pos]; if (sptep) - return sptep; + goto out; } iter->desc = iter->desc->more; @@ -1134,17 +1145,20 @@ static u64 *rmap_get_next(struct rmap_iterator *iter) if (iter->desc) { iter->pos = 0; /* desc->sptes[0] cannot be NULL */ - return iter->desc->sptes[iter->pos]; + sptep = iter->desc->sptes[iter->pos]; + goto out; } } return NULL; +out: + WARN_ON(!is_shadow_present_pte(*sptep)); + return sptep; } #define for_each_rmap_spte(_rmap_, _iter_, _spte_) \ for (_spte_ = rmap_get_first(*_rmap_, _iter_); \ - _spte_ && ({BUG_ON(!is_shadow_present_pte(*_spte_)); 1;}); \ - _spte_ = rmap_get_next(_iter_)) + _spte_; _spte_ = rmap_get_next(_iter_)) static void drop_spte(struct kvm *kvm, u64 *sptep) { @@ -1358,7 +1372,6 @@ static bool kvm_zap_rmapp(struct kvm *kvm, unsigned long *rmapp) bool flush = false; while ((sptep = rmap_get_first(*rmapp, ))) { - BUG_ON(!(*sptep & PT_PRESENT_MASK)); rmap_printk("%s: spte %p %llx.\n", __func__, sptep, *sptep); drop_spte(kvm, sptep); -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()
Commit fd1369021878 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()") forgot to initialize force_pt_level to false in FNAME(page_fault)() before calling mapping_level() like nonpaging_map() does. This can sometimes result in forcing page table level mapping unnecessarily. Fix this and move the first *force_pt_level check in mapping_level() before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that the variable must be initialized before mapping_level() gets called. This change can also avoid calling kvm_vcpu_gfn_to_memslot() when !check_hugepage_cache_consistency() check in tdp_page_fault() forces page table level mapping. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 7 --- arch/x86/kvm/paging_tmpl.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index dd2a7c6..7d85bca 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -886,10 +886,11 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, int host_level, level, max_level; struct kvm_memory_slot *slot; - slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); + if (unlikely(*force_pt_level)) + return PT_PAGE_TABLE_LEVEL; - if (likely(!*force_pt_level)) - *force_pt_level = !memslot_valid_for_gpte(slot, true); + slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); + *force_pt_level = !memslot_valid_for_gpte(slot, true); if (unlikely(*force_pt_level)) return PT_PAGE_TABLE_LEVEL; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index bf39d0f..b41faa9 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -698,7 +698,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, int r; pfn_t pfn; int level = PT_PAGE_TABLE_LEVEL; - bool force_pt_level; + bool force_pt_level = false; unsigned long mmu_seq; bool map_writable, is_self_change_mapping; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] KVM: x86: MMU: Initialize force_pt_level before calling mapping_level()
Commit fd1369021878 ("KVM: x86: MMU: Move mapping_level_dirty_bitmap() call in mapping_level()") forgot to initialize force_pt_level to false in FNAME(page_fault)() before calling mapping_level() like nonpaging_map() does. This can sometimes result in forcing page table level mapping unnecessarily. Fix this and move the first *force_pt_level check in mapping_level() before kvm_vcpu_gfn_to_memslot() call to make it a bit clearer that the variable must be initialized before mapping_level() gets called. This change can also avoid calling kvm_vcpu_gfn_to_memslot() when !check_hugepage_cache_consistency() check in tdp_page_fault() forces page table level mapping. Signed-off-by: Takuya Yoshikawa <yoshikawa_takuya...@lab.ntt.co.jp> --- arch/x86/kvm/mmu.c | 7 --- arch/x86/kvm/paging_tmpl.h | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index dd2a7c6..7d85bca 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -886,10 +886,11 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, int host_level, level, max_level; struct kvm_memory_slot *slot; - slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); + if (unlikely(*force_pt_level)) + return PT_PAGE_TABLE_LEVEL; - if (likely(!*force_pt_level)) - *force_pt_level = !memslot_valid_for_gpte(slot, true); + slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); + *force_pt_level = !memslot_valid_for_gpte(slot, true); if (unlikely(*force_pt_level)) return PT_PAGE_TABLE_LEVEL; diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index bf39d0f..b41faa9 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -698,7 +698,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, int r; pfn_t pfn; int level = PT_PAGE_TABLE_LEVEL; - bool force_pt_level; + bool force_pt_level = false; unsigned long mmu_seq; bool map_writable, is_self_change_mapping; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 5/5] KVM: x86: MMU: Eliminate an extra memory slot search in mapping_level()
Calling kvm_vcpu_gfn_to_memslot() twice in mapping_level() should be avoided since getting a slot by binary search may not be negligible, especially for virtual machines with many memory slots. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 17 +++-- 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 09833b0..dd2a7c6 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -818,14 +818,11 @@ static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) kvm->arch.indirect_shadow_pages--; } -static int has_wrprotected_page(struct kvm_vcpu *vcpu, - gfn_t gfn, - int level) +static int __has_wrprotected_page(gfn_t gfn, int level, + struct kvm_memory_slot *slot) { - struct kvm_memory_slot *slot; struct kvm_lpage_info *linfo; - slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); if (slot) { linfo = lpage_info_slot(gfn, slot, level); return linfo->write_count; @@ -834,6 +831,14 @@ static int has_wrprotected_page(struct kvm_vcpu *vcpu, return 1; } +static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level) +{ + struct kvm_memory_slot *slot; + + slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); + return __has_wrprotected_page(gfn, level, slot); +} + static int host_mapping_level(struct kvm *kvm, gfn_t gfn) { unsigned long page_size; @@ -896,7 +901,7 @@ static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, max_level = min(kvm_x86_ops->get_lpage_level(), host_level); for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) - if (has_wrprotected_page(vcpu, large_gfn, level)) + if (__has_wrprotected_page(large_gfn, level, slot)) break; return level - 1; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/5] KVM: x86: MMU: Remove mapping_level_dirty_bitmap()
Now that it has only one caller, and its name is not so helpful for readers, remove it. Instead, the new memslot_valid_for_gpte() function makes it possible to share the common code. Signed-off-by: Takuya Yoshikawa --- arch/x86/kvm/mmu.c | 24 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 890cd69..09833b0 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -851,6 +851,17 @@ static int host_mapping_level(struct kvm *kvm, gfn_t gfn) return ret; } +static inline bool memslot_valid_for_gpte(struct kvm_memory_slot *slot, + bool no_dirty_log) +{ + if (!slot || slot->flags & KVM_MEMSLOT_INVALID) + return false; + if (no_dirty_log && slot->dirty_bitmap) + return false; + + return true; +} + static struct kvm_memory_slot * gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, bool no_dirty_log) @@ -858,25 +869,22 @@ gfn_to_memslot_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t gfn, struct kvm_memory_slot *slot; slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); - if (!slot || slot->flags & KVM_MEMSLOT_INVALID || - (no_dirty_log && slot->dirty_bitmap)) + if (!memslot_valid_for_gpte(slot, no_dirty_log)) slot = NULL; return slot; } -static bool mapping_level_dirty_bitmap(struct kvm_vcpu *vcpu, gfn_t large_gfn) -{ - return !gfn_to_memslot_dirty_bitmap(vcpu, large_gfn, true); -} - static int mapping_level(struct kvm_vcpu *vcpu, gfn_t large_gfn, bool *force_pt_level) { int host_level, level, max_level; + struct kvm_memory_slot *slot; + + slot = kvm_vcpu_gfn_to_memslot(vcpu, large_gfn); if (likely(!*force_pt_level)) - *force_pt_level = mapping_level_dirty_bitmap(vcpu, large_gfn); + *force_pt_level = !memslot_valid_for_gpte(slot, true); if (unlikely(*force_pt_level)) return PT_PAGE_TABLE_LEVEL; -- 2.1.0 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/