Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 20/08/2014 03:03, David Matlack ha scritto: > On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong > wrote: >> On 08/19/2014 05:03 PM, Paolo Bonzini wrote: >>> Il 19/08/2014 10:50, Xiao Guangrong ha scritto: Okay, what confused me it that it seems that the single line patch is ok to you. :) >>> >>> No, it was late and I was confused. :) >>> Now, do we really need to care the case 2? like David said: "Sorry I didn't explain myself very well: Since we can get a single wrong mmio exit no matter what, it has to be handled in userspace. So my point was, it doesn't really help to fix that one very specific way that it can happen, because it can just happen in other ways. (E.g. update memslots occurs after is_noslot_pfn() and before mmio exit)." What's your idea? > I think if you always treat the low bit as zero in mmio sptes, you can > do that without losing a bit of the generation. What's you did is avoiding cache a invalid generation number into spte, but actually if we can figure it out when we check mmio access, it's ok. Like the updated patch i posted should fix it, that way avoids doubly increase the number. >>> >>> Yes. >>> Okay, if you're interested increasing the number doubly, there is the simpler one: >>> >>> This wastes a bit in the mmio spte though. My idea is to increase the >>> memslots generation twice, but drop the low bit in the mmio spte. >> >> Yeah, really smart idea. :) >> >> Paolo/David, would you mind making a patch for this (+ the comments in >> David's >> patch)? > > Paolo, since it was your idea would you like to write it? I don't mind either > way. Sure, I'll post the patch for review. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong wrote: > On 08/19/2014 05:03 PM, Paolo Bonzini wrote: >> Il 19/08/2014 10:50, Xiao Guangrong ha scritto: >>> Okay, what confused me it that it seems that the single line patch >>> is ok to you. :) >> >> No, it was late and I was confused. :) >> >>> Now, do we really need to care the case 2? like David said: >>> "Sorry I didn't explain myself very well: Since we can get a single wrong >>> mmio exit no matter what, it has to be handled in userspace. So my point >>> was, it doesn't really help to fix that one very specific way that it can >>> happen, because it can just happen in other ways. (E.g. update memslots >>> occurs after is_noslot_pfn() and before mmio exit)." >>> >>> What's your idea? >>> I think if you always treat the low bit as zero in mmio sptes, you can do that without losing a bit of the generation. >>> >>> What's you did is avoiding cache a invalid generation number into spte, but >>> actually if we can figure it out when we check mmio access, it's ok. Like >>> the >>> updated patch i posted should fix it, that way avoids doubly increase the >>> number. >> >> Yes. >> >>> Okay, if you're interested increasing the number doubly, there is the >>> simpler >>> one: >> >> This wastes a bit in the mmio spte though. My idea is to increase the >> memslots generation twice, but drop the low bit in the mmio spte. > > Yeah, really smart idea. :) > > Paolo/David, would you mind making a patch for this (+ the comments in David's > patch)? Paolo, since it was your idea would you like to write it? I don't mind either way. > > Please feel free to add my: > Reviewed-by: Xiao Guangrong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 05:03 PM, Paolo Bonzini wrote: > Il 19/08/2014 10:50, Xiao Guangrong ha scritto: >> Okay, what confused me it that it seems that the single line patch >> is ok to you. :) > > No, it was late and I was confused. :) > >> Now, do we really need to care the case 2? like David said: >> "Sorry I didn't explain myself very well: Since we can get a single wrong >> mmio exit no matter what, it has to be handled in userspace. So my point >> was, it doesn't really help to fix that one very specific way that it can >> happen, because it can just happen in other ways. (E.g. update memslots >> occurs after is_noslot_pfn() and before mmio exit)." >> >> What's your idea? >> >>> I think if you always treat the low bit as zero in mmio sptes, you can >>> do that without losing a bit of the generation. >> >> What's you did is avoiding cache a invalid generation number into spte, but >> actually if we can figure it out when we check mmio access, it's ok. Like the >> updated patch i posted should fix it, that way avoids doubly increase the >> number. > > Yes. > >> Okay, if you're interested increasing the number doubly, there is the simpler >> one: > > This wastes a bit in the mmio spte though. My idea is to increase the > memslots generation twice, but drop the low bit in the mmio spte. Yeah, really smart idea. :) Paolo/David, would you mind making a patch for this (+ the comments in David's patch)? Please feel free to add my: Reviewed-by: Xiao Guangrong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 19/08/2014 10:50, Xiao Guangrong ha scritto: > Okay, what confused me it that it seems that the single line patch > is ok to you. :) No, it was late and I was confused. :) > Now, do we really need to care the case 2? like David said: > "Sorry I didn't explain myself very well: Since we can get a single wrong > mmio exit no matter what, it has to be handled in userspace. So my point > was, it doesn't really help to fix that one very specific way that it can > happen, because it can just happen in other ways. (E.g. update memslots > occurs after is_noslot_pfn() and before mmio exit)." > > What's your idea? > > > I think if you always treat the low bit as zero in mmio sptes, you can > > do that without losing a bit of the generation. > > What's you did is avoiding cache a invalid generation number into spte, but > actually if we can figure it out when we check mmio access, it's ok. Like the > updated patch i posted should fix it, that way avoids doubly increase the > number. Yes. > Okay, if you're interested increasing the number doubly, there is the simpler > one: This wastes a bit in the mmio spte though. My idea is to increase the memslots generation twice, but drop the low bit in the mmio spte. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 04:28 PM, Paolo Bonzini wrote: > Il 19/08/2014 05:50, Xiao Guangrong ha scritto: >> >> Note in the step *, my approach detects the invalid generation-number which >> will invalidate the mmio spte properly . > > You are right, in fact my mail included another part: "Another > alternative could be to use the low bit to mark an in-progress change, > *and skip the caching if the low bit is set*." Okay, what confused me it that it seems that the single line patch is ok to you. :) Now, do we really need to care the case 2? like David said: "Sorry I didn't explain myself very well: Since we can get a single wrong mmio exit no matter what, it has to be handled in userspace. So my point was, it doesn't really help to fix that one very specific way that it can happen, because it can just happen in other ways. (E.g. update memslots occurs after is_noslot_pfn() and before mmio exit)." What's your idea? > > I think if you always treat the low bit as zero in mmio sptes, you can > do that without losing a bit of the generation. What's you did is avoiding cache a invalid generation number into spte, but actually if we can figure it out when we check mmio access, it's ok. Like the updated patch i posted should fix it, that way avoids doubly increase the number. Okay, if you're interested increasing the number doubly, there is the simpler one: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9314678..bf49170 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -236,6 +236,9 @@ static unsigned int get_mmio_spte_generation(u64 spte) static unsigned int kvm_current_mmio_generation(struct kvm *kvm) { + /* The initialized generation number should be even. */ + BUILD_BUG_ON((MMIO_MAX_GEN - 150) & 0x1); + /* * Init kvm generation close to MMIO_MAX_GEN to easily test the * code of handling generation number wrap-around. @@ -292,6 +295,14 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte) kvm_gen = kvm_current_mmio_generation(kvm); spte_gen = get_mmio_spte_generation(spte); + /* +* Aha, we cached a being-updated generation number or +* generation number is currnetly being updated, let do the +* real check for mmio access. +*/ + if ((kvm_gen | spte_gen) & 0x1) + return false; + trace_check_mmio_spte(spte, kvm_gen, spte_gen); return likely(kvm_gen == spte_gen); } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..5c3f7b7 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -725,7 +725,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, update_memslots(slots, new, kvm->memslots->generation); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); - + kvm->memslots->generation++; kvm_arch_memslots_updated(kvm); return old_memslots; -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 19/08/2014 05:50, Xiao Guangrong ha scritto: > > Note in the step *, my approach detects the invalid generation-number which > will invalidate the mmio spte properly . You are right, in fact my mail included another part: "Another alternative could be to use the low bit to mark an in-progress change, *and skip the caching if the low bit is set*." I think if you always treat the low bit as zero in mmio sptes, you can do that without losing a bit of the generation. Something like this (untested/uncompiled): diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 931467881da7..3a56d377c6d7 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask) EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask); /* - * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number, - * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation - * number. + * the low bit of the generation number is always presumed to be zero. + * This disables mmio caching during memslot updates. The concept is + * similar to a seqcount but instead of retrying the access we just punt + * and ignore the cache. + * + * spte bits 3-11 are used as bits 1-9 of the generation number, + * the bits 52-61 are used as bits 10-19 of the generation number. */ -#define MMIO_SPTE_GEN_LOW_SHIFT3 +#define MMIO_SPTE_GEN_LOW_SHIFT2 #define MMIO_SPTE_GEN_HIGH_SHIFT 52 -#define MMIO_GEN_SHIFT 19 -#define MMIO_GEN_LOW_SHIFT 9 -#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 1) +#define MMIO_GEN_SHIFT 20 +#define MMIO_GEN_LOW_SHIFT 10 +#define MMIO_GEN_LOW_MASK ((1 << MMIO_GEN_LOW_SHIFT) - 2) #define MMIO_GEN_MASK ((1 << MMIO_GEN_SHIFT) - 1) #define MMIO_MAX_GEN ((1 << MMIO_GEN_SHIFT) - 1) @@ -236,12 +240,7 @@ static unsigned int get_mmio_spte_generation(u64 spte) static unsigned int kvm_current_mmio_generation(struct kvm *kvm) { - /* -* Init kvm generation close to MMIO_MAX_GEN to easily test the -* code of handling generation number wrap-around. -*/ - return (kvm_memslots(kvm)->generation + - MMIO_MAX_GEN - 150) & MMIO_GEN_MASK; + return kvm_memslots(kvm)->generation & MMIO_GEN_MASK; } static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn, diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a69a623938b8..c7e2800313b8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -474,6 +476,13 @@ static struct kvm *kvm_create_vm(unsigned long type) kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL); if (!kvm->memslots) goto out_err_no_srcu; + + /* +* Init kvm generation close to MMIO_MAX_GEN to easily test the +* code of handling generation number wrap-around. +*/ + kvm->memslots->generation = -150; + kvm_init_memslots_id(kvm); if (init_srcu_struct(&kvm->srcu)) goto out_err_no_srcu; @@ -725,6 +732,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, synchronize_srcu_expedited(&kvm->srcu); kvm_arch_memslots_updated(kvm); + slots->generation++; + WARN_ON(slots->generation & 1); return old_memslots; } (modulo the changes to always set the generation in install_new_memslots, which I'm eliding for clarity). Moving the initialization to kvm_create_vm ensures that the low bit is untouched between install_new_memslots and kvm_current_mmio_generation. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 01:40 PM, David Matlack wrote: > On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong > wrote: >> On 08/19/2014 01:00 PM, David Matlack wrote: >>> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong >>> wrote: On 08/19/2014 12:31 PM, David Matlack wrote: > The single line patch I suggested was only intended to fix the "forever > incorrectly exit mmio". My patch also fixes this case and that does not doubly increase the number. I think this is the better one. >>> >>> I prefer doubly increasing the generation for this reason: the updated >>> boolean >>> requires extra code on the "client-side" to check if there's an update in >>> progress. And that makes it easy to get wrong. In fact, your patch >>> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the >>> generation requires no "client-side" code to work. >> >> No, the updated patch is used to fix case 2 which i draw the scenario in >> the last mail. I mean the original patch in this patchset which just >> increase the number after srcu-sync. >> >> Then could you tell me that your approach can do but my original patch can >> not? > > It avoids publishing new memslots with an old generation number attached to > them (even if it only lasts for a short period of time). I can not see the problem if that happen, could you please draw the scenario? > Do you have a reason > why you don't want to doubly increase the generation? That more easily causes the number wrap-around. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong wrote: > On 08/19/2014 01:00 PM, David Matlack wrote: >> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong >> wrote: >>> On 08/19/2014 12:31 PM, David Matlack wrote: The single line patch I suggested was only intended to fix the "forever incorrectly exit mmio". >>> >>> My patch also fixes this case and that does not doubly increase the >>> number. I think this is the better one. >> >> I prefer doubly increasing the generation for this reason: the updated >> boolean >> requires extra code on the "client-side" to check if there's an update in >> progress. And that makes it easy to get wrong. In fact, your patch >> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the >> generation requires no "client-side" code to work. > > No, the updated patch is used to fix case 2 which i draw the scenario in > the last mail. I mean the original patch in this patchset which just > increase the number after srcu-sync. > > Then could you tell me that your approach can do but my original patch can > not? It avoids publishing new memslots with an old generation number attached to them (even if it only lasts for a short period of time). Do you have a reason why you don't want to doubly increase the generation? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 01:00 PM, David Matlack wrote: > On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong > wrote: >> On 08/19/2014 12:31 PM, David Matlack wrote: >>> But it looks like you basically said the same thing earlier, so I think >>> we're on the same page. >>> >> >> Yes, that is what i try to explain in previous mails. :( > > I'm glad we understand each other now! Sorry again for my confusion. Yup, me too. :) > >>> The single line patch I suggested was only intended to fix the "forever >>> incorrectly exit mmio". >> >> My patch also fixes this case and that does not doubly increase the >> number. I think this is the better one. > > I prefer doubly increasing the generation for this reason: the updated boolean > requires extra code on the "client-side" to check if there's an update in > progress. And that makes it easy to get wrong. In fact, your patch > forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the > generation requires no "client-side" code to work. No, the updated patch is used to fix case 2 which i draw the scenario in the last mail. I mean the original patch in this patchset which just increase the number after srcu-sync. Then could you tell me that your approach can do but my original patch can not? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong wrote: > On 08/19/2014 12:31 PM, David Matlack wrote: >> But it looks like you basically said the same thing earlier, so I think >> we're on the same page. >> > > Yes, that is what i try to explain in previous mails. :( I'm glad we understand each other now! Sorry again for my confusion. >> The single line patch I suggested was only intended to fix the "forever >> incorrectly exit mmio". > > My patch also fixes this case and that does not doubly increase the > number. I think this is the better one. I prefer doubly increasing the generation for this reason: the updated boolean requires extra code on the "client-side" to check if there's an update in progress. And that makes it easy to get wrong. In fact, your patch forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the generation requires no "client-side" code to work. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 12:31 PM, David Matlack wrote: > On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong > wrote: >> On 08/19/2014 05:15 AM, David Matlack wrote: >>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong >>> wrote: @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn, static bool check_mmio_spte(struct kvm *kvm, u64 spte) { + struct kvm_memslots *slots = kvm_memslots(kvm); unsigned int kvm_gen, spte_gen; - kvm_gen = kvm_current_mmio_generation(kvm); + if (slots->updated) + return false; + + smp_rmb(); + + kvm_gen = __kvm_current_mmio_generation(slots); spte_gen = get_mmio_spte_generation(spte); >>> >>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless >>> we >>> block during memslot updates, which I don't think we should :). >> >> This exactly fixes case 2, slots->updated just acts as the "low bit" >> but avoid generation number wrap-around and trick handling of the number. >> More details please see below. >> >>> trace_check_mmio_spte(spte, kvm_gen, spte_gen); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4b6c01b..1d4e78f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,7 +96,7 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, u64 last_generation); + struct kvm_memory_slot *new); static void kvm_release_pfn_dirty(pfn_t pfn); static void mark_page_dirty_in_slot(struct kvm *kvm, @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) } static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, - u64 last_generation) + struct kvm_memory_slot *new) { if (new) { int id = new->id; @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, if (new->npages != npages) sort_memslots(slots); } - - slots->generation = last_generation + 1; } static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm->memslots; - update_memslots(slots, new, kvm->memslots->generation); + /* ensure generation number is always increased. */ + slots->updated = true; + slots->generation = old_memslots->generation; + update_memslots(slots, new); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; + smp_wmb(); + slots->updated = false; + kvm_arch_memslots_updated(kvm); return old_memslots; >>> >>> This is effectively the same as the first approach. >>> >>> I just realized how simple Paolo's idea is. I think it can be a one line >>> patch (without comments): >>> >>> [...] >>> update_memslots(slots, new, kvm->memslots->generation); >>> rcu_assign_pointer(kvm->memslots, slots); >>> synchronize_srcu_expedited(&kvm->srcu); >>> + slots->generation++; >>> >>> kvm_arch_memslots_updated(kvm); >>> [...] >> >> Really? Unfortunately no. :) >> >> See this scenario: >> >> CPU 0 CPU 1 >> ioctl registering a new memslot which >> contains GPA: >>page-fault handler: >> see it'a mmio access on GPA; >> >> assign the new memslots with generation number increased >> cache the generation-number into spte; >> fix the access and comeback to guest; >> SRCU-sync >> page-fault again and check the spte is a valid >> mmio-spte(*) >> generation-number++; >> return to userspace; >> do mmio-emulation and inject mmio-exit; >> >> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly >> said in the last mail. >> >> >> Note in the step *, my approach detects the invalid generation-number which >> will invalidate the mmio spte properly . > > Sorry I didn't explain myself very well: Since we can get a single wrong > mmio exit no matter what, it has to be handled in userspace. So my point > was, it doesn't really help to fix that one very specific way that it can > happen, because it can just happen in oth
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong wrote: > On 08/19/2014 05:15 AM, David Matlack wrote: >> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong >> wrote: >>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, >>> gfn_t gfn, >>> >>> static bool check_mmio_spte(struct kvm *kvm, u64 spte) >>> { >>> + struct kvm_memslots *slots = kvm_memslots(kvm); >>> unsigned int kvm_gen, spte_gen; >>> >>> - kvm_gen = kvm_current_mmio_generation(kvm); >>> + if (slots->updated) >>> + return false; >>> + >>> + smp_rmb(); >>> + >>> + kvm_gen = __kvm_current_mmio_generation(slots); >>> spte_gen = get_mmio_spte_generation(spte); >>> >> >> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we >> block during memslot updates, which I don't think we should :). > > This exactly fixes case 2, slots->updated just acts as the "low bit" > but avoid generation number wrap-around and trick handling of the number. > More details please see below. > >> >>> trace_check_mmio_spte(spte, kvm_gen, spte_gen); >>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >>> index 4b6c01b..1d4e78f 100644 >>> --- a/virt/kvm/kvm_main.c >>> +++ b/virt/kvm/kvm_main.c >>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void); >>> >>> static void kvm_io_bus_destroy(struct kvm_io_bus *bus); >>> static void update_memslots(struct kvm_memslots *slots, >>> - struct kvm_memory_slot *new, u64 >>> last_generation); >>> + struct kvm_memory_slot *new); >>> >>> static void kvm_release_pfn_dirty(pfn_t pfn); >>> static void mark_page_dirty_in_slot(struct kvm *kvm, >>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) >>> } >>> >>> static void update_memslots(struct kvm_memslots *slots, >>> - struct kvm_memory_slot *new, >>> - u64 last_generation) >>> + struct kvm_memory_slot *new) >>> { >>> if (new) { >>> int id = new->id; >>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, >>> if (new->npages != npages) >>> sort_memslots(slots); >>> } >>> - >>> - slots->generation = last_generation + 1; >>> } >>> >>> static int check_memory_region_flags(struct kvm_userspace_memory_region >>> *mem) >>> @@ -720,10 +717,17 @@ static struct kvm_memslots >>> *install_new_memslots(struct kvm *kvm, >>> { >>> struct kvm_memslots *old_memslots = kvm->memslots; >>> >>> - update_memslots(slots, new, kvm->memslots->generation); >>> + /* ensure generation number is always increased. */ >>> + slots->updated = true; >>> + slots->generation = old_memslots->generation; >>> + update_memslots(slots, new); >>> rcu_assign_pointer(kvm->memslots, slots); >>> synchronize_srcu_expedited(&kvm->srcu); >>> >>> + slots->generation++; >>> + smp_wmb(); >>> + slots->updated = false; >>> + >>> kvm_arch_memslots_updated(kvm); >>> >>> return old_memslots; >>> >> >> This is effectively the same as the first approach. >> >> I just realized how simple Paolo's idea is. I think it can be a one line >> patch (without comments): >> >> [...] >> update_memslots(slots, new, kvm->memslots->generation); >> rcu_assign_pointer(kvm->memslots, slots); >> synchronize_srcu_expedited(&kvm->srcu); >> + slots->generation++; >> >> kvm_arch_memslots_updated(kvm); >> [...] > > Really? Unfortunately no. :) > > See this scenario: > > CPU 0 CPU 1 > ioctl registering a new memslot which > contains GPA: >page-fault handler: > see it'a mmio access on GPA; > > assign the new memslots with generation number increased > cache the generation-number into spte; > fix the access and comeback to guest; > SRCU-sync > page-fault again and check the spte is a valid > mmio-spte(*) > generation-number++; > return to userspace; > do mmio-emulation and inject mmio-exit; > > !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly > said in the last mail. > > > Note in the step *, my approach detects the invalid generation-number which > will invalidate the mmio spte properly . Sorry I didn't explain myself very well: Since we can get a single wrong mmio exit no matter what, it has to be handled in userspace. So my point was, it doesn't really help to fix that one very specific way that it can happen, because it can just happen in other ways. (E.g. update memslots occurs after is_noslot_pfn() and before mmio exit). But it looks like you basically said the same thing earlier, so I think we're on the same page. The singl
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/19/2014 05:15 AM, David Matlack wrote: > On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong > wrote: >> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, >> gfn_t gfn, >> >> static bool check_mmio_spte(struct kvm *kvm, u64 spte) >> { >> + struct kvm_memslots *slots = kvm_memslots(kvm); >> unsigned int kvm_gen, spte_gen; >> >> - kvm_gen = kvm_current_mmio_generation(kvm); >> + if (slots->updated) >> + return false; >> + >> + smp_rmb(); >> + >> + kvm_gen = __kvm_current_mmio_generation(slots); >> spte_gen = get_mmio_spte_generation(spte); >> > > What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we > block during memslot updates, which I don't think we should :). This exactly fixes case 2, slots->updated just acts as the "low bit" but avoid generation number wrap-around and trick handling of the number. More details please see below. > >> trace_check_mmio_spte(spte, kvm_gen, spte_gen); >> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c >> index 4b6c01b..1d4e78f 100644 >> --- a/virt/kvm/kvm_main.c >> +++ b/virt/kvm/kvm_main.c >> @@ -96,7 +96,7 @@ static void hardware_disable_all(void); >> >> static void kvm_io_bus_destroy(struct kvm_io_bus *bus); >> static void update_memslots(struct kvm_memslots *slots, >> - struct kvm_memory_slot *new, u64 >> last_generation); >> + struct kvm_memory_slot *new); >> >> static void kvm_release_pfn_dirty(pfn_t pfn); >> static void mark_page_dirty_in_slot(struct kvm *kvm, >> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) >> } >> >> static void update_memslots(struct kvm_memslots *slots, >> - struct kvm_memory_slot *new, >> - u64 last_generation) >> + struct kvm_memory_slot *new) >> { >> if (new) { >> int id = new->id; >> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, >> if (new->npages != npages) >> sort_memslots(slots); >> } >> - >> - slots->generation = last_generation + 1; >> } >> >> static int check_memory_region_flags(struct kvm_userspace_memory_region >> *mem) >> @@ -720,10 +717,17 @@ static struct kvm_memslots >> *install_new_memslots(struct kvm *kvm, >> { >> struct kvm_memslots *old_memslots = kvm->memslots; >> >> - update_memslots(slots, new, kvm->memslots->generation); >> + /* ensure generation number is always increased. */ >> + slots->updated = true; >> + slots->generation = old_memslots->generation; >> + update_memslots(slots, new); >> rcu_assign_pointer(kvm->memslots, slots); >> synchronize_srcu_expedited(&kvm->srcu); >> >> + slots->generation++; >> + smp_wmb(); >> + slots->updated = false; >> + >> kvm_arch_memslots_updated(kvm); >> >> return old_memslots; >> > > This is effectively the same as the first approach. > > I just realized how simple Paolo's idea is. I think it can be a one line > patch (without comments): > > [...] > update_memslots(slots, new, kvm->memslots->generation); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > kvm_arch_memslots_updated(kvm); > [...] Really? Unfortunately no. :) See this scenario: CPU 0 CPU 1 ioctl registering a new memslot which contains GPA: page-fault handler: see it'a mmio access on GPA; assign the new memslots with generation number increased cache the generation-number into spte; fix the access and comeback to guest; SRCU-sync page-fault again and check the spte is a valid mmio-spte(*) generation-number++; return to userspace; do mmio-emulation and inject mmio-exit; !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly said in the last mail. Note in the step *, my approach detects the invalid generation-number which will invalidate the mmio spte properly . -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 2:24 PM, Paolo Bonzini wrote: > Il 18/08/2014 23:15, David Matlack ha scritto: >> I just realized how simple Paolo's idea is. I think it can be a one line >> patch (without comments): >> >> [...] >> update_memslots(slots, new, kvm->memslots->generation); >> rcu_assign_pointer(kvm->memslots, slots); >> synchronize_srcu_expedited(&kvm->srcu); >> + slots->generation++; >> >> kvm_arch_memslots_updated(kvm); >> [...] > > Yeah, you're right. I think at this point it makes sense to put all > generation handling in install_new_memslots, but with proper comments > the above can do as well. > > Would you like to send it? Patch 2 still applies on top. Sure, I like doing everything in install_new_memslots() as well so I'll fix that. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 18/08/2014 23:15, David Matlack ha scritto: > I just realized how simple Paolo's idea is. I think it can be a one line > patch (without comments): > > [...] > update_memslots(slots, new, kvm->memslots->generation); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > kvm_arch_memslots_updated(kvm); > [...] Yeah, you're right. I think at this point it makes sense to put all generation handling in install_new_memslots, but with proper comments the above can do as well. Would you like to send it? Patch 2 still applies on top. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong wrote: > @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, > gfn_t gfn, > > static bool check_mmio_spte(struct kvm *kvm, u64 spte) > { > + struct kvm_memslots *slots = kvm_memslots(kvm); > unsigned int kvm_gen, spte_gen; > > - kvm_gen = kvm_current_mmio_generation(kvm); > + if (slots->updated) > + return false; > + > + smp_rmb(); > + > + kvm_gen = __kvm_current_mmio_generation(slots); > spte_gen = get_mmio_spte_generation(spte); > What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we block during memslot updates, which I don't think we should :). > trace_check_mmio_spte(spte, kvm_gen, spte_gen); > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 4b6c01b..1d4e78f 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -96,7 +96,7 @@ static void hardware_disable_all(void); > > static void kvm_io_bus_destroy(struct kvm_io_bus *bus); > static void update_memslots(struct kvm_memslots *slots, > - struct kvm_memory_slot *new, u64 last_generation); > + struct kvm_memory_slot *new); > > static void kvm_release_pfn_dirty(pfn_t pfn); > static void mark_page_dirty_in_slot(struct kvm *kvm, > @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) > } > > static void update_memslots(struct kvm_memslots *slots, > - struct kvm_memory_slot *new, > - u64 last_generation) > + struct kvm_memory_slot *new) > { > if (new) { > int id = new->id; > @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, > if (new->npages != npages) > sort_memslots(slots); > } > - > - slots->generation = last_generation + 1; > } > > static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) > @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct > kvm *kvm, > { > struct kvm_memslots *old_memslots = kvm->memslots; > > - update_memslots(slots, new, kvm->memslots->generation); > + /* ensure generation number is always increased. */ > + slots->updated = true; > + slots->generation = old_memslots->generation; > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > > + slots->generation++; > + smp_wmb(); > + slots->updated = false; > + > kvm_arch_memslots_updated(kvm); > > return old_memslots; > This is effectively the same as the first approach. I just realized how simple Paolo's idea is. I think it can be a one line patch (without comments): [...] update_memslots(slots, new, kvm->memslots->generation); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; kvm_arch_memslots_updated(kvm); [...] -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Aug 19, 2014, at 2:47 AM, Paolo Bonzini wrote: > >> I think this patch is auditable, page-fault is always called by holding >> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited. >> Only these cases can happen: >> >> 1) page fault occurs before synchronize_srcu_expedited. >>In this case, vcpu will generate mmio-exit for the memslot being >> registered >>by the ioctl. That’s ok since the ioctl have not finished. >> >> 2) page fault occurs after synchronize_srcu_expedited and during >> increasing generation-number. >> In this case, userspace may get wrong mmio-exit (that happen if handing >> page-fault is slower that the ioctl), that’s ok too since userspace need do >> the check anyway like i said above. >> >> 3) page fault occurs after generation-number update >> that’s definitely correct. :) >> >>> Another alternative could be to use the low bit to mark an in-progress >>> change, and skip the caching if the low bit is set. Similar to a >>> seqcount (except if read_seqcount_retry fails, we just punt and not >>> retry anything), you could use it even though the memory barriers >>> provided by write_seqcount_begin/end are not too useful in this case. >> >> I do not know how the bit works, page fault will cache the memslot before >> the bit set and cache the generation-number after the bit set. >> >> Maybe i missed your idea, could you please detail it? > > Something like this: > > - update_memslots(slots, new, kvm->memslots->generation); > + /* ensure generation number is always increased. */ > + slots->generation = old_memslots->generation + 1; > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > Then case 1 and 2 will just have a cache miss. So, case 2 is what you concerned? :) I still think it is ok but i do not have strong opinion on that. How about simplify it like this: diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 9314678..9fabf6a 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -234,16 +234,22 @@ static unsigned int get_mmio_spte_generation(u64 spte) return gen; } -static unsigned int kvm_current_mmio_generation(struct kvm *kvm) +static unsigned int __kvm_current_mmio_generation(struct kvm_memslots *slots) { + /* * Init kvm generation close to MMIO_MAX_GEN to easily test the * code of handling generation number wrap-around. */ - return (kvm_memslots(kvm)->generation + + return (slots->generation + MMIO_MAX_GEN - 150) & MMIO_GEN_MASK; } +static unsigned int kvm_current_mmio_generation(struct kvm *kvm) +{ + return __kvm_current_mmio_generation(kvm_memslots(kvm)); +} + static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn, unsigned access) { @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, gfn_t gfn, static bool check_mmio_spte(struct kvm *kvm, u64 spte) { + struct kvm_memslots *slots = kvm_memslots(kvm); unsigned int kvm_gen, spte_gen; - kvm_gen = kvm_current_mmio_generation(kvm); + if (slots->updated) + return false; + + smp_rmb(); + + kvm_gen = __kvm_current_mmio_generation(slots); spte_gen = get_mmio_spte_generation(spte); trace_check_mmio_spte(spte, kvm_gen, spte_gen); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 4b6c01b..1d4e78f 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,7 +96,7 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, u64 last_generation); + struct kvm_memory_slot *new); static void kvm_release_pfn_dirty(pfn_t pfn); static void mark_page_dirty_in_slot(struct kvm *kvm, @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots) } static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, - u64 last_generation) + struct kvm_memory_slot *new) { if (new) { int id = new->id; @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots, if (new->npages != npages) sort_memslots(slots); } - - slots->generation = last_generation + 1; } static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm->memslots; - update_memslots(slots, new, kvm->memslots->generation); + /* ensure generation number is always increased. */ + slots->updated
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 18/08/2014 18:35, Xiao Guangrong ha scritto: > > Hi Paolo, > > Thank you to review the patch! > > On Aug 18, 2014, at 9:57 PM, Paolo Bonzini wrote: > >> Il 14/08/2014 09:01, Xiao Guangrong ha scritto: >>> - update_memslots(slots, new, kvm->memslots->generation); >>> + /* ensure generation number is always increased. */ >>> + slots->generation = old_memslots->generation; >>> + update_memslots(slots, new); >>> rcu_assign_pointer(kvm->memslots, slots); >>> synchronize_srcu_expedited(&kvm->srcu); >>> + slots->generation++; >> >> I don't trust my brain enough to review this patch. > > Sorry to make you confused. I should expain it more clearly. Don't worry, it's not your fault. :) >> kvm_current_mmio_generation seems like a very bad (race-prone) API. One >> patch I trust myself reviewing would change a bunch of functions in >> kvm_main.c to take a memslots struct. This would make it easy to >> respect the hard and fast rule of not dereferencing the same pointer >> twice. But it would be a tedious change. > > kvm_set_memory_region is the only place updating memslot and > kvm_current_mmio_generation accesses memslot by rcu-dereference, > i do not know why other places need to take into account. The race occurs because gfn_to_pfn_many_atomic or some other function has already used kvm_memslots(). Calling kvm_memslots() twice is the root cause the bug. > I think this patch is auditable, page-fault is always called by holding > srcu-lock so that a page fault can’t go across synchronize_srcu_expedited. > Only these cases can happen: > > 1) page fault occurs before synchronize_srcu_expedited. > In this case, vcpu will generate mmio-exit for the memslot being > registered > by the ioctl. That’s ok since the ioctl have not finished. > > 2) page fault occurs after synchronize_srcu_expedited and during >increasing generation-number. >In this case, userspace may get wrong mmio-exit (that happen if handing >page-fault is slower that the ioctl), that’s ok too since userspace need do > the check anyway like i said above. > > 3) page fault occurs after generation-number update >that’s definitely correct. :) > >> Another alternative could be to use the low bit to mark an in-progress >> change, and skip the caching if the low bit is set. Similar to a >> seqcount (except if read_seqcount_retry fails, we just punt and not >> retry anything), you could use it even though the memory barriers >> provided by write_seqcount_begin/end are not too useful in this case. > > I do not know how the bit works, page fault will cache the memslot before > the bit set and cache the generation-number after the bit set. > > Maybe i missed your idea, could you please detail it? Something like this: - update_memslots(slots, new, kvm->memslots->generation); + /* ensure generation number is always increased. */ + slots->generation = old_memslots->generation + 1; + update_memslots(slots, new); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; Then case 1 and 2 will just have a cache miss. The "low bit" is really just because each slot update does 2 generation increases. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 18, 2014 at 9:35 AM, Xiao Guangrong wrote: > > Hi Paolo, > > Thank you to review the patch! > > On Aug 18, 2014, at 9:57 PM, Paolo Bonzini wrote: > >> Il 14/08/2014 09:01, Xiao Guangrong ha scritto: >>> -update_memslots(slots, new, kvm->memslots->generation); >>> +/* ensure generation number is always increased. */ >>> +slots->generation = old_memslots->generation; >>> +update_memslots(slots, new); >>> rcu_assign_pointer(kvm->memslots, slots); >>> synchronize_srcu_expedited(&kvm->srcu); >>> +slots->generation++; >> >> I don't trust my brain enough to review this patch. Xiao, I thought about your approach a lot and I can't think of a bad race that isn't already possible due to the fact that kvm allows memslot mutation to race with vm exits. That being said, it's hard to reason about all the other "clients" of memslots and what weirdness (or badness) will be caused by updating generation after srcu_synch. I like Paolo's two approaches because they fix the bug without any side-effects. > Sorry to make you confused. I should expain it more clearly. > > What this patch tried to fix is: kvm will generate wrong mmio-exit forever > if no luck event cleans mmio spte. (eg. if no memory pressure or no > context-sync on that spte.) > > Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region > and mmio-exit - that means userspace is able to get mmio-exit even if > kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies > a mmio access before issuing the ioctl and injects mmio-exit to userspace > after > ioctl return. So checking if mmio-exit is a real mmio access in userspace is > needed anyway. > >> kvm_current_mmio_generation seems like a very bad (race-prone) API. One >> patch I trust myself reviewing would change a bunch of functions in >> kvm_main.c to take a memslots struct. This would make it easy to >> respect the hard and fast rule of not dereferencing the same pointer >> twice. But it would be a tedious change. > > kvm_set_memory_region is the only place updating memslot and > kvm_current_mmio_generation accesses memslot by rcu-dereference, > i do not know why other places need to take into account. If you rcu_dereference() more than once, you can't trust previous decisions based on rcu_dereference()'s. If the mmio cache code only did one rcu_dereference() per vm exit, this bug would be gone. > I think this patch is auditable, page-fault is always called by holding > srcu-lock so that a page fault can’t go across synchronize_srcu_expedited. > Only these cases can happen: > > 1) page fault occurs before synchronize_srcu_expedited. > In this case, vcpu will generate mmio-exit for the memslot being > registered > by the ioctl. That’s ok since the ioctl have not finished. > > 2) page fault occurs after synchronize_srcu_expedited and during >increasing generation-number. >In this case, userspace may get wrong mmio-exit (that happen if handing >page-fault is slower that the ioctl), that’s ok too since userspace need do > the check anyway like i said above. > > 3) page fault occurs after generation-number update >that’s definitely correct. :) > >> Another alternative could be to use the low bit to mark an in-progress >> change, and skip the caching if the low bit is set. Similar to a >> seqcount (except if read_seqcount_retry fails, we just punt and not >> retry anything), you could use it even though the memory barriers >> provided by write_seqcount_begin/end are not too useful in this case. I like this approach best. It would have the least code changes and provide the same guarantees. > I do not know how the bit works, page fault will cache the memslot before > the bit set and cache the generation-number after the bit set. > > Maybe i missed your idea, could you please detail it? In vcpu_cache_mmio_info() if generation is odd, just don't do the caching because memslots were changed while we were running and we just assume the worst case. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Hi Paolo, Thank you to review the patch! On Aug 18, 2014, at 9:57 PM, Paolo Bonzini wrote: > Il 14/08/2014 09:01, Xiao Guangrong ha scritto: >> -update_memslots(slots, new, kvm->memslots->generation); >> +/* ensure generation number is always increased. */ >> +slots->generation = old_memslots->generation; >> +update_memslots(slots, new); >> rcu_assign_pointer(kvm->memslots, slots); >> synchronize_srcu_expedited(&kvm->srcu); >> +slots->generation++; > > I don't trust my brain enough to review this patch. Sorry to make you confused. I should expain it more clearly. What this patch tried to fix is: kvm will generate wrong mmio-exit forever if no luck event cleans mmio spte. (eg. if no memory pressure or no context-sync on that spte.) Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region and mmio-exit - that means userspace is able to get mmio-exit even if kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies a mmio access before issuing the ioctl and injects mmio-exit to userspace after ioctl return. So checking if mmio-exit is a real mmio access in userspace is needed anyway. > kvm_current_mmio_generation seems like a very bad (race-prone) API. One > patch I trust myself reviewing would change a bunch of functions in > kvm_main.c to take a memslots struct. This would make it easy to > respect the hard and fast rule of not dereferencing the same pointer > twice. But it would be a tedious change. kvm_set_memory_region is the only place updating memslot and kvm_current_mmio_generation accesses memslot by rcu-dereference, i do not know why other places need to take into account. I think this patch is auditable, page-fault is always called by holding srcu-lock so that a page fault can’t go across synchronize_srcu_expedited. Only these cases can happen: 1) page fault occurs before synchronize_srcu_expedited. In this case, vcpu will generate mmio-exit for the memslot being registered by the ioctl. That’s ok since the ioctl have not finished. 2) page fault occurs after synchronize_srcu_expedited and during increasing generation-number. In this case, userspace may get wrong mmio-exit (that happen if handing page-fault is slower that the ioctl), that’s ok too since userspace need do the check anyway like i said above. 3) page fault occurs after generation-number update that’s definitely correct. :) > Another alternative could be to use the low bit to mark an in-progress > change, and skip the caching if the low bit is set. Similar to a > seqcount (except if read_seqcount_retry fails, we just punt and not > retry anything), you could use it even though the memory barriers > provided by write_seqcount_begin/end are not too useful in this case. I do not know how the bit works, page fault will cache the memslot before the bit set and cache the generation-number after the bit set. Maybe i missed your idea, could you please detail it? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Il 14/08/2014 09:01, Xiao Guangrong ha scritto: > - update_memslots(slots, new, kvm->memslots->generation); > + /* ensure generation number is always increased. */ > + slots->generation = old_memslots->generation; > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; I don't trust my brain enough to review this patch. kvm_current_mmio_generation seems like a very bad (race-prone) API. One patch I trust myself reviewing would change a bunch of functions in kvm_main.c to take a memslots struct. This would make it easy to respect the hard and fast rule of not dereferencing the same pointer twice. But it would be a tedious change. Another alternative could be to use the low bit to mark an in-progress change, and skip the caching if the low bit is set. Similar to a seqcount (except if read_seqcount_retry fails, we just punt and not retry anything), you could use it even though the memory barriers provided by write_seqcount_begin/end are not too useful in this case. Paolo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
Sorry, the title is not clear enough. This is the v2 which fixes the issue pointed out by David: " the generation number actually decreases." Please review. On 08/14/2014 03:01 PM, Xiao Guangrong wrote: > We may cache the current mmio generation number and stale memslot info > into spte, like this scenario: > >CPU 0 CPU 1 > page fault:add a new memslot > read memslot and detecting its a mmio access >update memslots >update generation number > read generation number > cache the gpa and current gen number into spte > > So, if guest accesses the gpa later, it will generate a incorrect > mmio exit > > This patch fixes it by updating the generation number after > synchronize_srcu_expedited() that makes sure the generation > number updated only if memslots update is finished > > Cc: sta...@vger.kernel.org > Cc: David Matlack > Signed-off-by: Xiao Guangrong > --- > virt/kvm/kvm_main.c | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c > index 33712fb..bb40df3 100644 > --- a/virt/kvm/kvm_main.c > +++ b/virt/kvm/kvm_main.c > @@ -96,7 +96,7 @@ static void hardware_disable_all(void); > > static void kvm_io_bus_destroy(struct kvm_io_bus *bus); > static void update_memslots(struct kvm_memslots *slots, > - struct kvm_memory_slot *new, u64 last_generation); > + struct kvm_memory_slot *new); > > static void kvm_release_pfn_dirty(pfn_t pfn); > static void mark_page_dirty_in_slot(struct kvm *kvm, > @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots) > } > > static void update_memslots(struct kvm_memslots *slots, > - struct kvm_memory_slot *new, > - u64 last_generation) > + struct kvm_memory_slot *new) > { > if (new) { > int id = new->id; > @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots, > if (new->npages != npages) > sort_memslots(slots); > } > - > - slots->generation = last_generation + 1; > } > > static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) > @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct > kvm *kvm, > { > struct kvm_memslots *old_memslots = kvm->memslots; > > - update_memslots(slots, new, kvm->memslots->generation); > + /* ensure generation number is always increased. */ > + slots->generation = old_memslots->generation; > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > kvm_arch_memslots_updated(kvm); > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
We may cache the current mmio generation number and stale memslot info into spte, like this scenario: CPU 0 CPU 1 page fault:add a new memslot read memslot and detecting its a mmio access update memslots update generation number read generation number cache the gpa and current gen number into spte So, if guest accesses the gpa later, it will generate a incorrect mmio exit This patch fixes it by updating the generation number after synchronize_srcu_expedited() that makes sure the generation number updated only if memslots update is finished Cc: sta...@vger.kernel.org Cc: David Matlack Signed-off-by: Xiao Guangrong --- virt/kvm/kvm_main.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..bb40df3 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,7 +96,7 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, u64 last_generation); + struct kvm_memory_slot *new); static void kvm_release_pfn_dirty(pfn_t pfn); static void mark_page_dirty_in_slot(struct kvm *kvm, @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots) } static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, - u64 last_generation) + struct kvm_memory_slot *new) { if (new) { int id = new->id; @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots, if (new->npages != npages) sort_memslots(slots); } - - slots->generation = last_generation + 1; } static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm->memslots; - update_memslots(slots, new, kvm->memslots->generation); + /* ensure generation number is always increased. */ + slots->generation = old_memslots->generation; + update_memslots(slots, new); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; kvm_arch_memslots_updated(kvm); -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On 08/13/2014 05:18 AM, David Matlack wrote: > On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong > wrote: >> @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct >> kvm *kvm, >> { >> struct kvm_memslots *old_memslots = kvm->memslots; >> > > I think you want > > slots->generation = old_memslots->generation; > > here. > > On the KVM_MR_DELETE path, install_new_memslots is called twice so this > patch introduces a short window of time where the generation number > actually decreases. Yes, indeed. Thank you for pointing it out, will update. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong wrote: > @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct > kvm *kvm, > { > struct kvm_memslots *old_memslots = kvm->memslots; > I think you want slots->generation = old_memslots->generation; here. On the KVM_MR_DELETE path, install_new_memslots is called twice so this patch introduces a short window of time where the generation number actually decreases. > - update_memslots(slots, new, kvm->memslots->generation); > + update_memslots(slots, new); > rcu_assign_pointer(kvm->memslots, slots); > synchronize_srcu_expedited(&kvm->srcu); > + slots->generation++; > > kvm_arch_memslots_updated(kvm); > > -- > 1.8.3.1 > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number
We may cache the current mmio generation number and stale memslot info into spte, like this scenario: CPU 0 CPU 1 page fault:add a new memslot read memslot and detecting its a mmio access update memslots update generation number read generation number cache the gpa and current gen number into spte So, if guest accesses the gpa later, it will generate a incorrect mmio exit This patch fixes it by updating the generation number after synchronize_srcu_expedited() that makes sure the generation number updated only if memslots update is finished Cc: sta...@vger.kernel.org Cc: David Matlack Signed-off-by: Xiao Guangrong --- virt/kvm/kvm_main.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 33712fb..ca3cdac 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -96,7 +96,7 @@ static void hardware_disable_all(void); static void kvm_io_bus_destroy(struct kvm_io_bus *bus); static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, u64 last_generation); + struct kvm_memory_slot *new); static void kvm_release_pfn_dirty(pfn_t pfn); static void mark_page_dirty_in_slot(struct kvm *kvm, @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots) } static void update_memslots(struct kvm_memslots *slots, - struct kvm_memory_slot *new, - u64 last_generation) + struct kvm_memory_slot *new) { if (new) { int id = new->id; @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots, if (new->npages != npages) sort_memslots(slots); } - - slots->generation = last_generation + 1; } static int check_memory_region_flags(struct kvm_userspace_memory_region *mem) @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct kvm *kvm, { struct kvm_memslots *old_memslots = kvm->memslots; - update_memslots(slots, new, kvm->memslots->generation); + update_memslots(slots, new); rcu_assign_pointer(kvm->memslots, slots); synchronize_srcu_expedited(&kvm->srcu); + slots->generation++; kvm_arch_memslots_updated(kvm); -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html