Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-20 Thread Paolo Bonzini
Il 20/08/2014 03:03, David Matlack ha scritto:
> On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
>  wrote:
>> On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
>>> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
 Okay, what confused me it that it seems that the single line patch
 is ok to you. :)
>>>
>>> No, it was late and I was confused. :)
>>>
 Now, do we really need to care the case 2? like David said:
 "Sorry I didn't explain myself very well: Since we can get a single wrong
 mmio exit no matter what, it has to be handled in userspace. So my point
 was, it doesn't really help to fix that one very specific way that it can
 happen, because it can just happen in other ways. (E.g. update memslots
 occurs after is_noslot_pfn() and before mmio exit)."

 What's your idea?

> I think if you always treat the low bit as zero in mmio sptes, you can
> do that without losing a bit of the generation.

 What's you did is avoiding cache a invalid generation number into spte, but
 actually if we can figure it out when we check mmio access, it's ok. Like 
 the
 updated patch i posted should fix it, that way avoids doubly increase the 
 number.
>>>
>>> Yes.
>>>
 Okay, if you're interested increasing the number doubly, there is the 
 simpler
 one:
>>>
>>> This wastes a bit in the mmio spte though.  My idea is to increase the
>>> memslots generation twice, but drop the low bit in the mmio spte.
>>
>> Yeah, really smart idea. :)
>>
>> Paolo/David, would you mind making a patch for this (+ the comments in 
>> David's
>> patch)?
> 
> Paolo, since it was your idea would you like to write it? I don't mind either
> way.

Sure, I'll post the patch for review.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread David Matlack
On Tue, Aug 19, 2014 at 5:29 PM, Xiao Guangrong
 wrote:
> On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
>> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
>>> Okay, what confused me it that it seems that the single line patch
>>> is ok to you. :)
>>
>> No, it was late and I was confused. :)
>>
>>> Now, do we really need to care the case 2? like David said:
>>> "Sorry I didn't explain myself very well: Since we can get a single wrong
>>> mmio exit no matter what, it has to be handled in userspace. So my point
>>> was, it doesn't really help to fix that one very specific way that it can
>>> happen, because it can just happen in other ways. (E.g. update memslots
>>> occurs after is_noslot_pfn() and before mmio exit)."
>>>
>>> What's your idea?
>>>
 I think if you always treat the low bit as zero in mmio sptes, you can
 do that without losing a bit of the generation.
>>>
>>> What's you did is avoiding cache a invalid generation number into spte, but
>>> actually if we can figure it out when we check mmio access, it's ok. Like 
>>> the
>>> updated patch i posted should fix it, that way avoids doubly increase the 
>>> number.
>>
>> Yes.
>>
>>> Okay, if you're interested increasing the number doubly, there is the 
>>> simpler
>>> one:
>>
>> This wastes a bit in the mmio spte though.  My idea is to increase the
>> memslots generation twice, but drop the low bit in the mmio spte.
>
> Yeah, really smart idea. :)
>
> Paolo/David, would you mind making a patch for this (+ the comments in David's
> patch)?

Paolo, since it was your idea would you like to write it? I don't mind either
way.

>
> Please feel free to add my:
> Reviewed-by: Xiao Guangrong 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Xiao Guangrong
On 08/19/2014 05:03 PM, Paolo Bonzini wrote:
> Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
>> Okay, what confused me it that it seems that the single line patch
>> is ok to you. :)
> 
> No, it was late and I was confused. :)
> 
>> Now, do we really need to care the case 2? like David said:
>> "Sorry I didn't explain myself very well: Since we can get a single wrong
>> mmio exit no matter what, it has to be handled in userspace. So my point
>> was, it doesn't really help to fix that one very specific way that it can
>> happen, because it can just happen in other ways. (E.g. update memslots
>> occurs after is_noslot_pfn() and before mmio exit)."
>>
>> What's your idea?
>>
>>> I think if you always treat the low bit as zero in mmio sptes, you can 
>>> do that without losing a bit of the generation.
>>
>> What's you did is avoiding cache a invalid generation number into spte, but
>> actually if we can figure it out when we check mmio access, it's ok. Like the
>> updated patch i posted should fix it, that way avoids doubly increase the 
>> number.
> 
> Yes.
> 
>> Okay, if you're interested increasing the number doubly, there is the simpler
>> one:
> 
> This wastes a bit in the mmio spte though.  My idea is to increase the
> memslots generation twice, but drop the low bit in the mmio spte.

Yeah, really smart idea. :)

Paolo/David, would you mind making a patch for this (+ the comments in David's
patch)?

Please feel free to add my:
Reviewed-by: Xiao Guangrong 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Paolo Bonzini
Il 19/08/2014 10:50, Xiao Guangrong ha scritto:
> Okay, what confused me it that it seems that the single line patch
> is ok to you. :)

No, it was late and I was confused. :)

> Now, do we really need to care the case 2? like David said:
> "Sorry I didn't explain myself very well: Since we can get a single wrong
> mmio exit no matter what, it has to be handled in userspace. So my point
> was, it doesn't really help to fix that one very specific way that it can
> happen, because it can just happen in other ways. (E.g. update memslots
> occurs after is_noslot_pfn() and before mmio exit)."
> 
> What's your idea?
> 
> > I think if you always treat the low bit as zero in mmio sptes, you can 
> > do that without losing a bit of the generation.
> 
> What's you did is avoiding cache a invalid generation number into spte, but
> actually if we can figure it out when we check mmio access, it's ok. Like the
> updated patch i posted should fix it, that way avoids doubly increase the 
> number.

Yes.

> Okay, if you're interested increasing the number doubly, there is the simpler
> one:

This wastes a bit in the mmio spte though.  My idea is to increase the
memslots generation twice, but drop the low bit in the mmio spte.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Xiao Guangrong
On 08/19/2014 04:28 PM, Paolo Bonzini wrote:
> Il 19/08/2014 05:50, Xiao Guangrong ha scritto:
>>
>> Note in the step *, my approach detects the invalid generation-number which
>> will invalidate the mmio spte properly .
> 
> You are right, in fact my mail included another part: "Another 
> alternative could be to use the low bit to mark an in-progress change, 
> *and skip the caching if the low bit is set*."

Okay, what confused me it that it seems that the single line patch
is ok to you. :)

Now, do we really need to care the case 2? like David said:
"Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit)."

What's your idea?

> 
> I think if you always treat the low bit as zero in mmio sptes, you can 
> do that without losing a bit of the generation.

What's you did is avoiding cache a invalid generation number into spte, but
actually if we can figure it out when we check mmio access, it's ok. Like the
updated patch i posted should fix it, that way avoids doubly increase the 
number.

Okay, if you're interested increasing the number doubly, there is the simpler
one:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..bf49170 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -236,6 +236,9 @@ static unsigned int get_mmio_spte_generation(u64 spte)

 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
+   /* The initialized generation number should be even. */
+   BUILD_BUG_ON((MMIO_MAX_GEN - 150) & 0x1);
+
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
 * code of handling generation number wrap-around.
@@ -292,6 +295,14 @@ static bool check_mmio_spte(struct kvm *kvm, u64 spte)
kvm_gen = kvm_current_mmio_generation(kvm);
spte_gen = get_mmio_spte_generation(spte);

+   /*
+* Aha, we cached a being-updated generation number or
+* generation number is currnetly being updated, let do the
+* real check for mmio access.
+*/
+   if ((kvm_gen | spte_gen) & 0x1)
+   return false;
+
trace_check_mmio_spte(spte, kvm_gen, spte_gen);
return likely(kvm_gen == spte_gen);
 }
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..5c3f7b7 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -725,7 +725,7 @@ static struct kvm_memslots *install_new_memslots(struct kvm 
*kvm,
update_memslots(slots, new, kvm->memslots->generation);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
-
+   kvm->memslots->generation++;
kvm_arch_memslots_updated(kvm);

return old_memslots;

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-19 Thread Paolo Bonzini
Il 19/08/2014 05:50, Xiao Guangrong ha scritto:
> 
> Note in the step *, my approach detects the invalid generation-number which
> will invalidate the mmio spte properly .

You are right, in fact my mail included another part: "Another 
alternative could be to use the low bit to mark an in-progress change, 
*and skip the caching if the low bit is set*."

I think if you always treat the low bit as zero in mmio sptes, you can 
do that without losing a bit of the generation.

Something like this (untested/uncompiled):

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 931467881da7..3a56d377c6d7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -199,16 +199,20 @@ void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask)
 EXPORT_SYMBOL_GPL(kvm_mmu_set_mmio_spte_mask);
 
 /*
- * spte bits of bit 3 ~ bit 11 are used as low 9 bits of generation number,
- * the bits of bits 52 ~ bit 61 are used as high 10 bits of generation
- * number.
+ * the low bit of the generation number is always presumed to be zero.
+ * This disables mmio caching during memslot updates.  The concept is
+ * similar to a seqcount but instead of retrying the access we just punt
+ * and ignore the cache.
+ *
+ * spte bits 3-11 are used as bits 1-9 of the generation number,
+ * the bits 52-61 are used as bits 10-19 of the generation number.
  */
-#define MMIO_SPTE_GEN_LOW_SHIFT3
+#define MMIO_SPTE_GEN_LOW_SHIFT2
 #define MMIO_SPTE_GEN_HIGH_SHIFT   52
 
-#define MMIO_GEN_SHIFT 19
-#define MMIO_GEN_LOW_SHIFT 9
-#define MMIO_GEN_LOW_MASK  ((1 << MMIO_GEN_LOW_SHIFT) - 1)
+#define MMIO_GEN_SHIFT 20
+#define MMIO_GEN_LOW_SHIFT 10
+#define MMIO_GEN_LOW_MASK  ((1 << MMIO_GEN_LOW_SHIFT) - 2)
 #define MMIO_GEN_MASK  ((1 << MMIO_GEN_SHIFT) - 1)
 #define MMIO_MAX_GEN   ((1 << MMIO_GEN_SHIFT) - 1)
 
@@ -236,12 +240,7 @@ static unsigned int get_mmio_spte_generation(u64 spte)
 
 static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
 {
-   /*
-* Init kvm generation close to MMIO_MAX_GEN to easily test the
-* code of handling generation number wrap-around.
-*/
-   return (kvm_memslots(kvm)->generation +
- MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
+   return kvm_memslots(kvm)->generation & MMIO_GEN_MASK;
 }
 
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a69a623938b8..c7e2800313b8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -474,6 +476,13 @@ static struct kvm *kvm_create_vm(unsigned long type)
kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm->memslots)
goto out_err_no_srcu;
+
+   /*
+* Init kvm generation close to MMIO_MAX_GEN to easily test the
+* code of handling generation number wrap-around.
+*/
+   kvm->memslots->generation = -150;
+
kvm_init_memslots_id(kvm);
if (init_srcu_struct(&kvm->srcu))
goto out_err_no_srcu;
@@ -725,6 +732,8 @@ static struct kvm_memslots *install_new_memslots(struct kvm 
*kvm,
synchronize_srcu_expedited(&kvm->srcu);
 
kvm_arch_memslots_updated(kvm);
+   slots->generation++;
+   WARN_ON(slots->generation & 1);
 
return old_memslots;
 }

(modulo the changes to always set the generation in install_new_memslots, which
I'm eliding for clarity).

Moving the initialization to kvm_create_vm ensures that the low bit is untouched
between install_new_memslots and kvm_current_mmio_generation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 01:40 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
>  wrote:
>> On 08/19/2014 01:00 PM, David Matlack wrote:
>>> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
>>>  wrote:
 On 08/19/2014 12:31 PM, David Matlack wrote:
> The single line patch I suggested was only intended to fix the "forever
> incorrectly exit mmio".

 My patch also fixes this case and that does not doubly increase the
 number. I think this is the better one.
>>>
>>> I prefer doubly increasing the generation for this reason: the updated 
>>> boolean
>>> requires extra code on the "client-side" to check if there's an update in
>>> progress. And that makes it easy to get wrong. In fact, your patch
>>> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
>>> generation requires no "client-side" code to work.
>>
>> No, the updated patch is used to fix case 2 which i draw the scenario in
>> the last mail. I mean the original patch in this patchset which just
>> increase the number after srcu-sync.
>>
>> Then could you tell me that your approach can do but my original patch can 
>> not?
> 
> It avoids publishing new memslots with an old generation number attached to
> them (even if it only lasts for a short period of time). 

I can not see the problem if that happen, could you please draw the scenario?

> Do you have a reason
> why you don't want to doubly increase the generation?

That more easily causes the number wrap-around.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 10:19 PM, Xiao Guangrong
 wrote:
> On 08/19/2014 01:00 PM, David Matlack wrote:
>> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
>>  wrote:
>>> On 08/19/2014 12:31 PM, David Matlack wrote:
 The single line patch I suggested was only intended to fix the "forever
 incorrectly exit mmio".
>>>
>>> My patch also fixes this case and that does not doubly increase the
>>> number. I think this is the better one.
>>
>> I prefer doubly increasing the generation for this reason: the updated 
>> boolean
>> requires extra code on the "client-side" to check if there's an update in
>> progress. And that makes it easy to get wrong. In fact, your patch
>> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
>> generation requires no "client-side" code to work.
>
> No, the updated patch is used to fix case 2 which i draw the scenario in
> the last mail. I mean the original patch in this patchset which just
> increase the number after srcu-sync.
>
> Then could you tell me that your approach can do but my original patch can 
> not?

It avoids publishing new memslots with an old generation number attached to
them (even if it only lasts for a short period of time). Do you have a reason
why you don't want to doubly increase the generation?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 01:00 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
>  wrote:
>> On 08/19/2014 12:31 PM, David Matlack wrote:
>>> But it looks like you basically said the same thing earlier, so I think
>>> we're on the same page.
>>>
>>
>> Yes, that is what i try to explain in previous mails. :(
> 
> I'm glad we understand each other now! Sorry again for my confusion.

Yup, me too. :)

> 
>>> The single line patch I suggested was only intended to fix the "forever
>>> incorrectly exit mmio".
>>
>> My patch also fixes this case and that does not doubly increase the
>> number. I think this is the better one.
> 
> I prefer doubly increasing the generation for this reason: the updated boolean
> requires extra code on the "client-side" to check if there's an update in
> progress. And that makes it easy to get wrong. In fact, your patch
> forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
> generation requires no "client-side" code to work.

No, the updated patch is used to fix case 2 which i draw the scenario in
the last mail. I mean the original patch in this patchset which just
increase the number after srcu-sync.

Then could you tell me that your approach can do but my original patch can not?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 9:41 PM, Xiao Guangrong
 wrote:
> On 08/19/2014 12:31 PM, David Matlack wrote:
>> But it looks like you basically said the same thing earlier, so I think
>> we're on the same page.
>>
>
> Yes, that is what i try to explain in previous mails. :(

I'm glad we understand each other now! Sorry again for my confusion.

>> The single line patch I suggested was only intended to fix the "forever
>> incorrectly exit mmio".
>
> My patch also fixes this case and that does not doubly increase the
> number. I think this is the better one.

I prefer doubly increasing the generation for this reason: the updated boolean
requires extra code on the "client-side" to check if there's an update in
progress. And that makes it easy to get wrong. In fact, your patch
forgot to check the updated bit in mark_mmio_spte(). Doubly increasing the
generation requires no "client-side" code to work.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 12:31 PM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
>  wrote:
>> On 08/19/2014 05:15 AM, David Matlack wrote:
>>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>>>  wrote:
 @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 
 *sptep, gfn_t gfn,

  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
  {
 +   struct kvm_memslots *slots = kvm_memslots(kvm);
 unsigned int kvm_gen, spte_gen;

 -   kvm_gen = kvm_current_mmio_generation(kvm);
 +   if (slots->updated)
 +   return false;
 +
 +   smp_rmb();
 +
 +   kvm_gen = __kvm_current_mmio_generation(slots);
 spte_gen = get_mmio_spte_generation(spte);

>>>
>>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless 
>>> we
>>> block during memslot updates, which I don't think we should :).
>>
>> This exactly fixes case 2, slots->updated just acts as the "low bit"
>> but avoid generation number wrap-around and trick handling of the number.
>> More details please see below.
>>
>>>
 trace_check_mmio_spte(spte, kvm_gen, spte_gen);
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 4b6c01b..1d4e78f 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -96,7 +96,7 @@ static void hardware_disable_all(void);

  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new, u64 
 last_generation);
 +   struct kvm_memory_slot *new);

  static void kvm_release_pfn_dirty(pfn_t pfn);
  static void mark_page_dirty_in_slot(struct kvm *kvm,
 @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
  }

  static void update_memslots(struct kvm_memslots *slots,
 -   struct kvm_memory_slot *new,
 -   u64 last_generation)
 +   struct kvm_memory_slot *new)
  {
 if (new) {
 int id = new->id;
 @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
 if (new->npages != npages)
 sort_memslots(slots);
 }
 -
 -   slots->generation = last_generation + 1;
  }

  static int check_memory_region_flags(struct kvm_userspace_memory_region 
 *mem)
 @@ -720,10 +717,17 @@ static struct kvm_memslots 
 *install_new_memslots(struct kvm *kvm,
  {
 struct kvm_memslots *old_memslots = kvm->memslots;

 -   update_memslots(slots, new, kvm->memslots->generation);
 +   /* ensure generation number is always increased. */
 +   slots->updated = true;
 +   slots->generation = old_memslots->generation;
 +   update_memslots(slots, new);
 rcu_assign_pointer(kvm->memslots, slots);
 synchronize_srcu_expedited(&kvm->srcu);

 +   slots->generation++;
 +   smp_wmb();
 +   slots->updated = false;
 +
 kvm_arch_memslots_updated(kvm);

 return old_memslots;

>>>
>>> This is effectively the same as the first approach.
>>>
>>> I just realized how simple Paolo's idea is. I think it can be a one line
>>> patch (without comments):
>>>
>>> [...]
>>> update_memslots(slots, new, kvm->memslots->generation);
>>> rcu_assign_pointer(kvm->memslots, slots);
>>> synchronize_srcu_expedited(&kvm->srcu);
>>> +   slots->generation++;
>>>
>>> kvm_arch_memslots_updated(kvm);
>>> [...]
>>
>> Really? Unfortunately no. :)
>>
>> See this scenario:
>>
>> CPU 0  CPU 1
>> ioctl registering a new memslot which
>> contains GPA:
>>page-fault handler:
>>  see it'a mmio access on GPA;
>>
>>  assign the new memslots with generation number increased
>>  cache the generation-number into spte;
>>  fix the access and comeback to guest;
>> SRCU-sync
>>  page-fault again and check the spte is a valid 
>> mmio-spte(*)
>> generation-number++;
>> return to userspace;
>>  do mmio-emulation and inject mmio-exit;
>>
>> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
>> said in the last mail.
>>
>>
>> Note in the step *, my approach detects the invalid generation-number which
>> will invalidate the mmio spte properly .
> 
> Sorry I didn't explain myself very well: Since we can get a single wrong
> mmio exit no matter what, it has to be handled in userspace. So my point
> was, it doesn't really help to fix that one very specific way that it can
> happen, because it can just happen in oth

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 8:50 PM, Xiao Guangrong
 wrote:
> On 08/19/2014 05:15 AM, David Matlack wrote:
>> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>>  wrote:
>>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
>>> gfn_t gfn,
>>>
>>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>>  {
>>> +   struct kvm_memslots *slots = kvm_memslots(kvm);
>>> unsigned int kvm_gen, spte_gen;
>>>
>>> -   kvm_gen = kvm_current_mmio_generation(kvm);
>>> +   if (slots->updated)
>>> +   return false;
>>> +
>>> +   smp_rmb();
>>> +
>>> +   kvm_gen = __kvm_current_mmio_generation(slots);
>>> spte_gen = get_mmio_spte_generation(spte);
>>>
>>
>> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
>> block during memslot updates, which I don't think we should :).
>
> This exactly fixes case 2, slots->updated just acts as the "low bit"
> but avoid generation number wrap-around and trick handling of the number.
> More details please see below.
>
>>
>>> trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>>> index 4b6c01b..1d4e78f 100644
>>> --- a/virt/kvm/kvm_main.c
>>> +++ b/virt/kvm/kvm_main.c
>>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>>
>>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -   struct kvm_memory_slot *new, u64 
>>> last_generation);
>>> +   struct kvm_memory_slot *new);
>>>
>>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>>  }
>>>
>>>  static void update_memslots(struct kvm_memslots *slots,
>>> -   struct kvm_memory_slot *new,
>>> -   u64 last_generation)
>>> +   struct kvm_memory_slot *new)
>>>  {
>>> if (new) {
>>> int id = new->id;
>>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>>> if (new->npages != npages)
>>> sort_memslots(slots);
>>> }
>>> -
>>> -   slots->generation = last_generation + 1;
>>>  }
>>>
>>>  static int check_memory_region_flags(struct kvm_userspace_memory_region 
>>> *mem)
>>> @@ -720,10 +717,17 @@ static struct kvm_memslots 
>>> *install_new_memslots(struct kvm *kvm,
>>>  {
>>> struct kvm_memslots *old_memslots = kvm->memslots;
>>>
>>> -   update_memslots(slots, new, kvm->memslots->generation);
>>> +   /* ensure generation number is always increased. */
>>> +   slots->updated = true;
>>> +   slots->generation = old_memslots->generation;
>>> +   update_memslots(slots, new);
>>> rcu_assign_pointer(kvm->memslots, slots);
>>> synchronize_srcu_expedited(&kvm->srcu);
>>>
>>> +   slots->generation++;
>>> +   smp_wmb();
>>> +   slots->updated = false;
>>> +
>>> kvm_arch_memslots_updated(kvm);
>>>
>>> return old_memslots;
>>>
>>
>> This is effectively the same as the first approach.
>>
>> I just realized how simple Paolo's idea is. I think it can be a one line
>> patch (without comments):
>>
>> [...]
>> update_memslots(slots, new, kvm->memslots->generation);
>> rcu_assign_pointer(kvm->memslots, slots);
>> synchronize_srcu_expedited(&kvm->srcu);
>> +   slots->generation++;
>>
>> kvm_arch_memslots_updated(kvm);
>> [...]
>
> Really? Unfortunately no. :)
>
> See this scenario:
>
> CPU 0  CPU 1
> ioctl registering a new memslot which
> contains GPA:
>page-fault handler:
>  see it'a mmio access on GPA;
>
>  assign the new memslots with generation number increased
>  cache the generation-number into spte;
>  fix the access and comeback to guest;
> SRCU-sync
>  page-fault again and check the spte is a valid 
> mmio-spte(*)
> generation-number++;
> return to userspace;
>  do mmio-emulation and inject mmio-exit;
>
> !!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
> said in the last mail.
>
>
> Note in the step *, my approach detects the invalid generation-number which
> will invalidate the mmio spte properly .

Sorry I didn't explain myself very well: Since we can get a single wrong
mmio exit no matter what, it has to be handled in userspace. So my point
was, it doesn't really help to fix that one very specific way that it can
happen, because it can just happen in other ways. (E.g. update memslots
occurs after is_noslot_pfn() and before mmio exit).

But it looks like you basically said the same thing earlier, so I think
we're on the same page.

The singl

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong
On 08/19/2014 05:15 AM, David Matlack wrote:
> On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
>  wrote:
>> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
>> gfn_t gfn,
>>
>>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>>  {
>> +   struct kvm_memslots *slots = kvm_memslots(kvm);
>> unsigned int kvm_gen, spte_gen;
>>
>> -   kvm_gen = kvm_current_mmio_generation(kvm);
>> +   if (slots->updated)
>> +   return false;
>> +
>> +   smp_rmb();
>> +
>> +   kvm_gen = __kvm_current_mmio_generation(slots);
>> spte_gen = get_mmio_spte_generation(spte);
>>
> 
> What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
> block during memslot updates, which I don't think we should :).

This exactly fixes case 2, slots->updated just acts as the "low bit"
but avoid generation number wrap-around and trick handling of the number.
More details please see below.

> 
>> trace_check_mmio_spte(spte, kvm_gen, spte_gen);
>> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
>> index 4b6c01b..1d4e78f 100644
>> --- a/virt/kvm/kvm_main.c
>> +++ b/virt/kvm/kvm_main.c
>> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>>
>>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>>  static void update_memslots(struct kvm_memslots *slots,
>> -   struct kvm_memory_slot *new, u64 
>> last_generation);
>> +   struct kvm_memory_slot *new);
>>
>>  static void kvm_release_pfn_dirty(pfn_t pfn);
>>  static void mark_page_dirty_in_slot(struct kvm *kvm,
>> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>>  }
>>
>>  static void update_memslots(struct kvm_memslots *slots,
>> -   struct kvm_memory_slot *new,
>> -   u64 last_generation)
>> +   struct kvm_memory_slot *new)
>>  {
>> if (new) {
>> int id = new->id;
>> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
>> if (new->npages != npages)
>> sort_memslots(slots);
>> }
>> -
>> -   slots->generation = last_generation + 1;
>>  }
>>
>>  static int check_memory_region_flags(struct kvm_userspace_memory_region 
>> *mem)
>> @@ -720,10 +717,17 @@ static struct kvm_memslots 
>> *install_new_memslots(struct kvm *kvm,
>>  {
>> struct kvm_memslots *old_memslots = kvm->memslots;
>>
>> -   update_memslots(slots, new, kvm->memslots->generation);
>> +   /* ensure generation number is always increased. */
>> +   slots->updated = true;
>> +   slots->generation = old_memslots->generation;
>> +   update_memslots(slots, new);
>> rcu_assign_pointer(kvm->memslots, slots);
>> synchronize_srcu_expedited(&kvm->srcu);
>>
>> +   slots->generation++;
>> +   smp_wmb();
>> +   slots->updated = false;
>> +
>> kvm_arch_memslots_updated(kvm);
>>
>> return old_memslots;
>>
> 
> This is effectively the same as the first approach.
> 
> I just realized how simple Paolo's idea is. I think it can be a one line
> patch (without comments):
> 
> [...]
> update_memslots(slots, new, kvm->memslots->generation);
> rcu_assign_pointer(kvm->memslots, slots);
> synchronize_srcu_expedited(&kvm->srcu);
> +   slots->generation++;
> 
> kvm_arch_memslots_updated(kvm);
> [...]

Really? Unfortunately no. :)

See this scenario:

CPU 0  CPU 1
ioctl registering a new memslot which
contains GPA:
   page-fault handler:
 see it'a mmio access on GPA;

 assign the new memslots with generation number increased
 cache the generation-number into spte;
 fix the access and comeback to guest;
SRCU-sync
 page-fault again and check the spte is a valid 
mmio-spte(*)
generation-number++;
return to userspace;
 do mmio-emulation and inject mmio-exit;

!!! userspace receives a unexpected mmio-exit, that is case 2 i exactly
said in the last mail.


Note in the step *, my approach detects the invalid generation-number which
will invalidate the mmio spte properly .

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 2:24 PM, Paolo Bonzini  wrote:
> Il 18/08/2014 23:15, David Matlack ha scritto:
>> I just realized how simple Paolo's idea is. I think it can be a one line
>> patch (without comments):
>>
>> [...]
>> update_memslots(slots, new, kvm->memslots->generation);
>> rcu_assign_pointer(kvm->memslots, slots);
>> synchronize_srcu_expedited(&kvm->srcu);
>> +   slots->generation++;
>>
>> kvm_arch_memslots_updated(kvm);
>> [...]
>
> Yeah, you're right.  I think at this point it makes sense to put all
> generation handling in install_new_memslots, but with proper comments
> the above can do as well.
>
> Would you like to send it?  Patch 2 still applies on top.

Sure, I like doing everything in install_new_memslots() as well so I'll fix
that.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 18/08/2014 23:15, David Matlack ha scritto:
> I just realized how simple Paolo's idea is. I think it can be a one line
> patch (without comments):
> 
> [...]
> update_memslots(slots, new, kvm->memslots->generation);
> rcu_assign_pointer(kvm->memslots, slots);
> synchronize_srcu_expedited(&kvm->srcu);
> +   slots->generation++;
> 
> kvm_arch_memslots_updated(kvm);
> [...]

Yeah, you're right.  I think at this point it makes sense to put all
generation handling in install_new_memslots, but with proper comments
the above can do as well.

Would you like to send it?  Patch 2 still applies on top.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 12:56 PM, Xiao Guangrong
 wrote:
> @@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
> gfn_t gfn,
>
>  static bool check_mmio_spte(struct kvm *kvm, u64 spte)
>  {
> +   struct kvm_memslots *slots = kvm_memslots(kvm);
> unsigned int kvm_gen, spte_gen;
>
> -   kvm_gen = kvm_current_mmio_generation(kvm);
> +   if (slots->updated)
> +   return false;
> +
> +   smp_rmb();
> +
> +   kvm_gen = __kvm_current_mmio_generation(slots);
> spte_gen = get_mmio_spte_generation(spte);
>

What does this fix? Case 2 can still happen. (Case 2 is unavoidable unless we
block during memslot updates, which I don't think we should :).

> trace_check_mmio_spte(spte, kvm_gen, spte_gen);
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4b6c01b..1d4e78f 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
>
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  static void update_memslots(struct kvm_memslots *slots,
> -   struct kvm_memory_slot *new, u64 last_generation);
> +   struct kvm_memory_slot *new);
>
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
>
>  static void update_memslots(struct kvm_memslots *slots,
> -   struct kvm_memory_slot *new,
> -   u64 last_generation)
> +   struct kvm_memory_slot *new)
>  {
> if (new) {
> int id = new->id;
> @@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
> if (new->npages != npages)
> sort_memslots(slots);
> }
> -
> -   slots->generation = last_generation + 1;
>  }
>
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct 
> kvm *kvm,
>  {
> struct kvm_memslots *old_memslots = kvm->memslots;
>
> -   update_memslots(slots, new, kvm->memslots->generation);
> +   /* ensure generation number is always increased. */
> +   slots->updated = true;
> +   slots->generation = old_memslots->generation;
> +   update_memslots(slots, new);
> rcu_assign_pointer(kvm->memslots, slots);
> synchronize_srcu_expedited(&kvm->srcu);
>
> +   slots->generation++;
> +   smp_wmb();
> +   slots->updated = false;
> +
> kvm_arch_memslots_updated(kvm);
>
> return old_memslots;
>

This is effectively the same as the first approach.

I just realized how simple Paolo's idea is. I think it can be a one line
patch (without comments):

[...]
update_memslots(slots, new, kvm->memslots->generation);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;

kvm_arch_memslots_updated(kvm);
[...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong

On Aug 19, 2014, at 2:47 AM, Paolo Bonzini  wrote:

> 
>> I think this patch is auditable, page-fault is always called by holding
>> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
>> Only these cases can happen:
>> 
>> 1)  page fault occurs before synchronize_srcu_expedited.
>>In this case, vcpu will generate mmio-exit for the memslot being 
>> registered
>>by the ioctl. That’s ok since the ioctl have not finished.
>> 
>> 2) page fault occurs after synchronize_srcu_expedited and during
>>   increasing generation-number.
>>   In this case, userspace may get wrong mmio-exit (that happen if handing
>>   page-fault is slower that the ioctl), that’s ok too since userspace need do
>>  the check anyway like i said above.
>> 
>> 3) page fault occurs after generation-number update
>>   that’s definitely correct. :)
>> 
>>> Another alternative could be to use the low bit to mark an in-progress
>>> change, and skip the caching if the low bit is set.  Similar to a
>>> seqcount (except if read_seqcount_retry fails, we just punt and not
>>> retry anything), you could use it even though the memory barriers
>>> provided by write_seqcount_begin/end are not too useful in this case.
>> 
>> I do not know how the bit works, page fault will cache the memslot before
>> the bit set and cache the generation-number after the bit set.
>> 
>> Maybe i missed your idea, could you please detail it?
> 
> Something like this:
> 
> - update_memslots(slots, new, kvm->memslots->generation);
> + /* ensure generation number is always increased. */
> + slots->generation = old_memslots->generation + 1;
> + update_memslots(slots, new);
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> + slots->generation++;
> 
> Then case 1 and 2 will just have a cache miss.

So, case 2 is what you concerned? :) I still think it is ok but i do not have
strong opinion on that. How about simplify it like this:

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 9314678..9fabf6a 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -234,16 +234,22 @@ static unsigned int get_mmio_spte_generation(u64 spte)
return gen;
 }
 
-static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+static unsigned int __kvm_current_mmio_generation(struct kvm_memslots *slots)
 {
+
/*
 * Init kvm generation close to MMIO_MAX_GEN to easily test the
 * code of handling generation number wrap-around.
 */
-   return (kvm_memslots(kvm)->generation +
+   return (slots->generation +
  MMIO_MAX_GEN - 150) & MMIO_GEN_MASK;
 }
 
+static unsigned int kvm_current_mmio_generation(struct kvm *kvm)
+{
+   return __kvm_current_mmio_generation(kvm_memslots(kvm));
+}
+
 static void mark_mmio_spte(struct kvm *kvm, u64 *sptep, u64 gfn,
   unsigned access)
 {
@@ -287,9 +293,15 @@ static bool set_mmio_spte(struct kvm *kvm, u64 *sptep, 
gfn_t gfn,
 
 static bool check_mmio_spte(struct kvm *kvm, u64 spte)
 {
+   struct kvm_memslots *slots = kvm_memslots(kvm);
unsigned int kvm_gen, spte_gen;
 
-   kvm_gen = kvm_current_mmio_generation(kvm);
+   if (slots->updated)
+   return false;
+
+   smp_rmb();
+   
+   kvm_gen = __kvm_current_mmio_generation(slots);
spte_gen = get_mmio_spte_generation(spte);
 
trace_check_mmio_spte(spte, kvm_gen, spte_gen);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4b6c01b..1d4e78f 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -685,8 +685,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new->id;
@@ -697,8 +696,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new->npages != npages)
sort_memslots(slots);
}
-
-   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -720,10 +717,17 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm->memslots;
 
-   update_memslots(slots, new, kvm->memslots->generation);
+   /* ensure generation number is always increased. */
+   slots->updated 

Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 18/08/2014 18:35, Xiao Guangrong ha scritto:
> 
> Hi Paolo,
> 
> Thank you to review the patch!
> 
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini  wrote:
> 
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -   update_memslots(slots, new, kvm->memslots->generation);
>>> +   /* ensure generation number is always increased. */
>>> +   slots->generation = old_memslots->generation;
>>> +   update_memslots(slots, new);
>>> rcu_assign_pointer(kvm->memslots, slots);
>>> synchronize_srcu_expedited(&kvm->srcu);
>>> +   slots->generation++;
>>
>> I don't trust my brain enough to review this patch.
> 
> Sorry to make you confused. I should expain it more clearly.

Don't worry, it's not your fault. :)

>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
> 
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

The race occurs because gfn_to_pfn_many_atomic or some other function
has already used kvm_memslots().  Calling kvm_memslots() twice is the
root cause the bug.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
> Only these cases can happen:
> 
> 1)  page fault occurs before synchronize_srcu_expedited.
> In this case, vcpu will generate mmio-exit for the memslot being 
> registered
> by the ioctl. That’s ok since the ioctl have not finished.
> 
> 2) page fault occurs after synchronize_srcu_expedited and during
>increasing generation-number.
>In this case, userspace may get wrong mmio-exit (that happen if handing
>page-fault is slower that the ioctl), that’s ok too since userspace need do
>   the check anyway like i said above.
> 
> 3) page fault occurs after generation-number update
>that’s definitely correct. :)
> 
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.
> 
> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
> 
> Maybe i missed your idea, could you please detail it?

Something like this:

-   update_memslots(slots, new, kvm->memslots->generation);
+   /* ensure generation number is always increased. */
+   slots->generation = old_memslots->generation + 1;
+   update_memslots(slots, new);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;

Then case 1 and 2 will just have a cache miss.

The "low bit" is really just because each slot update does 2 generation
increases.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread David Matlack
On Mon, Aug 18, 2014 at 9:35 AM, Xiao Guangrong
 wrote:
>
> Hi Paolo,
>
> Thank you to review the patch!
>
> On Aug 18, 2014, at 9:57 PM, Paolo Bonzini  wrote:
>
>> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>>> -update_memslots(slots, new, kvm->memslots->generation);
>>> +/* ensure generation number is always increased. */
>>> +slots->generation = old_memslots->generation;
>>> +update_memslots(slots, new);
>>>  rcu_assign_pointer(kvm->memslots, slots);
>>>  synchronize_srcu_expedited(&kvm->srcu);
>>> +slots->generation++;
>>
>> I don't trust my brain enough to review this patch.

Xiao, I thought about your approach a lot and I can't think of a bad race
that isn't already possible due to the fact that kvm allows memslot
mutation to race with vm exits. That being said, it's hard to reason about
all the other "clients" of memslots and what weirdness (or badness) will
be caused by updating generation after srcu_synch. I like Paolo's two
approaches because they fix the bug without any side-effects.

> Sorry to make you confused. I should expain it more clearly.
>
> What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
> if no luck event cleans mmio spte. (eg. if no memory pressure or  no
> context-sync on that spte.)
>
> Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
> and mmio-exit - that means userspace is able to get mmio-exit even if
> kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
> a mmio access before issuing the ioctl and injects mmio-exit to userspace 
> after
> ioctl return. So checking if mmio-exit is a real mmio access in userspace is
> needed anyway.
>
>> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
>> patch I trust myself reviewing would change a bunch of functions in
>> kvm_main.c to take a memslots struct.  This would make it easy to
>> respect the hard and fast rule of not dereferencing the same pointer
>> twice.  But it would be a tedious change.
>
> kvm_set_memory_region is the only place updating memslot and
> kvm_current_mmio_generation accesses memslot by rcu-dereference,
> i do not know why other places need to take into account.

If you rcu_dereference() more than once, you can't trust previous
decisions based on rcu_dereference()'s. If the mmio cache code only
did one rcu_dereference() per vm exit, this bug would be gone.

> I think this patch is auditable, page-fault is always called by holding
> srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
> Only these cases can happen:
>
> 1)  page fault occurs before synchronize_srcu_expedited.
> In this case, vcpu will generate mmio-exit for the memslot being 
> registered
> by the ioctl. That’s ok since the ioctl have not finished.
>
> 2) page fault occurs after synchronize_srcu_expedited and during
>increasing generation-number.
>In this case, userspace may get wrong mmio-exit (that happen if handing
>page-fault is slower that the ioctl), that’s ok too since userspace need do
>   the check anyway like i said above.
>
> 3) page fault occurs after generation-number update
>that’s definitely correct. :)
>
>> Another alternative could be to use the low bit to mark an in-progress
>> change, and skip the caching if the low bit is set.  Similar to a
>> seqcount (except if read_seqcount_retry fails, we just punt and not
>> retry anything), you could use it even though the memory barriers
>> provided by write_seqcount_begin/end are not too useful in this case.

I like this approach best. It would have the least code changes and provide
the same guarantees.

> I do not know how the bit works, page fault will cache the memslot before
> the bit set and cache the generation-number after the bit set.
>
> Maybe i missed your idea, could you please detail it?

In vcpu_cache_mmio_info() if generation is odd, just don't do the caching
because memslots were changed while we were running and we just assume the
worst case.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Xiao Guangrong

Hi Paolo,

Thank you to review the patch!

On Aug 18, 2014, at 9:57 PM, Paolo Bonzini  wrote:

> Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
>> -update_memslots(slots, new, kvm->memslots->generation);
>> +/* ensure generation number is always increased. */
>> +slots->generation = old_memslots->generation;
>> +update_memslots(slots, new);
>>  rcu_assign_pointer(kvm->memslots, slots);
>>  synchronize_srcu_expedited(&kvm->srcu);
>> +slots->generation++;
> 
> I don't trust my brain enough to review this patch.

Sorry to make you confused. I should expain it more clearly.

What this patch tried to fix is:  kvm will generate wrong mmio-exit forever
if no luck event cleans mmio spte. (eg. if no memory pressure or  no
context-sync on that spte.)

Note, it is hard to do precise sync between kvm_vm_ioctl_set_memory_region
and mmio-exit - that means userspace is able to get mmio-exit even if
kvm_vm_ioctl_set_memory_region have finished, for example, kvm identifies
a mmio access before issuing the ioctl and injects mmio-exit to userspace after
ioctl return. So checking if mmio-exit is a real mmio access in userspace is
needed anyway.

> kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
> patch I trust myself reviewing would change a bunch of functions in
> kvm_main.c to take a memslots struct.  This would make it easy to
> respect the hard and fast rule of not dereferencing the same pointer
> twice.  But it would be a tedious change.

kvm_set_memory_region is the only place updating memslot and
kvm_current_mmio_generation accesses memslot by rcu-dereference,
i do not know why other places need to take into account.

I think this patch is auditable, page-fault is always called by holding
srcu-lock so that a page fault can’t go across synchronize_srcu_expedited.
Only these cases can happen:

1)  page fault occurs before synchronize_srcu_expedited.
In this case, vcpu will generate mmio-exit for the memslot being registered
by the ioctl. That’s ok since the ioctl have not finished.

2) page fault occurs after synchronize_srcu_expedited and during
   increasing generation-number.
   In this case, userspace may get wrong mmio-exit (that happen if handing
   page-fault is slower that the ioctl), that’s ok too since userspace need do
  the check anyway like i said above.

3) page fault occurs after generation-number update
   that’s definitely correct. :)

> Another alternative could be to use the low bit to mark an in-progress
> change, and skip the caching if the low bit is set.  Similar to a
> seqcount (except if read_seqcount_retry fails, we just punt and not
> retry anything), you could use it even though the memory barriers
> provided by write_seqcount_begin/end are not too useful in this case.

I do not know how the bit works, page fault will cache the memslot before
the bit set and cache the generation-number after the bit set.

Maybe i missed your idea, could you please detail it?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-18 Thread Paolo Bonzini
Il 14/08/2014 09:01, Xiao Guangrong ha scritto:
> - update_memslots(slots, new, kvm->memslots->generation);
> + /* ensure generation number is always increased. */
> + slots->generation = old_memslots->generation;
> + update_memslots(slots, new);
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> + slots->generation++;

I don't trust my brain enough to review this patch.

kvm_current_mmio_generation seems like a very bad (race-prone) API.  One
patch I trust myself reviewing would change a bunch of functions in
kvm_main.c to take a memslots struct.  This would make it easy to
respect the hard and fast rule of not dereferencing the same pointer
twice.  But it would be a tedious change.

Another alternative could be to use the low bit to mark an in-progress
change, and skip the caching if the low bit is set.  Similar to a
seqcount (except if read_seqcount_retry fails, we just punt and not
retry anything), you could use it even though the memory barriers
provided by write_seqcount_begin/end are not too useful in this case.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-14 Thread Xiao Guangrong

Sorry, the title is not clear enough.

This is the v2 which fixes the issue pointed out by David:
" the generation number actually decreases."

Please review.

On 08/14/2014 03:01 PM, Xiao Guangrong wrote:
> We may cache the current mmio generation number and stale memslot info
> into spte, like this scenario:
> 
>CPU 0  CPU 1
> page fault:add a new memslot
> read memslot and detecting its a mmio access
>update memslots
>update generation number
> read generation number
> cache the gpa and current gen number into spte
> 
> So, if guest accesses the gpa later, it will generate a incorrect
> mmio exit
> 
> This patch fixes it by updating the generation number after
> synchronize_srcu_expedited() that makes sure the generation
> number updated only if memslots update is finished
> 
> Cc: sta...@vger.kernel.org
> Cc: David Matlack 
> Signed-off-by: Xiao Guangrong 
> ---
>  virt/kvm/kvm_main.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 33712fb..bb40df3 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -96,7 +96,7 @@ static void hardware_disable_all(void);
> 
>  static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
>  static void update_memslots(struct kvm_memslots *slots,
> - struct kvm_memory_slot *new, u64 last_generation);
> + struct kvm_memory_slot *new);
> 
>  static void kvm_release_pfn_dirty(pfn_t pfn);
>  static void mark_page_dirty_in_slot(struct kvm *kvm,
> @@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
>  }
> 
>  static void update_memslots(struct kvm_memslots *slots,
> - struct kvm_memory_slot *new,
> - u64 last_generation)
> + struct kvm_memory_slot *new)
>  {
>   if (new) {
>   int id = new->id;
> @@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
>   if (new->npages != npages)
>   sort_memslots(slots);
>   }
> -
> - slots->generation = last_generation + 1;
>  }
> 
>  static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
> @@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct 
> kvm *kvm,
>  {
>   struct kvm_memslots *old_memslots = kvm->memslots;
> 
> - update_memslots(slots, new, kvm->memslots->generation);
> + /* ensure generation number is always increased. */
> + slots->generation = old_memslots->generation;
> + update_memslots(slots, new);
>   rcu_assign_pointer(kvm->memslots, slots);
>   synchronize_srcu_expedited(&kvm->srcu);
> + slots->generation++;
> 
>   kvm_arch_memslots_updated(kvm);
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-14 Thread Xiao Guangrong
We may cache the current mmio generation number and stale memslot info
into spte, like this scenario:

   CPU 0  CPU 1
page fault:add a new memslot
read memslot and detecting its a mmio access
   update memslots
   update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: sta...@vger.kernel.org
Cc: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 virt/kvm/kvm_main.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..bb40df3 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new->npages != npages)
sort_memslots(slots);
}
-
-   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,12 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm->memslots;
 
-   update_memslots(slots, new, kvm->memslots->generation);
+   /* ensure generation number is always increased. */
+   slots->generation = old_memslots->generation;
+   update_memslots(slots, new);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;
 
kvm_arch_memslots_updated(kvm);
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-13 Thread Xiao Guangrong
On 08/13/2014 05:18 AM, David Matlack wrote:
> On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
>  wrote:
>> @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
>> kvm *kvm,
>>  {
>> struct kvm_memslots *old_memslots = kvm->memslots;
>>
> 
> I think you want
> 
>   slots->generation = old_memslots->generation;
> 
> here.
> 
> On the KVM_MR_DELETE path, install_new_memslots is called twice so this
> patch introduces a short window of time where the generation number
> actually decreases.

Yes, indeed. Thank you for pointing it out, will update.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-12 Thread David Matlack
On Mon, Aug 11, 2014 at 10:02 PM, Xiao Guangrong
 wrote:
> @@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
> kvm *kvm,
>  {
> struct kvm_memslots *old_memslots = kvm->memslots;
>

I think you want

  slots->generation = old_memslots->generation;

here.

On the KVM_MR_DELETE path, install_new_memslots is called twice so this
patch introduces a short window of time where the generation number
actually decreases.

> -   update_memslots(slots, new, kvm->memslots->generation);
> +   update_memslots(slots, new);
> rcu_assign_pointer(kvm->memslots, slots);
> synchronize_srcu_expedited(&kvm->srcu);
> +   slots->generation++;
>
> kvm_arch_memslots_updated(kvm);
>
> --
> 1.8.3.1
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: fix cache stale memslot info with correct mmio generation number

2014-08-11 Thread Xiao Guangrong
We may cache the current mmio generation number and stale memslot info
into spte, like this scenario:

   CPU 0  CPU 1
page fault:add a new memslot
read memslot and detecting its a mmio access
   update memslots
   update generation number
read generation number
cache the gpa and current gen number into spte

So, if guest accesses the gpa later, it will generate a incorrect
mmio exit

This patch fixes it by updating the generation number after
synchronize_srcu_expedited() that makes sure the generation
number updated only if memslots update is finished

Cc: sta...@vger.kernel.org
Cc: David Matlack 
Signed-off-by: Xiao Guangrong 
---
 virt/kvm/kvm_main.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 33712fb..ca3cdac 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -96,7 +96,7 @@ static void hardware_disable_all(void);
 
 static void kvm_io_bus_destroy(struct kvm_io_bus *bus);
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new, u64 last_generation);
+   struct kvm_memory_slot *new);
 
 static void kvm_release_pfn_dirty(pfn_t pfn);
 static void mark_page_dirty_in_slot(struct kvm *kvm,
@@ -687,8 +687,7 @@ static void sort_memslots(struct kvm_memslots *slots)
 }
 
 static void update_memslots(struct kvm_memslots *slots,
-   struct kvm_memory_slot *new,
-   u64 last_generation)
+   struct kvm_memory_slot *new)
 {
if (new) {
int id = new->id;
@@ -699,8 +698,6 @@ static void update_memslots(struct kvm_memslots *slots,
if (new->npages != npages)
sort_memslots(slots);
}
-
-   slots->generation = last_generation + 1;
 }
 
 static int check_memory_region_flags(struct kvm_userspace_memory_region *mem)
@@ -722,9 +719,10 @@ static struct kvm_memslots *install_new_memslots(struct 
kvm *kvm,
 {
struct kvm_memslots *old_memslots = kvm->memslots;
 
-   update_memslots(slots, new, kvm->memslots->generation);
+   update_memslots(slots, new);
rcu_assign_pointer(kvm->memslots, slots);
synchronize_srcu_expedited(&kvm->srcu);
+   slots->generation++;
 
kvm_arch_memslots_updated(kvm);
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html