Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Andres Lagar-Cavilla
On Wed, Sep 24, 2014 at 1:27 AM, Paolo Bonzini  wrote:
> Il 24/09/2014 09:20, Wanpeng Li ha scritto:
>>
>> This trace point still dup duplicated message for the same gfn in the
>> for loop.
>
> Yes, but the gfn argument lets you take it out again.
>
> Note that having the duplicated trace would make sense if you included
> the accessed bit for each spte, instead of just the "young" variable.

FWIW the new arrangement in kvm.git/queue LGTM

Thanks
Andres

>
> Paolo



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Paolo Bonzini
Il 24/09/2014 09:20, Wanpeng Li ha scritto:
> 
> This trace point still dup duplicated message for the same gfn in the
> for loop.

Yes, but the gfn argument lets you take it out again.

Note that having the duplicated trace would make sense if you included
the accessed bit for each spte, instead of just the "young" variable.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Wanpeng Li

Hi Paolo,
于 9/24/14, 3:04 PM, Paolo Bonzini 写道:

Il 24/09/2014 04:27, Wanpeng Li ha scritto:

Hi Andres,
On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.


[...]

---
+   BUG_ON(!shadow_accessed_mask);

for (sptep = rmap_get_first(*rmapp, ); sptep;
 sptep = rmap_get_next()) {
+   struct kvm_mmu_page *sp;
+   gfn_t gfn;
BUG_ON(!is_shadow_present_pte(*sptep));
+   /* From spte to gfn. */
+   sp = page_header(__pa(sptep));
+   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);

if (*sptep & shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
 (unsigned long *)sptep);
}
+   trace_kvm_age_page(gfn, slot, young);

IIUC, all the rmapps in this for loop are against the same gfn which
results in the above trace point dump the message duplicated.

You're right; Andres's patch "[PATCH] kvm/x86/mmu: Pass gfn and level to
rmapp callback" helps avoiding that.


From Andres's patch "[PATCH] kvm/x86/mmu: Pass gfn and level to rmapp 
callback"


@@ -1410,25 +1421,20 @@ static int kvm_age_rmapp(struct kvm *kvm, 
unsigned long *rmapp,


for (sptep = rmap_get_first(*rmapp, ); sptep;
sptep = rmap_get_next()) {
- struct kvm_mmu_page *sp;
- gfn_t gfn;
BUG_ON(!is_shadow_present_pte(*sptep));
- /* From spte to gfn. */
- sp = page_header(__pa(sptep));
- gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
-
if (*sptep & shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
}
- trace_kvm_age_page(gfn, slot, young);
+ trace_kvm_age_page(gfn, level, slot, young);
}
return young;
}


This trace point still dup duplicated message for the same gfn in the 
for loop.


Regards,
Wanpeng Li



Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Paolo Bonzini
Il 24/09/2014 04:27, Wanpeng Li ha scritto:
> Hi Andres,
> On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:
>> 1. We were calling clear_flush_young_notify in unmap_one, but we are
>> within an mmu notifier invalidate range scope. The spte exists no more
>> (due to range_start) and the accessed bit info has already been
>> propagated (due to kvm_pfn_set_accessed). Simply call
>> clear_flush_young.
>>
>> 2. We clear_flush_young on a primary MMU PMD, but this may be mapped
>> as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
>> This required expanding the interface of the clear_flush_young mmu
>> notifier, so a lot of code has been trivially touched.
>>
>> 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
>> the access bit by blowing the spte. This requires proper synchronizing
>> with MMU notifier consumers, like every other removal of spte's does.
>>
> [...]
>> ---
>> +BUG_ON(!shadow_accessed_mask);
>>
>>  for (sptep = rmap_get_first(*rmapp, ); sptep;
>>   sptep = rmap_get_next()) {
>> +struct kvm_mmu_page *sp;
>> +gfn_t gfn;
>>  BUG_ON(!is_shadow_present_pte(*sptep));
>> +/* From spte to gfn. */
>> +sp = page_header(__pa(sptep));
>> +gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>>
>>  if (*sptep & shadow_accessed_mask) {
>>  young = 1;
>>  clear_bit((ffs(shadow_accessed_mask) - 1),
>>   (unsigned long *)sptep);
>>  }
>> +trace_kvm_age_page(gfn, slot, young);
> 
> IIUC, all the rmapps in this for loop are against the same gfn which
> results in the above trace point dump the message duplicated.

You're right; Andres's patch "[PATCH] kvm/x86/mmu: Pass gfn and level to
rmapp callback" helps avoiding that.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Paolo Bonzini
Il 24/09/2014 04:27, Wanpeng Li ha scritto:
 Hi Andres,
 On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:
 1. We were calling clear_flush_young_notify in unmap_one, but we are
 within an mmu notifier invalidate range scope. The spte exists no more
 (due to range_start) and the accessed bit info has already been
 propagated (due to kvm_pfn_set_accessed). Simply call
 clear_flush_young.

 2. We clear_flush_young on a primary MMU PMD, but this may be mapped
 as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
 This required expanding the interface of the clear_flush_young mmu
 notifier, so a lot of code has been trivially touched.

 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
 the access bit by blowing the spte. This requires proper synchronizing
 with MMU notifier consumers, like every other removal of spte's does.

 [...]
 ---
 +BUG_ON(!shadow_accessed_mask);

  for (sptep = rmap_get_first(*rmapp, iter); sptep;
   sptep = rmap_get_next(iter)) {
 +struct kvm_mmu_page *sp;
 +gfn_t gfn;
  BUG_ON(!is_shadow_present_pte(*sptep));
 +/* From spte to gfn. */
 +sp = page_header(__pa(sptep));
 +gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);

  if (*sptep  shadow_accessed_mask) {
  young = 1;
  clear_bit((ffs(shadow_accessed_mask) - 1),
   (unsigned long *)sptep);
  }
 +trace_kvm_age_page(gfn, slot, young);
 
 IIUC, all the rmapps in this for loop are against the same gfn which
 results in the above trace point dump the message duplicated.

You're right; Andres's patch [PATCH] kvm/x86/mmu: Pass gfn and level to
rmapp callback helps avoiding that.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Wanpeng Li

Hi Paolo,
于 9/24/14, 3:04 PM, Paolo Bonzini 写道:

Il 24/09/2014 04:27, Wanpeng Li ha scritto:

Hi Andres,
On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:

1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.


[...]

---
+   BUG_ON(!shadow_accessed_mask);

for (sptep = rmap_get_first(*rmapp, iter); sptep;
 sptep = rmap_get_next(iter)) {
+   struct kvm_mmu_page *sp;
+   gfn_t gfn;
BUG_ON(!is_shadow_present_pte(*sptep));
+   /* From spte to gfn. */
+   sp = page_header(__pa(sptep));
+   gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);

if (*sptep  shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
 (unsigned long *)sptep);
}
+   trace_kvm_age_page(gfn, slot, young);

IIUC, all the rmapps in this for loop are against the same gfn which
results in the above trace point dump the message duplicated.

You're right; Andres's patch [PATCH] kvm/x86/mmu: Pass gfn and level to
rmapp callback helps avoiding that.


From Andres's patch [PATCH] kvm/x86/mmu: Pass gfn and level to rmapp 
callback


@@ -1410,25 +1421,20 @@ static int kvm_age_rmapp(struct kvm *kvm, 
unsigned long *rmapp,


for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
- struct kvm_mmu_page *sp;
- gfn_t gfn;
BUG_ON(!is_shadow_present_pte(*sptep));
- /* From spte to gfn. */
- sp = page_header(__pa(sptep));
- gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
-
if (*sptep  shadow_accessed_mask) {
young = 1;
clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
}
- trace_kvm_age_page(gfn, slot, young);
+ trace_kvm_age_page(gfn, level, slot, young);
}
return young;
}


This trace point still dup duplicated message for the same gfn in the 
for loop.


Regards,
Wanpeng Li



Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Paolo Bonzini
Il 24/09/2014 09:20, Wanpeng Li ha scritto:
 
 This trace point still dup duplicated message for the same gfn in the
 for loop.

Yes, but the gfn argument lets you take it out again.

Note that having the duplicated trace would make sense if you included
the accessed bit for each spte, instead of just the young variable.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-24 Thread Andres Lagar-Cavilla
On Wed, Sep 24, 2014 at 1:27 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/09/2014 09:20, Wanpeng Li ha scritto:

 This trace point still dup duplicated message for the same gfn in the
 for loop.

 Yes, but the gfn argument lets you take it out again.

 Note that having the duplicated trace would make sense if you included
 the accessed bit for each spte, instead of just the young variable.

FWIW the new arrangement in kvm.git/queue LGTM

Thanks
Andres


 Paolo



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Wanpeng Li
Hi Andres,
On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:
>1. We were calling clear_flush_young_notify in unmap_one, but we are
>within an mmu notifier invalidate range scope. The spte exists no more
>(due to range_start) and the accessed bit info has already been
>propagated (due to kvm_pfn_set_accessed). Simply call
>clear_flush_young.
>
>2. We clear_flush_young on a primary MMU PMD, but this may be mapped
>as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
>This required expanding the interface of the clear_flush_young mmu
>notifier, so a lot of code has been trivially touched.
>
>3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
>the access bit by blowing the spte. This requires proper synchronizing
>with MMU notifier consumers, like every other removal of spte's does.
>
[...]
>---
>+  BUG_ON(!shadow_accessed_mask);
> 
>   for (sptep = rmap_get_first(*rmapp, ); sptep;
>sptep = rmap_get_next()) {
>+  struct kvm_mmu_page *sp;
>+  gfn_t gfn;
>   BUG_ON(!is_shadow_present_pte(*sptep));
>+  /* From spte to gfn. */
>+  sp = page_header(__pa(sptep));
>+  gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
> 
>   if (*sptep & shadow_accessed_mask) {
>   young = 1;
>   clear_bit((ffs(shadow_accessed_mask) - 1),
>(unsigned long *)sptep);
>   }
>+  trace_kvm_age_page(gfn, slot, young);

IIUC, all the rmapps in this for loop are against the same gfn which
results in the above trace point dump the message duplicated.

Regards,
Wanpeng Li 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini
Il 23/09/2014 19:04, Andres Lagar-Cavilla ha scritto:
> I'm not sure. The addition is not always by PAGE_SIZE, since it
> depends on the current level we are iterating at in the outer
> kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
> is_large_pte() enough to tell?
> 
> This is probably worth a general fix, I can see all the callbacks
> benefiting from knowing the gfn (passed down by
> kvm_handle_hva_range()) without any additional computation, and adding
> that to a tracing call if they don't already.
> 
> Even passing the level down to the callback would help by cutting down
> to one arithmetic op (subtract rmapp from slot rmap base pointer for
> that level)

You're right.  Let's apply this patch and work on that as a follow-up.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Andres Lagar-Cavilla
On Tue, Sep 23, 2014 at 12:49 AM, Paolo Bonzini  wrote:
> Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
>> @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
>> long *rmapp,
>>   struct rmap_iterator uninitialized_var(iter);
>>   int young = 0;
>>
>> - /*
>> -  * In case of absence of EPT Access and Dirty Bits supports,
>> -  * emulate the accessed bit for EPT, by checking if this page has
>> -  * an EPT mapping, and clearing it if it does. On the next access,
>> -  * a new EPT mapping will be established.
>> -  * This has some overhead, but not as much as the cost of swapping
>> -  * out actively used pages or breaking up actively used hugepages.
>> -  */
>> - if (!shadow_accessed_mask) {
>> - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
>> - goto out;
>> - }
>> + BUG_ON(!shadow_accessed_mask);
>>
>>   for (sptep = rmap_get_first(*rmapp, ); sptep;
>>sptep = rmap_get_next()) {
>> + struct kvm_mmu_page *sp;
>> + gfn_t gfn;
>>   BUG_ON(!is_shadow_present_pte(*sptep));
>> + /* From spte to gfn. */
>> + sp = page_header(__pa(sptep));
>> + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>>
>>   if (*sptep & shadow_accessed_mask) {
>>   young = 1;
>>   clear_bit((ffs(shadow_accessed_mask) - 1),
>>(unsigned long *)sptep);
>>   }
>> + trace_kvm_age_page(gfn, slot, young);
>
> Yesterday I couldn't think of a way to avoid the
> page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
> not hard.  Instead of passing hva as datum, you can pass (unsigned long)
>   Then you can add PAGE_SIZE to it at the end of every call to
> kvm_age_rmapp, and keep the old tracing logic.

I'm not sure. The addition is not always by PAGE_SIZE, since it
depends on the current level we are iterating at in the outer
kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
is_large_pte() enough to tell?

This is probably worth a general fix, I can see all the callbacks
benefiting from knowing the gfn (passed down by
kvm_handle_hva_range()) without any additional computation, and adding
that to a tracing call if they don't already.

Even passing the level down to the callback would help by cutting down
to one arithmetic op (subtract rmapp from slot rmap base pointer for
that level)

Andres
>
>
> Paolo



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini
Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
> @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
> long *rmapp,
>   struct rmap_iterator uninitialized_var(iter);
>   int young = 0;
>  
> - /*
> -  * In case of absence of EPT Access and Dirty Bits supports,
> -  * emulate the accessed bit for EPT, by checking if this page has
> -  * an EPT mapping, and clearing it if it does. On the next access,
> -  * a new EPT mapping will be established.
> -  * This has some overhead, but not as much as the cost of swapping
> -  * out actively used pages or breaking up actively used hugepages.
> -  */
> - if (!shadow_accessed_mask) {
> - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
> - goto out;
> - }
> + BUG_ON(!shadow_accessed_mask);
>  
>   for (sptep = rmap_get_first(*rmapp, ); sptep;
>sptep = rmap_get_next()) {
> + struct kvm_mmu_page *sp;
> + gfn_t gfn;
>   BUG_ON(!is_shadow_present_pte(*sptep));
> + /* From spte to gfn. */
> + sp = page_header(__pa(sptep));
> + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt);
>  
>   if (*sptep & shadow_accessed_mask) {
>   young = 1;
>   clear_bit((ffs(shadow_accessed_mask) - 1),
>(unsigned long *)sptep);
>   }
> + trace_kvm_age_page(gfn, slot, young);

Yesterday I couldn't think of a way to avoid the
page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
not hard.  Instead of passing hva as datum, you can pass (unsigned long)
  Then you can add PAGE_SIZE to it at the end of every call to
kvm_age_rmapp, and keep the old tracing logic.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini
Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
 @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
   struct rmap_iterator uninitialized_var(iter);
   int young = 0;
  
 - /*
 -  * In case of absence of EPT Access and Dirty Bits supports,
 -  * emulate the accessed bit for EPT, by checking if this page has
 -  * an EPT mapping, and clearing it if it does. On the next access,
 -  * a new EPT mapping will be established.
 -  * This has some overhead, but not as much as the cost of swapping
 -  * out actively used pages or breaking up actively used hugepages.
 -  */
 - if (!shadow_accessed_mask) {
 - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
 - goto out;
 - }
 + BUG_ON(!shadow_accessed_mask);
  
   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
 + /* From spte to gfn. */
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
  
   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
 + trace_kvm_age_page(gfn, slot, young);

Yesterday I couldn't think of a way to avoid the
page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
not hard.  Instead of passing hva as datum, you can pass (unsigned long)
start.  Then you can add PAGE_SIZE to it at the end of every call to
kvm_age_rmapp, and keep the old tracing logic.


Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Andres Lagar-Cavilla
On Tue, Sep 23, 2014 at 12:49 AM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 22/09/2014 23:54, Andres Lagar-Cavilla ha scritto:
 @@ -1406,32 +1406,24 @@ static int kvm_age_rmapp(struct kvm *kvm, unsigned 
 long *rmapp,
   struct rmap_iterator uninitialized_var(iter);
   int young = 0;

 - /*
 -  * In case of absence of EPT Access and Dirty Bits supports,
 -  * emulate the accessed bit for EPT, by checking if this page has
 -  * an EPT mapping, and clearing it if it does. On the next access,
 -  * a new EPT mapping will be established.
 -  * This has some overhead, but not as much as the cost of swapping
 -  * out actively used pages or breaking up actively used hugepages.
 -  */
 - if (!shadow_accessed_mask) {
 - young = kvm_unmap_rmapp(kvm, rmapp, slot, data);
 - goto out;
 - }
 + BUG_ON(!shadow_accessed_mask);

   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
 + struct kvm_mmu_page *sp;
 + gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
 + /* From spte to gfn. */
 + sp = page_header(__pa(sptep));
 + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);

   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
 + trace_kvm_age_page(gfn, slot, young);

 Yesterday I couldn't think of a way to avoid the
 page_header/kvm_mmu_page_get_gfn on every iteration, but it's actually
 not hard.  Instead of passing hva as datum, you can pass (unsigned long)
 start.  Then you can add PAGE_SIZE to it at the end of every call to
 kvm_age_rmapp, and keep the old tracing logic.

I'm not sure. The addition is not always by PAGE_SIZE, since it
depends on the current level we are iterating at in the outer
kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
is_large_pte() enough to tell?

This is probably worth a general fix, I can see all the callbacks
benefiting from knowing the gfn (passed down by
kvm_handle_hva_range()) without any additional computation, and adding
that to a tracing call if they don't already.

Even passing the level down to the callback would help by cutting down
to one arithmetic op (subtract rmapp from slot rmap base pointer for
that level)

Andres


 Paolo



-- 
Andres Lagar-Cavilla | Google Kernel Team | andre...@google.com
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Paolo Bonzini
Il 23/09/2014 19:04, Andres Lagar-Cavilla ha scritto:
 I'm not sure. The addition is not always by PAGE_SIZE, since it
 depends on the current level we are iterating at in the outer
 kvm_handle_hva_range(). IOW, could be PMD_SIZE or even PUD_SIZE, and
 is_large_pte() enough to tell?
 
 This is probably worth a general fix, I can see all the callbacks
 benefiting from knowing the gfn (passed down by
 kvm_handle_hva_range()) without any additional computation, and adding
 that to a tracing call if they don't already.
 
 Even passing the level down to the callback would help by cutting down
 to one arithmetic op (subtract rmapp from slot rmap base pointer for
 that level)

You're right.  Let's apply this patch and work on that as a follow-up.

Paolo
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4] kvm: Fix page ageing bugs

2014-09-23 Thread Wanpeng Li
Hi Andres,
On Mon, Sep 22, 2014 at 02:54:42PM -0700, Andres Lagar-Cavilla wrote:
1. We were calling clear_flush_young_notify in unmap_one, but we are
within an mmu notifier invalidate range scope. The spte exists no more
(due to range_start) and the accessed bit info has already been
propagated (due to kvm_pfn_set_accessed). Simply call
clear_flush_young.

2. We clear_flush_young on a primary MMU PMD, but this may be mapped
as a collection of PTEs by the secondary MMU (e.g. during log-dirty).
This required expanding the interface of the clear_flush_young mmu
notifier, so a lot of code has been trivially touched.

3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate
the access bit by blowing the spte. This requires proper synchronizing
with MMU notifier consumers, like every other removal of spte's does.

[...]
---
+  BUG_ON(!shadow_accessed_mask);
 
   for (sptep = rmap_get_first(*rmapp, iter); sptep;
sptep = rmap_get_next(iter)) {
+  struct kvm_mmu_page *sp;
+  gfn_t gfn;
   BUG_ON(!is_shadow_present_pte(*sptep));
+  /* From spte to gfn. */
+  sp = page_header(__pa(sptep));
+  gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt);
 
   if (*sptep  shadow_accessed_mask) {
   young = 1;
   clear_bit((ffs(shadow_accessed_mask) - 1),
(unsigned long *)sptep);
   }
+  trace_kvm_age_page(gfn, slot, young);

IIUC, all the rmapps in this for loop are against the same gfn which
results in the above trace point dump the message duplicated.

Regards,
Wanpeng Li 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/