Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-05 Thread Avi Kivity

Marcelo Tosatti wrote:

Here's one way to make this work:

 - add a hash of global pagetables, indexed by virtual address instead  
of the pagetable's gfn

 - invlpg checks this hash in addition to the recursive walk

We'd need to make the virtual address part of sp->role to avoid needing  
to link the same page multiple times in the virtual address hash.



Humpf, yes. It seems its too expensive/complex to handle this, for such
small gain (~= 2% on AIM7 with RHEL3 guest).

Are you okay with just disabling the global pages optimization?
  


Definitely to plug the hole; and probably for later as well, unless 
people cry out due to regressions.


Please send it in two patches:  one a trivial one to disable global page 
detection which can be sent to -stable as well, and a follow on which 
rips out the global page machinery until (and if) we decide to 
reimplement it correctly.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-05 Thread Marcelo Tosatti
On Sun, Apr 05, 2009 at 11:41:39AM +0300, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>> The problem is when the page is unreachable due to a higher level path
>> being unlinked. Say:
>>
>> level 4 -> level 3 . level 2 -> level 1 (global unsync)
>>
>> The dot there means level 3 is not linked to level 2, so invlpg can't
>> reach the global unsync at level 1.
>>
>> kvm_mmu_get_page does sync pages when it finds them, so the code is
>> already safe for the "linking a page which contains global ptes" case
>> you mention above.
>>   
>
> The recursive resync ignores global pages (or it would hit them on cr3  
> switch too).
>
> But we have a bigger problem, invlpg can miss even if nothing is unlinked:
>
>   access address x through global pte
>   -> pte instantiated into spte
>   switch to cr3 where x is not mapped, or mapped differently
>   write to pte
>   -> no fault since pte is unsync
>   invlpg x
>   -> no way this can hit the spte
>   switch cr3 back
>   access address x
>   -> use old spte
>
> Here's one way to make this work:
>
>  - add a hash of global pagetables, indexed by virtual address instead  
> of the pagetable's gfn
>  - invlpg checks this hash in addition to the recursive walk
>
> We'd need to make the virtual address part of sp->role to avoid needing  
> to link the same page multiple times in the virtual address hash.

Humpf, yes. It seems its too expensive/complex to handle this, for such
small gain (~= 2% on AIM7 with RHEL3 guest).

Are you okay with just disabling the global pages optimization?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-05 Thread Avi Kivity

Marcelo Tosatti wrote:

The problem is when the page is unreachable due to a higher level path
being unlinked. Say:

level 4 -> level 3 . level 2 -> level 1 (global unsync)

The dot there means level 3 is not linked to level 2, so invlpg can't
reach the global unsync at level 1.

kvm_mmu_get_page does sync pages when it finds them, so the code is
already safe for the "linking a page which contains global ptes" case
you mention above.
  


The recursive resync ignores global pages (or it would hit them on cr3 
switch too).


But we have a bigger problem, invlpg can miss even if nothing is unlinked:

  access address x through global pte
  -> pte instantiated into spte
  switch to cr3 where x is not mapped, or mapped differently
  write to pte
  -> no fault since pte is unsync
  invlpg x
  -> no way this can hit the spte
  switch cr3 back
  access address x
  -> use old spte

Here's one way to make this work:

 - add a hash of global pagetables, indexed by virtual address instead 
of the pagetable's gfn

 - invlpg checks this hash in addition to the recursive walk

We'd need to make the virtual address part of sp->role to avoid needing 
to link the same page multiple times in the virtual address hash.


--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-04 Thread Aurelien Jarno
On Fri, Apr 03, 2009 at 06:45:48PM -0300, Marcelo Tosatti wrote:
> On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
> >> index 2ea8262..48169d7 100644
> >> --- a/arch/x86/kvm/x86.c
> >> +++ b/arch/x86/kvm/x86.c
> >> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, 
> >> struct kvm_run *kvm_run)
> >>kvm_write_guest_time(vcpu);
> >>if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
> >>kvm_mmu_sync_roots(vcpu);
> >> +  if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
> >> &vcpu->requests))
> >> +  kvm_mmu_sync_global(vcpu);
> >>if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
> >>kvm_x86_ops->tlb_flush(vcpu);
> >>if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
> >
> > Windows will (I think) write a PDE on every context switch, so this  
> > effectively disables global unsync for that guest.
> >
> > What about recursively syncing the newly linked page in FNAME(fetch)()?  
> > If the page isn't global, this becomes a no-op, so no new overhead.  The  
> > only question is the expense when linking a populated top-level page,  
> > especially in long mode.
> 
> How about this?
> 
> KVM: MMU: sync global pages on fetch()
> 
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
> 
> So sync global pages in fetch().
> 
> Signed-off-by: Marcelo Tosatti 

I have tried this patch, and unfortunately it does not solve the
original problem, while the previous one did.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-04 Thread Marcelo Tosatti
On Sat, Apr 04, 2009 at 01:37:39PM +0300, Avi Kivity wrote:
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 09782a9..728be72 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t 
>> addr,
>>  break;
>>  }
>>  -   if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
>> +if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
>> +if (level-1 == PT_PAGE_TABLE_LEVEL) {
>> +shadow_page = page_header(__pa(sptep));
>> +if (shadow_page->unsync && shadow_page->global)
>> +kvm_sync_page(vcpu, shadow_page);
>> +}
>>  continue;
>> +}
>>  if (is_large_pte(*sptep)) {
>>  rmap_remove(vcpu->kvm, sptep);
>>   
>
> But here the shadow page is already linked?  Isn't the root cause that  
> an invlpg was called when the page wasn't linked, so it wasn't seen by  
> invlpg?
>
> So I thought the best place would be in fetch(), after  
> kvm_mmu_get_page().  If we're linking a page which contains global ptes,  
> they might be unsynced due to invlpgs that we've missed.
>
> Or am I missing something about the root cause?

The problem is when the page is unreachable due to a higher level path
being unlinked. Say:

level 4 -> level 3 . level 2 -> level 1 (global unsync)

The dot there means level 3 is not linked to level 2, so invlpg can't
reach the global unsync at level 1.

kvm_mmu_get_page does sync pages when it finds them, so the code is
already safe for the "linking a page which contains global ptes" case
you mention above.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-04 Thread Avi Kivity

Marcelo Tosatti wrote:

On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
  

index 2ea8262..48169d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_write_guest_time(vcpu);
if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
kvm_mmu_sync_roots(vcpu);
+   if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
&vcpu->requests))
+   kvm_mmu_sync_global(vcpu);
if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
kvm_x86_ops->tlb_flush(vcpu);
if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
  
Windows will (I think) write a PDE on every context switch, so this  
effectively disables global unsync for that guest.


What about recursively syncing the newly linked page in FNAME(fetch)()?  
If the page isn't global, this becomes a no-op, so no new overhead.  The  
only question is the expense when linking a populated top-level page,  
especially in long mode.



How about this?

KVM: MMU: sync global pages on fetch()

If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.

So sync global pages in fetch().

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 09782a9..728be72 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
break;
}
 
-		if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))

+   if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
+   if (level-1 == PT_PAGE_TABLE_LEVEL) {
+   shadow_page = page_header(__pa(sptep));
+   if (shadow_page->unsync && shadow_page->global)
+   kvm_sync_page(vcpu, shadow_page);
+   }
continue;
+   }
 
 		if (is_large_pte(*sptep)) {

rmap_remove(vcpu->kvm, sptep);
  


But here the shadow page is already linked?  Isn't the root cause that 
an invlpg was called when the page wasn't linked, so it wasn't seen by 
invlpg?


So I thought the best place would be in fetch(), after 
kvm_mmu_get_page().  If we're linking a page which contains global ptes, 
they might be unsynced due to invlpgs that we've missed.


Or am I missing something about the root cause?

--
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-04-03 Thread Marcelo Tosatti
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
>> index 2ea8262..48169d7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, 
>> struct kvm_run *kvm_run)
>>  kvm_write_guest_time(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>>  kvm_mmu_sync_roots(vcpu);
>> +if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
>> &vcpu->requests))
>> +kvm_mmu_sync_global(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>>  kvm_x86_ops->tlb_flush(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
>
> Windows will (I think) write a PDE on every context switch, so this  
> effectively disables global unsync for that guest.
>
> What about recursively syncing the newly linked page in FNAME(fetch)()?  
> If the page isn't global, this becomes a no-op, so no new overhead.  The  
> only question is the expense when linking a populated top-level page,  
> especially in long mode.

How about this?

KVM: MMU: sync global pages on fetch()

If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.

So sync global pages in fetch().

Signed-off-by: Marcelo Tosatti 

diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
index 09782a9..728be72 100644
--- a/arch/x86/kvm/paging_tmpl.h
+++ b/arch/x86/kvm/paging_tmpl.h
@@ -308,8 +308,14 @@ static u64 *FNAME(fetch)(struct kvm_vcpu *vcpu, gva_t addr,
break;
}
 
-   if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep))
+   if (is_shadow_present_pte(*sptep) && !is_large_pte(*sptep)) {
+   if (level-1 == PT_PAGE_TABLE_LEVEL) {
+   shadow_page = page_header(__pa(sptep));
+   if (shadow_page->unsync && shadow_page->global)
+   kvm_sync_page(vcpu, shadow_page);
+   }
continue;
+   }
 
if (is_large_pte(*sptep)) {
rmap_remove(vcpu->kvm, sptep);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-24 Thread Marcelo Tosatti
On Tue, Mar 24, 2009 at 11:47:33AM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> Maybe it's best to resync when relinking a global page?
>>> 
>>
>> How about this. It will shorten the unsync period of global pages,
>> unfortunately.
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 2a36f7f..bccdcc7 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
>> kvm_vcpu *vcpu,
>>  set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
>>  kvm_mmu_mark_parents_unsync(vcpu, sp);
>>  }
>> +if (role.level != PT_PAGE_TABLE_LEVEL &&
>> +!list_empty(&vcpu->kvm->arch.oos_global_pages))
>> +set_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
>> &vcpu->requests);
>> +
>>  pgprintk("%s: found\n", __func__);
>>  return sp;
>>  }
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index 2ea8262..48169d7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, 
>> struct kvm_run *kvm_run)
>>  kvm_write_guest_time(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>>  kvm_mmu_sync_roots(vcpu);
>> +if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
>> &vcpu->requests))
>> +kvm_mmu_sync_global(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>>  kvm_x86_ops->tlb_flush(vcpu);
>>  if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS
>
> Windows will (I think) write a PDE on every context switch, so this  
> effectively disables global unsync for that guest.
>
> What about recursively syncing the newly linked page in FNAME(fetch)()?  
> If the page isn't global, this becomes a no-op, so no new overhead.  The  
> only question is the expense when linking a populated top-level page,  
> especially in long mode.

Yes, I started doing that but it touches the nice fastpath in fetch().
I'll see if I can come up with something and with numbers.

Thanks
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-24 Thread Aurelien Jarno
On Mon, Mar 23, 2009 at 02:27:25PM -0300, Marcelo Tosatti wrote:
> On Sun, Mar 22, 2009 at 11:35:00AM +0200, Avi Kivity wrote:
> > Good catch, indeed.  But is it sufficient?  We could unlink a page  
> > through other means, for example by the guest zapping a page directory  
> > entry.  
> 
> Yep.
> 
> > Maybe it's best to resync when relinking a global page?
> 
> How about this. It will shorten the unsync period of global pages,
> unfortunately.

JFYI, it still fixes the problem seen with FreeBSD.

> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 2a36f7f..bccdcc7 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
> kvm_vcpu *vcpu,
>   set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
>   kvm_mmu_mark_parents_unsync(vcpu, sp);
>   }
> + if (role.level != PT_PAGE_TABLE_LEVEL &&
> + !list_empty(&vcpu->kvm->arch.oos_global_pages))
> + set_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
> &vcpu->requests);
> +
>   pgprintk("%s: found\n", __func__);
>   return sp;
>   }
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index 2ea8262..48169d7 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, 
> struct kvm_run *kvm_run)
>   kvm_write_guest_time(vcpu);
>   if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
>   kvm_mmu_sync_roots(vcpu);
> + if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
> &vcpu->requests))
> + kvm_mmu_sync_global(vcpu);
>   if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
>   kvm_x86_ops->tlb_flush(vcpu);
>   if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 11eb702..8efd6e3 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -37,7 +37,8 @@
>  #define KVM_REQ_PENDING_TIMER  5
>  #define KVM_REQ_UNHALT 6
>  #define KVM_REQ_MMU_SYNC   7
> -#define KVM_REQ_KVMCLOCK_UPDATE8
> +#define KVM_REQ_MMU_GLOBAL_SYNC8   
> +#define KVM_REQ_KVMCLOCK_UPDATE9
>  
>  #define KVM_USERSPACE_IRQ_SOURCE_ID  0
>  
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-24 Thread Avi Kivity

Marcelo Tosatti wrote:

Maybe it's best to resync when relinking a global page?



How about this. It will shorten the unsync period of global pages,
unfortunately.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a36f7f..bccdcc7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
kvm_mmu_mark_parents_unsync(vcpu, sp);
}
+   if (role.level != PT_PAGE_TABLE_LEVEL &&
+   !list_empty(&vcpu->kvm->arch.oos_global_pages))
+   set_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
&vcpu->requests);
+
pgprintk("%s: found\n", __func__);
return sp;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ea8262..48169d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_write_guest_time(vcpu);
if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
kvm_mmu_sync_roots(vcpu);
+   if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
&vcpu->requests))
+   kvm_mmu_sync_global(vcpu);
if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
kvm_x86_ops->tlb_flush(vcpu);
if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS


Windows will (I think) write a PDE on every context switch, so this 
effectively disables global unsync for that guest.


What about recursively syncing the newly linked page in FNAME(fetch)()? 
If the page isn't global, this becomes a no-op, so no new overhead.  The 
only question is the expense when linking a populated top-level page, 
especially in long mode.


--
Do not meddle in the internals of kernels, for they are subtle and quick to 
panic.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-23 Thread Marcelo Tosatti
On Sun, Mar 22, 2009 at 11:35:00AM +0200, Avi Kivity wrote:
> Good catch, indeed.  But is it sufficient?  We could unlink a page  
> through other means, for example by the guest zapping a page directory  
> entry.  

Yep.

> Maybe it's best to resync when relinking a global page?

How about this. It will shorten the unsync period of global pages,
unfortunately.

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 2a36f7f..bccdcc7 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -1238,6 +1238,10 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct 
kvm_vcpu *vcpu,
set_bit(KVM_REQ_MMU_SYNC, &vcpu->requests);
kvm_mmu_mark_parents_unsync(vcpu, sp);
}
+   if (role.level != PT_PAGE_TABLE_LEVEL &&
+   !list_empty(&vcpu->kvm->arch.oos_global_pages))
+   set_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
&vcpu->requests);
+
pgprintk("%s: found\n", __func__);
return sp;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ea8262..48169d7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3109,6 +3109,8 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu, struct 
kvm_run *kvm_run)
kvm_write_guest_time(vcpu);
if (test_and_clear_bit(KVM_REQ_MMU_SYNC, &vcpu->requests))
kvm_mmu_sync_roots(vcpu);
+   if (test_and_clear_bit(KVM_REQ_MMU_GLOBAL_SYNC, 
&vcpu->requests))
+   kvm_mmu_sync_global(vcpu);
if (test_and_clear_bit(KVM_REQ_TLB_FLUSH, &vcpu->requests))
kvm_x86_ops->tlb_flush(vcpu);
if (test_and_clear_bit(KVM_REQ_REPORT_TPR_ACCESS,
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 11eb702..8efd6e3 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -37,7 +37,8 @@
 #define KVM_REQ_PENDING_TIMER  5
 #define KVM_REQ_UNHALT 6
 #define KVM_REQ_MMU_SYNC   7
-#define KVM_REQ_KVMCLOCK_UPDATE8
+#define KVM_REQ_MMU_GLOBAL_SYNC8 
+#define KVM_REQ_KVMCLOCK_UPDATE9
 
 #define KVM_USERSPACE_IRQ_SOURCE_ID0
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-22 Thread Avi Kivity

Marcelo Tosatti wrote:

On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
  

Hi,

Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
under high load (during a compilation for example) with the following 
error message:


| Fatal trap 12: page fault while in kernel mode
| fault virtual address   = 0x4
| fault code  = supervisor read, page not present
| instruction pointer = 0x20:0xc0a4fc00
| stack pointer   = 0x28:0xe66d7a70
| frame pointer   = 0x28:0xe66d7a80
| code segment= base 0x0, limit 0xf, type 0x1b
| = DPL 0, pres 1, def32 1, gran 1
| processor eflags= interrupt enabled, resume, IOPL = 0
| current process = 24037 (bash)
| trap number = 12
| panic: page fault
| Uptime: 4m7s
| Cannot dump. No dump device defined.
| Automatic reboot in 15 seconds - press a key on the console to abort



Aurelien,

Can you try this patch please.

---

KVM: MMU: do not allow unsync global pages to become unreachable

Section 5.1 of the Intel TLB doc:

"• INVLPG. This instruction takes a single operand, which is a linear
address. The instruction invalidates any TLB entries for the page number
of that linear address including those for global pages (see Section
7). INVLPG also invalidates all entries in all paging-structure caches
regardless of the linear addresses to which they correspond."

Section 7:

"The following items detail the invalidation of global TLB entries:

• An execution of INVLPG invalidates any TLB entries for the page
number of the linear address that is its operand, even if the entries
are global."

If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.

So invalidate all unsync global entries when zapping a page. Another
possibility would be to cover global pages in the unsync bitmaps, but
that would increase overhead for mmu_sync_roots.

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
}
 }
 
+static void mmu_invalidate_unsync_global(struct kvm *kvm)

+{
+   struct kvm_mmu_page *sp, *n;
+
+   list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
+   kvm_mmu_page_unlink_children(kvm, sp);
+   kvm_unlink_unsync_page(kvm, sp);
+   }
+}
+
 static int mmu_zap_unsync_children(struct kvm *kvm,
   struct kvm_mmu_page *parent)
 {
@@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
kvm_mmu_pages_init(parent, &parents, &pages);
}
 
+	mmu_invalidate_unsync_global(kvm);

+
return zapped;
 }
 
  


Good catch, indeed.  But is it sufficient?  We could unlink a page 
through other means, for example by the guest zapping a page directory 
entry.  Maybe it's best to resync when relinking a global page?



--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-21 Thread Aurelien Jarno
On Fri, Mar 20, 2009 at 08:14:05PM -0300, Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> > Hi,
> > 
> > Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> > under high load (during a compilation for example) with the following 
> > error message:
> > 
> > | Fatal trap 12: page fault while in kernel mode
> > | fault virtual address   = 0x4
> > | fault code  = supervisor read, page not present
> > | instruction pointer = 0x20:0xc0a4fc00
> > | stack pointer   = 0x28:0xe66d7a70
> > | frame pointer   = 0x28:0xe66d7a80
> > | code segment= base 0x0, limit 0xf, type 0x1b
> > | = DPL 0, pres 1, def32 1, gran 1
> > | processor eflags= interrupt enabled, resume, IOPL = 0
> > | current process = 24037 (bash)
> > | trap number = 12
> > | panic: page fault
> > | Uptime: 4m7s
> > | Cannot dump. No dump device defined.
> > | Automatic reboot in 15 seconds - press a key on the console to abort
> 
> Aurelien,
> 
> Can you try this patch please.

I have just tried it, and it works. Thanks a lot for your work!

> ---
> 
> KVM: MMU: do not allow unsync global pages to become unreachable
> 
> Section 5.1 of the Intel TLB doc:
> 
> "• INVLPG. This instruction takes a single operand, which is a linear
> address. The instruction invalidates any TLB entries for the page number
> of that linear address including those for global pages (see Section
> 7). INVLPG also invalidates all entries in all paging-structure caches
> regardless of the linear addresses to which they correspond."
> 
> Section 7:
> 
> "The following items detail the invalidation of global TLB entries:
> 
> • An execution of INVLPG invalidates any TLB entries for the page
> number of the linear address that is its operand, even if the entries
> are global."
> 
> If an unsync global page becomes unreachable via the shadow tree, which
> can happen if one its parent pages is zapped, invlpg will fail to
> invalidate translations for gvas contained in such unreachable pages.
> 
> So invalidate all unsync global entries when zapping a page. Another
> possibility would be to cover global pages in the unsync bitmaps, but
> that would increase overhead for mmu_sync_roots.
> 
> Index: kvm/arch/x86/kvm/mmu.c
> ===
> --- kvm.orig/arch/x86/kvm/mmu.c
> +++ kvm/arch/x86/kvm/mmu.c
> @@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
>   }
>  }
>  
> +static void mmu_invalidate_unsync_global(struct kvm *kvm)
> +{
> + struct kvm_mmu_page *sp, *n;
> +
> + list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
> + kvm_mmu_page_unlink_children(kvm, sp);
> + kvm_unlink_unsync_page(kvm, sp);
> + }
> +}
> +
>  static int mmu_zap_unsync_children(struct kvm *kvm,
>  struct kvm_mmu_page *parent)
>  {
> @@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
>   kvm_mmu_pages_init(parent, &parents, &pages);
>   }
>  
> + mmu_invalidate_unsync_global(kvm);
> +
>   return zapped;
>  }
>  
> 
> 

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-03-20 Thread Marcelo Tosatti
On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> Hi,
> 
> Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> under high load (during a compilation for example) with the following 
> error message:
> 
> | Fatal trap 12: page fault while in kernel mode
> | fault virtual address   = 0x4
> | fault code  = supervisor read, page not present
> | instruction pointer = 0x20:0xc0a4fc00
> | stack pointer   = 0x28:0xe66d7a70
> | frame pointer   = 0x28:0xe66d7a80
> | code segment= base 0x0, limit 0xf, type 0x1b
> | = DPL 0, pres 1, def32 1, gran 1
> | processor eflags= interrupt enabled, resume, IOPL = 0
> | current process = 24037 (bash)
> | trap number = 12
> | panic: page fault
> | Uptime: 4m7s
> | Cannot dump. No dump device defined.
> | Automatic reboot in 15 seconds - press a key on the console to abort

Aurelien,

Can you try this patch please.

---

KVM: MMU: do not allow unsync global pages to become unreachable

Section 5.1 of the Intel TLB doc:

"• INVLPG. This instruction takes a single operand, which is a linear
address. The instruction invalidates any TLB entries for the page number
of that linear address including those for global pages (see Section
7). INVLPG also invalidates all entries in all paging-structure caches
regardless of the linear addresses to which they correspond."

Section 7:

"The following items detail the invalidation of global TLB entries:

• An execution of INVLPG invalidates any TLB entries for the page
number of the linear address that is its operand, even if the entries
are global."

If an unsync global page becomes unreachable via the shadow tree, which
can happen if one its parent pages is zapped, invlpg will fail to
invalidate translations for gvas contained in such unreachable pages.

So invalidate all unsync global entries when zapping a page. Another
possibility would be to cover global pages in the unsync bitmaps, but
that would increase overhead for mmu_sync_roots.

Index: kvm/arch/x86/kvm/mmu.c
===
--- kvm.orig/arch/x86/kvm/mmu.c
+++ kvm/arch/x86/kvm/mmu.c
@@ -1362,6 +1362,16 @@ static void kvm_mmu_unlink_parents(struc
}
 }
 
+static void mmu_invalidate_unsync_global(struct kvm *kvm)
+{
+   struct kvm_mmu_page *sp, *n;
+
+   list_for_each_entry_safe(sp, n, &kvm->arch.oos_global_pages, oos_link) {
+   kvm_mmu_page_unlink_children(kvm, sp);
+   kvm_unlink_unsync_page(kvm, sp);
+   }
+}
+
 static int mmu_zap_unsync_children(struct kvm *kvm,
   struct kvm_mmu_page *parent)
 {
@@ -1384,6 +1394,8 @@ static int mmu_zap_unsync_children(struc
kvm_mmu_pages_init(parent, &parents, &pages);
}
 
+   mmu_invalidate_unsync_global(kvm);
+
return zapped;
 }
 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-23 Thread Avi Kivity

Marcelo Tosatti wrote:

On Mon, Feb 23, 2009 at 04:59:37PM +0200, Avi Kivity wrote:
  

Marcelo Tosatti wrote:


Thanks for your fast answer and for your help for debugging.



If you confirm that FreeBSD is indeed relying on cr3 to sync global
pages, it might be better to disable the optimization. Lets hope that is
not the case.
  
  
cr3 writes explicitly do not flush global pages; otherwise what would be  
the point of global pages at all?



From the Intel TLB doc:

The processor is always free to invalidate additional entries in the TLBs
and paging-structure caches. The following are some examples:

• MOV to CR3 may invalidate TLB entries for global pages.

The reasoning was if an optimization breaks an important guest which
contains a bug that happens to not trigger on real HW due to positioning
of the stars, it is reasonable to disable that optimization.
  


This means the OS may not rely on the TLB retaining its contents.  For 
example, you can't do


 1. set pte to global+present
 2. access through pte to load tlb entry
 3. clear pte
 4. switch cr3
 5. access through same pte again, relying on tlb entry to service the 
access


So the processor may choose to ignore the global bit on some or all tlb 
entries, but software cannot assume that it does.  Typically it will 
honor the global bit since otherwise it's useless.


I don't think this is what is happening with FreeBSD.  It may be that 
spte population on invlpg is confusing the guest (though that is allowed 
as a speculative read?).  For example, the sequence:


 1. invlpg
 2. set pte to A (present+accessed)
 3. set pte to B (present+accessed)

kvm behaves as if a speculative read always happens between 2 and 3, 
which would be very rare on real hardware.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-23 Thread Marcelo Tosatti
On Mon, Feb 23, 2009 at 04:59:37PM +0200, Avi Kivity wrote:
> Marcelo Tosatti wrote:
>>> Thanks for your fast answer and for your help for debugging.
>>> 
>>
>> If you confirm that FreeBSD is indeed relying on cr3 to sync global
>> pages, it might be better to disable the optimization. Lets hope that is
>> not the case.
>>   
>
> cr3 writes explicitly do not flush global pages; otherwise what would be  
> the point of global pages at all?

>From the Intel TLB doc:

The processor is always free to invalidate additional entries in the TLBs
and paging-structure caches. The following are some examples:

• MOV to CR3 may invalidate TLB entries for global pages.

The reasoning was if an optimization breaks an important guest which
contains a bug that happens to not trigger on real HW due to positioning
of the stars, it is reasonable to disable that optimization.

> In other words, the only difference between global and non-global pages  
> is visible via cr3 writes.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-23 Thread Avi Kivity

Marcelo Tosatti wrote:

Thanks for your fast answer and for your help for debugging.



If you confirm that FreeBSD is indeed relying on cr3 to sync global
pages, it might be better to disable the optimization. Lets hope that is
not the case.
  


cr3 writes explicitly do not flush global pages; otherwise what would be 
the point of global pages at all?


In other words, the only difference between global and non-global pages 
is visible via cr3 writes.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-23 Thread Marcelo Tosatti

On Mon, Feb 23, 2009 at 03:01:15PM +0100, Aurelien Jarno wrote:
> > Maybe there is a bug in the syncing code (eg: not all global pages are
> > sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
> > invlpg/cr3 write to sync global pages (remember TLB entries can be
> > invalidated internally by CPU).
> 
> As far as I understand the Intel IA32 manual, using invlpg to sync
> global pages is correct. OTOH, cr3 is not. I think that if FreeBSD is
> using cr3 to sync global pages that should also cause a problem on real
> hardware sooner or later, right?

>From my understanding of the documentation, yes. Note:

5.1 Invalidation Instructions

The processor is always free to invalidate additional entries in the
TLBs and paging-structure caches. The following are some examples:

• INVLPG may invalidate TLB entries for pages other than the one
corresponding to its linear-address operand.

• MOV to CR3 may invalidate TLB entries for global pages.
 ^^^

• On a processor supporting Hyper-Threading Technology, invalidations
performed on one logical processor may invalidate entries in the TLBs
and paging-structure caches used by other logical processors.

> I'll try to look at the kernel code.
> 
> > If you want to debug it, would suggest looping over all MMU pages in
> > mmu_sync_global, after the kvm_sync_page loop, and
> > 
> >   WARN_ON(sp->unsync && sp->global);
> > 
> > If that fails, check if the unsync and global flags mean what they are
> > supposed to.
> 
> This doesn't detect any problem, which means that all pages marked as
> global are synced correctly.
> 
> I have also tried to call kvm_mmu_sync_global() in kvm_set_cr3(), and
> as expected the guest works correctly in that case.
> 
> If I understand correctly, and if the problem is not on the FreeBSD
> side, the only remaining possible problem is if normal pages are wrongly
> marked as global, and thus should be synced with cr3, while they are 
> not. Am I right here?

Yes, this is the most likely problematic scenario.

> > Sorry for the trouble and thanks for the detailed report, will take a
> > close look at it this week.
> > 
> 
> Thanks for your fast answer and for your help for debugging.

If you confirm that FreeBSD is indeed relying on cr3 to sync global
pages, it might be better to disable the optimization. Lets hope that is
not the case.

Thanks

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-23 Thread Aurelien Jarno
On Sun, Feb 22, 2009 at 10:47:13PM -0300, Marcelo Tosatti wrote:
> On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> > Hi,
> > 
> > Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> > under high load (during a compilation for example) with the following 
> > error message:
> > 
> > | Fatal trap 12: page fault while in kernel mode
> > | fault virtual address   = 0x4
> > | fault code  = supervisor read, page not present
> > | instruction pointer = 0x20:0xc0a4fc00
> > | stack pointer   = 0x28:0xe66d7a70
> > | frame pointer   = 0x28:0xe66d7a80
> > | code segment= base 0x0, limit 0xf, type 0x1b
> > | = DPL 0, pres 1, def32 1, gran 1
> > | processor eflags= interrupt enabled, resume, IOPL = 0
> > | current process = 24037 (bash)
> > | trap number = 12
> > | panic: page fault
> > | Uptime: 4m7s
> > | Cannot dump. No dump device defined.
> > | Automatic reboot in 15 seconds - press a key on the console to abort
> >  
> > I haven't tried yet with a plain FreeBSD guest, but I also expect it to
> > crash given the kernel (version 7.1) is almost the same. A closer 
> > investigation has shown that the following commit is causing the 
> > problem:
> > 
> > | commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
> > | Author: Marcelo Tosatti 
> > | Date:   Mon Dec 1 22:32:04 2008 -0200
> > | 
> > | KVM: MMU: skip global pgtables on sync due to cr3 switch
> > | 
> > | Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
> > | important for Linux 32-bit guests with PAE, where the kmap page is
> > | marked as global.
> > | 
> > | Signed-off-by: Marcelo Tosatti 
> > | Signed-off-by: Avi Kivity 
> > 
> > As expected, loading the KVM module with oos_shadow=0 workaround the
> > problem. Please note that the guest is running in 32-bit mode, does not
> > use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
> > problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.
> > 
> > Does anybody see any problem in this patch? How can I further
> > debug the problem?
> 
> Aurelien,
> 
> Maybe there is a bug in the syncing code (eg: not all global pages are
> sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
> invlpg/cr3 write to sync global pages (remember TLB entries can be
> invalidated internally by CPU).

As far as I understand the Intel IA32 manual, using invlpg to sync
global pages is correct. OTOH, cr3 is not. I think that if FreeBSD is
using cr3 to sync global pages that should also cause a problem on real
hardware sooner or later, right?

I'll try to look at the kernel code.

> If you want to debug it, would suggest looping over all MMU pages in
> mmu_sync_global, after the kvm_sync_page loop, and
> 
>   WARN_ON(sp->unsync && sp->global);
> 
> If that fails, check if the unsync and global flags mean what they are
> supposed to.

This doesn't detect any problem, which means that all pages marked as
global are synced correctly.

I have also tried to call kvm_mmu_sync_global() in kvm_set_cr3(), and
as expected the guest works correctly in that case.

If I understand correctly, and if the problem is not on the FreeBSD
side, the only remaining possible problem is if normal pages are wrongly
marked as global, and thus should be synced with cr3, while they are 
not. Am I right here?

> Sorry for the trouble and thanks for the detailed report, will take a
> close look at it this week.
> 

Thanks for your fast answer and for your help for debugging.

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-22 Thread Marcelo Tosatti
On Mon, Feb 23, 2009 at 01:33:05AM +0100, Aurelien Jarno wrote:
> Hi,
> 
> Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
> under high load (during a compilation for example) with the following 
> error message:
> 
> | Fatal trap 12: page fault while in kernel mode
> | fault virtual address   = 0x4
> | fault code  = supervisor read, page not present
> | instruction pointer = 0x20:0xc0a4fc00
> | stack pointer   = 0x28:0xe66d7a70
> | frame pointer   = 0x28:0xe66d7a80
> | code segment= base 0x0, limit 0xf, type 0x1b
> | = DPL 0, pres 1, def32 1, gran 1
> | processor eflags= interrupt enabled, resume, IOPL = 0
> | current process = 24037 (bash)
> | trap number = 12
> | panic: page fault
> | Uptime: 4m7s
> | Cannot dump. No dump device defined.
> | Automatic reboot in 15 seconds - press a key on the console to abort
>  
> I haven't tried yet with a plain FreeBSD guest, but I also expect it to
> crash given the kernel (version 7.1) is almost the same. A closer 
> investigation has shown that the following commit is causing the 
> problem:
> 
> | commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
> | Author: Marcelo Tosatti 
> | Date:   Mon Dec 1 22:32:04 2008 -0200
> | 
> | KVM: MMU: skip global pgtables on sync due to cr3 switch
> | 
> | Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
> | important for Linux 32-bit guests with PAE, where the kmap page is
> | marked as global.
> | 
> | Signed-off-by: Marcelo Tosatti 
> | Signed-off-by: Avi Kivity 
> 
> As expected, loading the KVM module with oos_shadow=0 workaround the
> problem. Please note that the guest is running in 32-bit mode, does not
> use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
> problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.
> 
> Does anybody see any problem in this patch? How can I further
> debug the problem?

Aurelien,

Maybe there is a bug in the syncing code (eg: not all global pages are
sync'ed when the OS requests a global sync), or FreeBSD is "relying" on
invlpg/cr3 write to sync global pages (remember TLB entries can be
invalidated internally by CPU).

If you want to debug it, would suggest looping over all MMU pages in
mmu_sync_global, after the kvm_sync_page loop, and

  WARN_ON(sp->unsync && sp->global);

If that fails, check if the unsync and global flags mean what they are
supposed to.

Sorry for the trouble and thanks for the detailed report, will take a
close look at it this week.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cr3 OOS optimisation breaks 32-bit GNU/kFreeBSD guest

2009-02-22 Thread Aurelien Jarno
Hi,

Since kvm-81, I have noticed that GNU/kFreeBSD 32-bit guest are crashing
under high load (during a compilation for example) with the following 
error message:

| Fatal trap 12: page fault while in kernel mode
| fault virtual address   = 0x4
| fault code  = supervisor read, page not present
| instruction pointer = 0x20:0xc0a4fc00
| stack pointer   = 0x28:0xe66d7a70
| frame pointer   = 0x28:0xe66d7a80
| code segment= base 0x0, limit 0xf, type 0x1b
| = DPL 0, pres 1, def32 1, gran 1
| processor eflags= interrupt enabled, resume, IOPL = 0
| current process = 24037 (bash)
| trap number = 12
| panic: page fault
| Uptime: 4m7s
| Cannot dump. No dump device defined.
| Automatic reboot in 15 seconds - press a key on the console to abort
 
I haven't tried yet with a plain FreeBSD guest, but I also expect it to
crash given the kernel (version 7.1) is almost the same. A closer 
investigation has shown that the following commit is causing the 
problem:

| commit 6364a3918cb5c28376849e7fca3e09bd66b859f3
| Author: Marcelo Tosatti 
| Date:   Mon Dec 1 22:32:04 2008 -0200
| 
| KVM: MMU: skip global pgtables on sync due to cr3 switch
| 
| Skip syncing global pages on cr3 switch (but not on cr4/cr0). This is
| important for Linux 32-bit guests with PAE, where the kmap page is
| marked as global.
| 
| Signed-off-by: Marcelo Tosatti 
| Signed-off-by: Avi Kivity 

As expected, loading the KVM module with oos_shadow=0 workaround the
problem. Please note that the guest is running in 32-bit mode, does not
use PAE, and uses global pages. My host has an Intel Q9450 CPU, and the
problem appears with both a 2.6.26 and a 2.6.28 64-bit kernel.

Does anybody see any problem in this patch? How can I further
debug the problem?

Aurelien

-- 
Aurelien Jarno  GPG: 1024D/F1BCDB73
aurel...@aurel32.net http://www.aurel32.net
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html