RE: [PATCH 0/2] kvm/e500v2: MMU optimization

2010-09-09 Thread Liu Yu-B13201
 

 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org 
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
 Sent: Thursday, September 09, 2010 12:07 AM
 To: Liu Yu-B13201
 Cc: k...@vger.kernel.org; kvm-ppc@vger.kernel.org; ag...@suse.de
 Subject: Re: [PATCH 0/2] kvm/e500v2: MMU optimization
 
 On 09/08/2010 02:40 AM, Liu Yu wrote:
  The patchset aims at mapping guest TLB1 to host TLB0.
  And it includes:
  [PATCH 1/2] kvm/e500v2: Remove shadow tlb
  [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0
 
  The reason we need patch 1 is because patch 1 make things 
 simple and flexible.
  Only applying patch 1 aslo make kvm work.
 
 I've always thought the best long-term optimization on 
 these cores is 
 to share in the host PID allocation (i.e. __init_new_context()). This 
 way, the TID in guest mappings would not overlap the TID in host 
 mappings, and guest mappings could be demand-faulted rather 
 than swapped 
 wholesale. To do that, you would need to track the host PID 
 in KVM data 
 structures, I guess in the tlbe_ref structure.
 

Hi Hollis,

Guest uses AS=1 and host uses AS=0,
so even guest uses the same TID with host, they're in different space.

Then why guest needs to care about host TID?


Thanks,
Yu

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-09 Thread Liu Yu-B13201
 

 -Original Message-
 From: Hollis Blanchard [mailto:hollis_blanch...@mentor.com] 
 Sent: Thursday, September 09, 2010 12:07 AM
 To: Liu Yu-B13201
 Cc: k...@vger.kernel.org; kvm-ppc@vger.kernel.org; ag...@suse.de
 Subject: Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb
 
 On 09/08/2010 02:40 AM, Liu Yu wrote:
  It is unnecessary to keep shadow tlb.
  first, shadow tlb keep fixed value in shadow, which make 
 things unflexible.
  second, remove shadow tlb can save a lot memory.
 
  This patch remove shadow tlb and caculate the shadow tlb entry value
  before we write it to hardware.
 
  Also we use new struct tlbe_ref to trace the relation
  between guest tlb entry and page.
 
 Did you look at the performance impact?
 
 Back in the day, we did essentially the same thing on 440. However, 
 rather than discard the whole TLB when context switching away 
 from the 
 host (to be demand-faulted when the guest is resumed), we found a 
 noticeable performance improvement by preserving a shadow TLB across 
 context switches. We only use it in the vcpu_put/vcpu_load path.
 
 Of course, our TLB was much smaller (64 entries), so the use 
 model may 
 not be the same at all (e.g. it takes longer to restore a 
 full guest TLB 
 working set, but maybe it's not really possible to use all 1024 TLB0 
 entries in one timeslice anyways).
 

Yes, it's hard to resume TLB0. We only resume TLB1 in previous code.
But TLB1 is even more smaller (13 free entries) than 440,
So that it still has little possibility to get hit.
thus the resumption is useless.

And we plan to use shadow PID to minimize the TLB invalidation.
Then we don't need to invalidate TLB when guest schedule out.
So that we don't need to resume TLB entries.

But thit method require dynamic TID in shadow TLB,
So we wouldn't like to keep shadow TLB anyway.

Thanks,
Yu

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm/e500v2: MMU optimization

2010-09-09 Thread Hollis Blanchard

On 09/09/2010 04:03 AM, Liu Yu-B13201 wrote:

-Original Message-
From: kvm-ppc-ow...@vger.kernel.org
[mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Hollis Blanchard
Sent: Thursday, September 09, 2010 12:07 AM
To: Liu Yu-B13201
Cc: k...@vger.kernel.org; kvm-ppc@vger.kernel.org; ag...@suse.de
Subject: Re: [PATCH 0/2] kvm/e500v2: MMU optimization

On 09/08/2010 02:40 AM, Liu Yu wrote:
 

The patchset aims at mapping guest TLB1 to host TLB0.
And it includes:
[PATCH 1/2] kvm/e500v2: Remove shadow tlb
[PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0

The reason we need patch 1 is because patch 1 make things
   

simple and flexible.
 

Only applying patch 1 aslo make kvm work.
   

I've always thought the best long-term optimization on
these cores is
to share in the host PID allocation (i.e. __init_new_context()). This
way, the TID in guest mappings would not overlap the TID in host
mappings, and guest mappings could be demand-faulted rather
than swapped
wholesale. To do that, you would need to track the host PID
in KVM data
structures, I guess in the tlbe_ref structure.

 

Hi Hollis,

Guest uses AS=1 and host uses AS=0,
so even guest uses the same TID with host, they're in different space.

Then why guest needs to care about host TID?

   

You're absolutely right, but this makes a couple key assumptions:
1. The guest doesn't try to use AS=1. This is already false in Linux, 
because the udbg code uses an AS=1 mapping for the UART, but this can be 
configured out (with a small loss in functionality). In non-Linux guests 
the AS=0 restriction could be onerous.
2. A Book E MMU. If we participate in the host MMU context allocation, 
the guest - host address space code could be generalized to include 
e.g. an e600 guest on an e500 host, or vice versa.


So you're right that optimization is not the right word; this would be 
more of a functionality and design improvement. (In fact, I suppose it 
could reduce performance on Book E, since AS=1 space actually *is* 
unused by the host. I think it would be worth finding out.)


Hollis Blanchard
Mentor Graphics, Embedded Systems Division



--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-09 Thread Hollis Blanchard

On 09/09/2010 04:16 AM, Liu Yu-B13201 wrote:

Yes, it's hard to resume TLB0. We only resume TLB1 in previous code.
But TLB1 is even more smaller (13 free entries) than 440,
So that it still has little possibility to get hit.
thus the resumption is useless.
   
The only reason hits are unlikely in TLB1 is because you still don't 
have large page support in the host. Once you have that, you can use 
TLB1 for large guest mappings, and it will become extremely likely that 
you get hits in TLB1. This is true even if the guest wants 256MB but the 
host supports only e.g. 16MB large pages, and must split the guest 
mapping into multiple large host pages.


When will you have hugetlbfs for e500? That's going to make such a 
dramatic difference, I'm not sure it's worth investing time in 
optimizing the MMU code until then.


Hollis Blanchard
Mentor Graphics, Embedded Systems Division


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm/e500v2: mapping guest TLB1 to host TLB0

2010-09-09 Thread Alexander Graf

On 08.09.2010, at 11:40, Liu Yu wrote:

 Current guest TLB1 is mapped to host TLB1.
 As host kernel only provides 4K uncontinuous pages,
 we have to break guest large mapping into 4K shadow mappings.
 These 4K shadow mappings are then mapped into host TLB1 on fly.
 As host TLB1 only has 13 free entries, there's serious tlb miss.
 
 Since e500v2 has a big number of TLB0 entries,
 it should be help to map those 4K shadow mappings to host TLB0.
 To achieve this, we need to unlink guest tlb and host tlb,
 So that guest TLB1 mappings can route to any host TLB0 entries freely.
 
 Pages/mappings are considerred in the same kind as host tlb entry.
 This patch remove the link between pages and guest tlb entry to do the unlink.
 And keep host_tlb0_ref in each vcpu to trace pages.
 Then it's easy to map guest TLB1 to host TLB0.
 
 In guest ramdisk boot test(guest mainly uses TLB1),
 with this patch, the tlb miss number get down 90%.
 
 Signed-off-by: Liu Yu yu@freescale.com
 ---
 arch/powerpc/include/asm/kvm_e500.h |7 +-
 arch/powerpc/kvm/e500.c |4 +
 arch/powerpc/kvm/e500_tlb.c |  280 ---
 arch/powerpc/kvm/e500_tlb.h |1 +
 4 files changed, 104 insertions(+), 188 deletions(-)
 
 diff --git a/arch/powerpc/include/asm/kvm_e500.h 
 b/arch/powerpc/include/asm/kvm_e500.h
 index cb785f9..16c0ed0 100644
 --- a/arch/powerpc/include/asm/kvm_e500.h
 +++ b/arch/powerpc/include/asm/kvm_e500.h
 @@ -37,13 +37,10 @@ struct tlbe_ref {
 struct kvmppc_vcpu_e500 {
   /* Unmodified copy of the guest's TLB. */
   struct tlbe *guest_tlb[E500_TLB_NUM];
 - /* TLB that's actually used when the guest is running. */
 - struct tlbe *shadow_tlb[E500_TLB_NUM];
 - /* Pages which are referenced in the shadow TLB. */
 - struct tlbe_ref *shadow_refs[E500_TLB_NUM];
 + /* Pages which are referenced in host TLB. */
 + struct tlbe_ref *host_tlb0_ref;
 
   unsigned int guest_tlb_size[E500_TLB_NUM];
 - unsigned int shadow_tlb_size[E500_TLB_NUM];
   unsigned int guest_tlb_nv[E500_TLB_NUM];
 
   u32 host_pid[E500_PID_NUM];
 diff --git a/arch/powerpc/kvm/e500.c b/arch/powerpc/kvm/e500.c
 index e8a00b0..14af6d7 100644
 --- a/arch/powerpc/kvm/e500.c
 +++ b/arch/powerpc/kvm/e500.c
 @@ -146,6 +146,10 @@ static int __init kvmppc_e500_init(void)
   if (r)
   return r;
 
 + r = kvmppc_e500_mmu_init();
 + if (r)
 + return r;
 +
   /* copy extra E500 exception handlers */
   ivor[0] = mfspr(SPRN_IVOR32);
   ivor[1] = mfspr(SPRN_IVOR33);
 diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
 index 0b657af..a6c2320 100644
 --- a/arch/powerpc/kvm/e500_tlb.c
 +++ b/arch/powerpc/kvm/e500_tlb.c
 @@ -25,9 +25,15 @@
 #include e500_tlb.h
 #include trace.h
 
 -#define to_htlb1_esel(esel) (tlb1_entry_num - (esel) - 1)
 +static unsigned int host_tlb0_entry_num;
 +static unsigned int host_tlb0_assoc;
 +static unsigned int host_tlb0_assoc_bit;

bits.

 
 -static unsigned int tlb1_entry_num;
 +static inline unsigned int get_tlb0_entry_offset(u32 eaddr, u32 esel)
 +{
 + return ((eaddr  0x7F000)  (12 - host_tlb0_assoc_bit) |
 + (esel  (host_tlb0_assoc - 1)));
 +}
 
 void kvmppc_dump_tlbs(struct kvm_vcpu *vcpu)
 {
 @@ -62,11 +68,6 @@ static inline unsigned int tlb0_get_next_victim(
   return victim;
 }
 
 -static inline unsigned int tlb1_max_shadow_size(void)
 -{
 - return tlb1_entry_num - tlbcam_index;
 -}
 -
 static inline int tlbe_is_writable(struct tlbe *tlbe)
 {
   return tlbe-mas3  (MAS3_SW|MAS3_UW);
 @@ -100,7 +101,7 @@ static inline u32 e500_shadow_mas2_attrib(u32 mas2, int 
 usermode)
 /*
  * writing shadow tlb entry to host TLB
  */
 -static inline void __write_host_tlbe(struct tlbe *stlbe)
 +static inline void __host_tlbe_write(struct tlbe *stlbe)
 {
   mtspr(SPRN_MAS1, stlbe-mas1);
   mtspr(SPRN_MAS2, stlbe-mas2);
 @@ -109,25 +110,22 @@ static inline void __write_host_tlbe(struct tlbe *stlbe)
   __asm__ __volatile__ (tlbwe\n : : );
 }
 
 -static inline void write_host_tlbe(struct kvmppc_vcpu_e500 *vcpu_e500,
 - int tlbsel, int esel, struct tlbe *stlbe)
 +static inline u32 host_tlb0_write(struct kvmppc_vcpu_e500 *vcpu_e500,
 +   u32 gvaddr, struct tlbe *stlbe)
 {
 - local_irq_disable();
 - if (tlbsel == 0) {
 - __write_host_tlbe(stlbe);
 - } else {
 - unsigned register mas0;
 + unsigned register mas0;
 
 - mas0 = mfspr(SPRN_MAS0);
 + local_irq_disable();

Do you have to disable interrupts for a tlb write? If you get preempted before 
the write, the entry you overwrite could be different. But you don't protect 
against that either way. And if you get preempted afterwards, you could lose 
the entry. But since you enable interrupts beyond that, that could happen 
either way too.

So what's the reason for the disable here?

 
 - 

Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-09 Thread Alexander Graf

On 09.09.2010, at 20:13, Hollis Blanchard wrote:

 On 09/09/2010 04:16 AM, Liu Yu-B13201 wrote:
 Yes, it's hard to resume TLB0. We only resume TLB1 in previous code.
 But TLB1 is even more smaller (13 free entries) than 440,
 So that it still has little possibility to get hit.
 thus the resumption is useless.
   
 The only reason hits are unlikely in TLB1 is because you still don't have 
 large page support in the host. Once you have that, you can use TLB1 for 
 large guest mappings, and it will become extremely likely that you get hits 
 in TLB1. This is true even if the guest wants 256MB but the host supports 
 only e.g. 16MB large pages, and must split the guest mapping into multiple 
 large host pages.
 
 When will you have hugetlbfs for e500? That's going to make such a dramatic 
 difference, I'm not sure it's worth investing time in optimizing the MMU code 
 until then.

I'm not sure I agree. Sure, huge pages give another big win, but the state as 
is should at least be fast enough for prototyping.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-09 Thread Hollis Blanchard

On 09/09/2010 04:26 PM, Alexander Graf wrote:

On 09.09.2010, at 20:13, Hollis Blanchard wrote:
   

On 09/09/2010 04:16 AM, Liu Yu-B13201 wrote:
 

Yes, it's hard to resume TLB0. We only resume TLB1 in previous code.
But TLB1 is even more smaller (13 free entries) than 440,
So that it still has little possibility to get hit.
thus the resumption is useless.

   

The only reason hits are unlikely in TLB1 is because you still don't have large 
page support in the host. Once you have that, you can use TLB1 for large guest 
mappings, and it will become extremely likely that you get hits in TLB1. This 
is true even if the guest wants 256MB but the host supports only e.g. 16MB 
large pages, and must split the guest mapping into multiple large host pages.

When will you have hugetlbfs for e500? That's going to make such a dramatic 
difference, I'm not sure it's worth investing time in optimizing the MMU code 
until then.
 

I'm not sure I agree. Sure, huge pages give another big win, but the state as 
is should at least be fast enough for prototyping.
   
Sure, and it sounds like you can prototype with it already. My point is 
that, in your 80-20 rule of optimization, the 20% is going to change 
radically once large page support is in place.


Remember that the guest kernel is mapped with just a couple large pages. 
During guest Linux boot on 440, I think about half the boot time is 
spent TLB thrashing in the initcalls. Using TLB0 can ameliorate that for 
now, but why bother, since it doesn't help you towards the real solution?


I'm not saying this shouldn't be committed, if that's how you 
interpreted my comments, but in my opinion there are more useful things 
to do than continuing to optimize a path that is going to disappear in 
the future. Once you *do* have hugetlbfs in the host, you're not going 
to want to use TLB0 for guest TLB1 mappings any more anyways.


Hollis Blanchard
Mentor Graphics, Embedded Systems Division


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] kvm/e500v2: Remove shadow tlb

2010-09-09 Thread Alexander Graf

On 10.09.2010, at 01:39, Hollis Blanchard wrote:

 On 09/09/2010 04:26 PM, Alexander Graf wrote:
 On 09.09.2010, at 20:13, Hollis Blanchard wrote:
   
 On 09/09/2010 04:16 AM, Liu Yu-B13201 wrote:
 
 Yes, it's hard to resume TLB0. We only resume TLB1 in previous code.
 But TLB1 is even more smaller (13 free entries) than 440,
 So that it still has little possibility to get hit.
 thus the resumption is useless.
 
   
 The only reason hits are unlikely in TLB1 is because you still don't have 
 large page support in the host. Once you have that, you can use TLB1 for 
 large guest mappings, and it will become extremely likely that you get hits 
 in TLB1. This is true even if the guest wants 256MB but the host supports 
 only e.g. 16MB large pages, and must split the guest mapping into multiple 
 large host pages.
 
 When will you have hugetlbfs for e500? That's going to make such a dramatic 
 difference, I'm not sure it's worth investing time in optimizing the MMU 
 code until then.
 
 I'm not sure I agree. Sure, huge pages give another big win, but the state 
 as is should at least be fast enough for prototyping.
   
 Sure, and it sounds like you can prototype with it already. My point is that, 
 in your 80-20 rule of optimization, the 20% is going to change radically once 
 large page support is in place.
 
 Remember that the guest kernel is mapped with just a couple large pages. 
 During guest Linux boot on 440, I think about half the boot time is spent TLB 
 thrashing in the initcalls. Using TLB0 can ameliorate that for now, but why 
 bother, since it doesn't help you towards the real solution?
 
 I'm not saying this shouldn't be committed, if that's how you interpreted my 
 comments, but in my opinion there are more useful things to do than 
 continuing to optimize a path that is going to disappear in the future. Once 
 you *do* have hugetlbfs in the host, you're not going to want to use TLB0 for 
 guest TLB1 mappings any more anyways.

That depends on the use cases. As long as there are no transparent huge pages 
available, not using hugetlbfs gives you a lot of benefit:

  - ksm
  - swapping
  - lazy allocation

So while I agree that supporting huge pages is crucial to high performance kvm, 
I'm not convinced it's the only path to optimize for. Look at x86 - few people 
actually use hugetlbfs there.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 0/2] kvm/e500v2: MMU optimization

2010-09-09 Thread Liu Yu-B13201
 

 -Original Message-
 From: Hollis Blanchard [mailto:hollis_blanch...@mentor.com] 
 Sent: Friday, September 10, 2010 12:23 AM
 To: Liu Yu-B13201
 Cc: k...@vger.kernel.org; kvm-ppc@vger.kernel.org; ag...@suse.de
 Subject: Re: [PATCH 0/2] kvm/e500v2: MMU optimization
 
 
   
  Hi Hollis,
 
  Guest uses AS=1 and host uses AS=0,
  so even guest uses the same TID with host, they're in 
 different space.
 
  Then why guest needs to care about host TID?
 
 
 You're absolutely right, but this makes a couple key assumptions:
 1. The guest doesn't try to use AS=1. This is already false in Linux, 
 because the udbg code uses an AS=1 mapping for the UART, but 
 this can be 
 configured out (with a small loss in functionality). In 
 non-Linux guests 
 the AS=0 restriction could be onerous.

We could map (guest AS, guest TID) to (shadow TID),
So that we still don't need to bother host.

 2. A Book E MMU. If we participate in the host MMU context 
 allocation, 
 the guest - host address space code could be generalized to include 
 e.g. an e600 guest on an e500 host, or vice versa.
 

Hmm.. Not sure it's a real requirement.


Thanks,
Yu

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html