Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-06 Thread Nikunj A Dadhania
On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
  flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
  the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
  paravirtualization.
  
  Use the vcpu state information inside the kvm_flush_tlb_others to
  avoid sending ipi to pre-empted vcpus.
  
  * Do not send ipi's to offline vcpus and set flush_on_enter flag
  * For online vcpus: Wait for them to clear the flag
  
  The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
  
  Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
  Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
  
  --
  Pseudo Algo:
  
 Write()
 ==
  
 guest_exit()
 flush_on_enter[i]=0;
 running[i] = 0;
  
 guest_enter()
 running[i] = 1;
 smp_mb();
 if(flush_on_enter[i]) {
 tlb_flush()
 flush_on_enter[i]=0;
 }
  
  
 Read()
 ==
  
 GUESTKVM-HV
  
 f-flushcpumask = cpumask - me;
  
  again:
 for_each_cpu(i, f-flushmask) {
  
 if (!running[i]) {
 case 1:
  
 running[n]=1
  
 (cpuN does not see
 flush_on_enter set,
 guest later finds it
 running and sends ipi,
 we are fine here, need
 to clear the flag on
 guest_exit)
  
flush_on_enter[i] = 1;
 case2:
  
 running[n]=1
 (cpuN - will see flush
 on enter and an IPI as
 well - addressed in patch-4)
  
if (!running[i])
   cpu_clear(f-flushmask);  All is well, vm_enter
 will do the fixup
 }
 case 3:
 running[n] = 0;
  
 (cpuN went to sleep,
 we saw it as awake,
 ipi sent, but wait
 will break without
 zero_mask and goto
 again will take care)
  
 }
 send_ipi(f-flushmask)
  
 wait_a_while_for_zero_mask();
  
 if (!zero_mask)
 goto again;
 
 Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
 of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
 help.
 

Please find below the results (debug patch attached for enabling
registration of kvm_vcu_state)

I have taken results for 1 and 4 vcpus. Used the following command for
starting the tests:

/usr/libexec/qemu-kvm -smp $i -device testdev,chardev=testlog -chardev
file,id=testlog,path=vmexit.out -serial stdio -kernel ./x86/vmexit.flat

Machine : IBM xSeries with Intel(R) Xeon(R) X7560 2.27GHz CPU 
  with 32 core, 32 online cpus and 4*64GB RAM.

x  base - unpatched host kernel 
+  wo_vs - patched host kernel, vcpu_state not registered
*  w_vs.txt - patched host kernel and vcpu_state registered

1 vcpu results:
---
cpuid
=
   NAvg   Stddev
x 10 2135.1  17.8975
+ 10   2188  18.3666
* 10 2448.9  43.9910

vmcall
==
   NAvg   Stddev
x 10 2025.5  38.1641
+ 10 2047.5  24.8205
* 10 2306.2  40.3066

mov_from_cr8

   NAvg   Stddev
x 10 12   0.
+ 10 12   0.
* 10 12   0.

mov_to_cr8
==
   NAvg   Stddev
x 10   19.4   0.5164
+ 10   19.1   0.3162
* 10   19.2   0.4216

inl_from_pmtimer

   NAvg   Stddev
x 1018093.2 462.0543
+ 1016579.71448.8892
* 1018577.7 266.2676

ple-round-robin
===
   NAvg   Stddev
x 10   16.1   0.3162
+ 

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-04 Thread Marcelo Tosatti
On Tue, Jul 03, 2012 at 01:49:49PM +0530, Nikunj A Dadhania wrote:
 On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com 
 wrote:
   
  if (!zero_mask)
goto again;
  
  Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
  of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
  help.
 
 Sure will get back with the result.
 
   + /* 
   +  * Guest might have seen us offline and would have set
   +  * flush_on_enter. 
   +  */
   + kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32));
   + if (vs-flush_on_enter) 
   + kvm_x86_ops-tlb_flush(vcpu);
  
  
  So flush_tlb_page which was an invlpg now flushes the entire TLB. Did
  you take that into account?
  
 When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb
 could have happened. And now when we are here, it is cleaning up all the
 TLB.

Yes, cases where there are sufficient exits transforming one TLB entry
invalidation into full TLB invalidation should go unnoticed.

 One other approach would be to queue the addresses, that brings us with
 the question: how many request to queue? This would require us adding
 more syncronization between guest and host for updating the area where
 these addresses is shared.

Sounds unnecessarily complicated.

   +again:
   + for_each_cpu(cpu, to_cpumask(f-flush_cpumask)) {
   + v_state = per_cpu(vcpu_state, cpu);
   +
   + if (!v_state-state) {
  
  Should use ACCESS_ONCE to make sure the value is not register cached.
  \
   + v_state-flush_on_enter = 1;
   + smp_mb();
   + if (!v_state-state)
  
  And here.
  
 Sure will add this check for both in my next version.
 
   + cpumask_clear_cpu(cpu, 
   to_cpumask(f-flush_cpumask));
   + }
   + }
   +
   + if (cpumask_empty(to_cpumask(f-flush_cpumask)))
   + goto out;
   +
   + apic-send_IPI_mask(to_cpumask(f-flush_cpumask),
   + INVALIDATE_TLB_VECTOR_START + sender);
   +
   + loop = 1000;
   + while (!cpumask_empty(to_cpumask(f-flush_cpumask))  --loop)
   + cpu_relax();
   +
   + if (!cpumask_empty(to_cpumask(f-flush_cpumask)))
   + goto again;
  
  Is this necessary in addition to the in-guest-mode/out-guest-mode
  detection? If so, why?
  
 The case 3 where we initially saw the vcpu was running, and a flush
 ipi is send to the vcpu. During this time the vcpu might be pre-empted,
 so we come out of the loop=1000 with !empty flushmask. We then re-verify
 the flushmask against the current running vcpu and make sure that the
 vcpu that was pre-empted is un-marked and we can proceed out of the
 kvm_flush_tlb_others_ipi without waiting for sleeping/pre-empted vcpus.
 
 Regards
 Nikunj
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-04 Thread Nikunj A Dadhania
On Wed, 4 Jul 2012 23:09:10 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Tue, Jul 03, 2012 at 01:49:49PM +0530, Nikunj A Dadhania wrote:
  On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com 
  wrote:

   if (!zero_mask)
   goto again;
   
   Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
   of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
   help.
  
  Sure will get back with the result.
  
+   /* 
+* Guest might have seen us offline and would have set
+* flush_on_enter. 
+*/
+   kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32));
+   if (vs-flush_on_enter) 
+   kvm_x86_ops-tlb_flush(vcpu);
   
   
   So flush_tlb_page which was an invlpg now flushes the entire TLB. Did
   you take that into account?
   
  When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb
  could have happened. And now when we are here, it is cleaning up all the
  TLB.
 
 Yes, cases where there are sufficient exits transforming one TLB entry
 invalidation into full TLB invalidation should go unnoticed.
 
  One other approach would be to queue the addresses, that brings us with
  the question: how many request to queue? This would require us adding
  more syncronization between guest and host for updating the area where
  these addresses is shared.
 
 Sounds unnecessarily complicated.
 
Yes, I did give this a try earlier, but did not see much improvement
with the amount of complexity that it was bringing in.

Regards
Nikunj

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Marcelo Tosatti
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
 flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
 the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
 paravirtualization.
 
 Use the vcpu state information inside the kvm_flush_tlb_others to
 avoid sending ipi to pre-empted vcpus.
 
 * Do not send ipi's to offline vcpus and set flush_on_enter flag
 * For online vcpus: Wait for them to clear the flag
 
 The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
 
 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 --
 Pseudo Algo:
 
Write()
==
 
  guest_exit()
  flush_on_enter[i]=0;
  running[i] = 0;
 
  guest_enter()
  running[i] = 1;
  smp_mb();
  if(flush_on_enter[i]) {
  tlb_flush()
  flush_on_enter[i]=0;
  }
 
 
Read()
==
 
  GUESTKVM-HV
 
f-flushcpumask = cpumask - me;
 
 again:
for_each_cpu(i, f-flushmask) {
 
  if (!running[i]) {
  case 1:
 
  running[n]=1
 
  (cpuN does not see
  flush_on_enter set,
  guest later finds it
  running and sends ipi,
  we are fine here, need
  to clear the flag on
  guest_exit)
 
 flush_on_enter[i] = 1;
  case2:
 
  running[n]=1
  (cpuN - will see flush
  on enter and an IPI as
  well - addressed in patch-4)
 
 if (!running[i])
cpu_clear(f-flushmask);  All is well, vm_enter
  will do the fixup
  }
  case 3:
  running[n] = 0;
 
  (cpuN went to sleep,
  we saw it as awake,
  ipi sent, but wait
  will break without
  zero_mask and goto
  again will take care)
 
}
send_ipi(f-flushmask)
 
wait_a_while_for_zero_mask();
 
if (!zero_mask)
  goto again;

Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
help.

  arch/x86/include/asm/kvm_para.h |3 +-
  arch/x86/include/asm/tlbflush.h |9 ++
  arch/x86/kernel/kvm.c   |1 +
  arch/x86/kvm/x86.c  |   14 -
  arch/x86/mm/tlb.c   |   61 
 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index f57b5cc..684a285 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -55,7 +55,8 @@ struct kvm_steal_time {
  
  struct kvm_vcpu_state {
   __u32 state;
 - __u32 pad[15];
 + __u32 flush_on_enter;
 + __u32 pad[14];
  };
  
  #define KVM_VCPU_STATE_ALIGN_BITS 5
 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
 index c0e108e..29470bd 100644
 --- a/arch/x86/include/asm/tlbflush.h
 +++ b/arch/x86/include/asm/tlbflush.h
 @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct 
 cpumask *cpumask,
  {
  }
  
 +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask,
 + struct mm_struct *mm,
 + unsigned long va)
 +{
 +}
 +
  static inline void reset_lazy_tlbstate(void)
  {
  }
 @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct 
 *vma,
  void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va);
  
 +void kvm_flush_tlb_others(const struct cpumask *cpumask,
 +   struct mm_struct *mm, unsigned long va);
 +
  #define TLBSTATE_OK  1
  #define TLBSTATE_LAZY2
  
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index 

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Marcelo Tosatti
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
 flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
 the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
 paravirtualization.
 
 Use the vcpu state information inside the kvm_flush_tlb_others to
 avoid sending ipi to pre-empted vcpus.
 
 * Do not send ipi's to offline vcpus and set flush_on_enter flag
 * For online vcpus: Wait for them to clear the flag
 
 The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
 
 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 --
 Pseudo Algo:
 
Write()
==
 
  guest_exit()
  flush_on_enter[i]=0;
  running[i] = 0;
 
  guest_enter()
  running[i] = 1;
  smp_mb();
  if(flush_on_enter[i]) {
  tlb_flush()
  flush_on_enter[i]=0;
  }
 
 
Read()
==
 
  GUESTKVM-HV
 
f-flushcpumask = cpumask - me;
 
 again:
for_each_cpu(i, f-flushmask) {
 
  if (!running[i]) {
  case 1:
 
  running[n]=1
 
  (cpuN does not see
  flush_on_enter set,
  guest later finds it
  running and sends ipi,
  we are fine here, need
  to clear the flag on
  guest_exit)
 
 flush_on_enter[i] = 1;
  case2:
 
  running[n]=1
  (cpuN - will see flush
  on enter and an IPI as
  well - addressed in patch-4)
 
 if (!running[i])
cpu_clear(f-flushmask);  All is well, vm_enter
  will do the fixup
  }
  case 3:
  running[n] = 0;
 
  (cpuN went to sleep,
  we saw it as awake,
  ipi sent, but wait
  will break without
  zero_mask and goto
  again will take care)
 
}
send_ipi(f-flushmask)
 
wait_a_while_for_zero_mask();
 
if (!zero_mask)
  goto again;
 ---
  arch/x86/include/asm/kvm_para.h |3 +-
  arch/x86/include/asm/tlbflush.h |9 ++
  arch/x86/kernel/kvm.c   |1 +
  arch/x86/kvm/x86.c  |   14 -
  arch/x86/mm/tlb.c   |   61 
 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
 
 diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
 index f57b5cc..684a285 100644
 --- a/arch/x86/include/asm/kvm_para.h
 +++ b/arch/x86/include/asm/kvm_para.h
 @@ -55,7 +55,8 @@ struct kvm_steal_time {
  
  struct kvm_vcpu_state {
   __u32 state;
 - __u32 pad[15];
 + __u32 flush_on_enter;
 + __u32 pad[14];
  };
  
  #define KVM_VCPU_STATE_ALIGN_BITS 5
 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h
 index c0e108e..29470bd 100644
 --- a/arch/x86/include/asm/tlbflush.h
 +++ b/arch/x86/include/asm/tlbflush.h
 @@ -119,6 +119,12 @@ static inline void native_flush_tlb_others(const struct 
 cpumask *cpumask,
  {
  }
  
 +static inline void kvm_flush_tlb_others(const struct cpumask *cpumask,
 + struct mm_struct *mm,
 + unsigned long va)
 +{
 +}
 +
  static inline void reset_lazy_tlbstate(void)
  {
  }
 @@ -145,6 +151,9 @@ static inline void flush_tlb_range(struct vm_area_struct 
 *vma,
  void native_flush_tlb_others(const struct cpumask *cpumask,
struct mm_struct *mm, unsigned long va);
  
 +void kvm_flush_tlb_others(const struct cpumask *cpumask,
 +   struct mm_struct *mm, unsigned long va);
 +
  #define TLBSTATE_OK  1
  #define TLBSTATE_LAZY2
  
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index bb686a6..66db54e 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -465,6 +465,7 @@ void __init kvm_guest_init(void)
   }
  
  

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Nikunj A Dadhania
On Tue, 3 Jul 2012 04:55:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
  
 if (!zero_mask)
 goto again;
 
 Can you please measure increased vmentry/vmexit overhead? x86/vmexit.c 
 of git://git.kernel.org/pub/scm/virt/kvm/kvm-unit-tests.git should 
 help.

Sure will get back with the result.

  +   /* 
  +* Guest might have seen us offline and would have set
  +* flush_on_enter. 
  +*/
  +   kvm_read_guest_cached(vcpu-kvm, ghc, vs, 2*sizeof(__u32));
  +   if (vs-flush_on_enter) 
  +   kvm_x86_ops-tlb_flush(vcpu);
 
 
 So flush_tlb_page which was an invlpg now flushes the entire TLB. Did
 you take that into account?
 
When the vcpu is sleeping/pre-empted out, multiple request for flush_tlb
could have happened. And now when we are here, it is cleaning up all the
TLB.

One other approach would be to queue the addresses, that brings us with
the question: how many request to queue? This would require us adding
more syncronization between guest and host for updating the area where
these addresses is shared.

  +again:
  +   for_each_cpu(cpu, to_cpumask(f-flush_cpumask)) {
  +   v_state = per_cpu(vcpu_state, cpu);
  +
  +   if (!v_state-state) {
 
 Should use ACCESS_ONCE to make sure the value is not register cached.
 \
  +   v_state-flush_on_enter = 1;
  +   smp_mb();
  +   if (!v_state-state)
 
 And here.
 
Sure will add this check for both in my next version.

  +   cpumask_clear_cpu(cpu, 
  to_cpumask(f-flush_cpumask));
  +   }
  +   }
  +
  +   if (cpumask_empty(to_cpumask(f-flush_cpumask)))
  +   goto out;
  +
  +   apic-send_IPI_mask(to_cpumask(f-flush_cpumask),
  +   INVALIDATE_TLB_VECTOR_START + sender);
  +
  +   loop = 1000;
  +   while (!cpumask_empty(to_cpumask(f-flush_cpumask))  --loop)
  +   cpu_relax();
  +
  +   if (!cpumask_empty(to_cpumask(f-flush_cpumask)))
  +   goto again;
 
 Is this necessary in addition to the in-guest-mode/out-guest-mode
 detection? If so, why?
 
The case 3 where we initially saw the vcpu was running, and a flush
ipi is send to the vcpu. During this time the vcpu might be pre-empted,
so we come out of the loop=1000 with !empty flushmask. We then re-verify
the flushmask against the current running vcpu and make sure that the
vcpu that was pre-empted is un-marked and we can proceed out of the
kvm_flush_tlb_others_ipi without waiting for sleeping/pre-empted vcpus.

Regards
Nikunj

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-07-03 Thread Nikunj A Dadhania
On Tue, 3 Jul 2012 05:11:35 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:

   arch/x86/include/asm/kvm_para.h |3 +-
   arch/x86/include/asm/tlbflush.h |9 ++
   arch/x86/kernel/kvm.c   |1 +
   arch/x86/kvm/x86.c  |   14 -
   arch/x86/mm/tlb.c   |   61 
  +++
   5 files changed, 86 insertions(+), 2 deletions(-)
  

[...]

  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index 264f172..4714a7b 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
 
 Please split guest/host (arch/x86/kernel/kvm.c etc VS arch/x86/kvm/)
 patches.
 
Ok

 Please document guest/host interface
 (Documentation/virtual/kvm/paravirt-tlb-flush.txt, add a pointer to it
 from msr.txt).
 
Sure.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-06-21 Thread Marcelo Tosatti
On Tue, Jun 19, 2012 at 11:41:54AM +0530, Nikunj A Dadhania wrote:
 Hi Marcelo,
 
 Thanks for the review.
 
 On Tue, 12 Jun 2012 20:02:18 -0300, Marcelo Tosatti mtosa...@redhat.com 
 wrote:
  On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
   flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
   the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
   paravirtualization.
   
   Use the vcpu state information inside the kvm_flush_tlb_others to
   avoid sending ipi to pre-empted vcpus.
   
   * Do not send ipi's to offline vcpus and set flush_on_enter flag
   * For online vcpus: Wait for them to clear the flag
   
   The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
   
   Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
   Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
  
  Why not reintroduce the hypercall to flush TLBs? 
 Sure, I will get a version with this.
 
  No waiting, no entry/exit trickery.
  
 Are you also suggesting to get rid of vcpu-running information?

Yes.

 We would atleast need this to raise a flushTLB hypercall on the sleeping
 vcpu.

No, you always raise a flush TLB hypercall on flush_tlb_others, like Xen does.
Instead of an MMIO exit to APIC to IPI, its a hypercall exit.

  This is half-way-there paravirt with all the downsides. 
 Some more details on what are the downsides would help us reach to a
 better solution.

The downside i refer to is overhead to write, test and maintain separate code
for running as a guest. If you are adding awareness about hypervisor
anyway, a hypercall is the simplest way:

- It is simple to request a remote TLB flush via vcpu-requests in the
  hypervisor. Information about which vcpus are in/out guest mode 
  already available.
- Maintenance of in/out guest mode information in guest memory adds more
  overhead to entry/exit paths.
- No need to handle the interprocessor synchronization in the pseudo
  algo below (already handled by vcpu-requests mechanism).
- The guest TLB IPI flow is (target vcpu):
  - exit-due-to-external-interrupt.
  - inject-ipi.
  - enter guest mode.
  - execute IPI handler, flush TLB.
  - ACK IPI.
  - source vcpu continues.
- The hypervisor TLB flush flow is:
  - exit-due-to-external-interrupt.
  - execute IPI handler (in host, from make_all_cpus_request)
  - ACK IPI.
  - source vcpu continues.

Unless there is an advantage in using APIC IPI exit vs hypercall
exit?

  Even though the guest running information might be useful in other
  cases.
  
 Yes, that was one of the things on the mind.
 
   Pseudo Algo:
   
  Write()
  ==
   
guest_exit()
flush_on_enter[i]=0;
running[i] = 0;
   
guest_enter()
running[i] = 1;
smp_mb();
if(flush_on_enter[i]) {
tlb_flush()
flush_on_enter[i]=0;
}
   
   
  Read()
  ==
   
GUESTKVM-HV
   
  f-flushcpumask = cpumask - me;
   
   again:
  for_each_cpu(i, f-flushmask) {
   
if (!running[i]) {
case 1:
   
running[n]=1
   
(cpuN does not see
flush_on_enter set,
guest later finds it
running and sends ipi,
we are fine here, need
to clear the flag on
guest_exit)
   
   flush_on_enter[i] = 1;
case2:
   
running[n]=1
(cpuN - will see flush
on enter and an IPI as
well - addressed in patch-4)
   
   if (!running[i])
  cpu_clear(f-flushmask);  All is well, vm_enter
will do the fixup
}
case 3:
running[n] = 0;
   
(cpuN went to sleep,
we saw it as awake,
ipi sent, but wait
will break without
zero_mask and goto
again will take care)
   
  }
  send_ipi(f-flushmask)
   
  wait_a_while_for_zero_mask();
   
  if (!zero_mask)

Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-06-19 Thread Nikunj A Dadhania
Hi Marcelo,

Thanks for the review.

On Tue, 12 Jun 2012 20:02:18 -0300, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
  flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
  the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
  paravirtualization.
  
  Use the vcpu state information inside the kvm_flush_tlb_others to
  avoid sending ipi to pre-empted vcpus.
  
  * Do not send ipi's to offline vcpus and set flush_on_enter flag
  * For online vcpus: Wait for them to clear the flag
  
  The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
  
  Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
  Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com
 
 Why not reintroduce the hypercall to flush TLBs? 
Sure, I will get a version with this.

 No waiting, no entry/exit trickery.
 
Are you also suggesting to get rid of vcpu-running information?
We would atleast need this to raise a flushTLB hypercall on the sleeping
vcpu.

 This is half-way-there paravirt with all the downsides. 
Some more details on what are the downsides would help us reach to a
better solution.

 Even though the guest running information might be useful in other
 cases.
 
Yes, that was one of the things on the mind.

  Pseudo Algo:
  
 Write()
 ==
  
 guest_exit()
 flush_on_enter[i]=0;
 running[i] = 0;
  
 guest_enter()
 running[i] = 1;
 smp_mb();
 if(flush_on_enter[i]) {
 tlb_flush()
 flush_on_enter[i]=0;
 }
  
  
 Read()
 ==
  
 GUESTKVM-HV
  
 f-flushcpumask = cpumask - me;
  
  again:
 for_each_cpu(i, f-flushmask) {
  
 if (!running[i]) {
 case 1:
  
 running[n]=1
  
 (cpuN does not see
 flush_on_enter set,
 guest later finds it
 running and sends ipi,
 we are fine here, need
 to clear the flag on
 guest_exit)
  
flush_on_enter[i] = 1;
 case2:
  
 running[n]=1
 (cpuN - will see flush
 on enter and an IPI as
 well - addressed in patch-4)
  
if (!running[i])
   cpu_clear(f-flushmask);  All is well, vm_enter
 will do the fixup
 }
 case 3:
 running[n] = 0;
  
 (cpuN went to sleep,
 we saw it as awake,
 ipi sent, but wait
 will break without
 zero_mask and goto
 again will take care)
  
 }
 send_ipi(f-flushmask)
  
 wait_a_while_for_zero_mask();
  
 if (!zero_mask)
 goto again;
  ---
   arch/x86/include/asm/kvm_para.h |3 +-
   arch/x86/include/asm/tlbflush.h |9 ++
   arch/x86/kernel/kvm.c   |1 +
   arch/x86/kvm/x86.c  |   14 -
   arch/x86/mm/tlb.c   |   61 
  +++
   5 files changed, 86 insertions(+), 2 deletions(-)
  
 
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/7] KVM: Add paravirt kvm_flush_tlb_others

2012-06-12 Thread Marcelo Tosatti
On Mon, Jun 04, 2012 at 10:37:24AM +0530, Nikunj A. Dadhania wrote:
 flush_tlb_others_ipi depends on lot of statics in tlb.c.  Replicated
 the flush_tlb_others_ipi as kvm_flush_tlb_others to further adapt to
 paravirtualization.
 
 Use the vcpu state information inside the kvm_flush_tlb_others to
 avoid sending ipi to pre-empted vcpus.
 
 * Do not send ipi's to offline vcpus and set flush_on_enter flag
 * For online vcpus: Wait for them to clear the flag
 
 The approach was discussed here: https://lkml.org/lkml/2012/2/20/157
 
 Suggested-by: Peter Zijlstra a.p.zijls...@chello.nl
 Signed-off-by: Nikunj A. Dadhania nik...@linux.vnet.ibm.com

Why not reintroduce the hypercall to flush TLBs? No waiting, no
entry/exit trickery.

This is half-way-there paravirt with all the downsides. Even though the
guest running information might be useful in other cases.

 Pseudo Algo:
 
Write()
==
 
  guest_exit()
  flush_on_enter[i]=0;
  running[i] = 0;
 
  guest_enter()
  running[i] = 1;
  smp_mb();
  if(flush_on_enter[i]) {
  tlb_flush()
  flush_on_enter[i]=0;
  }
 
 
Read()
==
 
  GUESTKVM-HV
 
f-flushcpumask = cpumask - me;
 
 again:
for_each_cpu(i, f-flushmask) {
 
  if (!running[i]) {
  case 1:
 
  running[n]=1
 
  (cpuN does not see
  flush_on_enter set,
  guest later finds it
  running and sends ipi,
  we are fine here, need
  to clear the flag on
  guest_exit)
 
 flush_on_enter[i] = 1;
  case2:
 
  running[n]=1
  (cpuN - will see flush
  on enter and an IPI as
  well - addressed in patch-4)
 
 if (!running[i])
cpu_clear(f-flushmask);  All is well, vm_enter
  will do the fixup
  }
  case 3:
  running[n] = 0;
 
  (cpuN went to sleep,
  we saw it as awake,
  ipi sent, but wait
  will break without
  zero_mask and goto
  again will take care)
 
}
send_ipi(f-flushmask)
 
wait_a_while_for_zero_mask();
 
if (!zero_mask)
  goto again;
 ---
  arch/x86/include/asm/kvm_para.h |3 +-
  arch/x86/include/asm/tlbflush.h |9 ++
  arch/x86/kernel/kvm.c   |1 +
  arch/x86/kvm/x86.c  |   14 -
  arch/x86/mm/tlb.c   |   61 
 +++
  5 files changed, 86 insertions(+), 2 deletions(-)
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html