Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-09 Thread Christoffer Dall
On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
 Hi Christoffer,
 
 On 28/05/15 19:49, Christoffer Dall wrote:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
  
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
  
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
  
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
  
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
  
  Changes since v1:
   - Tweak comment and commit text based on Marc's feedback.
   - Explicitly disable preemption and enable it only after kvm_guest_exit().
  
   arch/arm/kvm/arm.c | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
  
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index e41cb11..fe8028d 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  kvm_vgic_flush_hwstate(vcpu);
  kvm_timer_flush_hwstate(vcpu);
   
  +   preempt_disable();
  local_irq_disable();
   
  /*
  @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
   
  if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
  local_irq_enable();
  +   preempt_enable();
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
  continue;
  @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
   
  vcpu-mode = OUTSIDE_GUEST_MODE;
  -   __kvm_guest_exit();
  -   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  +   /*
  +* Back from guest
  +*/
  +
  /*
   * We may have taken a host interrupt in HYP mode (ie
   * while executing the guest). This interrupt is still
  @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  local_irq_enable();
   
  /*
  -* Back from guest
  -*/
  +* We do local_irq_enable() before calling kvm_guest_exit() so
  +* that if a timer interrupt hits while running the guest we
  +* account that tick as being spent in the guest.  We enable
  +* preemption after calling kvm_guest_exit() so that if we get
  +* preempted we make sure ticks after that is not counted as
  +* guest time.
  +*/
  +   kvm_guest_exit();
  +   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  +   preempt_enable();
  +
   
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
  
 
 I've been thinking about this a bit more, and I wonder if we can
 simplify it a bit. At the moment, we disable the interrupts around the
 HYP entry. But now that you have introduced preempt_disable, it looks
 like we move the local_irq_disable call to be just after
 __kvm_guest_enter, and not bother with having such a long critical section.
 
 This is possible because entering HYP mode automatically masks
 interrupts, and we restore PSTATE on exception return.
 
 I think this would slightly reduce the amount of code we run on the host
 that gets accounted to the guest.
 
 Thoughts?
 
Isn't there a situation 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-09 Thread Marc Zyngier
On 09/06/15 15:43, Christoffer Dall wrote:
 On Mon, Jun 08, 2015 at 06:50:08PM +0100, Marc Zyngier wrote:
 Hi Christoffer,

 On 28/05/15 19:49, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.

 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.

 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:

 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.

 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

 Changes since v1:
  - Tweak comment and commit text based on Marc's feedback.
  - Explicitly disable preemption and enable it only after kvm_guest_exit().

  arch/arm/kvm/arm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index e41cb11..fe8028d 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 kvm_vgic_flush_hwstate(vcpu);
 kvm_timer_flush_hwstate(vcpu);
  
 +   preempt_disable();
 local_irq_disable();
  
 /*
 @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
  
 if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
 local_irq_enable();
 +   preempt_enable();
 kvm_timer_sync_hwstate(vcpu);
 kvm_vgic_sync_hwstate(vcpu);
 continue;
 @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
 vcpu-mode = OUTSIDE_GUEST_MODE;
 -   __kvm_guest_exit();
 -   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 +   /*
 +* Back from guest
 +*/
 +
 /*
  * We may have taken a host interrupt in HYP mode (ie
  * while executing the guest). This interrupt is still
 @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 local_irq_enable();
  
 /*
 -* Back from guest
 -*/
 +* We do local_irq_enable() before calling kvm_guest_exit() so
 +* that if a timer interrupt hits while running the guest we
 +* account that tick as being spent in the guest.  We enable
 +* preemption after calling kvm_guest_exit() so that if we get
 +* preempted we make sure ticks after that is not counted as
 +* guest time.
 +*/
 +   kvm_guest_exit();
 +   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 +   preempt_enable();
 +
  
 kvm_timer_sync_hwstate(vcpu);
 kvm_vgic_sync_hwstate(vcpu);


 I've been thinking about this a bit more, and I wonder if we can
 simplify it a bit. At the moment, we disable the interrupts around the
 HYP entry. But now that you have introduced preempt_disable, it looks
 like we move the local_irq_disable call to be just after
 __kvm_guest_enter, and not bother with having such a long critical section.

 This is possible because entering HYP mode automatically masks
 interrupts, and we restore PSTATE on exception return.

 I think this would slightly reduce the amount of code we run on the host
 that gets accounted to the guest.

 Thoughts?

 Isn't there a situation then where the guest can get stuck because we
 don't properly check 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-09 Thread Mario Smarduch
On 06/08/2015 04:35 AM, Christoffer Dall wrote:
 On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
 On 06/02/2015 02:27 AM, Christoffer Dall wrote:
 On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
 On 05/30/2015 11:59 PM, Christoffer Dall wrote:
 Hi Mario,

 On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
 On 05/28/2015 11:49 AM, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.

 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.

 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
[ ... ]

 preempt_enable() will call __preempt_schedule() and cause preemption
 there, so you're talking about adding these lines of latency:t

 kvm_guest_exit();
 trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));

 On return from IRQ this should execute - and el1_preempt won't
 get called.

 #ifdef CONFIG_PREEMPT
 get_thread_info tsk
 ldr w24, [tsk, #TI_PREEMPT] // get preempt count
 cbnzw24, 1f // preempt count != 0
 ldr x0, [tsk, #TI_FLAGS]// get flags
 tbz x0, #TIF_NEED_RESCHED, 1f   // needs rescheduling?
 bl  el1_preempt
 1:
 #endif

 
 I understand that, but then you call preempt_enable right after which
 calls __preempt_schedule() which has the same affect as that asm snippet
 you pasted here.
 


 And these were called with interrupts disabled before, so I don't see
 the issue??

 However, your question is making me think whether we have a race in the
 current code on fully preemptible kernels, if we get preempted before
 calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
 could potentially schedule another vcpu on this core and loose/corrupt
 state, can we not?  We probably need to check for this in
 kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
 real issue or if I'm seeing ghosts.

 Yes appears like it could be an issue in PREEMPT mode.
 
 see separate mail, I don't believe this to be an issue anymore.
 


[ ... ]

 Would you run with NO_HZ_FULL in this case?  Because then we should just
 enable HAVE_VIRT_CPU_ACCOUNTING_GEN, and I think that would be a good
 start.
 It may have a use case to run an isolated vCPU, but in general any mode
 may be used (,NO_HZ, even low PERIODIC).

 ok, but I still think it would be more correct to have this patch than
 not to.
No daubt, it exposes an important missing feature and fixes
'Guest time' which mostly should be accurate or close to it.

But there may be room for some future work in this
area (like NATIVE accounting with guest time support),

- Mario

 
 -Christoffer
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-08 Thread Marc Zyngier
Hi Christoffer,

On 28/05/15 19:49, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.
 
 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.
 
 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:
 
 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.
 
 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.
 
 Changes since v1:
  - Tweak comment and commit text based on Marc's feedback.
  - Explicitly disable preemption and enable it only after kvm_guest_exit().
 
  arch/arm/kvm/arm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index e41cb11..fe8028d 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
   kvm_vgic_flush_hwstate(vcpu);
   kvm_timer_flush_hwstate(vcpu);
  
 + preempt_disable();
   local_irq_disable();
  
   /*
 @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
  
   if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
   local_irq_enable();
 + preempt_enable();
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);
   continue;
 @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
   vcpu-mode = OUTSIDE_GUEST_MODE;
 - __kvm_guest_exit();
 - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + /*
 +  * Back from guest
 +  */
 +
   /*
* We may have taken a host interrupt in HYP mode (ie
* while executing the guest). This interrupt is still
 @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   local_irq_enable();
  
   /*
 -  * Back from guest
 -  */
 +  * We do local_irq_enable() before calling kvm_guest_exit() so
 +  * that if a timer interrupt hits while running the guest we
 +  * account that tick as being spent in the guest.  We enable
 +  * preemption after calling kvm_guest_exit() so that if we get
 +  * preempted we make sure ticks after that is not counted as
 +  * guest time.
 +  */
 + kvm_guest_exit();
 + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + preempt_enable();
 +
  
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);
 

I've been thinking about this a bit more, and I wonder if we can
simplify it a bit. At the moment, we disable the interrupts around the
HYP entry. But now that you have introduced preempt_disable, it looks
like we move the local_irq_disable call to be just after
__kvm_guest_enter, and not bother with having such a long critical section.

This is possible because entering HYP mode automatically masks
interrupts, and we restore PSTATE on exception return.

I think this would slightly reduce the amount of code we run on the host
that gets accounted to the guest.

Thoughts?

Thanks,

M.

-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line unsubscribe 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-08 Thread Christoffer Dall
On Fri, Jun 05, 2015 at 05:24:07AM -0700, Mario Smarduch wrote:
 On 06/02/2015 02:27 AM, Christoffer Dall wrote:
  On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
  On 05/30/2015 11:59 PM, Christoffer Dall wrote:
  Hi Mario,
 
  On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
  On 05/28/2015 11:49 AM, Christoffer Dall wrote:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
 
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
 
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
 
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the 
  kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
 
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Changes since v1:
   - Tweak comment and commit text based on Marc's feedback.
   - Explicitly disable preemption and enable it only after 
  kvm_guest_exit().
 
   arch/arm/kvm/arm.c | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
 
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index e41cb11..fe8028d 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  kvm_vgic_flush_hwstate(vcpu);
  kvm_timer_flush_hwstate(vcpu);
   
  +   preempt_disable();
  local_irq_disable();
   
  /*
  @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
   
  if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
  local_irq_enable();
  +   preempt_enable();
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
  continue;
  @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
   
  vcpu-mode = OUTSIDE_GUEST_MODE;
  -   __kvm_guest_exit();
  -   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), 
  *vcpu_pc(vcpu));
  +   /*
  +* Back from guest
  +
  */
  +
  /*
   * We may have taken a host interrupt in HYP mode (ie
   * while executing the guest). This interrupt is still
  @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  local_irq_enable();
   
  /*
  -* Back from guest
  -
  */
  +* We do local_irq_enable() before calling 
  kvm_guest_exit() so
  +* that if a timer interrupt hits while running the 
  guest we
  +* account that tick as being spent in the guest.  We 
  enable
  +* preemption after calling kvm_guest_exit() so that if 
  we get
  +* preempted we make sure ticks after that is not 
  counted as
  +* guest time.
  +*/
  +   kvm_guest_exit();
  +   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), 
  *vcpu_pc(vcpu));
  +   preempt_enable();
  +
   
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
 
 
  Hi Christoffer,
   so currently we take a snap shot when we enter the guest
  (tsk-vtime_snap) and upon exit add the time we spent in
  the guest and update accrued time, which appears correct.
 
  not on 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-05 Thread Mario Smarduch
On 06/02/2015 02:27 AM, Christoffer Dall wrote:
 On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
 On 05/30/2015 11:59 PM, Christoffer Dall wrote:
 Hi Mario,

 On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
 On 05/28/2015 11:49 AM, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.

 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.

 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:

 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.

 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

 Changes since v1:
  - Tweak comment and commit text based on Marc's feedback.
  - Explicitly disable preemption and enable it only after 
 kvm_guest_exit().

  arch/arm/kvm/arm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index e41cb11..fe8028d 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   kvm_vgic_flush_hwstate(vcpu);
   kvm_timer_flush_hwstate(vcpu);
  
 + preempt_disable();
   local_irq_disable();
  
   /*
 @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
  
   if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
   local_irq_enable();
 + preempt_enable();
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);
   continue;
 @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
   vcpu-mode = OUTSIDE_GUEST_MODE;
 - __kvm_guest_exit();
 - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + /*
 +  * Back from guest
 +  */
 +
   /*
* We may have taken a host interrupt in HYP mode (ie
* while executing the guest). This interrupt is still
 @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   local_irq_enable();
  
   /*
 -  * Back from guest
 -  */
 +  * We do local_irq_enable() before calling kvm_guest_exit() so
 +  * that if a timer interrupt hits while running the guest we
 +  * account that tick as being spent in the guest.  We enable
 +  * preemption after calling kvm_guest_exit() so that if we get
 +  * preempted we make sure ticks after that is not counted as
 +  * guest time.
 +  */
 + kvm_guest_exit();
 + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + preempt_enable();
 +
  
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);


 Hi Christoffer,
  so currently we take a snap shot when we enter the guest
 (tsk-vtime_snap) and upon exit add the time we spent in
 the guest and update accrued time, which appears correct.

 not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
 am I missing something obvious here?
 I see what you mean we can't use cycle based accounting to accrue
 Guest time.

 
 See other thread, we can enable this in the config but it still only
 works with NO_HZ_FULL.
 


 With this patch it appears that interrupts running
 in host mode are accrued to Guest time, and additional preemption
 latency is added.

 It is 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 08:48:22AM -0700, Mario Smarduch wrote:
 On 05/30/2015 11:59 PM, Christoffer Dall wrote:
  Hi Mario,
  
  On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
  On 05/28/2015 11:49 AM, Christoffer Dall wrote:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
 
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
 
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
 
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
 
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Changes since v1:
   - Tweak comment and commit text based on Marc's feedback.
   - Explicitly disable preemption and enable it only after 
  kvm_guest_exit().
 
   arch/arm/kvm/arm.c | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
 
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index e41cb11..fe8028d 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
kvm_vgic_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
   
  + preempt_disable();
local_irq_disable();
   
/*
  @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
   
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
  + preempt_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
  @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
   
vcpu-mode = OUTSIDE_GUEST_MODE;
  - __kvm_guest_exit();
  - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  + /*
  +  * Back from guest
  +  */
  +
/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
  @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
local_irq_enable();
   
/*
  -  * Back from guest
  -  */
  +  * We do local_irq_enable() before calling kvm_guest_exit() so
  +  * that if a timer interrupt hits while running the guest we
  +  * account that tick as being spent in the guest.  We enable
  +  * preemption after calling kvm_guest_exit() so that if we get
  +  * preempted we make sure ticks after that is not counted as
  +  * guest time.
  +  */
  + kvm_guest_exit();
  + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  + preempt_enable();
  +
   
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
 
 
  Hi Christoffer,
   so currently we take a snap shot when we enter the guest
  (tsk-vtime_snap) and upon exit add the time we spent in
  the guest and update accrued time, which appears correct.
  
  not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
  am I missing something obvious here?
 I see what you mean we can't use cycle based accounting to accrue
 Guest time.
 

See other thread, we can enable this in the config but it still only
works with NO_HZ_FULL.

  
 
  With this patch it appears that interrupts running
  in host mode are accrued to 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 03:37:32PM +0200, Christian Borntraeger wrote:
 Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
  On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
  Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
 
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Not an expert here, but I assume mips has the same logic as arm so if 
  your
  patch is right for arm its probably also for mips.
 
  powerpc looks similar to what s390 does (not using the tick, instead it 
  uses
  a hw-timer) so this should be fine.
 
  I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
  this for free which would avoid the need for this patch?
 
  Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
  HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
  - yes it might work out. Can you give it a try?
 
  Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
  no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
  like a fix since it would be a shame to force users to use this config
  option to report CPU usage correctly.
  
  I'm not entirely sure what the history and meaning behind these configs
  are, so maybe there is an entirely different rework needed here.  It
  seems logical that you could simply sample the counter at entry/exit of
  the guest, but if there is nowhere to store this data without
  NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?
 
 Given Paolos response that irq_disable/enable is faster than save/restore
 at least on x86 your v2 patch might actually be the right thing to do.
 
Thanks, I think so too, but we should enable
HAVE_VIRT_CPU_ACCOUNTING_GEN for arm64 as well, assuming there are no
mysterious side affects of doing so.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-02 Thread Christoffer Dall
[replying to myself]

On Tue, Jun 02, 2015 at 11:27:59AM +0200, Christoffer Dall wrote:

[..]

   
   If this patch is incorrect, then how does it work on x86, where
   handle_external_intr() is called (with a barrier in between) before
   kvm_guest_exit(), and where handle_external_intr() is simply
   local_irq_enable() on SVM and something more complicated on VMX ?
   
   Finally, how exactly is preemption latency added here?   Won't IRQ
   processing run with higher priority than any task on your system, so the
   order of (1) process pending IRQs (2) call schedule if needed is still
   preserved here, but we call kvm_guest_exit() between (1) and (2) instead
   of before (1).
  
  I may be missing something, but on return from interrupt with preempt
  disabled we can't take the need resched path. And need to return
  to KVM no?
 
 preempt_enable() will call __preempt_schedule() and cause preemption
 there, so you're talking about adding these lines of latency:
 
   kvm_guest_exit();
   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 
 And these were called with interrupts disabled before, so I don't see
 the issue??
 
 However, your question is making me think whether we have a race in the
 current code on fully preemptible kernels, if we get preempted before
 calling kvm_timer_sync_hwstate() and kvm_vgic_sync_hwstate(), then we
 could potentially schedule another vcpu on this core and loose/corrupt
 state, can we not?  We probably need to check for this in
 kvm_vcpu_load/kvm_vcpu_put.  I need to think more about if this is a
 real issue or if I'm seeing ghosts.
 
I've thought about it and I don't think there's a race because those
functions don't access the hardware directly, but only manipulate
per-vcpu data structures.

-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Paolo Bonzini


On 01/06/2015 13:42, Christian Borntraeger wrote:
 Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:


 On 01/06/2015 09:47, Christian Borntraeger wrote:

 1: disable, guest, disable again and save, restore to disable, 
 enable
 and now it is
 2: disable, guest, enable
 and with your patch it is
 3: disable, guest, enable, disable, enable

 I assume that 3 and 1 are similar in its costs, so this is probably ok.

 At least on x86, 3 and 2 are similar, but 3 is much more expensive than
 1!  See https://lkml.org/lkml/2015/5/5/835:
 
 That does not make sense. If 3 and 2 are similar, then 2 must be much more
 expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?

Yes, it must.  I meant 3 is much cheaper than 1.

Paolo

 Cost of: CLI insn  same-IF : 0 cycles
 Cost of: CLI insn  flip-IF : 0 cycles
 Cost of: STI insn  same-IF : 0 cycles
 Cost of: STI insn  flip-IF : 0 cycles
 Cost of: PUSHF   insn  : 0 cycles
 Cost of: POPFinsn  same-IF :20 cycles
 Cost of: POPFinsn  flip-IF :28 cycles
 Cost of: local_irq_save()fn:20 cycles
 Cost of: local_irq_restore() fnsame-IF :24 cycles
 Cost of: local_irq_restore() fnflip-IF :28 cycles
 Cost of: irq_save()+restore()fnsame-IF :48 cycles
 Cost of: irq_save()+restore()fnflip-IF :48 cycles
 
 Yes its similar on s390. local_irq_save/restore is noticable in guest exit
 hot loops (thats what inspired my patch), but a simple irq disable is
 just single cycle pipelined. Given the design of aggressive out-out order
 designs with all the architectural ordering this makes sense.
 
 Christian
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christian Borntraeger
Am 01.06.2015 um 13:34 schrieb Paolo Bonzini:
 
 
 On 01/06/2015 09:47, Christian Borntraeger wrote:

 1: disable, guest, disable again and save, restore to disable, 
 enable
 and now it is
 2: disable, guest, enable
 and with your patch it is
 3: disable, guest, enable, disable, enable

 I assume that 3 and 1 are similar in its costs, so this is probably ok.
 
 At least on x86, 3 and 2 are similar, but 3 is much more expensive than
 1!  See https://lkml.org/lkml/2015/5/5/835:

That does not make sense. If 3 and 2 are similar, then 2 must be much more
expensive than 1 as well. As 2 is a strict subset of 1 it must be cheaper, no?


 
 Cost of: CLI insn  same-IF : 0 cycles
 Cost of: CLI insn  flip-IF : 0 cycles
 Cost of: STI insn  same-IF : 0 cycles
 Cost of: STI insn  flip-IF : 0 cycles
 Cost of: PUSHF   insn  : 0 cycles
 Cost of: POPFinsn  same-IF :20 cycles
 Cost of: POPFinsn  flip-IF :28 cycles
 Cost of: local_irq_save()fn:20 cycles
 Cost of: local_irq_restore() fnsame-IF :24 cycles
 Cost of: local_irq_restore() fnflip-IF :28 cycles
 Cost of: irq_save()+restore()fnsame-IF :48 cycles
 Cost of: irq_save()+restore()fnflip-IF :48 cycles

Yes its similar on s390. local_irq_save/restore is noticable in guest exit
hot loops (thats what inspired my patch), but a simple irq disable is
just single cycle pipelined. Given the design of aggressive out-out order
designs with all the architectural ordering this makes sense.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Paolo Bonzini


On 01/06/2015 09:47, Christian Borntraeger wrote:
 
 1: disable, guest, disable again and save, restore to disable, 
 enable
 and now it is
 2: disable, guest, enable
 and with your patch it is
 3: disable, guest, enable, disable, enable
 
 I assume that 3 and 1 are similar in its costs, so this is probably ok.

At least on x86, 3 and 2 are similar, but 3 is much more expensive than
1!  See https://lkml.org/lkml/2015/5/5/835:

Cost of: CLI insn  same-IF : 0 cycles
Cost of: CLI insn  flip-IF : 0 cycles
Cost of: STI insn  same-IF : 0 cycles
Cost of: STI insn  flip-IF : 0 cycles
Cost of: PUSHF   insn  : 0 cycles
Cost of: POPFinsn  same-IF :20 cycles
Cost of: POPFinsn  flip-IF :28 cycles
Cost of: local_irq_save()fn:20 cycles
Cost of: local_irq_restore() fnsame-IF :24 cycles
Cost of: local_irq_restore() fnflip-IF :28 cycles
Cost of: irq_save()+restore()fnsame-IF :48 cycles
Cost of: irq_save()+restore()fnflip-IF :48 cycles

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christian Borntraeger
Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.

Can you verify that a CPU bound guest has almost zero guest time?
Assuming that your answer is yes your patch make sense as host
timer interrupts should be the only reasons for guest exits then.


 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.
 
 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:
 
 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.

This should be somewhat similar to the situation before my patch.
There it was

1: disable, guest, disable again and save, restore to disable, enable
and now it is
2: disable, guest, enable
and with your patch it is
3: disable, guest, enable, disable, enable

I assume that 3 and 1 are similar in its costs, so this is probably ok.


 
 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

Not an expert here, but I assume mips has the same logic as arm so if your
patch is right for arm its probably also for mips.

powerpc looks similar to what s390 does (not using the tick, instead it uses
a hw-timer) so this should be fine.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 09:47:46AM +0200, Christian Borntraeger wrote:
 Am 28.05.2015 um 20:49 schrieb Christoffer Dall:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
 
 Can you verify that a CPU bound guest has almost zero guest time?
 Assuming that your answer is yes your patch make sense as host
 timer interrupts should be the only reasons for guest exits then.
 

Yes, 'cat /dev/urandom  /dev/null' in the guest shows up as 100% sys in
mpstat on the host, 0% guest.

 
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
  
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
  
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
 
 This should be somewhat similar to the situation before my patch.
 There it was
 
 1: disable, guest, disable again and save, restore to disable, 
 enable
 and now it is
 2: disable, guest, enable
 and with your patch it is
 3: disable, guest, enable, disable, enable
 
 I assume that 3 and 1 are similar in its costs, so this is probably ok.
 
 
  
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
 Not an expert here, but I assume mips has the same logic as arm so if your
 patch is right for arm its probably also for mips.
 
 powerpc looks similar to what s390 does (not using the tick, instead it uses
 a hw-timer) so this should be fine.
 
I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
this for free which would avoid the need for this patch?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christian Borntraeger
Am 01.06.2015 um 11:08 schrieb Christoffer Dall:


 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

 Not an expert here, but I assume mips has the same logic as arm so if your
 patch is right for arm its probably also for mips.

 powerpc looks similar to what s390 does (not using the tick, instead it uses
 a hw-timer) so this should be fine.

 I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
 this for free which would avoid the need for this patch?

Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
- yes it might work out. Can you give it a try?

Christian


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Mario Smarduch
On 05/30/2015 11:59 PM, Christoffer Dall wrote:
 Hi Mario,
 
 On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
 On 05/28/2015 11:49 AM, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.

 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.

 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.

 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:

 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.

 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

 Changes since v1:
  - Tweak comment and commit text based on Marc's feedback.
  - Explicitly disable preemption and enable it only after kvm_guest_exit().

  arch/arm/kvm/arm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)

 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index e41cb11..fe8028d 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 kvm_vgic_flush_hwstate(vcpu);
 kvm_timer_flush_hwstate(vcpu);
  
 +   preempt_disable();
 local_irq_disable();
  
 /*
 @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
  
 if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
 local_irq_enable();
 +   preempt_enable();
 kvm_timer_sync_hwstate(vcpu);
 kvm_vgic_sync_hwstate(vcpu);
 continue;
 @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
 vcpu-mode = OUTSIDE_GUEST_MODE;
 -   __kvm_guest_exit();
 -   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 +   /*
 +* Back from guest
 +*/
 +
 /*
  * We may have taken a host interrupt in HYP mode (ie
  * while executing the guest). This interrupt is still
 @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
 local_irq_enable();
  
 /*
 -* Back from guest
 -*/
 +* We do local_irq_enable() before calling kvm_guest_exit() so
 +* that if a timer interrupt hits while running the guest we
 +* account that tick as being spent in the guest.  We enable
 +* preemption after calling kvm_guest_exit() so that if we get
 +* preempted we make sure ticks after that is not counted as
 +* guest time.
 +*/
 +   kvm_guest_exit();
 +   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 +   preempt_enable();
 +
  
 kvm_timer_sync_hwstate(vcpu);
 kvm_vgic_sync_hwstate(vcpu);


 Hi Christoffer,
  so currently we take a snap shot when we enter the guest
 (tsk-vtime_snap) and upon exit add the time we spent in
 the guest and update accrued time, which appears correct.
 
 not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
 am I missing something obvious here?
I see what you mean we can't use cycle based accounting to accrue
Guest time.

 

 With this patch it appears that interrupts running
 in host mode are accrued to Guest time, and additional preemption
 latency is added.

 It is true that interrupt processing in host mode (if they hit on a CPU
 when it is running a guest) are accrued to guest time, but without this
 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christoffer Dall
On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
 Am 01.06.2015 um 11:08 schrieb Christoffer Dall:
 
 
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
 
  Not an expert here, but I assume mips has the same logic as arm so if your
  patch is right for arm its probably also for mips.
 
  powerpc looks similar to what s390 does (not using the tick, instead it 
  uses
  a hw-timer) so this should be fine.
 
  I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
  this for free which would avoid the need for this patch?
 
 Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
 HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
 - yes it might work out. Can you give it a try?
 
Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
like a fix since it would be a shame to force users to use this config
option to report CPU usage correctly.

I'm not entirely sure what the history and meaning behind these configs
are, so maybe there is an entirely different rework needed here.  It
seems logical that you could simply sample the counter at entry/exit of
the guest, but if there is nowhere to store this data without
NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-06-01 Thread Christian Borntraeger
Am 01.06.2015 um 15:35 schrieb Christoffer Dall:
 On Mon, Jun 01, 2015 at 11:21:19AM +0200, Christian Borntraeger wrote:
 Am 01.06.2015 um 11:08 schrieb Christoffer Dall:


 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.

 Not an expert here, but I assume mips has the same logic as arm so if your
 patch is right for arm its probably also for mips.

 powerpc looks similar to what s390 does (not using the tick, instead it 
 uses
 a hw-timer) so this should be fine.

 I wonder if we can simply enable HAVE_VIRT_CPU_ACCOUNTING_GEN and get
 this for free which would avoid the need for this patch?

 Asssuming that HAVE_VIRT_CPU_ACCOUNTING_GEN behaves similar to 
 HAVE_VIRT_CPU_ACCOUNTING on s390/power in respect to not rely on ticks
 - yes it might work out. Can you give it a try?

 Adding HAVE_VIRT_CPU_ACCOUNTING_GEN to arch/arm64/Kconfig works, but has
 no effect unless you also enable CONFIG_NO_HZ_FULL, so that hardly feels
 like a fix since it would be a shame to force users to use this config
 option to report CPU usage correctly.
 
 I'm not entirely sure what the history and meaning behind these configs
 are, so maybe there is an entirely different rework needed here.  It
 seems logical that you could simply sample the counter at entry/exit of
 the guest, but if there is nowhere to store this data without
 NO_HZ_FULL+VIRT_CPU_ACCOUNTING_GEN then I guess that would be why?

Given Paolos response that irq_disable/enable is faster than save/restore
at least on x86 your v2 patch might actually be the right thing to do.

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-05-31 Thread Christoffer Dall
Hi Mario,

On Fri, May 29, 2015 at 03:34:47PM -0700, Mario Smarduch wrote:
 On 05/28/2015 11:49 AM, Christoffer Dall wrote:
  Until now we have been calling kvm_guest_exit after re-enabling
  interrupts when we come back from the guest, but this has the
  unfortunate effect that CPU time accounting done in the context of timer
  interrupts occurring while the guest is running doesn't properly notice
  that the time since the last tick was spent in the guest.
  
  Inspired by the comment in the x86 code, move the kvm_guest_exit() call
  below the local_irq_enable() call and change __kvm_guest_exit() to
  kvm_guest_exit(), because we are now calling this function with
  interrupts enabled.  We have to now explicitly disable preemption and
  not enable preemption before we've called kvm_guest_exit(), since
  otherwise we could be preempted and everything happening before we
  eventually get scheduled again would be accounted for as guest time.
  
  At the same time, move the trace_kvm_exit() call outside of the atomic
  section, since there is no reason for us to do that with interrupts
  disabled.
  
  Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
  ---
  This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
  rework recently posted by Christian Borntraeger.  I hope I got the logic
  of this right, there were 2 slightly worrying facts about this:
  
  First, we now enable and disable and enable interrupts on each exit
  path, but I couldn't see any performance overhead on hackbench - yes the
  only benchmark we care about.
  
  Second, looking at the ppc and mips code, they seem to also call
  kvm_guest_exit() before enabling interrupts, so I don't understand how
  guest CPU time accounting works on those architectures.
  
  Changes since v1:
   - Tweak comment and commit text based on Marc's feedback.
   - Explicitly disable preemption and enable it only after kvm_guest_exit().
  
   arch/arm/kvm/arm.c | 21 +
   1 file changed, 17 insertions(+), 4 deletions(-)
  
  diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
  index e41cb11..fe8028d 100644
  --- a/arch/arm/kvm/arm.c
  +++ b/arch/arm/kvm/arm.c
  @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  kvm_vgic_flush_hwstate(vcpu);
  kvm_timer_flush_hwstate(vcpu);
   
  +   preempt_disable();
  local_irq_disable();
   
  /*
  @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
   
  if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
  local_irq_enable();
  +   preempt_enable();
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
  continue;
  @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
   
  vcpu-mode = OUTSIDE_GUEST_MODE;
  -   __kvm_guest_exit();
  -   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  +   /*
  +* Back from guest
  +*/
  +
  /*
   * We may have taken a host interrupt in HYP mode (ie
   * while executing the guest). This interrupt is still
  @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
  struct kvm_run *run)
  local_irq_enable();
   
  /*
  -* Back from guest
  -*/
  +* We do local_irq_enable() before calling kvm_guest_exit() so
  +* that if a timer interrupt hits while running the guest we
  +* account that tick as being spent in the guest.  We enable
  +* preemption after calling kvm_guest_exit() so that if we get
  +* preempted we make sure ticks after that is not counted as
  +* guest time.
  +*/
  +   kvm_guest_exit();
  +   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
  +   preempt_enable();
  +
   
  kvm_timer_sync_hwstate(vcpu);
  kvm_vgic_sync_hwstate(vcpu);
  
 
 Hi Christoffer,
  so currently we take a snap shot when we enter the guest
 (tsk-vtime_snap) and upon exit add the time we spent in
 the guest and update accrued time, which appears correct.

not on arm64, because we don't select HAVE_VIRT_CPU_ACCOUNTING_GEN.  Or
am I missing something obvious here?

 
 With this patch it appears that interrupts running
 in host mode are accrued to Guest time, and additional preemption
 latency is added.
 
It is true that interrupt processing in host mode (if they hit on a CPU
when it is running a guest) are accrued to guest time, but without this
patch on current arm64 we 

Re: [PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-05-29 Thread Mario Smarduch
On 05/28/2015 11:49 AM, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts occurring while the guest is running doesn't properly notice
 that the time since the last tick was spent in the guest.
 
 Inspired by the comment in the x86 code, move the kvm_guest_exit() call
 below the local_irq_enable() call and change __kvm_guest_exit() to
 kvm_guest_exit(), because we are now calling this function with
 interrupts enabled.  We have to now explicitly disable preemption and
 not enable preemption before we've called kvm_guest_exit(), since
 otherwise we could be preempted and everything happening before we
 eventually get scheduled again would be accounted for as guest time.
 
 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this right, there were 2 slightly worrying facts about this:
 
 First, we now enable and disable and enable interrupts on each exit
 path, but I couldn't see any performance overhead on hackbench - yes the
 only benchmark we care about.
 
 Second, looking at the ppc and mips code, they seem to also call
 kvm_guest_exit() before enabling interrupts, so I don't understand how
 guest CPU time accounting works on those architectures.
 
 Changes since v1:
  - Tweak comment and commit text based on Marc's feedback.
  - Explicitly disable preemption and enable it only after kvm_guest_exit().
 
  arch/arm/kvm/arm.c | 21 +
  1 file changed, 17 insertions(+), 4 deletions(-)
 
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index e41cb11..fe8028d 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
   kvm_vgic_flush_hwstate(vcpu);
   kvm_timer_flush_hwstate(vcpu);
  
 + preempt_disable();
   local_irq_disable();
  
   /*
 @@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
 kvm_run *run)
  
   if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
   local_irq_enable();
 + preempt_enable();
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);
   continue;
 @@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
  
   vcpu-mode = OUTSIDE_GUEST_MODE;
 - __kvm_guest_exit();
 - trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + /*
 +  * Back from guest
 +  */
 +
   /*
* We may have taken a host interrupt in HYP mode (ie
* while executing the guest). This interrupt is still
 @@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
 struct kvm_run *run)
   local_irq_enable();
  
   /*
 -  * Back from guest
 -  */
 +  * We do local_irq_enable() before calling kvm_guest_exit() so
 +  * that if a timer interrupt hits while running the guest we
 +  * account that tick as being spent in the guest.  We enable
 +  * preemption after calling kvm_guest_exit() so that if we get
 +  * preempted we make sure ticks after that is not counted as
 +  * guest time.
 +  */
 + kvm_guest_exit();
 + trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
 + preempt_enable();
 +
  
   kvm_timer_sync_hwstate(vcpu);
   kvm_vgic_sync_hwstate(vcpu);
 

Hi Christoffer,
 so currently we take a snap shot when we enter the guest
(tsk-vtime_snap) and upon exit add the time we spent in
the guest and update accrued time, which appears correct.

With this patch it appears that interrupts running
in host mode are accrued to Guest time, and additional preemption
latency is added.

Thanks,
- Mario

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-05-28 Thread Christoffer Dall
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
rework recently posted by Christian Borntraeger.  I hope I got the logic
of this right, there were 2 slightly worrying facts about this:

First, we now enable and disable and enable interrupts on each exit
path, but I couldn't see any performance overhead on hackbench - yes the
only benchmark we care about.

Second, looking at the ppc and mips code, they seem to also call
kvm_guest_exit() before enabling interrupts, so I don't understand how
guest CPU time accounting works on those architectures.

Changes since v1:
 - Tweak comment and commit text based on Marc's feedback.
 - Explicitly disable preemption and enable it only after kvm_guest_exit().

 arch/arm/kvm/arm.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e41cb11..fe8028d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_vgic_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
 
+   preempt_disable();
local_irq_disable();
 
/*
@@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
+   preempt_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
@@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
vcpu-mode = OUTSIDE_GUEST_MODE;
-   __kvm_guest_exit();
-   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   /*
+* Back from guest
+*/
+
/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
@@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_enable();
 
/*
-* Back from guest
-*/
+* We do local_irq_enable() before calling kvm_guest_exit() so
+* that if a timer interrupt hits while running the guest we
+* account that tick as being spent in the guest.  We enable
+* preemption after calling kvm_guest_exit() so that if we get
+* preempted we make sure ticks after that is not counted as
+* guest time.
+*/
+   kvm_guest_exit();
+   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   preempt_enable();
+
 
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html