[patch v2 0/3] kvmclock: allow stable sched clock

2015-05-28 Thread Marcelo Tosatti
kvmclock provides the behaviour sched_clock users expect.
Mark it as stable allowing nohz_full in guests.
See individual patches for more details.

v2: address comments from Paolo:
- set COUNTS_FROM_ZERO unconditionally (host).
- rely on KVM_FEATURE_CLOCKSOURCE_STABLE_BIT (guest).


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/4] KVM: x86: Add KVM exit for IOAPIC EOIs

2015-05-28 Thread Steve Rutherford
On Wed, May 27, 2015 at 08:32:04AM +0300, Avi Kivity wrote:
 On 05/27/2015 05:06 AM, Steve Rutherford wrote:
 On Sun, May 24, 2015 at 07:46:03PM +0300, Avi Kivity wrote:
 On 05/13/2015 04:47 AM, Steve Rutherford wrote:
 Adds KVM_EXIT_IOAPIC_EOI which passes the interrupt vector up to
 userspace.
 
 Uses a per VCPU exit bitmap to decide whether or not the IOAPIC needs
 to be informed (which is identical to the EOI_EXIT_BITMAP field used
 by modern x86 processors, but can also be used to elide kvm IOAPIC EOI
 exits on older processors).
 
 [Note: A prototype using ResampleFDs found that decoupling the EOI
 from the VCPU's thread made it possible for the VCPU to not see a
 recent EOI after reentering the guest. This does not match real
 hardware.]
 
 Compile tested for Intel x86.
 
 Signed-off-by: Steve Rutherford srutherf...@google.com
 ---
   Documentation/virtual/kvm/api.txt | 10 ++
   arch/x86/include/asm/kvm_host.h   |  3 +++
   arch/x86/kvm/lapic.c  |  9 +
   arch/x86/kvm/x86.c| 11 +++
   include/linux/kvm_host.h  |  1 +
   include/uapi/linux/kvm.h  |  5 +
   6 files changed, 39 insertions(+)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 0744b4e..dd92996 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -3285,6 +3285,16 @@ Valid values for 'type' are:
 */
__u64 kvm_valid_regs;
__u64 kvm_dirty_regs;
 +
 +  /* KVM_EXIT_IOAPIC_EOI */
 +struct {
 + __u8 vector;
 +} eoi;
 +
 +Indicates that an eoi of a level triggered IOAPIC interrupt on vector has
 +occurred, which should be handled by the userspace IOAPIC. Triggers when
 +the Irqchip has been split between userspace and the kernel.
 +
 The ioapic is a global resource, so it doesn't make sense for
 information about it to be returned in a per-vcpu structure
 EOI exits are a per-vcpu behavior, so this doesn't seem all that strange.
 
 (or to block the vcpu while it is being processed).
 Blocking doesn't feel clean, but doesn't seem all that bad, given
 that these operations are relatively rare on modern configurations.
 
 Agree, maybe the realtime people have an interest here.
 
 The way I'd model it is to emulate the APIC bus that connects local
 APICs and the IOAPIC, using a socket pair.  When the user-space
 ioapic wants to inject an interrupt, it sends a message to the local
 APICs which then inject it, and when it's ack'ed the EOI is sent
 back on the same bus.
 Although I'm not certain about this, it sounds to me like this would
 require a kernel thread to be waiting (in some way) on this socket, which
 seems rather heavy handed.
 
 It's been a while since I did kernel programming, but I think you
 can queue a callback to be called when an I/O is ready, and not
 require a thread.  IIRC we do that with irqfd to cause an interrupt
 to be injected.
 

This should be possible, but it's going to add a ton of complexity, and I don't 
really see any compelling benefits. If there is a compelling reason to switch 
to a socket based interface, I'm definitely willing to refactor.

Steve
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v2 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR

2015-05-28 Thread Marcelo Tosatti
Initialize kvmclock base, on kvmclock system MSR write time,
so that the guest sees kvmclock counting from zero.

This matches baremetal behaviour when kvmclock in guest
sets sched clock stable.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

---
 arch/x86/kvm/x86.c |4 
 1 file changed, 4 insertions(+)

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c 2015-05-28 19:18:12.621372286 -0300
+++ kvm/arch/x86/kvm/x86.c  2015-05-28 19:19:17.738268690 -0300
@@ -1700,6 +1700,8 @@
vcpu-pvclock_set_guest_stopped_request = false;
}
 
+   /* pvclock counts from zero */
+   pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;
/* If the host uses TSC clocksource, then it is stable */
if (use_master_clock)
pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
@@ -2282,6 +2284,8 @@
vcpu-requests);
 
ka-boot_vcpu_runs_old_kvmclock = tmp;
+
+   ka-kvmclock_offset = -get_kernel_ns();
}
 
vcpu-arch.time = data;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v2 2/3] x86: kvmclock: set scheduler clock stable

2015-05-28 Thread Marcelo Tosatti
From: Luiz Capitulino lcapitul...@redhat.com

If you try to enable NOHZ_FULL on a guest today, you'll get
the following error when the guest tries to deactivate the
scheduler tick:

 WARNING: CPU: 3 PID: 2182 at kernel/time/tick-sched.c:192 
can_stop_full_tick+0xb9/0x290()
 NO_HZ FULL will not work with unstable sched clock
 CPU: 3 PID: 2182 Comm: kworker/3:1 Not tainted 4.0.0-10545-gb9bb6fb #204
 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
 Workqueue: events flush_to_ldisc
  8162a0c7 88011f583e88 814e6ba0 0002
  88011f583ed8 88011f583ec8 8104d095 88011f583eb8
   0003 0001 0001
 Call Trace:
  IRQ  [814e6ba0] dump_stack+0x4f/0x7b
  [8104d095] warn_slowpath_common+0x85/0xc0
  [8104d146] warn_slowpath_fmt+0x46/0x50
  [810bd2a9] can_stop_full_tick+0xb9/0x290
  [810bd9ed] tick_nohz_irq_exit+0x8d/0xb0
  [810511c5] irq_exit+0xc5/0x130
  [814f180a] smp_apic_timer_interrupt+0x4a/0x60
  [814eff5e] apic_timer_interrupt+0x6e/0x80
  EOI  [814ee5d1] ? _raw_spin_unlock_irqrestore+0x31/0x60
  [8108bbc8] __wake_up+0x48/0x60
  [8134836c] n_tty_receive_buf_common+0x49c/0xba0
  [8134a6bf] ? tty_ldisc_ref+0x1f/0x70
  [81348a84] n_tty_receive_buf2+0x14/0x20
  [8134b390] flush_to_ldisc+0xe0/0x120
  [81064d05] process_one_work+0x1d5/0x540
  [81064c81] ? process_one_work+0x151/0x540
  [81065191] worker_thread+0x121/0x470
  [81065070] ? process_one_work+0x540/0x540
  [8106b4df] kthread+0xef/0x110
  [8106b3f0] ? __kthread_parkme+0xa0/0xa0
  [814ef4f2] ret_from_fork+0x42/0x70
  [8106b3f0] ? __kthread_parkme+0xa0/0xa0
 ---[ end trace 06e3507544a38866 ]---

However, it turns out that kvmclock does provide a stable
sched_clock callback. So, let the scheduler know this which
in turn makes NOHZ_FULL work in the guest.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Luiz Capitulino lcapitul...@redhat.com

---
 arch/x86/kernel/kvmclock.c |   14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

Index: kvm/arch/x86/kernel/kvmclock.c
===
--- kvm.orig/arch/x86/kernel/kvmclock.c 2015-05-27 18:00:53.616391551 -0300
+++ kvm/arch/x86/kernel/kvmclock.c  2015-05-28 19:19:13.878274636 -0300
@@ -24,6 +24,7 @@
 #include linux/percpu.h
 #include linux/hardirq.h
 #include linux/memblock.h
+#include linux/sched.h
 
 #include asm/x86_init.h
 #include asm/reboot.h
@@ -217,8 +218,10 @@
 
 void __init kvmclock_init(void)
 {
+   struct pvclock_vcpu_time_info *vcpu_time;
unsigned long mem;
-   int size;
+   int size, cpu;
+   u8 flags;
 
size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS);
 
@@ -264,7 +267,14 @@
pv_info.name = KVM;
 
if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
-   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+   pvclock_set_flags(~0);
+
+   cpu = get_cpu();
+   vcpu_time = hv_clock[cpu].pvti;
+   flags = pvclock_read_flags(vcpu_time);
+   if (flags  PVCLOCK_COUNTS_FROM_ZERO)
+   set_sched_clock_stable();
+   put_cpu();
 }
 
 int __init kvm_setup_vsyscall_timeinfo(void)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] arm/arm64: KVM: Properly account for guest CPU time

2015-05-28 Thread Christoffer Dall
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts occurring while the guest is running doesn't properly notice
that the time since the last tick was spent in the guest.

Inspired by the comment in the x86 code, move the kvm_guest_exit() call
below the local_irq_enable() call and change __kvm_guest_exit() to
kvm_guest_exit(), because we are now calling this function with
interrupts enabled.  We have to now explicitly disable preemption and
not enable preemption before we've called kvm_guest_exit(), since
otherwise we could be preempted and everything happening before we
eventually get scheduled again would be accounted for as guest time.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
rework recently posted by Christian Borntraeger.  I hope I got the logic
of this right, there were 2 slightly worrying facts about this:

First, we now enable and disable and enable interrupts on each exit
path, but I couldn't see any performance overhead on hackbench - yes the
only benchmark we care about.

Second, looking at the ppc and mips code, they seem to also call
kvm_guest_exit() before enabling interrupts, so I don't understand how
guest CPU time accounting works on those architectures.

Changes since v1:
 - Tweak comment and commit text based on Marc's feedback.
 - Explicitly disable preemption and enable it only after kvm_guest_exit().

 arch/arm/kvm/arm.c | 21 +
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e41cb11..fe8028d 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -532,6 +532,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
kvm_vgic_flush_hwstate(vcpu);
kvm_timer_flush_hwstate(vcpu);
 
+   preempt_disable();
local_irq_disable();
 
/*
@@ -544,6 +545,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
+   preempt_enable();
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
continue;
@@ -559,8 +561,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
vcpu-mode = OUTSIDE_GUEST_MODE;
-   __kvm_guest_exit();
-   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   /*
+* Back from guest
+*/
+
/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
@@ -574,8 +578,17 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_enable();
 
/*
-* Back from guest
-*/
+* We do local_irq_enable() before calling kvm_guest_exit() so
+* that if a timer interrupt hits while running the guest we
+* account that tick as being spent in the guest.  We enable
+* preemption after calling kvm_guest_exit() so that if we get
+* preempted we make sure ticks after that is not counted as
+* guest time.
+*/
+   kvm_guest_exit();
+   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   preempt_enable();
+
 
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch v2 1/3] x86: kvmclock: add flag to indicate pvclock counts from zero

2015-05-28 Thread Marcelo Tosatti
Setting sched clock stable for kvmclock causes the printk timestamps
to not start from zero, which is different from baremetal and 
can possibly break userspace. Add a flag to indicate that 
hypervisor sets clock base at zero when kvmclock is initialized.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

---
 arch/x86/include/asm/pvclock-abi.h |1 +
 1 file changed, 1 insertion(+)

Index: kvm/arch/x86/include/asm/pvclock-abi.h
===
--- kvm.orig/arch/x86/include/asm/pvclock-abi.h 2014-11-06 23:59:14.615913334 
-0200
+++ kvm/arch/x86/include/asm/pvclock-abi.h  2015-05-27 17:40:53.435192771 
-0300
@@ -41,5 +41,6 @@
 
 #define PVCLOCK_TSC_STABLE_BIT (1  0)
 #define PVCLOCK_GUEST_STOPPED  (1  1)
+#define PVCLOCK_COUNTS_FROM_ZERO (1  2)
 #endif /* __ASSEMBLY__ */
 #endif /* _ASM_X86_PVCLOCK_ABI_H */


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 2/3] x86: kvmclock: set scheduler clock stable

2015-05-28 Thread Paolo Bonzini


On 28/05/2015 03:46, Marcelo Tosatti wrote:
 + flags = PVCLOCK_COUNTS_FROM_ZERO;

If the KVM_FEATURE_CLOCKSOURCE_STABLE_BIT bit is not set, we cannot
trust flags at all.  So let's just do...

   if (kvm_para_has_feature(KVM_FEATURE_CLOCKSOURCE_STABLE_BIT))
 - pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
 + flags |= PVCLOCK_TSC_STABLE_BIT;

-   pvclock_set_flags(PVCLOCK_TSC_STABLE_BIT);
+   pvclock_set_flags(~0);

Otherwise looks good.  Shall I do the above change and apply?

Paolo

 + pvclock_set_flags(flags);
 +
 + cpu = get_cpu();
 + vcpu_time = hv_clock[cpu].pvti;
 + flags = pvclock_read_flags(vcpu_time);
 + if (flags  PVCLOCK_COUNTS_FROM_ZERO)
 + set_sched_clock_stable();
 + put_cpu();
  }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/3] KVM: x86: zero kvmclock_offset when vcpu0 initializes kvmclock system MSR

2015-05-28 Thread Paolo Bonzini


On 28/05/2015 03:47, Marcelo Tosatti wrote:
 Initialize kvmclock base, on kvmclock system MSR write time,
 so that the guest sees kvmclock counting from zero.
 
 This matches baremetal behaviour when kvmclock in guest
 sets sched clock stable.
 
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 ---
  arch/x86/kvm/x86.c |5 +
  1 file changed, 5 insertions(+)
 
 Index: kvm/arch/x86/kvm/x86.c
 ===
 --- kvm.orig/arch/x86/kvm/x86.c   2015-05-27 17:40:46.948189811 -0300
 +++ kvm/arch/x86/kvm/x86.c2015-05-27 22:43:47.340413347 -0300
 @@ -1703,6 +1703,8 @@
   /* If the host uses TSC clocksource, then it is stable */
   if (use_master_clock)
   pvclock_flags |= PVCLOCK_TSC_STABLE_BIT;
 + if (ka-kvmclk_counts_from_zero)

Not defined anywhere.

 + pvclock_flags |= PVCLOCK_COUNTS_FROM_ZERO;

Why can't this be unconditional?

Paolo

   vcpu-hv_clock.flags = pvclock_flags;
  
 @@ -2282,6 +2284,9 @@
   vcpu-requests);
  
   ka-boot_vcpu_runs_old_kvmclock = tmp;
 +
 + ka-kvmclock_offset = -get_kernel_ns();
 + ka-kvmclk_counts_from_zero = true;
   }
  
   vcpu-arch.time = data;
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/13] KVM: x86: API changes for SMM support

2015-05-28 Thread Paolo Bonzini


On 27/05/2015 19:05, Paolo Bonzini wrote:
 This patch includes changes to the external API for SMM support.
 All the changes are predicated by the availability of a new
 capability, KVM_CAP_X86_SMM, which is added at the end of the
 patch series.
 
 Signed-off-by: Paolo Bonzini pbonz...@redhat.com

Radim, I forgot to change flags to u8 in this patch.  However, I am
thinking that it's just as good to leave it as u16.  The reason is that
the other two fields in this 32-bit word were just 1-bit flags that were
expanded to 8 bits for no particular reasons.

If we make it now

u8 flags;
u8 padding2;

chances are that in due time it will become simply

u8 flags;
u8 flags2;

and then it's nicer to just have an u16.  On the other hand, your
argument that KVM_RUN has very little free space is also a good one.
What do you think?  I have already done the change in my local repo, but
I can change it back too.

Paolo

 ---
   RFC-v1: add smi.pending and smi.rsm_unmasks_nmi fields, reduce padding
in struct kvm_vcpu_events; remove memset of events struct,
instead zero smi.pad.  KVM_CAP_X86_SMM frozen at 117.
 ---
  Documentation/virtual/kvm/api.txt | 40 
 +--
  arch/x86/include/asm/kvm_host.h   |  3 +++
  arch/x86/include/uapi/asm/kvm.h   | 11 ++-
  arch/x86/kvm/kvm_cache_regs.h |  5 +
  arch/x86/kvm/x86.c| 34 +++--
  include/uapi/linux/kvm.h  |  5 -
  6 files changed, 88 insertions(+), 10 deletions(-)
 
 diff --git a/Documentation/virtual/kvm/api.txt 
 b/Documentation/virtual/kvm/api.txt
 index 695544420ff2..51523b70b6b2 100644
 --- a/Documentation/virtual/kvm/api.txt
 +++ b/Documentation/virtual/kvm/api.txt
 @@ -820,11 +820,21 @@ struct kvm_vcpu_events {
   } nmi;
   __u32 sipi_vector;
   __u32 flags;
 + struct {
 + __u8 smm;
 + __u8 pending;
 + __u8 smm_inside_nmi;
 + __u8 pad;
 + } smi;
  };
  
 -KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
 -interrupt.shadow contains a valid state. Otherwise, this field is undefined.
 +Only two fields are defined in the flags field:
 +
 +- KVM_VCPUEVENT_VALID_SHADOW may be set in the flags field to signal that
 +  interrupt.shadow contains a valid state.
  
 +- KVM_VCPUEVENT_VALID_SMM may be set in the flags field to signal that
 +  smi contains a valid state.
  
  4.32 KVM_SET_VCPU_EVENTS
  
 @@ -841,17 +851,20 @@ vcpu.
  See KVM_GET_VCPU_EVENTS for the data structure.
  
  Fields that may be modified asynchronously by running VCPUs can be excluded
 -from the update. These fields are nmi.pending and sipi_vector. Keep the
 -corresponding bits in the flags field cleared to suppress overwriting the
 -current in-kernel state. The bits are:
 +from the update. These fields are nmi.pending, sipi_vector, smi.smm,
 +smi.pending. Keep the corresponding bits in the flags field cleared to
 +suppress overwriting the current in-kernel state. The bits are:
  
  KVM_VCPUEVENT_VALID_NMI_PENDING - transfer nmi.pending to the kernel
  KVM_VCPUEVENT_VALID_SIPI_VECTOR - transfer sipi_vector
 +KVM_VCPUEVENT_VALID_SMM - transfer the smi sub-struct.
  
  If KVM_CAP_INTR_SHADOW is available, KVM_VCPUEVENT_VALID_SHADOW can be set in
  the flags field to signal that interrupt.shadow contains a valid state and
  shall be written into the VCPU.
  
 +KVM_VCPUEVENT_VALID_SMM can only be set if KVM_CAP_X86_SMM is available.
 +
  
  4.33 KVM_GET_DEBUGREGS
  
 @@ -2979,6 +2992,16 @@ len must be a multiple of sizeof(struct kvm_s390_irq). 
 It must be  0
  and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
  which is the maximum number of possibly pending cpu-local interrupts.
  
 +4.90 KVM_SMI
 +
 +Capability: KVM_CAP_X86_SMM
 +Architectures: x86
 +Type: vcpu ioctl
 +Parameters: none
 +Returns: 0 on success, -1 on error
 +
 +Queues an SMI on the thread's vcpu.
 +
  5. The kvm_run structure
  
  
 @@ -3014,7 +3037,12 @@ an interrupt can be injected now with KVM_INTERRUPT.
  The value of the current interrupt flag.  Only valid if in-kernel
  local APIC is not used.
  
 - __u8 padding2[2];
 + __u16 flags;
 +
 +More architecture-specific flags detailing state of the VCPU that may
 +affect the device's behavior.  The only currently defined flag is
 +KVM_RUN_X86_SMM, which is valid on x86 machines and is set if the
 +VCPU is in system management mode.
  
   /* in (pre_kvm_run), out (post_kvm_run) */
   __u64 cr8;
 diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
 index 4e299fcd0eb6..d52d7aea375f 100644
 --- a/arch/x86/include/asm/kvm_host.h
 +++ b/arch/x86/include/asm/kvm_host.h
 @@ -471,6 +471,7 @@ struct kvm_vcpu_arch {
   atomic_t nmi_queued;  /* unprocessed asynchronous NMIs */
   unsigned nmi_pending; /* NMI 

[PATCH] KVM: arm64: fix misleading comments in save/restore

2015-05-28 Thread Alex Bennée
The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into the hypervisor code. In the case of guest state it
could be in either el0 or el1.

Signed-off-by: Alex Bennée alex.ben...@linaro.org
---
 arch/arm64/kvm/hyp.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index d755922..1940a4c 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -50,8 +50,8 @@
stp x29, lr, [x3, #80]
 
mrs x19, sp_el0
-   mrs x20, elr_el2// EL1 PC
-   mrs x21, spsr_el2   // EL1 pstate
+   mrs x20, elr_el2// PC before hyp entry
+   mrs x21, spsr_el2   // pstate before hyp entry
 
stp x19, x20, [x3, #96]
str x21, [x3, #112]
@@ -82,8 +82,8 @@
ldr x21, [x3, #16]
 
msr sp_el0, x19
-   msr elr_el2, x20// EL1 PC
-   msr spsr_el2, x21   // EL1 pstate
+   msr elr_el2, x20// PC to restore
+   msr spsr_el2, x21   // pstate to restore
 
add x3, x2, #CPU_XREG_OFFSET(19)
ldp x19, x20, [x3]
-- 
2.4.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] arm/arm64: KVM: Propertly account for guest CPU time

2015-05-28 Thread Christoffer Dall
Until now we have been calling kvm_guest_exit after re-enabling
interrupts when we come back from the guest, but this has the
unfortunate effect that CPU time accounting done in the context of timer
interrupts doesn't properly notice that the time since the last tick was
spent in the guest.

Inspired by the comment in the x86 code, simply move the
kvm_guest_exit() call below the local_irq_enable() call and change
__kvm_guest_exit() to kvm_guest_exit(), because we are now calling this
function with interrupts enabled.  Note that AFAIU we don't need an
explicit barrier like x86 because the arm/arm64 implementation of
local_irq_(en/dis)able has an implicit barrier.

At the same time, move the trace_kvm_exit() call outside of the atomic
section, since there is no reason for us to do that with interrupts
disabled.

Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
---
This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
rework recently posted by Christian Borntraeger.  I hope I got the logic
of this wrong, there were 2 slightly worrying facts about this:

First, we now enable and disable and enable interrupts on each exit
path, but I couldn't see any performance overhead on hackbench - yes the
only benchmark we care abotu.

Second, looking at the power and mips code, they seem to also call
kvm_guest_exit() before enabling interrupts, so I don't understand how
guest CPU time accounting works on those architectures.

 arch/arm/kvm/arm.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index e41cb11..bd0e463 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -559,8 +559,10 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
ret = kvm_call_hyp(__kvm_vcpu_run, vcpu);
 
vcpu-mode = OUTSIDE_GUEST_MODE;
-   __kvm_guest_exit();
-   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+   /*
+* Back from guest
+*/
+
/*
 * We may have taken a host interrupt in HYP mode (ie
 * while executing the guest). This interrupt is still
@@ -574,8 +576,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
local_irq_enable();
 
/*
-* Back from guest
-*/
+* We do local_irq_enable() before calling kvm_guest_exit so
+* that the cputime accounting done in the context of timer
+* interrupts properly accounts time spent in the guest as
+* guest time.
+*/
+   kvm_guest_exit();
+   trace_kvm_exit(kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu));
+
 
kvm_timer_sync_hwstate(vcpu);
kvm_vgic_sync_hwstate(vcpu);
-- 
2.1.2.330.g565301e.dirty

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 04/13] KVM: x86: API changes for SMM support

2015-05-28 Thread Radim Krčmář
2015-05-28 11:00+0200, Paolo Bonzini:
 On 27/05/2015 19:05, Paolo Bonzini wrote:
  This patch includes changes to the external API for SMM support.
  All the changes are predicated by the availability of a new
  capability, KVM_CAP_X86_SMM, which is added at the end of the
  patch series.
  
  Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 
 Radim, I forgot to change flags to u8 in this patch.  However, I am
 thinking that it's just as good to leave it as u16.  The reason is that
 the other two fields in this 32-bit word were just 1-bit flags that were
 expanded to 8 bits for no particular reasons.

(Yeah, it's sad that we have wasted all those bits.)

 If we make it now
 
   u8 flags;
   u8 padding2;
 
 chances are that in due time it will become simply
 
   u8 flags;
   u8 flags2;

True, I forgot that big endian exists, so we can't just say u16 later.
(u8 avoids endianess problems of C bit-fields, but I wouldn't have
 brought it up when u8 isn't strictly better ...)

 and then it's nicer to just have an u16.  On the other hand, your
 argument that KVM_RUN has very little free space is also a good one.

Thinking more about it ... we don't forbid multi-bit flags, so u16 can
be used fairly the same way.  (u16 would be a bit worse for an 8 bit
vector, but nicer for 8 bit vector.)

 What do you think?  I have already done the change in my local repo, but
 I can change it back too.

I'd be glad if we were more bit-field friendly, but whatever is the
least work for you is the best :)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] arm/arm64: KVM: Propertly account for guest CPU time

2015-05-28 Thread Christoffer Dall
On Thu, May 28, 2015 at 02:49:09PM +0200, Christoffer Dall wrote:
 Until now we have been calling kvm_guest_exit after re-enabling
 interrupts when we come back from the guest, but this has the
 unfortunate effect that CPU time accounting done in the context of timer
 interrupts doesn't properly notice that the time since the last tick was
 spent in the guest.
 
 Inspired by the comment in the x86 code, simply move the
 kvm_guest_exit() call below the local_irq_enable() call and change
 __kvm_guest_exit() to kvm_guest_exit(), because we are now calling this
 function with interrupts enabled.  Note that AFAIU we don't need an
 explicit barrier like x86 because the arm/arm64 implementation of
 local_irq_(en/dis)able has an implicit barrier.
 
 At the same time, move the trace_kvm_exit() call outside of the atomic
 section, since there is no reason for us to do that with interrupts
 disabled.
 
 Signed-off-by: Christoffer Dall christoffer.d...@linaro.org
 ---
 This patch is based on kvm/queue, because it has the kvm_guest_enter/exit
 rework recently posted by Christian Borntraeger.  I hope I got the logic
 of this wrong, there were 2 slightly worrying facts about this:

Of course this should have been:

I hope I got the logic of this *right*, but there...

Damn it!
-Christoffer
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html