Re: [PATCH 2/2] kvm tools: inject MSI directly without going through a GSI

2012-08-05 Thread Sasha Levin
On 08/04/2012 01:02 PM, Pekka Enberg wrote:
> On 08/04/2012 11:14 AM, Pekka Enberg wrote:
>>> This patch makes 'make check' hang for me. Full boot log below:
> 
> On Sat, Aug 4, 2012 at 12:30 PM, Sasha Levin  wrote:
>> Is your host kernel running 3.5? The new MSI injection ioctl is a new 3.5 
>> feature.
> 
> No, it's not running 3.5. We need to support older *host* kernels, though.

Do we? Don't we need to support just the kernel that the tool was built with?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reset problem vs. MMIO emulation, hypercalls, etc...

2012-08-05 Thread Avi Kivity
On 08/02/2012 11:29 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2012-08-02 at 16:05 +0300, Avi Kivity wrote:
>> > Yeah, we stumbled over this chunk as well. So you're saying we
>> should delay the reset by invoking a self-signal if we're in such an
>> operation?
>> 
>> Yes.  Qemu of course already supports this for migration, so it should
>> be easy to add.
> 
> Adding a self signal for the CPU initiating the reset is not enough,
> other VCPUs might also be in an hcall or MMIO emulation when that
> happens.

That happens naturally if you update (or read) registers through a
run_on_cpu() call.  run_on_cpu() should never happen within an mmio
sequence.

> 
> It must be done for all VCPUs, so best is to look at the migration
> situation.
> 
> For reset, there are two code path at play:
> 
>  - The VCPU initiating the request: It calls qemu_system_reset_request()
> which calls cpu_stop_current() directly after signaling the main loop
> 
>  - The other VCPUs are then marked with the "stop" flag by the maintloop
> which will then wait for them to set "stopped" to 1, which is done by
> qemu_wait_io_event_common() when it sees "stop".
> 
> Now, it seems like suspend also uses that same technique. I don't
> totally grasp where migration fits in that picture and where it does the
> KVM_RUN with a signal pending trick to complete pending operations, any
> chance you can enlighten me ?

I'm afraid I no longer know the details so closely, the code has changed
quite a lot.  But the self-signal happens in kvm_cpu_exec(), see also
env->exit_request.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reset problem vs. MMIO emulation, hypercalls, etc...

2012-08-05 Thread Avi Kivity
On 08/04/2012 01:32 AM, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-03 at 15:05 -0300, Marcelo Tosatti wrote:
> 
>> See kvm_arch_process_async_events() call to qemu_system_reset_request()
>> in target-i386/kvm.c.
>> 
>> The whole thing is fragile, though: we rely on the order events
>> are processed inside KVM_RUN, in x86:
>> 
>> 1) If there is pending MMIO, process it.
>> 2) If not, return with -EINTR (and KVM_EXIT_INTR) in case
>> there is a signal pending.
>> 
>> That way, the vcpu will not process the stop event from the main loop
>> (ie not exit from the kvm_cpu_exec() loop), until MMIO is finished.
> 
> Right, it is fragile, thankfully we appear to adhere to the same
> ordering on powerpc so far :-)
> 
> So we'll need to test but it looks like we might be able to fix our
> problem without a kernel or API change, just by changing qemu to
> do the same exit_request trick for our reboot hypercall.
> 
> Long run however, I wonder whether we should consider an explicit ioctl
> to complete those pending operations instead...

It's pointless.  We have to support the old method forever.  There's no
material different between sigqueue() + KVM_RUN and KVM_COMPLETE, or a
KVM_RUN with a flag that tells it to exit immediately.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: inject MSI directly without going through a GSI

2012-08-05 Thread Pekka Enberg
On 08/04/2012 01:02 PM, Pekka Enberg wrote:
>> No, it's not running 3.5. We need to support older *host* kernels,
>> though.

On Sun, Aug 5, 2012 at 10:02 AM, Sasha Levin  wrote:
> Do we? Don't we need to support just the kernel that the tool was
> built with?

We only do that for *guest kernels* if we have to but we've always been
compatible with older host kernels.

Isn't there a capability flag that KVM sets if KVM_SIGNAL_MSI is
supported? Just store that in 'struct kvm" and switch between
virtio_pci__signal_msi() and kvm__irq_trigger() depending on wheter the
flag is set.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: inject MSI directly without going through a GSI

2012-08-05 Thread Sasha Levin
On 08/05/2012 11:08 AM, Pekka Enberg wrote:
> On 08/04/2012 01:02 PM, Pekka Enberg wrote:
>>> No, it's not running 3.5. We need to support older *host* kernels,
>>> though.
> 
> On Sun, Aug 5, 2012 at 10:02 AM, Sasha Levin  wrote:
>> Do we? Don't we need to support just the kernel that the tool was
>> built with?
> 
> We only do that for *guest kernels* if we have to but we've always been
> compatible with older host kernels.
> 
> Isn't there a capability flag that KVM sets if KVM_SIGNAL_MSI is
> supported? Just store that in 'struct kvm" and switch between
> virtio_pci__signal_msi() and kvm__irq_trigger() depending on wheter the
> flag is set.

There is, but we've broken backwards compatibility for guests several times 
before as well - which is why I assumed thats fine.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested kvm_intel broken on pre 3.3 hosts

2012-08-05 Thread Avi Kivity
On 08/03/2012 01:57 PM, Stefan Bader wrote:
>> No, you're backporting the entire feature.  All we need is to expose
>> RDPMC intercept to the guest.
> 
> Oh well, I thought that was the thing you asked for...

Sorry for being unclear.

> 
>> It should be sufficient to backport the bits in
>> nested_vmx_setup_ctls_msrs() and nested_vmx_exit_handled().
> 
> Ok, how about that? It is probably wrong again, but at least it
> allows to load the kvm-intel module from within a nested guest
> and not having the feature pretend to fail seems the closest
> thing to do...
> 
> ---
> 
> From 0aeb99348363b7aeb2b0bd92428cb212159fa468 Mon Sep 17 00:00:00 2001
> From: Stefan Bader 
> Date: Thu, 10 Nov 2011 14:57:25 +0200
> Subject: [PATCH] KVM: VMX: Fake intercept RDPMC
> 
> Based on commit fee84b079d5ddee2247b5c1f53162c330c622902 upstream.
> 
>   Intercept RDPMC and forward it to the PMU emulation code.
> 
> But drop the requirement for the feature being present and instead
> of forwarding, cause a GP as if the call had failed.
> 
> BugLink: http://bugs.launchpad.net/bugs/1031090
> Signed-off-by: Stefan Bader 
> ---
>  arch/x86/kvm/vmx.c |   10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 7315488..fc937f2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1956,6 +1956,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
>  #endif
>   CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
>   CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
> + CPU_BASED_RDPMC_EXITING |
>   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
>   /*
>* We can allow some features even when not supported by the
> @@ -4613,6 +4614,14 @@ static int handle_invlpg(struct kvm_vcpu *vcpu)
>   return 1;
>  }
>  
> +static int handle_rdpmc(struct kvm_vcpu *vcpu)
> +{
> + /* Instead of implementing the feature, cause a GP */
> + kvm_complete_insn_gp(vcpu, 1);
> +
> + return 1;
> +}

In fact this should never be called, since we never request RDPMC
exiting for L1.

> +
>  static int handle_wbinvd(struct kvm_vcpu *vcpu)
>  {
>   skip_emulated_instruction(vcpu);
> @@ -5563,6 +5572,7 @@ static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu 
> *vcpu) = {
>   [EXIT_REASON_HLT] = handle_halt,
>   [EXIT_REASON_INVD]= handle_invd,
>   [EXIT_REASON_INVLPG]  = handle_invlpg,
> + [EXIT_REASON_RDPMC]   = handle_rdpmc,
>   [EXIT_REASON_VMCALL]  = handle_vmcall,
>   [EXIT_REASON_VMCLEAR] = handle_vmclear,
>   [EXIT_REASON_VMLAUNCH]= handle_vmlaunch,
> 

Provided you backport the bit in nested_vmx_exit_handled().  That takes
the L2->L1 RDPMC exit and forwards it to L1.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm tools: inject MSI directly without going through a GSI

2012-08-05 Thread Pekka Enberg
On 08/05/2012 11:08 AM, Pekka Enberg wrote:
>> Isn't there a capability flag that KVM sets if KVM_SIGNAL_MSI is
>> supported? Just store that in 'struct kvm" and switch between
>> virtio_pci__signal_msi() and kvm__irq_trigger() depending on wheter the
>> flag is set.

On Sun, Aug 5, 2012 at 12:14 PM, Sasha Levin  wrote:
> There is, but we've broken backwards compatibility for guests several
> times before as well - which is why I assumed thats fine.

Let me be clear about this: I don't like breaking backwards
compatibility at all. Yes, we have done it in the past for *guest
kernels* for various technical reasons but we've never ever done it on
purpose for host kernels.

We have been more relaxed on backwards compatibility requirements than
QEMU but I think we're reaching a point where it's more painful to break
backwards compatibility than it is to write code to accommodate older
kernels. Especially for something like this which is obviously a new KVM
feature and not required for running Linux.

Pekka
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update KVM_SAVE_MSRS_BEGIN to correct value

2012-08-05 Thread Avi Kivity
On 08/03/2012 09:08 PM, Marcelo Tosatti wrote:
> On Wed, Aug 01, 2012 at 05:01:42PM +0300, Gleb Natapov wrote:
>> When MSR_KVM_PV_EOI_EN was added to msrs_to_save array
>> KVM_SAVE_MSRS_BEGIN was not updated accordingly.
>> 
>> Signed-off-by: Gleb Natapov 
> 
> Applied, thanks.

This should go into 3.6-rc, no?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update KVM_SAVE_MSRS_BEGIN to correct value

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 12:47:42PM +0300, Avi Kivity wrote:
> On 08/03/2012 09:08 PM, Marcelo Tosatti wrote:
> > On Wed, Aug 01, 2012 at 05:01:42PM +0300, Gleb Natapov wrote:
> >> When MSR_KVM_PV_EOI_EN was added to msrs_to_save array
> >> KVM_SAVE_MSRS_BEGIN was not updated accordingly.
> >> 
> >> Signed-off-by: Gleb Natapov 
> > 
> > Applied, thanks.
> 
> This should go into 3.6-rc, no?
If we want migration to work there then yes.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/6] arch/powerpc/kvm/e500_tlb.c: fix error return code

2012-08-05 Thread Julia Lawall
From: Julia Lawall 

Convert a 0 error return code to a negative one, as returned elsewhere in the
function.

A new label is also added to avoid freeing things that are known to not yet
be allocated.

A simplified version of the semantic match that finds the first problem is as
follows: (http://coccinelle.lip6.fr/)

// 
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = 
\(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// 

Signed-off-by: Julia Lawall 

---
 arch/powerpc/kvm/e500_tlb.c |   19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index c8f6c58..85fcdf7 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -1176,21 +1176,27 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
}
 
virt = vmap(pages, num_pages, VM_MAP, PAGE_KERNEL);
-   if (!virt)
+   if (!virt) {
+   ret = -ENOMEM;
goto err_put_page;
+   }
 
privs[0] = kzalloc(sizeof(struct tlbe_priv) * params.tlb_sizes[0],
   GFP_KERNEL);
privs[1] = kzalloc(sizeof(struct tlbe_priv) * params.tlb_sizes[1],
   GFP_KERNEL);
 
-   if (!privs[0] || !privs[1])
-   goto err_put_page;
+   if (!privs[0] || !privs[1]) {
+   ret = -ENOMEM;
+   goto err_privs;
+   }
 
g2h_bitmap = kzalloc(sizeof(u64) * params.tlb_sizes[1],
 GFP_KERNEL);
-   if (!g2h_bitmap)
-   goto err_put_page;
+   if (!g2h_bitmap) {
+   ret = -ENOMEM;
+   goto err_privs;
+   }
 
free_gtlb(vcpu_e500);
 
@@ -1230,10 +1236,11 @@ int kvm_vcpu_ioctl_config_tlb(struct kvm_vcpu *vcpu,
kvmppc_recalc_tlb1map_range(vcpu_e500);
return 0;
 
-err_put_page:
+err_privs:
kfree(privs[0]);
kfree(privs[1]);
 
+err_put_page:
for (i = 0; i < num_pages; i++)
put_page(pages[i]);
 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM segfaults with 3.5 while installing ubuntu 12.04

2012-08-05 Thread Stefan Priebe

Am 01.08.2012 11:53, schrieb Avi Kivity:

On 08/01/2012 12:42 PM, Stefan Priebe - Profihost AG wrote:

Am 01.08.2012 11:33, schrieb Avi Kivity:

So here are 3 backtraces from booting the rescue system:
http://pastebin.com/raw.php?i=xCy2pEcP

To me they all look the same.


They are.  What version of qemu are you using?


latest stable-1.1 branch (1.1.1) - which works fine with latest RHEL6
kernel.


This could be due to a kernel bug, or due to a different code path taken
in qemu because of differing features exposed to kvm.

Please try qemu-kvm.git master and report.


qemu-kvm.git master just hangs after the 2nd screen with using 100% CPU 
and doing nothing... so i can't test with qemu-kvm.master


Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM segfaults with 3.5 while installing ubuntu 12.04

2012-08-05 Thread Avi Kivity
On 08/05/2012 01:08 PM, Stefan Priebe wrote:
> Am 01.08.2012 11:53, schrieb Avi Kivity:
>> On 08/01/2012 12:42 PM, Stefan Priebe - Profihost AG wrote:
>>> Am 01.08.2012 11:33, schrieb Avi Kivity:
> So here are 3 backtraces from booting the rescue system:
> http://pastebin.com/raw.php?i=xCy2pEcP
>
> To me they all look the same.

 They are.  What version of qemu are you using?
>>>
>>> latest stable-1.1 branch (1.1.1) - which works fine with latest RHEL6
>>> kernel.
>>
>> This could be due to a kernel bug, or due to a different code path taken
>> in qemu because of differing features exposed to kvm.
>>
>> Please try qemu-kvm.git master and report.
> 
> qemu-kvm.git master just hangs after the 2nd screen with using 100% CPU
> and doing nothing... so i can't test with qemu-kvm.master

Please provide your command line and I will try to reproduce.

Which iso image are you running?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Gleb Natapov
APIC code has a lot of checks for apic presence and apic HW/SW enable
state.  Most common configuration is when each vcpu has in kernel apic
and it is fully enabled. This path series uses jump labels to turn checks
to nops in the common case. 

Gleb Natapov (8):
  KVM: clean up kvm_(set|get)_apic_base
  KVM: use kvm_lapic_set_base() to change apic_base
  KVM: mark apic enabled on start up.
  Export jump_label_rate_limit()
  KVM: use jump label to optimize checking for HW enabled APIC in
APIC_BASE MSR.
  KVM: use jump label to optimize checking for SW enabled apic in
spurious interrupt register
  KVM: use jump label to optimize checking for in kernel local apic
presence.
  KVM: inline kvm_apic_present() and kvm_lapic_enabled()

 arch/x86/kvm/lapic.c |  211 +++---
 arch/x86/kvm/lapic.h |   46 ++-
 arch/x86/kvm/x86.c   |   18 ++---
 arch/x86/kvm/x86.h   |1 +
 kernel/jump_label.c  |1 +
 5 files changed, 170 insertions(+), 107 deletions(-)

-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/8] KVM: clean up kvm_(set|get)_apic_base

2012-08-05 Thread Gleb Natapov
kvm_get_apic_base() needlessly checks irqchip_in_kernel although it does
the same no matter what result of the check is. kvm_set_apic_base() also
checks for irqchip_in_kernel, but kvm_lapic_set_base() can handle this
case.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/x86.c |   10 ++
 1 file changed, 2 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 530d800..f339b2c 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -246,20 +246,14 @@ static void drop_user_return_notifiers(void *ignore)
 
 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu)
 {
-   if (irqchip_in_kernel(vcpu->kvm))
-   return vcpu->arch.apic_base;
-   else
-   return vcpu->arch.apic_base;
+   return vcpu->arch.apic_base;
 }
 EXPORT_SYMBOL_GPL(kvm_get_apic_base);
 
 void kvm_set_apic_base(struct kvm_vcpu *vcpu, u64 data)
 {
/* TODO: reserve bits check */
-   if (irqchip_in_kernel(vcpu->kvm))
-   kvm_lapic_set_base(vcpu, data);
-   else
-   vcpu->arch.apic_base = data;
+   kvm_lapic_set_base(vcpu, data);
 }
 EXPORT_SYMBOL_GPL(kvm_set_apic_base);
 
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/8] KVM: use kvm_lapic_set_base() to change apic_base

2012-08-05 Thread Gleb Natapov
Do not change apic_base directly. Use kvm_lapic_set_base() instead.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 0cd431c..49f4ac0 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1185,7 +1185,8 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
update_divide_count(apic);
atomic_set(&apic->lapic_timer.pending, 0);
if (kvm_vcpu_is_bsp(vcpu))
-   vcpu->arch.apic_base |= MSR_IA32_APICBASE_BSP;
+   kvm_lapic_set_base(vcpu,
+   vcpu->arch.apic_base | MSR_IA32_APICBASE_BSP);
vcpu->arch.pv_eoi.msr_val = 0;
apic_update_ppr(apic);
 
@@ -1310,8 +1311,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 HRTIMER_MODE_ABS);
apic->lapic_timer.timer.function = apic_timer_fn;
 
-   apic->base_address = APIC_DEFAULT_PHYS_BASE;
-   vcpu->arch.apic_base = APIC_DEFAULT_PHYS_BASE;
+   kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
 
kvm_lapic_reset(vcpu);
kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
@@ -1380,8 +1380,7 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
 
-   apic->base_address = vcpu->arch.apic_base &
-MSR_IA32_APICBASE_BASE;
+   kvm_lapic_set_base(vcpu, vcpu->arch.apic_base);
kvm_apic_set_version(vcpu);
 
apic_update_ppr(apic);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/8] KVM: mark apic enabled on start up.

2012-08-05 Thread Gleb Natapov
According to SDM apic is enabled on start up.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 49f4ac0..c3f14fe 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1311,7 +1311,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 HRTIMER_MODE_ABS);
apic->lapic_timer.timer.function = apic_timer_fn;
 
-   kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
+   kvm_lapic_set_base(vcpu,
+   APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
kvm_lapic_reset(vcpu);
kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 5/8] KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSR.

2012-08-05 Thread Gleb Natapov
Usually all APICs are HW enabled so the check can be optimized out.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |   29 -
 arch/x86/kvm/lapic.h |1 +
 arch/x86/kvm/x86.c   |1 +
 3 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c3f14fe..1aa5528 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -34,6 +34,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "kvm_cache_regs.h"
 #include "irq.h"
 #include "trace.h"
@@ -117,9 +118,13 @@ static inline int __apic_test_and_clear_vector(int vec, 
void *bitmap)
return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
 }
 
+struct static_key_deferred apic_hw_disabled __read_mostly;
+
 static inline int apic_hw_enabled(struct kvm_lapic *apic)
 {
-   return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
+   if (static_key_false(&apic_hw_disabled.key))
+   return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
+   return MSR_IA32_APICBASE_ENABLE;
 }
 
 static inline int  apic_sw_enabled(struct kvm_lapic *apic)
@@ -1055,6 +1060,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu)
 
hrtimer_cancel(&vcpu->arch.apic->lapic_timer.timer);
 
+   if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
+   static_key_slow_dec_deferred(&apic_hw_disabled);
+
if (vcpu->arch.apic->regs)
free_page((unsigned long)vcpu->arch.apic->regs);
 
@@ -1125,6 +1133,14 @@ void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value)
return;
}
 
+   /* update jump label if enable bit changes */
+   if ((vcpu->arch.apic_base ^ value) & MSR_IA32_APICBASE_ENABLE) {
+   if (value & MSR_IA32_APICBASE_ENABLE)
+   static_key_slow_dec_deferred(&apic_hw_disabled);
+   else
+   static_key_slow_inc(&apic_hw_disabled.key);
+   }
+
if (!kvm_vcpu_is_bsp(apic->vcpu))
value &= ~MSR_IA32_APICBASE_BSP;
 
@@ -1311,6 +1327,11 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 HRTIMER_MODE_ABS);
apic->lapic_timer.timer.function = apic_timer_fn;
 
+   /*
+* APIC is created enabled. This will prevent kvm_lapic_set_base from
+* thinking that APIC satet has changed.
+*/ 
+   vcpu->arch.apic_base = MSR_IA32_APICBASE_ENABLE;
kvm_lapic_set_base(vcpu,
APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
@@ -1598,3 +1619,9 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 
data)
return kvm_gfn_to_hva_cache_init(vcpu->kvm, &vcpu->arch.pv_eoi.data,
 addr);
 }
+
+void kvm_lapic_init(void)
+{
+   /* do not patch jump label more than once per second */
+   jump_label_rate_limit(&apic_hw_disabled, HZ);
+}
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 166766f..73fa299 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -78,4 +78,5 @@ static inline bool kvm_hv_vapic_assist_page_enabled(struct 
kvm_vcpu *vcpu)
 }
 
 int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data);
+void kvm_lapic_init(void);
 #endif
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index f339b2c..12812db 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4908,6 +4908,7 @@ int kvm_arch_init(void *opaque)
if (cpu_has_xsave)
host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK);
 
+   kvm_lapic_init();
return 0;
 
 out:
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/8] KVM: use jump label to optimize checking for SW enabled apic in spurious interrupt register

2012-08-05 Thread Gleb Natapov
Usually all APICs are SW enabled so the check can be optimized out.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |   39 +++
 1 file changed, 31 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 1aa5528..952774c 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -127,9 +127,24 @@ static inline int apic_hw_enabled(struct kvm_lapic *apic)
return MSR_IA32_APICBASE_ENABLE;
 }
 
-static inline int  apic_sw_enabled(struct kvm_lapic *apic)
+struct static_key_deferred apic_sw_disabled __read_mostly;
+
+static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
+{
+   if ((apic_get_reg(apic, APIC_SPIV) ^ val) & APIC_SPIV_APIC_ENABLED) {
+   if (val & APIC_SPIV_APIC_ENABLED)
+   static_key_slow_dec_deferred(&apic_sw_disabled);
+   else
+   static_key_slow_inc(&apic_sw_disabled.key);
+   }
+   apic_set_reg(apic, APIC_SPIV, val);
+}
+
+static inline int apic_sw_enabled(struct kvm_lapic *apic)
 {
-   return apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_APIC_ENABLED;
+   if (static_key_false(&apic_sw_disabled.key))
+   return apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_APIC_ENABLED;
+   return APIC_SPIV_APIC_ENABLED;
 }
 
 static inline int apic_enabled(struct kvm_lapic *apic)
@@ -918,7 +933,7 @@ static int apic_reg_write(struct kvm_lapic *apic, u32 reg, 
u32 val)
u32 mask = 0x3ff;
if (apic_get_reg(apic, APIC_LVR) & APIC_LVR_DIRECTED_EOI)
mask |= APIC_SPIV_DIRECTED_EOI;
-   apic_set_reg(apic, APIC_SPIV, val & mask);
+   apic_set_spiv(apic, val & mask);
if (!(val & APIC_SPIV_APIC_ENABLED)) {
int i;
u32 lvt_val;
@@ -1055,18 +1070,23 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 
 void kvm_free_lapic(struct kvm_vcpu *vcpu)
 {
+   struct kvm_lapic *apic = vcpu->arch.apic;
+
if (!vcpu->arch.apic)
return;
 
-   hrtimer_cancel(&vcpu->arch.apic->lapic_timer.timer);
+   hrtimer_cancel(&apic->lapic_timer.timer);
 
if (!(vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE))
static_key_slow_dec_deferred(&apic_hw_disabled);
 
-   if (vcpu->arch.apic->regs)
-   free_page((unsigned long)vcpu->arch.apic->regs);
+   if (!(apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_APIC_ENABLED))
+   static_key_slow_dec_deferred(&apic_sw_disabled);
 
-   kfree(vcpu->arch.apic);
+   if (apic->regs)
+   free_page((unsigned long)apic->regs);
+
+   kfree(apic);
 }
 
 /*
@@ -1182,7 +1202,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 SET_APIC_DELIVERY_MODE(0, APIC_MODE_EXTINT));
 
apic_set_reg(apic, APIC_DFR, 0xU);
-   apic_set_reg(apic, APIC_SPIV, 0xff);
+   apic_set_spiv(apic, 0xff);
apic_set_reg(apic, APIC_TASKPRI, 0);
apic_set_reg(apic, APIC_LDR, 0);
apic_set_reg(apic, APIC_ESR, 0);
@@ -1335,6 +1355,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
kvm_lapic_set_base(vcpu,
APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
 
+   static_key_slow_inc(&apic_sw_disabled.key); /* sw disabled at reset */
kvm_lapic_reset(vcpu);
kvm_iodevice_init(&apic->dev, &apic_mmio_ops);
 
@@ -1404,6 +1425,7 @@ void kvm_apic_post_state_restore(struct kvm_vcpu *vcpu)
 
kvm_lapic_set_base(vcpu, vcpu->arch.apic_base);
kvm_apic_set_version(vcpu);
+   apic_set_spiv(apic, apic_get_reg(apic, APIC_SPIV));
 
apic_update_ppr(apic);
hrtimer_cancel(&apic->lapic_timer.timer);
@@ -1624,4 +1646,5 @@ void kvm_lapic_init(void)
 {
/* do not patch jump label more than once per second */
jump_label_rate_limit(&apic_hw_disabled, HZ);
+   jump_label_rate_limit(&apic_sw_disabled, HZ);
 }
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 8/8] KVM: inline kvm_apic_present() and kvm_lapic_enabled()

2012-08-05 Thread Gleb Natapov
Those functions are used during interrupt injection. When inlined they
become nops on the fast path.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |  143 +++---
 arch/x86/kvm/lapic.h |   45 +++-
 2 files changed, 96 insertions(+), 92 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index c97284b..f019ca2 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -73,11 +73,6 @@
 static unsigned int min_timer_period_us = 500;
 module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR);
 
-static inline u32 apic_get_reg(struct kvm_lapic *apic, int reg_off)
-{
-   return *((u32 *) (apic->regs + reg_off));
-}
-
 static inline void apic_set_reg(struct kvm_lapic *apic, int reg_off, u32 val)
 {
*((u32 *) (apic->regs + reg_off)) = val;
@@ -119,19 +114,11 @@ static inline int __apic_test_and_clear_vector(int vec, 
void *bitmap)
 }
 
 struct static_key_deferred apic_hw_disabled __read_mostly;
-
-static inline int apic_hw_enabled(struct kvm_lapic *apic)
-{
-   if (static_key_false(&apic_hw_disabled.key))
-   return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
-   return MSR_IA32_APICBASE_ENABLE;
-}
-
 struct static_key_deferred apic_sw_disabled __read_mostly;
 
 static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val)
 {
-   if ((apic_get_reg(apic, APIC_SPIV) ^ val) & APIC_SPIV_APIC_ENABLED) {
+   if ((kvm_apic_get_reg(apic, APIC_SPIV) ^ val) & APIC_SPIV_APIC_ENABLED) 
{
if (val & APIC_SPIV_APIC_ENABLED)
static_key_slow_dec_deferred(&apic_sw_disabled);
else
@@ -140,23 +127,9 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, 
u32 val)
apic_set_reg(apic, APIC_SPIV, val);
 }
 
-static inline int apic_sw_enabled(struct kvm_lapic *apic)
-{
-   if (static_key_false(&apic_sw_disabled.key))
-   return apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_APIC_ENABLED;
-   return APIC_SPIV_APIC_ENABLED;
-}
-
 static inline int apic_enabled(struct kvm_lapic *apic)
 {
-   return apic_sw_enabled(apic) && apic_hw_enabled(apic);
-}
-
-static inline bool vcpu_has_lapic(struct kvm_vcpu *vcpu)
-{
-   if (static_key_false(&kvm_no_apic_vcpu))
-   return vcpu->arch.apic;
-   return true;
+   return kvm_apic_sw_enabled(apic) && kvm_apic_hw_enabled(apic);
 }
 
 #define LVT_MASK   \
@@ -168,34 +141,34 @@ static inline bool vcpu_has_lapic(struct kvm_vcpu *vcpu)
 
 static inline int kvm_apic_id(struct kvm_lapic *apic)
 {
-   return (apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
+   return (kvm_apic_get_reg(apic, APIC_ID) >> 24) & 0xff;
 }
 
 static inline int apic_lvt_enabled(struct kvm_lapic *apic, int lvt_type)
 {
-   return !(apic_get_reg(apic, lvt_type) & APIC_LVT_MASKED);
+   return !(kvm_apic_get_reg(apic, lvt_type) & APIC_LVT_MASKED);
 }
 
 static inline int apic_lvt_vector(struct kvm_lapic *apic, int lvt_type)
 {
-   return apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK;
+   return kvm_apic_get_reg(apic, lvt_type) & APIC_VECTOR_MASK;
 }
 
 static inline int apic_lvtt_oneshot(struct kvm_lapic *apic)
 {
-   return ((apic_get_reg(apic, APIC_LVTT) &
+   return ((kvm_apic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_ONESHOT);
 }
 
 static inline int apic_lvtt_period(struct kvm_lapic *apic)
 {
-   return ((apic_get_reg(apic, APIC_LVTT) &
+   return ((kvm_apic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask) == APIC_LVT_TIMER_PERIODIC);
 }
 
 static inline int apic_lvtt_tscdeadline(struct kvm_lapic *apic)
 {
-   return ((apic_get_reg(apic, APIC_LVTT) &
+   return ((kvm_apic_get_reg(apic, APIC_LVTT) &
apic->lapic_timer.timer_mode_mask) ==
APIC_LVT_TIMER_TSCDEADLINE);
 }
@@ -211,7 +184,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *feat;
u32 v = APIC_VERSION;
 
-   if (!vcpu_has_lapic(vcpu))
+   if (!kvm_vcpu_has_lapic(vcpu))
return;
 
feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
@@ -319,7 +292,7 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 * will cause vmexit immediately and the value will be recalculated
 * on the next vmentry.
 */
-   if (!vcpu_has_lapic(vcpu))
+   if (!kvm_vcpu_has_lapic(vcpu))
return 0;
highest_irr = apic_find_highest_irr(vcpu->arch.apic);
 
@@ -404,8 +377,8 @@ static void apic_update_ppr(struct kvm_lapic *apic)
u32 tpr, isrv, ppr, old_ppr;
int isr;
 
-   old_ppr = apic_get_reg(apic, APIC_PROCPRI);
-   tpr = apic_get_reg(apic, APIC_TASKPRI);
+   old_ppr = kvm_apic_get_reg(apic, APIC_PROCPRI);
+   tpr = kvm_apic_get_reg(apic, APIC_TASKPRI);
isr = apic_find_highest_isr(apic);
isrv = (isr != -1) ? i

[PATCH 7/8] KVM: use jump label to optimize checking for in kernel local apic presence.

2012-08-05 Thread Gleb Natapov
Usually all vcpus have local apic pointer initialized, so the check may
be completely skipped.

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |   62 +++---
 arch/x86/kvm/x86.c   |7 +-
 arch/x86/kvm/x86.h   |1 +
 3 files changed, 41 insertions(+), 29 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 952774c..c97284b 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -152,6 +152,13 @@ static inline int apic_enabled(struct kvm_lapic *apic)
return apic_sw_enabled(apic) && apic_hw_enabled(apic);
 }
 
+static inline bool vcpu_has_lapic(struct kvm_vcpu *vcpu)
+{
+   if (static_key_false(&kvm_no_apic_vcpu))
+   return vcpu->arch.apic;
+   return true;
+}
+
 #define LVT_MASK   \
(APIC_LVT_MASKED | APIC_SEND_PENDING | APIC_VECTOR_MASK)
 
@@ -204,7 +211,7 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu)
struct kvm_cpuid_entry2 *feat;
u32 v = APIC_VERSION;
 
-   if (!irqchip_in_kernel(vcpu->kvm))
+   if (!vcpu_has_lapic(vcpu))
return;
 
feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0);
@@ -305,7 +312,6 @@ static inline void apic_clear_isr(int vec, struct kvm_lapic 
*apic)
 
 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 {
-   struct kvm_lapic *apic = vcpu->arch.apic;
int highest_irr;
 
/* This may race with setting of irr in __apic_accept_irq() and
@@ -313,9 +319,9 @@ int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu)
 * will cause vmexit immediately and the value will be recalculated
 * on the next vmentry.
 */
-   if (!apic)
+   if (!vcpu_has_lapic(vcpu))
return 0;
-   highest_irr = apic_find_highest_irr(apic);
+   highest_irr = apic_find_highest_irr(vcpu->arch.apic);
 
return highest_irr;
 }
@@ -1061,9 +1067,7 @@ static int apic_mmio_write(struct kvm_io_device *this,
 
 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu)
 {
-   struct kvm_lapic *apic = vcpu->arch.apic;
-
-   if (apic)
+   if (vcpu_has_lapic(vcpu))
apic_reg_write(vcpu->arch.apic, APIC_EOI, 0);
 }
 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
@@ -1098,10 +1102,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu)
 u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
-   if (!apic)
-   return 0;
 
-   if (apic_lvtt_oneshot(apic) || apic_lvtt_period(apic))
+   if (!vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) ||
+   apic_lvtt_period(apic))
return 0;
 
return apic->lapic_timer.tscdeadline;
@@ -1110,10 +1113,9 @@ u64 kvm_get_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu)
 void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64 data)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
-   if (!apic)
-   return;
 
-   if (apic_lvtt_oneshot(apic) || apic_lvtt_period(apic))
+   if (!vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) ||
+   apic_lvtt_period(apic))
return;
 
hrtimer_cancel(&apic->lapic_timer.timer);
@@ -1125,20 +1127,21 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned 
long cr8)
 {
struct kvm_lapic *apic = vcpu->arch.apic;
 
-   if (!apic)
+   if (!vcpu_has_lapic(vcpu))
return;
+
apic_set_tpr(apic, ((cr8 & 0x0f) << 4)
 | (apic_get_reg(apic, APIC_TASKPRI) & 4));
 }
 
 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu)
 {
-   struct kvm_lapic *apic = vcpu->arch.apic;
u64 tpr;
 
-   if (!apic)
+   if (!vcpu_has_lapic(vcpu))
return 0;
-   tpr = (u64) apic_get_reg(apic, APIC_TASKPRI);
+
+   tpr = (u64) apic_get_reg(vcpu->arch.apic, APIC_TASKPRI);
 
return (tpr & 0xf0) >> 4;
 }
@@ -1237,7 +1240,7 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu)
 
 bool kvm_apic_present(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.apic && apic_hw_enabled(vcpu->arch.apic);
+   return vcpu_has_lapic(vcpu) && apic_hw_enabled(vcpu->arch.apic);
 }
 
 int kvm_lapic_enabled(struct kvm_vcpu *vcpu)
@@ -1258,10 +1261,11 @@ static bool lapic_is_periodic(struct kvm_lapic *apic)
 
 int apic_has_pending_timer(struct kvm_vcpu *vcpu)
 {
-   struct kvm_lapic *lapic = vcpu->arch.apic;
+   struct kvm_lapic *apic = vcpu->arch.apic;
 
-   if (lapic && apic_enabled(lapic) && apic_lvt_enabled(lapic, APIC_LVTT))
-   return atomic_read(&lapic->lapic_timer.pending);
+   if (vcpu_has_lapic(vcpu) && apic_enabled(apic) &&
+   apic_lvt_enabled(apic, APIC_LVTT))
+   return atomic_read(&apic->lapic_timer.pending);
 
return 0;
 }
@@ -1371,7 +1375,7 @@ int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu)
struct kvm_lapic *apic = vcpu->arch.apic;
int highest_irr;
 
-   if (!apic || !apic_enabled(apic))
+

[PATCH 4/8] Export jump_label_rate_limit()

2012-08-05 Thread Gleb Natapov
CC: Jason Baron 
CC: Ingo Molnar 
CC: Peter Zijlstra 
Signed-off-by: Gleb Natapov 
---
 kernel/jump_label.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 4304919..60f48fa 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -118,6 +118,7 @@ void jump_label_rate_limit(struct static_key_deferred *key,
key->timeout = rl;
INIT_DELAYED_WORK(&key->work, jump_label_update_timeout);
 }
+EXPORT_SYMBOL_GPL(jump_label_rate_limit);
 
 static int addr_conflict(struct jump_entry *entry, void *start, void *end)
 {
-- 
1.7.10

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Avi Kivity
On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> APIC code has a lot of checks for apic presence and apic HW/SW enable
> state.  Most common configuration is when each vcpu has in kernel apic
> and it is fully enabled. This path series uses jump labels to turn checks
> to nops in the common case. 
> 
> Gleb Natapov (8):
>   KVM: clean up kvm_(set|get)_apic_base
>   KVM: use kvm_lapic_set_base() to change apic_base
>   KVM: mark apic enabled on start up.
>   Export jump_label_rate_limit()
>   KVM: use jump label to optimize checking for HW enabled APIC in
> APIC_BASE MSR.
>   KVM: use jump label to optimize checking for SW enabled apic in
> spurious interrupt register
>   KVM: use jump label to optimize checking for in kernel local apic
> presence.
>   KVM: inline kvm_apic_present() and kvm_lapic_enabled()

Neat.

During guest boot up, some of these jump keys will change, no?  Does
this mean a stop_machine() or equivalent?  I'm worried about real-time
response or one guest being affected by another.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 04:33:02PM +0300, Avi Kivity wrote:
> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> > APIC code has a lot of checks for apic presence and apic HW/SW enable
> > state.  Most common configuration is when each vcpu has in kernel apic
> > and it is fully enabled. This path series uses jump labels to turn checks
> > to nops in the common case. 
> > 
> > Gleb Natapov (8):
> >   KVM: clean up kvm_(set|get)_apic_base
> >   KVM: use kvm_lapic_set_base() to change apic_base
> >   KVM: mark apic enabled on start up.
> >   Export jump_label_rate_limit()
> >   KVM: use jump label to optimize checking for HW enabled APIC in
> > APIC_BASE MSR.
> >   KVM: use jump label to optimize checking for SW enabled apic in
> > spurious interrupt register
> >   KVM: use jump label to optimize checking for in kernel local apic
> > presence.
> >   KVM: inline kvm_apic_present() and kvm_lapic_enabled()
> 
> Neat.
> 
> During guest boot up, some of these jump keys will change, no?  Does
> this mean a stop_machine() or equivalent?  I'm worried about real-time
> response or one guest being affected by another.
> 
Yes, SW enable bit changes during boot. The jump label triggerable by a
guest are rate limited though. So stop machine will not happen more then
once per second even with malicious guests.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Avi Kivity
On 08/05/2012 04:35 PM, Gleb Natapov wrote:
> On Sun, Aug 05, 2012 at 04:33:02PM +0300, Avi Kivity wrote:
>> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
>> > APIC code has a lot of checks for apic presence and apic HW/SW enable
>> > state.  Most common configuration is when each vcpu has in kernel apic
>> > and it is fully enabled. This path series uses jump labels to turn checks
>> > to nops in the common case. 
>> > 
>> > Gleb Natapov (8):
>> >   KVM: clean up kvm_(set|get)_apic_base
>> >   KVM: use kvm_lapic_set_base() to change apic_base
>> >   KVM: mark apic enabled on start up.
>> >   Export jump_label_rate_limit()
>> >   KVM: use jump label to optimize checking for HW enabled APIC in
>> > APIC_BASE MSR.
>> >   KVM: use jump label to optimize checking for SW enabled apic in
>> > spurious interrupt register
>> >   KVM: use jump label to optimize checking for in kernel local apic
>> > presence.
>> >   KVM: inline kvm_apic_present() and kvm_lapic_enabled()
>> 
>> Neat.
>> 
>> During guest boot up, some of these jump keys will change, no?  Does
>> this mean a stop_machine() or equivalent?  I'm worried about real-time
>> response or one guest being affected by another.
>> 
> Yes, SW enable bit changes during boot. The jump label triggerable by a
> guest are rate limited though. So stop machine will not happen more then
> once per second even with malicious guests.

I'm not talking about a malicious guest, just a guest that is booting up
normally but kills real-time response for another guest (or just induces
a large hiccup in a non-real-time guest, but we don't guarantee anything
for those).

We don't support real-time guests now, but Jan has plans.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 04:42:17PM +0300, Avi Kivity wrote:
> On 08/05/2012 04:35 PM, Gleb Natapov wrote:
> > On Sun, Aug 05, 2012 at 04:33:02PM +0300, Avi Kivity wrote:
> >> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> >> > APIC code has a lot of checks for apic presence and apic HW/SW enable
> >> > state.  Most common configuration is when each vcpu has in kernel apic
> >> > and it is fully enabled. This path series uses jump labels to turn checks
> >> > to nops in the common case. 
> >> > 
> >> > Gleb Natapov (8):
> >> >   KVM: clean up kvm_(set|get)_apic_base
> >> >   KVM: use kvm_lapic_set_base() to change apic_base
> >> >   KVM: mark apic enabled on start up.
> >> >   Export jump_label_rate_limit()
> >> >   KVM: use jump label to optimize checking for HW enabled APIC in
> >> > APIC_BASE MSR.
> >> >   KVM: use jump label to optimize checking for SW enabled apic in
> >> > spurious interrupt register
> >> >   KVM: use jump label to optimize checking for in kernel local apic
> >> > presence.
> >> >   KVM: inline kvm_apic_present() and kvm_lapic_enabled()
> >> 
> >> Neat.
> >> 
> >> During guest boot up, some of these jump keys will change, no?  Does
> >> this mean a stop_machine() or equivalent?  I'm worried about real-time
> >> response or one guest being affected by another.
> >> 
> > Yes, SW enable bit changes during boot. The jump label triggerable by a
> > guest are rate limited though. So stop machine will not happen more then
> > once per second even with malicious guests.
> 
> I'm not talking about a malicious guest, just a guest that is booting up
> normally but kills real-time response for another guest (or just induces
> a large hiccup in a non-real-time guest, but we don't guarantee anything
> for those).
> 
> We don't support real-time guests now, but Jan has plans.
> 
For such setup jump labels have to be compiled out from the kernel
completely. Anything that calls stop_machine does not play well with
real time.

Guest can cause stop machine on boot today already by detecting PMU and
configuring NMI watchdog.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Avi Kivity
On 08/05/2012 04:48 PM, Gleb Natapov wrote:
 >>
>> >> During guest boot up, some of these jump keys will change, no?  Does
>> >> this mean a stop_machine() or equivalent?  I'm worried about real-time
>> >> response or one guest being affected by another.
>> >> 
>> > Yes, SW enable bit changes during boot. The jump label triggerable by a
>> > guest are rate limited though. So stop machine will not happen more then
>> > once per second even with malicious guests.
>> 
>> I'm not talking about a malicious guest, just a guest that is booting up
>> normally but kills real-time response for another guest (or just induces
>> a large hiccup in a non-real-time guest, but we don't guarantee anything
>> for those).
>> 
>> We don't support real-time guests now, but Jan has plans.
>> 
> For such setup jump labels have to be compiled out from the kernel
> completely. Anything that calls stop_machine does not play well with
> real time.
> 
> Guest can cause stop machine on boot today already by detecting PMU and
> configuring NMI watchdog.

The host can prevent this by leaving disabling the guest pmu.  But
disabling jump labels for real-time kernels may be acceptable too.  We
can probably to it at run time by forcing the slow path at all times.
-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 05:00:37PM +0300, Avi Kivity wrote:
> On 08/05/2012 04:48 PM, Gleb Natapov wrote:
>  >>
> >> >> During guest boot up, some of these jump keys will change, no?  Does
> >> >> this mean a stop_machine() or equivalent?  I'm worried about real-time
> >> >> response or one guest being affected by another.
> >> >> 
> >> > Yes, SW enable bit changes during boot. The jump label triggerable by a
> >> > guest are rate limited though. So stop machine will not happen more then
> >> > once per second even with malicious guests.
> >> 
> >> I'm not talking about a malicious guest, just a guest that is booting up
> >> normally but kills real-time response for another guest (or just induces
> >> a large hiccup in a non-real-time guest, but we don't guarantee anything
> >> for those).
> >> 
> >> We don't support real-time guests now, but Jan has plans.
> >> 
> > For such setup jump labels have to be compiled out from the kernel
> > completely. Anything that calls stop_machine does not play well with
> > real time.
> > 
> > Guest can cause stop machine on boot today already by detecting PMU and
> > configuring NMI watchdog.
> 
> The host can prevent this by leaving disabling the guest pmu.  But
> disabling jump labels for real-time kernels may be acceptable too.  We
> can probably to it at run time by forcing the slow path at all times.
Yes, it is possible to add module option that will force slow path if
needed.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: mark apic enabled on start up.

2012-08-05 Thread Avi Kivity
On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> According to SDM apic is enabled on start up.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/x86/kvm/lapic.c |3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index 49f4ac0..c3f14fe 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -1311,7 +1311,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
>HRTIMER_MODE_ABS);
>   apic->lapic_timer.timer.function = apic_timer_fn;
>  
> - kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
> + kvm_lapic_set_base(vcpu,
> + APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
>  

I'm guessing that in practice this is a no-op since qemu reset involves
programming this MSR.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/8] Export jump_label_rate_limit()

2012-08-05 Thread Avi Kivity
On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> CC: Jason Baron 
> CC: Ingo Molnar 
> CC: Peter Zijlstra 
> Signed-off-by: Gleb Natapov 
> ---
>  kernel/jump_label.c |1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 4304919..60f48fa 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -118,6 +118,7 @@ void jump_label_rate_limit(struct static_key_deferred 
> *key,
>   key->timeout = rl;
>   INIT_DELAYED_WORK(&key->work, jump_label_update_timeout);
>  }
> +EXPORT_SYMBOL_GPL(jump_label_rate_limit);
>  

Jason, please ack this if it is acceptable.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: mark apic enabled on start up.

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 05:14:59PM +0300, Avi Kivity wrote:
> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> > According to SDM apic is enabled on start up.
> > 
> > Signed-off-by: Gleb Natapov 
> > ---
> >  arch/x86/kvm/lapic.c |3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index 49f4ac0..c3f14fe 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -1311,7 +1311,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
> >  HRTIMER_MODE_ABS);
> > apic->lapic_timer.timer.function = apic_timer_fn;
> >  
> > -   kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
> > +   kvm_lapic_set_base(vcpu,
> > +   APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
> >  
> 
> I'm guessing that in practice this is a no-op since qemu reset involves
> programming this MSR.
> 
Qemu is not the only userspace :) But this is here to avoid unneeded
jump label patching truth to be told.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSR.

2012-08-05 Thread Avi Kivity
On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> Usually all APICs are HW enabled so the check can be optimized out.
> 
> Signed-off-by: Gleb Natapov 
> ---
>  arch/x86/kvm/lapic.c |   29 -
>  arch/x86/kvm/lapic.h |1 +
>  arch/x86/kvm/x86.c   |1 +
>  3 files changed, 30 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index c3f14fe..1aa5528 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include "kvm_cache_regs.h"
>  #include "irq.h"
>  #include "trace.h"
> @@ -117,9 +118,13 @@ static inline int __apic_test_and_clear_vector(int vec, 
> void *bitmap)
>   return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>  }
>  
> +struct static_key_deferred apic_hw_disabled __read_mostly;

On top of file please.  Add all_ to the name to make it clear we're
talking about all apics.

> +
>  static inline int apic_hw_enabled(struct kvm_lapic *apic)
>  {
> - return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> + if (static_key_false(&apic_hw_disabled.key))
> + return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;

Hm, for the test to be readable, it needs to be

   if (static_key_false(&all_apics_hw_enabled))




-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: mark apic enabled on start up.

2012-08-05 Thread Avi Kivity
On 08/05/2012 05:17 PM, Gleb Natapov wrote:
> On Sun, Aug 05, 2012 at 05:14:59PM +0300, Avi Kivity wrote:
>> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
>> > According to SDM apic is enabled on start up.
>> > 
>> > Signed-off-by: Gleb Natapov 
>> > ---
>> >  arch/x86/kvm/lapic.c |3 ++-
>> >  1 file changed, 2 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> > index 49f4ac0..c3f14fe 100644
>> > --- a/arch/x86/kvm/lapic.c
>> > +++ b/arch/x86/kvm/lapic.c
>> > @@ -1311,7 +1311,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
>> > HRTIMER_MODE_ABS);
>> >apic->lapic_timer.timer.function = apic_timer_fn;
>> >  
>> > -  kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
>> > +  kvm_lapic_set_base(vcpu,
>> > +  APIC_DEFAULT_PHYS_BASE | MSR_IA32_APICBASE_ENABLE);
>> >  
>> 
>> I'm guessing that in practice this is a no-op since qemu reset involves
>> programming this MSR.
>> 
> Qemu is not the only userspace :) But this is here to avoid unneeded
> jump label patching truth to be told.

It's also correct, otherwise it wouldn't be here.

But won't the patching happen anyway? since it starts software disabled?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSR.

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 05:35:21PM +0300, Avi Kivity wrote:
> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> > Usually all APICs are HW enabled so the check can be optimized out.
> > 
> > Signed-off-by: Gleb Natapov 
> > ---
> >  arch/x86/kvm/lapic.c |   29 -
> >  arch/x86/kvm/lapic.h |1 +
> >  arch/x86/kvm/x86.c   |1 +
> >  3 files changed, 30 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index c3f14fe..1aa5528 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -34,6 +34,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include "kvm_cache_regs.h"
> >  #include "irq.h"
> >  #include "trace.h"
> > @@ -117,9 +118,13 @@ static inline int __apic_test_and_clear_vector(int 
> > vec, void *bitmap)
> > return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
> >  }
> >  
> > +struct static_key_deferred apic_hw_disabled __read_mostly;
> 
> On top of file please.  Add all_ to the name to make it clear we're
> talking about all apics.
> 
This is count of disabled apics really. So I think all_apic_hw_disabled
is misleading.

> > +
> >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
> >  {
> > -   return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> > +   if (static_key_false(&apic_hw_disabled.key))
> > +   return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> 
> Hm, for the test to be readable, it needs to be
> 
>if (static_key_false(&all_apics_hw_enabled))
> 
Exactly. all_ makes it so because apic_hw_disabled is a counter that
counts disabled apics. So may be call it global_hw_disabled_apic_counter?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/8] KVM: mark apic enabled on start up.

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 05:39:33PM +0300, Avi Kivity wrote:
> On 08/05/2012 05:17 PM, Gleb Natapov wrote:
> > On Sun, Aug 05, 2012 at 05:14:59PM +0300, Avi Kivity wrote:
> >> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> >> > According to SDM apic is enabled on start up.
> >> > 
> >> > Signed-off-by: Gleb Natapov 
> >> > ---
> >> >  arch/x86/kvm/lapic.c |3 ++-
> >> >  1 file changed, 2 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> > index 49f4ac0..c3f14fe 100644
> >> > --- a/arch/x86/kvm/lapic.c
> >> > +++ b/arch/x86/kvm/lapic.c
> >> > @@ -1311,7 +1311,8 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
> >> >   HRTIMER_MODE_ABS);
> >> >  apic->lapic_timer.timer.function = apic_timer_fn;
> >> >  
> >> > -kvm_lapic_set_base(vcpu, APIC_DEFAULT_PHYS_BASE);
> >> > +kvm_lapic_set_base(vcpu,
> >> > +APIC_DEFAULT_PHYS_BASE | 
> >> > MSR_IA32_APICBASE_ENABLE);
> >> >  
> >> 
> >> I'm guessing that in practice this is a no-op since qemu reset involves
> >> programming this MSR.
> >> 
> > Qemu is not the only userspace :) But this is here to avoid unneeded
> > jump label patching truth to be told.
> 
> It's also correct, otherwise it wouldn't be here.
> 
Of course.

> But won't the patching happen anyway? since it starts software disabled?
> 
Yes, but different label, so less patching.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSR.

2012-08-05 Thread Avi Kivity
On 08/05/2012 05:42 PM, Gleb Natapov wrote:
> On Sun, Aug 05, 2012 at 05:35:21PM +0300, Avi Kivity wrote:
>> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
>> > Usually all APICs are HW enabled so the check can be optimized out.
>> > 
>> > Signed-off-by: Gleb Natapov 
>> > ---
>> >  arch/x86/kvm/lapic.c |   29 -
>> >  arch/x86/kvm/lapic.h |1 +
>> >  arch/x86/kvm/x86.c   |1 +
>> >  3 files changed, 30 insertions(+), 1 deletion(-)
>> > 
>> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> > index c3f14fe..1aa5528 100644
>> > --- a/arch/x86/kvm/lapic.c
>> > +++ b/arch/x86/kvm/lapic.c
>> > @@ -34,6 +34,7 @@
>> >  #include 
>> >  #include 
>> >  #include 
>> > +#include 
>> >  #include "kvm_cache_regs.h"
>> >  #include "irq.h"
>> >  #include "trace.h"
>> > @@ -117,9 +118,13 @@ static inline int __apic_test_and_clear_vector(int 
>> > vec, void *bitmap)
>> >return __test_and_clear_bit(VEC_POS(vec), (bitmap) + REG_POS(vec));
>> >  }
>> >  
>> > +struct static_key_deferred apic_hw_disabled __read_mostly;
>> 
>> On top of file please.  Add all_ to the name to make it clear we're
>> talking about all apics.
>> 
> This is count of disabled apics really. So I think all_apic_hw_disabled
> is misleading.
> 
>> > +
>> >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
>> >  {
>> > -  return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
>> > +  if (static_key_false(&apic_hw_disabled.key))
>> > +  return apic->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
>> 
>> Hm, for the test to be readable, it needs to be
>> 
>>if (static_key_false(&all_apics_hw_enabled))
>> 
> Exactly. all_ makes it so because apic_hw_disabled is a counter that
> counts disabled apics. So may be call it global_hw_disabled_apic_counter?
> 

The problem is how static_key_false() is defined.  It returns true if
the count > 0, opposite from what I'd expect.  So anything with counter
semantics will be confusing.  I guess we need to pick a neutral name
(apic_disabled_key or apic_disabled_slowpath or such) to force the
reader to look at the definitions.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 1/4] Provide userspace IO exit completion callback.

2012-08-05 Thread Gleb Natapov
On Thu, Aug 02, 2012 at 04:26:29PM -0300, Marcelo Tosatti wrote:
> On Mon, Jul 30, 2012 at 05:38:18PM +0300, Gleb Natapov wrote:
> > int r;
> > @@ -5554,9 +5568,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, 
> > struct kvm_run *kvm_run)
> > }
> > }
> >  
> > -   r = complete_mmio(vcpu);
> > -   if (r <= 0)
> > -   goto out;
> > +   if (unlikely(vcpu->arch.complete_userspace_io)) {
> > +   int (*cui)(struct kvm_vcpu *) = 
> > vcpu->arch.complete_userspace_io;
> > +   vcpu->arch.complete_userspace_io = NULL;
> > +   r = cui(vcpu);
> > +   if (r <= 0)
> > +   goto out;
> > +   }
> 
> Would it be worthwhile to add BUG/WARN_ONs here checking for
> variables that represent valid mmio/pio, but without
> complete_userspace_io function pointer set? (you do that in 
> the reverse case, inside the complete_userspace_io 
> function pointers).
There are never too much asserts :), But I wouldn't want to resend the
series just for that, so if there are other comments that will require
me to resend the series anyway I'll add asserts too.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/8] KVM: use jump label to optimize checking for HW enabled APIC in APIC_BASE MSR.

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 05:48:42PM +0300, Avi Kivity wrote:
> On 08/05/2012 05:42 PM, Gleb Natapov wrote:
> > On Sun, Aug 05, 2012 at 05:35:21PM +0300, Avi Kivity wrote:
> >> On 08/05/2012 03:58 PM, Gleb Natapov wrote:
> >> > Usually all APICs are HW enabled so the check can be optimized out.
> >> > 
> >> > Signed-off-by: Gleb Natapov 
> >> > ---
> >> >  arch/x86/kvm/lapic.c |   29 -
> >> >  arch/x86/kvm/lapic.h |1 +
> >> >  arch/x86/kvm/x86.c   |1 +
> >> >  3 files changed, 30 insertions(+), 1 deletion(-)
> >> > 
> >> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> >> > index c3f14fe..1aa5528 100644
> >> > --- a/arch/x86/kvm/lapic.c
> >> > +++ b/arch/x86/kvm/lapic.c
> >> > @@ -34,6 +34,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >  #include "kvm_cache_regs.h"
> >> >  #include "irq.h"
> >> >  #include "trace.h"
> >> > @@ -117,9 +118,13 @@ static inline int __apic_test_and_clear_vector(int 
> >> > vec, void *bitmap)
> >> >  return __test_and_clear_bit(VEC_POS(vec), (bitmap) + 
> >> > REG_POS(vec));
> >> >  }
> >> >  
> >> > +struct static_key_deferred apic_hw_disabled __read_mostly;
> >> 
> >> On top of file please.  Add all_ to the name to make it clear we're
> >> talking about all apics.
> >> 
> > This is count of disabled apics really. So I think all_apic_hw_disabled
> > is misleading.
> > 
> >> > +
> >> >  static inline int apic_hw_enabled(struct kvm_lapic *apic)
> >> >  {
> >> > -return (apic)->vcpu->arch.apic_base & MSR_IA32_APICBASE_ENABLE;
> >> > +if (static_key_false(&apic_hw_disabled.key))
> >> > +return apic->vcpu->arch.apic_base & 
> >> > MSR_IA32_APICBASE_ENABLE;
> >> 
> >> Hm, for the test to be readable, it needs to be
> >> 
> >>if (static_key_false(&all_apics_hw_enabled))
> >> 
> > Exactly. all_ makes it so because apic_hw_disabled is a counter that
> > counts disabled apics. So may be call it global_hw_disabled_apic_counter?
> > 
> 
> The problem is how static_key_false() is defined.  It returns true if
> the count > 0, opposite from what I'd expect.  So anything with counter
> semantics will be confusing.  I guess we need to pick a neutral name
> (apic_disabled_key or apic_disabled_slowpath or such) to force the
> reader to look at the definitions.
> 
There was a long thread about readability of static_key_true/false and
this is the result of it :( I read it this way: drop static_key_(true|false)
and read the if() to understand its meaning. Now look at static_key_(true|false)
to see what fast path will be in asm code.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 4/4] KVM: emulator: optimize "rep ins" handling.

2012-08-05 Thread Avi Kivity
On 07/30/2012 05:38 PM, Gleb Natapov wrote:
> Optimize "rep ins" by allowing emulator to write back more than one
> datum at a time. Introduce new operand type OP_MEM_STR which tells
> writeback() that dst contains pointer to an array that should be written
> back as opposite to just one data element.
> 
>  
>   if (ctxt->rep_prefix && (ctxt->d & String)) {
> + unsigned int count;
>   struct read_cache *r = &ctxt->io_read;
> - register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], 
> -1);
> + if ((ctxt->d & SrcMask) == SrcSI)
> + count = ctxt->src.count;
> + else
> + count = ctxt->dst.count;
> + register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX],
> + -count);
>  

count is unsigned.  Does it sign extend correctly in
register_address_increment()?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 4/4] KVM: emulator: optimize "rep ins" handling.

2012-08-05 Thread Gleb Natapov
On Sun, Aug 05, 2012 at 06:03:12PM +0300, Avi Kivity wrote:
> On 07/30/2012 05:38 PM, Gleb Natapov wrote:
> > Optimize "rep ins" by allowing emulator to write back more than one
> > datum at a time. Introduce new operand type OP_MEM_STR which tells
> > writeback() that dst contains pointer to an array that should be written
> > back as opposite to just one data element.
> > 
> >  
> > if (ctxt->rep_prefix && (ctxt->d & String)) {
> > +   unsigned int count;
> > struct read_cache *r = &ctxt->io_read;
> > -   register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], 
> > -1);
> > +   if ((ctxt->d & SrcMask) == SrcSI)
> > +   count = ctxt->src.count;
> > +   else
> > +   count = ctxt->dst.count;
> > +   register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX],
> > +   -count);
> >  
> 
> count is unsigned.  Does it sign extend correctly in
> register_address_increment()?
> 
I think it sign extent before register_address_increment() when compiler
sees -count. count is in the range 1-1024 here, so there shouldn't be a
problem. By I welcome better suggestions.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 4/4] KVM: emulator: optimize "rep ins" handling.

2012-08-05 Thread Avi Kivity
On 08/05/2012 06:18 PM, Gleb Natapov wrote:
> On Sun, Aug 05, 2012 at 06:03:12PM +0300, Avi Kivity wrote:
>> On 07/30/2012 05:38 PM, Gleb Natapov wrote:
>> > Optimize "rep ins" by allowing emulator to write back more than one
>> > datum at a time. Introduce new operand type OP_MEM_STR which tells
>> > writeback() that dst contains pointer to an array that should be written
>> > back as opposite to just one data element.
>> > 
>> >  
>> >if (ctxt->rep_prefix && (ctxt->d & String)) {
>> > +  unsigned int count;
>> >struct read_cache *r = &ctxt->io_read;
>> > -  register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX], 
>> > -1);
>> > +  if ((ctxt->d & SrcMask) == SrcSI)
>> > +  count = ctxt->src.count;
>> > +  else
>> > +  count = ctxt->dst.count;
>> > +  register_address_increment(ctxt, &ctxt->regs[VCPU_REGS_RCX],
>> > +  -count);
>> >  
>> 
>> count is unsigned.  Does it sign extend correctly in
>> register_address_increment()?
>> 
> I think it sign extent before register_address_increment() when compiler
> sees -count. count is in the range 1-1024 here, so there shouldn't be a
> problem. By I welcome better suggestions.

There is actually no problem since the 'inc' parameter is signed.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM segfaults with 3.5 while installing ubuntu 12.04

2012-08-05 Thread Stefan Priebe

Am 05.08.2012 12:29, schrieb Avi Kivity:

On 08/05/2012 01:08 PM, Stefan Priebe wrote:

Am 01.08.2012 11:53, schrieb Avi Kivity:

On 08/01/2012 12:42 PM, Stefan Priebe - Profihost AG wrote:

Am 01.08.2012 11:33, schrieb Avi Kivity:

So here are 3 backtraces from booting the rescue system:
http://pastebin.com/raw.php?i=xCy2pEcP

To me they all look the same.


They are.  What version of qemu are you using?


latest stable-1.1 branch (1.1.1) - which works fine with latest RHEL6
kernel.


This could be due to a kernel bug, or due to a different code path taken
in qemu because of differing features exposed to kvm.

Please try qemu-kvm.git master and report.


qemu-kvm.git master just hangs after the 2nd screen with using 100% CPU
and doing nothing... so i can't test with qemu-kvm.master


Please provide your command line and I will try to reproduce.


ulimit -c unlimited; /usr/bin/qemu-x86_64 -chardev 
socket,id=qmp,path=/var/run/qemu-server/103.qmp,server,nowait -mon 
chardev=qmp,mode=control -pidfile /var/run/qemu-server/103.pid 
-daemonize -usbdevice tablet -name kvmcrash -smp sockets=1,cores=8 
-nodefaults -boot menu=on -vga cirrus -k de -drive 
file=/dev/VGiSCSItarget0/vm-103-disk-1,if=none,id=drive-virtio0,aio=native,cache=none 
-device 
virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100 
-drive 
file=/var/lib/vz/template/iso/ubuntu-12.04-server-amd64.iso,if=none,id=drive-ide0,media=cdrom,aio=native 
-device ide-cd,bus=ide.0,unit=0,drive=drive-ide0,id=ide0,bootindex=200 
-m 4096 -netdev 
type=tap,id=net0,ifname=tap103i0,script=/var/lib/qemu-server/pve-bridge,vhost=on 
-device 
virtio-net-pci,mac=BA:5B:86:AD:14:3A,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300



Which iso image are you running?

ubuntu-12.04-server-amd64.iso
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM segfaults with 3.5 while installing ubuntu 12.04

2012-08-05 Thread Stefan Priebe

Am 05.08.2012 17:52, schrieb Stefan Priebe:

Am 05.08.2012 12:29, schrieb Avi Kivity:

On 08/05/2012 01:08 PM, Stefan Priebe wrote:

Am 01.08.2012 11:53, schrieb Avi Kivity:

On 08/01/2012 12:42 PM, Stefan Priebe - Profihost AG wrote:

Am 01.08.2012 11:33, schrieb Avi Kivity:

So here are 3 backtraces from booting the rescue system:
http://pastebin.com/raw.php?i=xCy2pEcP

To me they all look the same.


They are.  What version of qemu are you using?


latest stable-1.1 branch (1.1.1) - which works fine with latest RHEL6
kernel.


This could be due to a kernel bug, or due to a different code path
taken
in qemu because of differing features exposed to kvm.

Please try qemu-kvm.git master and report.


OK got it running it's just awfully slow and i was too impatient. It 
crashes at the part as 1.1.1 stable.


Stefan
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/8] use jump labels to streamline common APIC configuration

2012-08-05 Thread Eric Northup
On Sun, Aug 5, 2012 at 5:58 AM, Gleb Natapov  wrote:
> APIC code has a lot of checks for apic presence and apic HW/SW enable
> state.  Most common configuration is when each vcpu has in kernel apic
> and it is fully enabled. This path series uses jump labels to turn checks
> to nops in the common case.

What is the target workload and how does the performance compare?  As
a naive question, how different is it than just using gcc branch
hints?

[...]
-- 
Typing one-handed, please don't mistake brevity for rudeness.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reset problem vs. MMIO emulation, hypercalls, etc...

2012-08-05 Thread Benjamin Herrenschmidt
On Sun, 2012-08-05 at 11:55 +0300, Avi Kivity wrote:
> 
> I'm afraid I no longer know the details so closely, the code has
> changed
> quite a lot.  But the self-signal happens in kvm_cpu_exec(), see also
> env->exit_request.

Right, I think I eventually grasped it :-) It is fairly fragile however,
it basically relies that none of those things that leave the kernel in
an "incomplete" state (hcalls, mmio emulation, ...) return a non-zero
value, but instead only ever request an exit via exit_request, so that
we are guaranteed that the exec loop -will- go back, send that signal
and finally exit as a result of EINTR.

It also requires that the kernel tests & handles all those "completion"
early in VCPU_RUN before it does anything else really including testing
for signals.

The latter seems fine for us, the former was what we got wrong on ppc:
our hypercalls always cause exits via a non-zero return value for some
reason (I didn't write that code, not sure exactly why it was written
like that). Working on fixing that on qemu side now.

Thanks !

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Reset problem vs. MMIO emulation, hypercalls, etc...

2012-08-05 Thread David Gibson
On Sat, Aug 04, 2012 at 08:30:08AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2012-08-03 at 14:41 -0300, Marcelo Tosatti wrote:
> 
> > > Hrm, except that doing KVM_RUN with a signal is very cumbersome to do
> > > and I couldn't quite find the logic in qemu to do it ... but I might
> > > just have missed it. I can see indeed that in the migration case you
> > > want to actually complete the operation rather than just "abort it".
> > > 
> > > Any chance you can point me to the code that performs that trick qemu
> > > side for migration ?
> > 
> > kvm-all.c:
> > 
> > kvm_arch_pre_run(env, run);
> > if (env->exit_request) {
> > DPRINTF("interrupt exit requested\n");
> > /*
> >  * KVM requires us to reenter the kernel after IO exits to
> >  * complete
> >  * instruction emulation. This self-signal will ensure that
> >  * we
> >  * leave ASAP again.
> >  */
> > qemu_cpu_kick_self();
> > }
> > 
> > 
> > > Anthony seems to think that for reset we can just abort the operation
> > > state in the kernel when the MP state changes.
> 
> Ok, I see now, this is scary really... set a flag somewhere, pray that
> we are in the right part of the loop on elsewhere... oh well.
> 
> In fact there's some fun (through harmless) bits to be found, look at
> x86 for example:
> 
> if (env->exception_injected == EXCP08_DBLE) {
> /* this means triple fault */
> qemu_system_reset_request();
> env->exit_request = 1;
> return 0;
> }
> 
> Well, qemu_system_reset_request() calls cpu_stop_current() which calls
> cpu_exit() which also does:
> 
>env->exit_request = 1;
>  
> So obviously it will be well set by that time :-)
> 
> Now I can see how having it set during kvm_arch_process_async_events()
> will work as this is called right before the run loop. Similarily,
> EXIT_MMIO and EXIT_IO would work so they are a non issue.
> 
> Our problem I suspect with PAPR doing resets via hcalls is that
> our kvm_arch_handle_exit() does:
> 
> case KVM_EXIT_PAPR_HCALL:
> dprintf("handle PAPR hypercall\n");
> run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
>   run->papr_hcall.args);
> ret = 1;
> break;
> 
> See the ret = 1 here ? That means that the caller (kvm_cpu_exec.c main
> loop) will exit immediately upon return. If we had instead returned 0,
> it would have looped, seen the "exit_requested" set by
> qemu_system_reset_request(), which would have then done the signal + run
> trick, completed the hypercall, and returned this time in a clean state.
> 
> So it seems (I don't have the machine at hand to test right now) that by
> just changing that ret = 1 to ret = 0, we might be fixing our problem,
> including the case where another vcpu is taking a hypercall at the time
> of the reset (this other CPU will also need to complete the operation
> before stopping).
> 
> David, is there any reason why you did ret = 1 to begin with ? For
> things like reset etc... the exit will happen as a result of
> env->exit_requested being set by cpu_exit().

Erm, there might have been, but I don't remember what it was.

> Are there other cases where you wish the hcall to go back to the main
> loop ? In that case, shouldn't it set env->exit_requested rather than
> returning 1 at that point ? (H_CEDE for example ?)

So, I'm still trying to nut out the implications for H_CEDE, and think
if there are any other hypercalls that might want to block the guest
for a time.  We were considering blocking H_PUT_TCE if qemu devices
had active dma maps on the previously mapped iovas.  I'm not sure if
the discussions that led to the inclusion of the qemu IOMMU code
decided that was wholly unnnecessary or just not necessary for the
time being.

> Paul, David, I don't have time to test that before Tuesday (I'm away on
> monday) but if you have a chance, revert the kernel change we did and
> try this qemu patch for reset:
> 
> --- a/target-ppc/kvm.c
> +++ b/target-ppc/kvm.c
> @@ -766,7 +766,7 @@ int kvm_arch_handle_exit(CPUPPCState *env, struct
> kvm_run *r
>  dprintf("handle PAPR hypercall\n");
>  run->papr_hcall.ret = spapr_hypercall(env, run->papr_hcall.nr,
>run->papr_hcall.args);
> -ret = 1;
> +ret = 0;
>  break;
>  #endif
>  default:

So, usually I think ret = 0 is correct, it lets us handle the
hypercall and get back to the guest without further fuss.  I think
right now, ret = 0 is always correct, since H_CEDE is always handled
in kernel.  But if we ever have qemu implemented hypercalls which want
to "block" then we might need a facility to do something else.  Maybe,
like I say, I haven't convinved myself I've thought of all the
implications yet.

> It might also need something like:
> 
> diff --git a/h