Qemu and virtio 1.0

2015-02-24 Thread Rusty Russell
OK, I am trying to experiment with virtio 1.0 support using the
latest kernel and MST's qemu tree:

https://git.kernel.org/cgit/virt/kvm/mst/qemu.git/?h=virtio-1.0

The first issue is that the device config endian was wrong (see
attached patch).

I'm now setting up a BE guest on my x86 laptop, and a BE and LE guest
on a BE powerpc machine, to check that all combinations work correctly.
If others test too, that would be appreciated!

Cheers,
Rusty.

>From 95ac91554ed602f856a2a5fcc25eaffcad1b1c8d Mon Sep 17 00:00:00 2001
From: Rusty Russell 
Date: Tue, 24 Feb 2015 14:47:44 +1030
Subject: [PATCH] virtio_config_write*/virtio_config_read*: Don't endian swap
 for virtio 1.0.

Signed-off-by: Rusty Russell 

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 079944c..882a31b 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -662,7 +662,12 @@ uint32_t virtio_config_readw(VirtIODevice *vdev, uint32_t 
addr)
 
 k->get_config(vdev, vdev->config);
 
-val = lduw_p(vdev->config + addr);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+val = lduw_le_p(vdev->config + addr);
+} else {
+val = lduw_p(vdev->config + addr);
+}
 return val;
 }
 
@@ -677,7 +682,12 @@ uint32_t virtio_config_readl(VirtIODevice *vdev, uint32_t 
addr)
 
 k->get_config(vdev, vdev->config);
 
-val = ldl_p(vdev->config + addr);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+val = ldl_le_p(vdev->config + addr);
+} else {
+val = ldl_p(vdev->config + addr);
+}
 return val;
 }
 
@@ -706,7 +716,12 @@ void virtio_config_writew(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 return;
 }
 
-stw_p(vdev->config + addr, val);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+stw_le_p(vdev->config + addr, val);
+} else {
+stw_p(vdev->config + addr, val);
+}
 
 if (k->set_config) {
 k->set_config(vdev, vdev->config);
@@ -722,7 +737,12 @@ void virtio_config_writel(VirtIODevice *vdev, uint32_t 
addr, uint32_t data)
 return;
 }
 
-stl_p(vdev->config + addr, val);
+/* Virtio 1.0 is always LE */
+if (virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+stl_le_p(vdev->config + addr, val);
+} else {
+stl_p(vdev->config + addr, val);
+}
 
 if (k->set_config) {
 k->set_config(vdev, vdev->config);
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] x86: svm: don't intercept CR0 TS or MP bit write

2015-02-24 Thread Joel Schopp

>> -clr_cr_intercept(svm, INTERCEPT_CR0_WRITE);
>>  } else {
>>  set_cr_intercept(svm, INTERCEPT_CR0_READ);
> (There is no point in checking fpu_active if cr0s are equal.)
>
>> -set_cr_intercept(svm, INTERCEPT_CR0_WRITE);
> KVM uses lazy FPU and the state is undefined before the first access.
> We set cr0.ts when !svm->vcpu.fpu_active to detect the first access, but
> if we allow the guest to clear cr0.ts without exiting, it can access FPU
> with undefined state.
Thanks for the valuable feedback.  It's apparent I hadn't thought
through the interaction with lazy FPU and will need to go back and
rethink my approach here.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: Get rid of kvm_kvfree()

2015-02-24 Thread Bandan Das
Thomas Huth  writes:

> kvm_kvfree() provides exactly the same functionality as the
> new common kvfree() function - so let's simply replace the
> kvm function with the common function.

I assumed there would be a wrapper kvzalloc as well, seems not :)

> Signed-off-by: Thomas Huth 
> ---
>  arch/x86/kvm/x86.c   |8 
>  include/linux/kvm_host.h |1 -
>  virt/kvm/kvm_main.c  |   10 +-
>  3 files changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index bd7a70b..c5f7e03 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -7429,7 +7429,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
> kvm_memory_slot *free,
>  
>   for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
>   if (!dont || free->arch.rmap[i] != dont->arch.rmap[i]) {
> - kvm_kvfree(free->arch.rmap[i]);
> + kvfree(free->arch.rmap[i]);
>   free->arch.rmap[i] = NULL;
>   }
>   if (i == 0)
> @@ -7437,7 +7437,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
> kvm_memory_slot *free,
>  
>   if (!dont || free->arch.lpage_info[i - 1] !=
>dont->arch.lpage_info[i - 1]) {
> - kvm_kvfree(free->arch.lpage_info[i - 1]);
> + kvfree(free->arch.lpage_info[i - 1]);
>   free->arch.lpage_info[i - 1] = NULL;
>   }
>   }
> @@ -7491,12 +7491,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
> kvm_memory_slot *slot,
>  
>  out_free:
>   for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
> - kvm_kvfree(slot->arch.rmap[i]);
> + kvfree(slot->arch.rmap[i]);
>   slot->arch.rmap[i] = NULL;
>   if (i == 0)
>   continue;
>  
> - kvm_kvfree(slot->arch.lpage_info[i - 1]);
> + kvfree(slot->arch.lpage_info[i - 1]);
>   slot->arch.lpage_info[i - 1] = NULL;
>   }
>   return -ENOMEM;
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index d12b210..0f574eb 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -658,7 +658,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
>  int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
>  
>  void *kvm_kvzalloc(unsigned long size);
> -void kvm_kvfree(const void *addr);
>  
>  #ifndef __KVM_HAVE_ARCH_VM_ALLOC
>  static inline struct kvm *kvm_arch_alloc_vm(void)
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index a109370..6f1a9c2 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -539,20 +539,12 @@ void *kvm_kvzalloc(unsigned long size)
>   return kzalloc(size, GFP_KERNEL);
>  }
>  
> -void kvm_kvfree(const void *addr)
> -{
> - if (is_vmalloc_addr(addr))
> - vfree(addr);
> - else
> - kfree(addr);
> -}
> -
>  static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
>  {
>   if (!memslot->dirty_bitmap)
>   return;
>  
> - kvm_kvfree(memslot->dirty_bitmap);
> + kvfree(memslot->dirty_bitmap);
>   memslot->dirty_bitmap = NULL;
>  }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: Get rid of kvm_kvfree()

2015-02-24 Thread Thomas Huth
kvm_kvfree() provides exactly the same functionality as the
new common kvfree() function - so let's simply replace the
kvm function with the common function.

Signed-off-by: Thomas Huth 
---
 arch/x86/kvm/x86.c   |8 
 include/linux/kvm_host.h |1 -
 virt/kvm/kvm_main.c  |   10 +-
 3 files changed, 5 insertions(+), 14 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index bd7a70b..c5f7e03 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7429,7 +7429,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 
for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
if (!dont || free->arch.rmap[i] != dont->arch.rmap[i]) {
-   kvm_kvfree(free->arch.rmap[i]);
+   kvfree(free->arch.rmap[i]);
free->arch.rmap[i] = NULL;
}
if (i == 0)
@@ -7437,7 +7437,7 @@ void kvm_arch_free_memslot(struct kvm *kvm, struct 
kvm_memory_slot *free,
 
if (!dont || free->arch.lpage_info[i - 1] !=
 dont->arch.lpage_info[i - 1]) {
-   kvm_kvfree(free->arch.lpage_info[i - 1]);
+   kvfree(free->arch.lpage_info[i - 1]);
free->arch.lpage_info[i - 1] = NULL;
}
}
@@ -7491,12 +7491,12 @@ int kvm_arch_create_memslot(struct kvm *kvm, struct 
kvm_memory_slot *slot,
 
 out_free:
for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) {
-   kvm_kvfree(slot->arch.rmap[i]);
+   kvfree(slot->arch.rmap[i]);
slot->arch.rmap[i] = NULL;
if (i == 0)
continue;
 
-   kvm_kvfree(slot->arch.lpage_info[i - 1]);
+   kvfree(slot->arch.lpage_info[i - 1]);
slot->arch.lpage_info[i - 1] = NULL;
}
return -ENOMEM;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d12b210..0f574eb 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -658,7 +658,6 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu);
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu);
 
 void *kvm_kvzalloc(unsigned long size);
-void kvm_kvfree(const void *addr);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 static inline struct kvm *kvm_arch_alloc_vm(void)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index a109370..6f1a9c2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -539,20 +539,12 @@ void *kvm_kvzalloc(unsigned long size)
return kzalloc(size, GFP_KERNEL);
 }
 
-void kvm_kvfree(const void *addr)
-{
-   if (is_vmalloc_addr(addr))
-   vfree(addr);
-   else
-   kfree(addr);
-}
-
 static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot)
 {
if (!memslot->dirty_bitmap)
return;
 
-   kvm_kvfree(memslot->dirty_bitmap);
+   kvfree(memslot->dirty_bitmap);
memslot->dirty_bitmap = NULL;
 }
 
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-24 Thread Andrew Jones
On Tue, Feb 24, 2015 at 05:47:19PM +, Ard Biesheuvel wrote:
> On 24 February 2015 at 14:55, Andrew Jones  wrote:
> > On Fri, Feb 20, 2015 at 04:36:26PM +0100, Andrew Jones wrote:
> >> On Fri, Feb 20, 2015 at 02:37:25PM +, Ard Biesheuvel wrote:
> >> > On 20 February 2015 at 14:29, Andrew Jones  wrote:
> >> > > So looks like the 3 orders of magnitude greater number of traps
> >> > > (only to el2) don't impact kernel compiles.
> >> > >
> >> >
> >> > OK, good! That was what I was hoping for, obviously.
> >> >
> >> > > Then I thought I'd be able to quick measure the number of cycles
> >> > > a trap to el2 takes with this kvm-unit-tests test
> >> > >
> >> > > int main(void)
> >> > > {
> >> > > unsigned long start, end;
> >> > > unsigned int sctlr;
> >> > >
> >> > > asm volatile(
> >> > > "   mrs %0, sctlr_el1\n"
> >> > > "   msr pmcr_el0, %1\n"
> >> > > : "=&r" (sctlr) : "r" (5));
> >> > >
> >> > > asm volatile(
> >> > > "   mrs %0, pmccntr_el0\n"
> >> > > "   msr sctlr_el1, %2\n"
> >> > > "   mrs %1, pmccntr_el0\n"
> >> > > : "=&r" (start), "=&r" (end) : "r" (sctlr));
> >> > >
> >> > > printf("%llx\n", end - start);
> >> > > return 0;
> >> > > }
> >> > >
> >> > > after applying this patch to kvm
> >> > >
> >> > > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> > > index bb91b6fc63861..5de39d740aa58 100644
> >> > > --- a/arch/arm64/kvm/hyp.S
> >> > > +++ b/arch/arm64/kvm/hyp.S
> >> > > @@ -770,7 +770,7 @@
> >> > >
> >> > > mrs x2, mdcr_el2
> >> > > and x2, x2, #MDCR_EL2_HPMN_MASK
> >> > > -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> >> > > +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> >> > > orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
> >> > >
> >> > > // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
> >> > >
> >> > > But I get zero for the cycle count. Not sure what I'm missing.
> >> > >
> >> >
> >> > No clue tbh. Does the counter work as expected in the host?
> >> >
> >>
> >> Guess not. I dropped the test into a module_init and inserted
> >> it on the host. Always get zero for pmccntr_el0 reads. Or, if
> >> I set it to something non-zero with a write, then I always get
> >> that back - no increments. pmcr_el0 looks OK... I had forgotten
> >> to set bit 31 of pmcntenset_el0, but doing that still doesn't
> >> help. Anyway, I assume the problem is me. I'll keep looking to
> >> see what I'm missing.
> >>
> >
> > I returned to this and see that the problem was indeed me. I needed yet
> > another enable bit set (the filter register needed to be instructed to
> > count cycles while in el2). I've attached the code for the curious.
> > The numbers are mean=6999, std_dev=242. Run on the host, or in a guest
> > running on a host without this patch series (after TVM traps have been
> > disabled), I get a pretty consistent 40.
> >
> > I checked how many vm-sysreg traps we do during the kernel compile
> > benchmark. It's 124924. So it's a bit strange that we don't see the
> > benchmark taking 10 to 20 seconds longer on average. I should probably
> > double check my runs. In any case, while I like the approach of this
> > series, the overhead is looking non-negligible.
> >
> 
> Thanks a lot for producing these numbers. 125k x 7k == <1 billion
> cycles == <1 second on a >1 GHz machine, I think?
> Or am I missing something? How long does the actual compile take?
>

Wait, my fault. I dropped a pretty big divisor in my calculation. Don't
ask... I'll just go home and study one of my daughter's math books now...

So, I even have a 2.4 GHz machine, which explains why the benchmark times
are the same with and without this series (those times are provided earlier
in this thread, they're roughly 03:10). I'm glad you straighted me out. I
was second guessing my benchmark results, and considering redoing them.

Anyway, this series, at least wrt to overhead, is looking good again.

Thanks,
drew
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 11:49:13PM +0530, Raghavendra K T wrote:
> On 02/24/2015 08:17 PM, Ingo Molnar wrote:
> >
> >* Greg KH  wrote:
> >
> >>On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
> >>>Paravirt spinlock clears slowpath flag after doing unlock.
> >>>As explained by Linus currently it does:
> >>> prev = *lock;
> >>> add_smp(&lock->tickets.head, TICKET_LOCK_INC);
> >>>
> >>> /* add_smp() is a full mb() */
> >>>
> >>> if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> >>> __ticket_unlock_slowpath(lock, prev);
> >>>
> >>>which is *exactly* the kind of things you cannot do with spinlocks,
> >>>because after you've done the "add_smp()" and released the spinlock
> >>>for the fast-path, you can't access the spinlock any more.  Exactly
> >>>because a fast-path lock might come in, and release the whole data
> >>>structure.
> >>>
> >>>Linus suggested that we should not do any writes to lock after unlock(),
> >>>and we can move slowpath clearing to fastpath lock.
> >>>
> >>>So this patch implements the fix with:
> >>>1. Moving slowpath flag to head (Oleg):
> >>>Unlocked locks don't care about the slowpath flag; therefore we can keep
> >>>it set after the last unlock, and clear it again on the first (try)lock.
> >>>-- this removes the write after unlock. note that keeping slowpath flag 
> >>>would
> >>>result in unnecessary kicks.
> >>>By moving the slowpath flag from the tail to the head ticket we also avoid
> >>>the need to access both the head and tail tickets on unlock.
> >>>
> >>>2. use xadd to avoid read/write after unlock that checks the need for
> >>>unlock_kick (Linus):
> >>>We further avoid the need for a read-after-release by using xadd;
> >>>the prev head value will include the slowpath flag and indicate if we
> >>>need to do PV kicking of suspended spinners -- on modern chips xadd
> >>>isn't (much) more expensive than an add + load.
> >>>
> >>>Result:
> >>>  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
> >>>  benchmark overcommit %improve
> >>>  kernbench  1x   -0.13
> >>>  kernbench  2x0.02
> >>>  dbench 1x   -1.77
> >>>  dbench 2x   -0.63
> >>>
> >>>[Jeremy: hinted missing TICKET_LOCK_INC for kick]
> >>>[Oleg: Moving slowpath flag to head, ticket_equals idea]
> >>>[PeterZ: Detailed changelog]
> >>>
> >>>Reported-by: Sasha Levin 
> >>>Suggested-by: Linus Torvalds 
> >>>Signed-off-by: Raghavendra K T 
> >>>Reviewed-by: Oleg Nesterov 
> >>>Acked-by: David Vrabel 
> >>>---
> >>>  arch/x86/include/asm/spinlock.h | 94 
> >>> -
> >>>  arch/x86/kernel/kvm.c   |  7 ++-
> >>>  arch/x86/xen/spinlock.c |  7 ++-
> >>>  3 files changed, 58 insertions(+), 50 deletions(-)
> >>>
> >>>Changes for stable:
> >>>   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause 
> >>> horraneous
> >>> Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
> >>
> >>What is the git commit id of this in Linus's tree?  What
> >>stable tree(s) do you want this applied to?
> >
> >It's:
> >
> >  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
> 
> Yes, This is the original patch. Please note I have taken out the
> READ_ONCE changes from the original patch to avoid build warnings
> mentioned below.
> (Those READ_ONCE changes were cosmetic and was not present in the
> previous versions)
> 
> >
> >You'll also need this fix from Linus to avoid (harmless)
> >build warnings:
> >
> >  dd36929720f4 kernel: make READ_ONCE() valid on const arguments
> 
> So this may not be absolutely necessary with the current patch.

I'd prefer to be as close as possible to the upstream patch.  So if
applying both of these patches will work, I'd much rather do that.
Changing patches when backporting them to stable for no good reason than
to clean things up, just confuses everyone involved.

Let's keep our messy history :)

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0

2015-02-24 Thread Bandan Das
Jan Kiszka  writes:

> On 2015-02-24 17:30, Radim Krčmář wrote:
>> 2015-02-23 19:05+0100, Kashyap Chamarthy:
>>> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
>>> applied it on both L0 and L1.
>>>
>>> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
>>> stack trace from `dmesg` on L0 is took slightly different path than
>>> before -- it's using MSR handling:
>> 
>> Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
>> had it disabled.  L1 could then vmrun a L2 state that L0 would have to
>> emulate, but that doesn't work.  There are at least these solutions:
>> 
>>  1) don't expose unrestricted_guest when L0 doesn't have it
>
> Reminds me of a patch called "KVM: nVMX: Disable unrestricted mode if
> ept=0" by Bandan. I thought that would have caught it - apparently not.

Yeah... Unrestricted guest could be disabled even if ept=0,
and I incorrectly didn't take that into account.

>>  2) fix unrestricted mode emulation code
>>  3) handle the failure a without killing L1
>> 
>> I'd do just (1) -- emulating unrestricted mode is a loss.
>
> Agreed.
>
> Jan
>
>> 
>> I have done initial testing and at least qemu-sanity-check works now:
>> 
>> ---8<---
>> If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
>> L0.  L1 triple faulted when running L2 guest that required emulation.
>> 
>> Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
>> in L0's dmesg:
>>   WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 
>> nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
>> 
>> Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
>> the host doesn't have it enabled.
>> 
>> Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
>> Signed-off-by: Radim Krčmář 

We should Cc stable on this patch.

Bandan
>> ---
>>  arch/x86/kvm/vmx.c | 7 +--
>>  1 file changed, 5 insertions(+), 2 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index f7b20b417a3a..dbabea21357b 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
>> *vmx)
>>  if (enable_ept) {
>>  /* nested EPT: emulate EPT also to L1 */
>>  vmx->nested.nested_vmx_secondary_ctls_high |=
>> -SECONDARY_EXEC_ENABLE_EPT |
>> -SECONDARY_EXEC_UNRESTRICTED_GUEST;
>> +SECONDARY_EXEC_ENABLE_EPT;
>>  vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
>>   VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
>>   VMX_EPT_INVEPT_BIT;
>> @@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct 
>> vcpu_vmx *vmx)
>>  } else
>>  vmx->nested.nested_vmx_ept_caps = 0;
>>  
>> +if (enable_unrestricted_guest)
>> +vmx->nested.nested_vmx_secondary_ctls_high |=
>> +SECONDARY_EXEC_UNRESTRICTED_GUEST;
>> +
>>  /* miscellaneous data */
>>  rdmsr(MSR_IA32_VMX_MISC,
>>  vmx->nested.nested_vmx_misc_low,
>> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:50 PM, Greg KH wrote:

On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote:


* Greg KH  wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
Reviewed-by: Oleg Nesterov 
Acked-by: David Vrabel 
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


Great.  But what stable kernel trees should it be backported to?  Just
3.19?  Or anything older?


My patch was intended only for 3.19.

Though paravirt changes have gone in 3.12, the problem manifested
clearly after some of the completion related changes. but I leave that 
decision to experts here. (I 'll send necessary changes if patch is

needed for older versions because it may not apply cleanly).


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T

On 02/24/2015 08:17 PM, Ingo Molnar wrote:


* Greg KH  wrote:


On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:

Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
 prev = *lock;
 add_smp(&lock->tickets.head, TICKET_LOCK_INC);

 /* add_smp() is a full mb() */

 if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
 __ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
  benchmark overcommit %improve
  kernbench  1x   -0.13
  kernbench  2x0.02
  dbench 1x   -1.77
  dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
Reviewed-by: Oleg Nesterov 
Acked-by: David Vrabel 
---
  arch/x86/include/asm/spinlock.h | 94 -
  arch/x86/kernel/kvm.c   |  7 ++-
  arch/x86/xen/spinlock.c |  7 ++-
  3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
 Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)


What is the git commit id of this in Linus's tree?  What
stable tree(s) do you want this applied to?


It's:

  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock


Yes, This is the original patch. Please note I have taken out the
READ_ONCE changes from the original patch to avoid build warnings
mentioned below.
(Those READ_ONCE changes were cosmetic and was not present in the
previous versions)



You'll also need this fix from Linus to avoid (harmless)
build warnings:

  dd36929720f4 kernel: make READ_ONCE() valid on const arguments


So this may not be absolutely necessary with the current patch.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-24 Thread Ard Biesheuvel
On 24 February 2015 at 14:55, Andrew Jones  wrote:
> On Fri, Feb 20, 2015 at 04:36:26PM +0100, Andrew Jones wrote:
>> On Fri, Feb 20, 2015 at 02:37:25PM +, Ard Biesheuvel wrote:
>> > On 20 February 2015 at 14:29, Andrew Jones  wrote:
>> > > So looks like the 3 orders of magnitude greater number of traps
>> > > (only to el2) don't impact kernel compiles.
>> > >
>> >
>> > OK, good! That was what I was hoping for, obviously.
>> >
>> > > Then I thought I'd be able to quick measure the number of cycles
>> > > a trap to el2 takes with this kvm-unit-tests test
>> > >
>> > > int main(void)
>> > > {
>> > > unsigned long start, end;
>> > > unsigned int sctlr;
>> > >
>> > > asm volatile(
>> > > "   mrs %0, sctlr_el1\n"
>> > > "   msr pmcr_el0, %1\n"
>> > > : "=&r" (sctlr) : "r" (5));
>> > >
>> > > asm volatile(
>> > > "   mrs %0, pmccntr_el0\n"
>> > > "   msr sctlr_el1, %2\n"
>> > > "   mrs %1, pmccntr_el0\n"
>> > > : "=&r" (start), "=&r" (end) : "r" (sctlr));
>> > >
>> > > printf("%llx\n", end - start);
>> > > return 0;
>> > > }
>> > >
>> > > after applying this patch to kvm
>> > >
>> > > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> > > index bb91b6fc63861..5de39d740aa58 100644
>> > > --- a/arch/arm64/kvm/hyp.S
>> > > +++ b/arch/arm64/kvm/hyp.S
>> > > @@ -770,7 +770,7 @@
>> > >
>> > > mrs x2, mdcr_el2
>> > > and x2, x2, #MDCR_EL2_HPMN_MASK
>> > > -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
>> > > +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
>> > > orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
>> > >
>> > > // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
>> > >
>> > > But I get zero for the cycle count. Not sure what I'm missing.
>> > >
>> >
>> > No clue tbh. Does the counter work as expected in the host?
>> >
>>
>> Guess not. I dropped the test into a module_init and inserted
>> it on the host. Always get zero for pmccntr_el0 reads. Or, if
>> I set it to something non-zero with a write, then I always get
>> that back - no increments. pmcr_el0 looks OK... I had forgotten
>> to set bit 31 of pmcntenset_el0, but doing that still doesn't
>> help. Anyway, I assume the problem is me. I'll keep looking to
>> see what I'm missing.
>>
>
> I returned to this and see that the problem was indeed me. I needed yet
> another enable bit set (the filter register needed to be instructed to
> count cycles while in el2). I've attached the code for the curious.
> The numbers are mean=6999, std_dev=242. Run on the host, or in a guest
> running on a host without this patch series (after TVM traps have been
> disabled), I get a pretty consistent 40.
>
> I checked how many vm-sysreg traps we do during the kernel compile
> benchmark. It's 124924. So it's a bit strange that we don't see the
> benchmark taking 10 to 20 seconds longer on average. I should probably
> double check my runs. In any case, while I like the approach of this
> series, the overhead is looking non-negligible.
>

Thanks a lot for producing these numbers. 125k x 7k == <1 billion
cycles == <1 second on a >1 GHz machine, I think?
Or am I missing something? How long does the actual compile take?

-- 
Ard.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: cleanup iterator update logic

2015-02-24 Thread Michael S. Tsirkin
Recent iterator-related changes in vhost made it
harder to follow the logic fixing up the header.
In fact, the fixup always happens at the same
offset: sizeof(virtio_net_hdr): sometimes the
fixup iterator is updated by copy_to_iter,
sometimes-by iov_iter_advance.

Rearrange code to make this obvious.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 22 --
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index afa06d2..ca70434 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -591,11 +591,6 @@ static void handle_rx(struct vhost_net *net)
 * TODO: support TSO.
 */
iov_iter_advance(&msg.msg_iter, vhost_hlen);
-   } else {
-   /* It'll come from socket; we'll need to patch
-* ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF
-*/
-   iov_iter_advance(&fixup, sizeof(hdr));
}
err = sock->ops->recvmsg(NULL, sock, &msg,
 sock_len, MSG_DONTWAIT | MSG_TRUNC);
@@ -609,11 +604,18 @@ static void handle_rx(struct vhost_net *net)
continue;
}
/* Supply virtio_net_hdr if VHOST_NET_F_VIRTIO_NET_HDR */
-   if (unlikely(vhost_hlen) &&
-   copy_to_iter(&hdr, sizeof(hdr), &fixup) != sizeof(hdr)) {
-   vq_err(vq, "Unable to write vnet_hdr at addr %p\n",
-  vq->iov->iov_base);
-   break;
+   if (unlikely(vhost_hlen)) {
+   if (copy_to_iter(&hdr, sizeof(hdr),
+&fixup) != sizeof(hdr)) {
+   vq_err(vq, "Unable to write vnet_hdr "
+  "at addr %p\n", vq->iov->iov_base);
+   break;
+   }
+   } else {
+   /* Header came from socket; we'll need to patch
+* ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF
+*/
+   iov_iter_advance(&fixup, sizeof(hdr));
}
/* TODO: Should check and handle checksum. */
 
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: fix possible coalesced_mmio_ring page leaks.

2015-02-24 Thread Marcelo Tosatti
On Thu, Feb 12, 2015 at 12:58:21PM +0800, Xiubo Li wrote:
> It forgets to free coalesced_mmio_ring page after the anon_inode_getfd
> fails.
> 
> Signed-off-by: Xiubo Li 
> ---
>  virt/kvm/kvm_main.c | 16 +++-
>  1 file changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 8579f18..85e8106 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -2784,16 +2784,22 @@ static int kvm_dev_ioctl_create_vm(unsigned long type)
>   return PTR_ERR(kvm);
>  #ifdef KVM_COALESCED_MMIO_PAGE_OFFSET
>   r = kvm_coalesced_mmio_init(kvm);
> - if (r < 0) {
> - kvm_put_kvm(kvm);
> - return r;
> - }
> + if (r < 0)
> + goto out_put_kvm;
>  #endif
>   r = anon_inode_getfd("kvm-vm", &kvm_vm_fops, kvm, O_RDWR | O_CLOEXEC);
>   if (r < 0)
> - kvm_put_kvm(kvm);
> + goto out_mmio_free;

kvm_put_kvm -> kvm_destroy_vm -> kvm_coalesced_mmio_free.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0

2015-02-24 Thread Jan Kiszka
On 2015-02-24 17:30, Radim Krčmář wrote:
> 2015-02-23 19:05+0100, Kashyap Chamarthy:
>> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
>> applied it on both L0 and L1.
>>
>> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
>> stack trace from `dmesg` on L0 is took slightly different path than
>> before -- it's using MSR handling:
> 
> Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
> had it disabled.  L1 could then vmrun a L2 state that L0 would have to
> emulate, but that doesn't work.  There are at least these solutions:
> 
>  1) don't expose unrestricted_guest when L0 doesn't have it

Reminds me of a patch called "KVM: nVMX: Disable unrestricted mode if
ept=0" by Bandan. I thought that would have caught it - apparently not.

>  2) fix unrestricted mode emulation code
>  3) handle the failure a without killing L1
> 
> I'd do just (1) -- emulating unrestricted mode is a loss.

Agreed.

Jan

> 
> I have done initial testing and at least qemu-sanity-check works now:
> 
> ---8<---
> If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
> L0.  L1 triple faulted when running L2 guest that required emulation.
> 
> Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
> in L0's dmesg:
>   WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 
> nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()
> 
> Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
> the host doesn't have it enabled.
> 
> Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
> Signed-off-by: Radim Krčmář 
> ---
>  arch/x86/kvm/vmx.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index f7b20b417a3a..dbabea21357b 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
> *vmx)
>   if (enable_ept) {
>   /* nested EPT: emulate EPT also to L1 */
>   vmx->nested.nested_vmx_secondary_ctls_high |=
> - SECONDARY_EXEC_ENABLE_EPT |
> - SECONDARY_EXEC_UNRESTRICTED_GUEST;
> + SECONDARY_EXEC_ENABLE_EPT;
>   vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
>VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
>VMX_EPT_INVEPT_BIT;
> @@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
> *vmx)
>   } else
>   vmx->nested.nested_vmx_ept_caps = 0;
>  
> + if (enable_unrestricted_guest)
> + vmx->nested.nested_vmx_secondary_ctls_high |=
> + SECONDARY_EXEC_UNRESTRICTED_GUEST;
> +
>   /* miscellaneous data */
>   rdmsr(MSR_IA32_VMX_MISC,
>   vmx->nested.nested_vmx_misc_low,
> 

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] vhost: drop hard-coded num_buffers size

2015-02-24 Thread Michael S. Tsirkin
The 2 that we use for copy_to_iter comes from sizeof(u16),
it used to be that way before the iov iter update.
Fix it up, making it obvious the size of stack access
is right.

Signed-off-by: Michael S. Tsirkin 
---
 drivers/vhost/net.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index ca70434..2bbfc25 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -621,7 +621,8 @@ static void handle_rx(struct vhost_net *net)
 
num_buffers = cpu_to_vhost16(vq, headcount);
if (likely(mergeable) &&
-   copy_to_iter(&num_buffers, 2, &fixup) != 2) {
+   copy_to_iter(&num_buffers, sizeof num_buffers,
+&fixup) != sizeof num_buffers) {
vq_err(vq, "Failed num_buffers write");
vhost_discard_vq_desc(vq, headcount);
break;
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: nVMX: mask unrestricted_guest if disabled on L0

2015-02-24 Thread Radim Krčmář
2015-02-23 19:05+0100, Kashyap Chamarthy:
> Tested with the _correct_ Kernel[1] (that has Radim's patch) now --
> applied it on both L0 and L1.
> 
> Result: Same as before -- Booting L2 causes L1 to reboot. However, the
> stack trace from `dmesg` on L0 is took slightly different path than
> before -- it's using MSR handling:

Thanks, the problem was deeper ... L1 enabled unrestricted mode while L0
had it disabled.  L1 could then vmrun a L2 state that L0 would have to
emulate, but that doesn't work.  There are at least these solutions:

 1) don't expose unrestricted_guest when L0 doesn't have it
 2) fix unrestricted mode emulation code
 3) handle the failure a without killing L1

I'd do just (1) -- emulating unrestricted mode is a loss.

I have done initial testing and at least qemu-sanity-check works now:

---8<---
If EPT was enabled, unrestricted_guest was allowed in L1 regardless of
L0.  L1 triple faulted when running L2 guest that required emulation.

Another side effect was 'WARN_ON_ONCE(vmx->nested.nested_run_pending)'
in L0's dmesg:
  WARNING: CPU: 0 PID: 0 at arch/x86/kvm/vmx.c:9190 
nested_vmx_vmexit+0x96e/0xb00 [kvm_intel] ()

Prevent this scenario by masking SECONDARY_EXEC_UNRESTRICTED_GUEST when
the host doesn't have it enabled.

Fixes: 78051e3b7e35 ("KVM: nVMX: Disable unrestricted mode if ept=0")
Signed-off-by: Radim Krčmář 
---
 arch/x86/kvm/vmx.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f7b20b417a3a..dbabea21357b 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2476,8 +2476,7 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
if (enable_ept) {
/* nested EPT: emulate EPT also to L1 */
vmx->nested.nested_vmx_secondary_ctls_high |=
-   SECONDARY_EXEC_ENABLE_EPT |
-   SECONDARY_EXEC_UNRESTRICTED_GUEST;
+   SECONDARY_EXEC_ENABLE_EPT;
vmx->nested.nested_vmx_ept_caps = VMX_EPT_PAGE_WALK_4_BIT |
 VMX_EPTP_WB_BIT | VMX_EPT_2MB_PAGE_BIT |
 VMX_EPT_INVEPT_BIT;
@@ -2491,6 +2490,10 @@ static void nested_vmx_setup_ctls_msrs(struct vcpu_vmx 
*vmx)
} else
vmx->nested.nested_vmx_ept_caps = 0;
 
+   if (enable_unrestricted_guest)
+   vmx->nested.nested_vmx_secondary_ctls_high |=
+   SECONDARY_EXEC_UNRESTRICTED_GUEST;
+
/* miscellaneous data */
rdmsr(MSR_IA32_VMX_MISC,
vmx->nested.nested_vmx_misc_low,
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 03:47:37PM +0100, Ingo Molnar wrote:
> 
> * Greg KH  wrote:
> 
> > On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
> > > Paravirt spinlock clears slowpath flag after doing unlock.
> > > As explained by Linus currently it does:
> > > prev = *lock;
> > > add_smp(&lock->tickets.head, TICKET_LOCK_INC);
> > > 
> > > /* add_smp() is a full mb() */
> > > 
> > > if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> > > __ticket_unlock_slowpath(lock, prev);
> > > 
> > > which is *exactly* the kind of things you cannot do with spinlocks,
> > > because after you've done the "add_smp()" and released the spinlock
> > > for the fast-path, you can't access the spinlock any more.  Exactly
> > > because a fast-path lock might come in, and release the whole data
> > > structure.
> > > 
> > > Linus suggested that we should not do any writes to lock after unlock(),
> > > and we can move slowpath clearing to fastpath lock.
> > > 
> > > So this patch implements the fix with:
> > > 1. Moving slowpath flag to head (Oleg):
> > > Unlocked locks don't care about the slowpath flag; therefore we can keep
> > > it set after the last unlock, and clear it again on the first (try)lock.
> > > -- this removes the write after unlock. note that keeping slowpath flag 
> > > would
> > > result in unnecessary kicks.
> > > By moving the slowpath flag from the tail to the head ticket we also avoid
> > > the need to access both the head and tail tickets on unlock.
> > > 
> > > 2. use xadd to avoid read/write after unlock that checks the need for
> > > unlock_kick (Linus):
> > > We further avoid the need for a read-after-release by using xadd;
> > > the prev head value will include the slowpath flag and indicate if we
> > > need to do PV kicking of suspended spinners -- on modern chips xadd
> > > isn't (much) more expensive than an add + load.
> > > 
> > > Result:
> > >  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
> > >  benchmark overcommit %improve
> > >  kernbench  1x   -0.13
> > >  kernbench  2x0.02
> > >  dbench 1x   -1.77
> > >  dbench 2x   -0.63
> > > 
> > > [Jeremy: hinted missing TICKET_LOCK_INC for kick]
> > > [Oleg: Moving slowpath flag to head, ticket_equals idea]
> > > [PeterZ: Detailed changelog]
> > > 
> > > Reported-by: Sasha Levin 
> > > Suggested-by: Linus Torvalds 
> > > Signed-off-by: Raghavendra K T 
> > > Reviewed-by: Oleg Nesterov 
> > > Acked-by: David Vrabel 
> > > ---
> > >  arch/x86/include/asm/spinlock.h | 94 
> > > -
> > >  arch/x86/kernel/kvm.c   |  7 ++-
> > >  arch/x86/xen/spinlock.c |  7 ++-
> > >  3 files changed, 58 insertions(+), 50 deletions(-)
> > > 
> > > Changes for stable:
> > >   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause 
> > > horraneous
> > > Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
> > 
> > What is the git commit id of this in Linus's tree?  What 
> > stable tree(s) do you want this applied to?
> 
> It's:
> 
>  d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock
> 
> You'll also need this fix from Linus to avoid (harmless) 
> build warnings:
> 
>  dd36929720f4 kernel: make READ_ONCE() valid on const arguments

Great.  But what stable kernel trees should it be backported to?  Just
3.19?  Or anything older?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/RFT PATCH 0/3] arm64: KVM: work around incoherency with uncached guest mappings

2015-02-24 Thread Andrew Jones
On Fri, Feb 20, 2015 at 04:36:26PM +0100, Andrew Jones wrote:
> On Fri, Feb 20, 2015 at 02:37:25PM +, Ard Biesheuvel wrote:
> > On 20 February 2015 at 14:29, Andrew Jones  wrote:
> > > So looks like the 3 orders of magnitude greater number of traps
> > > (only to el2) don't impact kernel compiles.
> > >
> > 
> > OK, good! That was what I was hoping for, obviously.
> > 
> > > Then I thought I'd be able to quick measure the number of cycles
> > > a trap to el2 takes with this kvm-unit-tests test
> > >
> > > int main(void)
> > > {
> > > unsigned long start, end;
> > > unsigned int sctlr;
> > >
> > > asm volatile(
> > > "   mrs %0, sctlr_el1\n"
> > > "   msr pmcr_el0, %1\n"
> > > : "=&r" (sctlr) : "r" (5));
> > >
> > > asm volatile(
> > > "   mrs %0, pmccntr_el0\n"
> > > "   msr sctlr_el1, %2\n"
> > > "   mrs %1, pmccntr_el0\n"
> > > : "=&r" (start), "=&r" (end) : "r" (sctlr));
> > >
> > > printf("%llx\n", end - start);
> > > return 0;
> > > }
> > >
> > > after applying this patch to kvm
> > >
> > > diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> > > index bb91b6fc63861..5de39d740aa58 100644
> > > --- a/arch/arm64/kvm/hyp.S
> > > +++ b/arch/arm64/kvm/hyp.S
> > > @@ -770,7 +770,7 @@
> > >
> > > mrs x2, mdcr_el2
> > > and x2, x2, #MDCR_EL2_HPMN_MASK
> > > -   orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> > > +// orr x2, x2, #(MDCR_EL2_TPM | MDCR_EL2_TPMCR)
> > > orr x2, x2, #(MDCR_EL2_TDRA | MDCR_EL2_TDOSA)
> > >
> > > // Check for KVM_ARM64_DEBUG_DIRTY, and set debug to trap
> > >
> > > But I get zero for the cycle count. Not sure what I'm missing.
> > >
> > 
> > No clue tbh. Does the counter work as expected in the host?
> >
> 
> Guess not. I dropped the test into a module_init and inserted
> it on the host. Always get zero for pmccntr_el0 reads. Or, if
> I set it to something non-zero with a write, then I always get
> that back - no increments. pmcr_el0 looks OK... I had forgotten
> to set bit 31 of pmcntenset_el0, but doing that still doesn't
> help. Anyway, I assume the problem is me. I'll keep looking to
> see what I'm missing.
>

I returned to this and see that the problem was indeed me. I needed yet
another enable bit set (the filter register needed to be instructed to
count cycles while in el2). I've attached the code for the curious.
The numbers are mean=6999, std_dev=242. Run on the host, or in a guest
running on a host without this patch series (after TVM traps have been
disabled), I get a pretty consistent 40.

I checked how many vm-sysreg traps we do during the kernel compile
benchmark. It's 124924. So it's a bit strange that we don't see the
benchmark taking 10 to 20 seconds longer on average. I should probably
double check my runs. In any case, while I like the approach of this
series, the overhead is looking non-negligible.

drew
#include 

static void prep_cc(void)
{
asm volatile(
"   msr pmovsclr_el0, %0\n"
"   msr pmccfiltr_el0, %1\n"
"   msr pmcntenset_el0, %2\n"
"   msr pmcr_el0, %3\n"
"   isb\n"
:
: "r" (1 << 31), "r" (1 << 27), "r" (1 << 31),
  "r" (1 << 6 | 1 << 2 | 1 << 0));
}

int main(void)
{
unsigned long start, end;
unsigned int sctlr;
int i, zeros = 0;

asm volatile("mrs %0, sctlr_el1" : "=&r" (sctlr));
prep_cc();

for (i = 0; i < 10; ++i) {
asm volatile(
"   mrs %0, pmccntr_el0\n"
"   msr sctlr_el1, %2\n"
"   mrs %1, pmccntr_el0\n"
"   isb\n"
: "=&r" (start), "=&r" (end) : "r" (sctlr));

if ((i % 10) == 0)
printf("\n");

printf(" %d", end - start);

if ((end - start) == 0) {
++zeros;
prep_cc();
}
}

printf("\nnum zero counts = %d\n", zeros);
return 0;
}


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Ingo Molnar

* Greg KH  wrote:

> On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
> > Paravirt spinlock clears slowpath flag after doing unlock.
> > As explained by Linus currently it does:
> > prev = *lock;
> > add_smp(&lock->tickets.head, TICKET_LOCK_INC);
> > 
> > /* add_smp() is a full mb() */
> > 
> > if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> > __ticket_unlock_slowpath(lock, prev);
> > 
> > which is *exactly* the kind of things you cannot do with spinlocks,
> > because after you've done the "add_smp()" and released the spinlock
> > for the fast-path, you can't access the spinlock any more.  Exactly
> > because a fast-path lock might come in, and release the whole data
> > structure.
> > 
> > Linus suggested that we should not do any writes to lock after unlock(),
> > and we can move slowpath clearing to fastpath lock.
> > 
> > So this patch implements the fix with:
> > 1. Moving slowpath flag to head (Oleg):
> > Unlocked locks don't care about the slowpath flag; therefore we can keep
> > it set after the last unlock, and clear it again on the first (try)lock.
> > -- this removes the write after unlock. note that keeping slowpath flag 
> > would
> > result in unnecessary kicks.
> > By moving the slowpath flag from the tail to the head ticket we also avoid
> > the need to access both the head and tail tickets on unlock.
> > 
> > 2. use xadd to avoid read/write after unlock that checks the need for
> > unlock_kick (Linus):
> > We further avoid the need for a read-after-release by using xadd;
> > the prev head value will include the slowpath flag and indicate if we
> > need to do PV kicking of suspended spinners -- on modern chips xadd
> > isn't (much) more expensive than an add + load.
> > 
> > Result:
> >  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
> >  benchmark overcommit %improve
> >  kernbench  1x   -0.13
> >  kernbench  2x0.02
> >  dbench 1x   -1.77
> >  dbench 2x   -0.63
> > 
> > [Jeremy: hinted missing TICKET_LOCK_INC for kick]
> > [Oleg: Moving slowpath flag to head, ticket_equals idea]
> > [PeterZ: Detailed changelog]
> > 
> > Reported-by: Sasha Levin 
> > Suggested-by: Linus Torvalds 
> > Signed-off-by: Raghavendra K T 
> > Reviewed-by: Oleg Nesterov 
> > Acked-by: David Vrabel 
> > ---
> >  arch/x86/include/asm/spinlock.h | 94 
> > -
> >  arch/x86/kernel/kvm.c   |  7 ++-
> >  arch/x86/xen/spinlock.c |  7 ++-
> >  3 files changed, 58 insertions(+), 50 deletions(-)
> > 
> > Changes for stable:
> >   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
> > Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)
> 
> What is the git commit id of this in Linus's tree?  What 
> stable tree(s) do you want this applied to?

It's:

 d6abfdb20223 x86/spinlocks/paravirt: Fix memory corruption on unlock

You'll also need this fix from Linus to avoid (harmless) 
build warnings:

 dd36929720f4 kernel: make READ_ONCE() valid on const arguments

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Greg KH
On Tue, Feb 24, 2015 at 02:54:59PM +0530, Raghavendra K T wrote:
> Paravirt spinlock clears slowpath flag after doing unlock.
> As explained by Linus currently it does:
> prev = *lock;
> add_smp(&lock->tickets.head, TICKET_LOCK_INC);
> 
> /* add_smp() is a full mb() */
> 
> if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
> __ticket_unlock_slowpath(lock, prev);
> 
> which is *exactly* the kind of things you cannot do with spinlocks,
> because after you've done the "add_smp()" and released the spinlock
> for the fast-path, you can't access the spinlock any more.  Exactly
> because a fast-path lock might come in, and release the whole data
> structure.
> 
> Linus suggested that we should not do any writes to lock after unlock(),
> and we can move slowpath clearing to fastpath lock.
> 
> So this patch implements the fix with:
> 1. Moving slowpath flag to head (Oleg):
> Unlocked locks don't care about the slowpath flag; therefore we can keep
> it set after the last unlock, and clear it again on the first (try)lock.
> -- this removes the write after unlock. note that keeping slowpath flag would
> result in unnecessary kicks.
> By moving the slowpath flag from the tail to the head ticket we also avoid
> the need to access both the head and tail tickets on unlock.
> 
> 2. use xadd to avoid read/write after unlock that checks the need for
> unlock_kick (Linus):
> We further avoid the need for a read-after-release by using xadd;
> the prev head value will include the slowpath flag and indicate if we
> need to do PV kicking of suspended spinners -- on modern chips xadd
> isn't (much) more expensive than an add + load.
> 
> Result:
>  setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
>  benchmark overcommit %improve
>  kernbench  1x   -0.13
>  kernbench  2x0.02
>  dbench 1x   -1.77
>  dbench 2x   -0.63
> 
> [Jeremy: hinted missing TICKET_LOCK_INC for kick]
> [Oleg: Moving slowpath flag to head, ticket_equals idea]
> [PeterZ: Detailed changelog]
> 
> Reported-by: Sasha Levin 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Raghavendra K T 
> Reviewed-by: Oleg Nesterov 
> Acked-by: David Vrabel 
> ---
>  arch/x86/include/asm/spinlock.h | 94 
> -
>  arch/x86/kernel/kvm.c   |  7 ++-
>  arch/x86/xen/spinlock.c |  7 ++-
>  3 files changed, 58 insertions(+), 50 deletions(-)
> 
> Changes for stable:
>   - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
> Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)

What is the git commit id of this in Linus's tree?  What stable tree(s)
do you want this applied to?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] MIPS: KVM: Fix trace event to save PC directly

2015-02-24 Thread Steven Rostedt
On Tue, 24 Feb 2015 11:46:20 +
James Hogan  wrote:


> Lets save the actual PC in the structure so that the correct value is
> accessible later.
> 
> Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
> Signed-off-by: James Hogan 
> Cc: Paolo Bonzini 
> Cc: Ralf Baechle 
> Cc: Marcelo Tosatti 
> Cc: Gleb Natapov 
> Cc: Steven Rostedt 

Acked-by: Steven Rostedt 

-- Steve

> Cc: Ingo Molnar 
> Cc: linux-m...@linux-mips.org
> Cc: kvm@vger.kernel.org
> Cc:  # v3.10+
> ---
>  arch/mips/kvm/trace.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] MIPS: KVM: Fix trace event to save PC directly

2015-02-24 Thread James Hogan
Currently the guest exit trace event saves the VCPU pointer to the
structure, and the guest PC is retrieved by dereferencing it when the
event is printed rather than directly from the trace record. This isn't
safe as the printing may occur long afterwards, after the PC has changed
and potentially after the VCPU has been freed. Usually this results in
the same (wrong) PC being printed for multiple trace events. It also
isn't portable as userland has no way to access the VCPU data structure
when interpreting the trace record itself.

Lets save the actual PC in the structure so that the correct value is
accessible later.

Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM")
Signed-off-by: James Hogan 
Cc: Paolo Bonzini 
Cc: Ralf Baechle 
Cc: Marcelo Tosatti 
Cc: Gleb Natapov 
Cc: Steven Rostedt 
Cc: Ingo Molnar 
Cc: linux-m...@linux-mips.org
Cc: kvm@vger.kernel.org
Cc:  # v3.10+
---
 arch/mips/kvm/trace.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/kvm/trace.h b/arch/mips/kvm/trace.h
index c1388d40663b..bd6437f67dc0 100644
--- a/arch/mips/kvm/trace.h
+++ b/arch/mips/kvm/trace.h
@@ -24,18 +24,18 @@ TRACE_EVENT(kvm_exit,
TP_PROTO(struct kvm_vcpu *vcpu, unsigned int reason),
TP_ARGS(vcpu, reason),
TP_STRUCT__entry(
-   __field(struct kvm_vcpu *, vcpu)
+   __field(unsigned long, pc)
__field(unsigned int, reason)
),
 
TP_fast_assign(
-   __entry->vcpu = vcpu;
+   __entry->pc = vcpu->arch.pc;
__entry->reason = reason;
),
 
TP_printk("[%s]PC: 0x%08lx",
  kvm_mips_exit_types_str[__entry->reason],
- __entry->vcpu->arch.pc)
+ __entry->pc)
 );
 
 #endif /* _TRACE_KVM_H */
-- 
2.0.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH for stable] x86/spinlocks/paravirt: Fix memory corruption on unlock

2015-02-24 Thread Raghavendra K T
Paravirt spinlock clears slowpath flag after doing unlock.
As explained by Linus currently it does:
prev = *lock;
add_smp(&lock->tickets.head, TICKET_LOCK_INC);

/* add_smp() is a full mb() */

if (unlikely(lock->tickets.tail & TICKET_SLOWPATH_FLAG))
__ticket_unlock_slowpath(lock, prev);

which is *exactly* the kind of things you cannot do with spinlocks,
because after you've done the "add_smp()" and released the spinlock
for the fast-path, you can't access the spinlock any more.  Exactly
because a fast-path lock might come in, and release the whole data
structure.

Linus suggested that we should not do any writes to lock after unlock(),
and we can move slowpath clearing to fastpath lock.

So this patch implements the fix with:
1. Moving slowpath flag to head (Oleg):
Unlocked locks don't care about the slowpath flag; therefore we can keep
it set after the last unlock, and clear it again on the first (try)lock.
-- this removes the write after unlock. note that keeping slowpath flag would
result in unnecessary kicks.
By moving the slowpath flag from the tail to the head ticket we also avoid
the need to access both the head and tail tickets on unlock.

2. use xadd to avoid read/write after unlock that checks the need for
unlock_kick (Linus):
We further avoid the need for a read-after-release by using xadd;
the prev head value will include the slowpath flag and indicate if we
need to do PV kicking of suspended spinners -- on modern chips xadd
isn't (much) more expensive than an add + load.

Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
 benchmark overcommit %improve
 kernbench  1x   -0.13
 kernbench  2x0.02
 dbench 1x   -1.77
 dbench 2x   -0.63

[Jeremy: hinted missing TICKET_LOCK_INC for kick]
[Oleg: Moving slowpath flag to head, ticket_equals idea]
[PeterZ: Detailed changelog]

Reported-by: Sasha Levin 
Suggested-by: Linus Torvalds 
Signed-off-by: Raghavendra K T 
Reviewed-by: Oleg Nesterov 
Acked-by: David Vrabel 
---
 arch/x86/include/asm/spinlock.h | 94 -
 arch/x86/kernel/kvm.c   |  7 ++-
 arch/x86/xen/spinlock.c |  7 ++-
 3 files changed, 58 insertions(+), 50 deletions(-)

Changes for stable:
  - Don't replace the ACCESS_ONCE to READ_ONCE which would cause horraneous
Compiler warnings (Linus, David Vbriel, PeterZ, Ingo)

Changes since V4:
  - one more usage of tickets_equal() (Oleg)
  - head > tail situation can lead to false contended check (Oleg)
  - smp_mb__after_atomic() added after slowptha_enter (Oleg)

Changes since V3:
  - Detailed changelog (PeterZ)
  - Replace ACCESS_ONCE with READ_ONCE (oleg)
  - Add xen changes (Oleg)
  - Correct break logic in unlock_wait() (Oleg)

Changes since V2:
  - Move the slowpath flag to head, this enables xadd usage in unlock code
and inturn we can get rid of read/write after  unlock (Oleg)
  - usage of ticket_equals (Oleg)

Changes since V1:
  - Add missing TICKET_LOCK_INC before unlock kick (fixes hang in overcommit: 
Jeremy).
  - Remove SLOWPATH_FLAG clearing in fast lock. (Jeremy)
  - clear SLOWPATH_FLAG in arch_spin_value_unlocked during comparison.

 Result:
 setup: 16core (32 cpu +ht sandy bridge 8GB 16vcpu guest)
base = 3_19_rc7

3_19_rc7_spinfix_v3
+---+---+---++---+
 kernbench (Time taken in sec lower is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x   54.2300 3.0652 54.3008 4.0366-0.13056
2x   90.1883 5.5509 90.1650 6.4336 0.02583
+---+---+---++---+
+---+---+---++---+
dbench (Throughput higher is better)
+---+---+---++---+
 base   %stdevpatched  %stdev  %improve
+---+---+---++---+
1x 7029.9188 2.5952   6905.0712 4.4737-1.77595
2x 3254.207514.8291   3233.713726.8784-0.62976
+---+---+---++---+

diff --git a/arch/x86/include/asm/spinlock.h b/arch/x86/include/asm/spinlock.h
index 625660f..cf87de3 100644
--- a/arch/x86/include/asm/spinlock.h
+++ b/arch/x86/include/asm/spinlock.h
@@ -46,7 +46,7 @@ static __always_inline bool static_key_false(struct 
static_key *key);
 
 static inline void __ticket_enter_slowpath(arch_spinlock_t *lock)
 {
-   set_bit(0, (volatile unsigned long *)&lock->tickets.tail);
+   set_bit(0, (volatile unsigned long *)&lock->tickets.head);
 }
 
 #else  /* !CONFIG_PARAVIRT_SPINLOCKS */
@@ -60,10 +60,30 @@ static inline void __ticket_unlock_kick(arch_spinlock_t 
*lock,
 }
 
 #endif /* CONFIG_PARAVIRT_SPINLOCKS */
+static inline int