Re: [patch 08/16] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-11-02 Thread Marcelo Tosatti
On Fri, Nov 02, 2012 at 02:23:06PM +0400, Glauber Costa wrote:
> On 11/02/2012 01:39 AM, Marcelo Tosatti wrote:
> > On Thu, Nov 01, 2012 at 06:28:31PM +0400, Glauber Costa wrote:
> >> On 11/01/2012 02:47 AM, Marcelo Tosatti wrote:
> >>> Allow a guest to register a second location for the VCPU time info
> >>>
> >>> structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
> >>> This is intended to allow the guest kernel to map this information
> >>> into a usermode accessible page, so that usermode can efficiently
> >>> calculate system time from the TSC without having to make a syscall.
> >>>
> >>> Signed-off-by: Marcelo Tosatti 
> >>>
> >>
> >> Changelog doesn't make a lot of sense. (specially from first line to the
> >> second). Please add in here the reasons why we can't (or decided not to)
> >> use the same page. The info in the last mail thread is good enough, just
> >> put it here.
> > 
> > Fixed.
> > 
> >>> Index: vsyscall/arch/x86/include/asm/kvm_para.h
> >>> ===
> >>> --- vsyscall.orig/arch/x86/include/asm/kvm_para.h
> >>> +++ vsyscall/arch/x86/include/asm/kvm_para.h
> >>> @@ -23,6 +23,7 @@
> >>>  #define KVM_FEATURE_ASYNC_PF 4
> >>>  #define KVM_FEATURE_STEAL_TIME   5
> >>>  #define KVM_FEATURE_PV_EOI   6
> >>> +#define KVM_FEATURE_USERSPACE_CLOCKSOURCE 7
> >>>  
> >>>  /* The last 8 bits are used to indicate how to interpret the flags field
> >>>   * in pvclock structure. If no bits are set, all flags are ignored.
> >>> @@ -39,6 +40,7 @@
> >>>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
> >>>  #define MSR_KVM_STEAL_TIME  0x4b564d03
> >>>  #define MSR_KVM_PV_EOI_EN  0x4b564d04
> >>> +#define MSR_KVM_USERSPACE_TIME  0x4b564d05
> >>>  
> >>
> >> I accept that it is possible that we may be better off with the page
> >> mapped twice.
> >>
> >> But why do we need an extra MSR? When, and why, would you enable the
> >> kernel-based pvclock, but disabled the userspace pvclock?
> > 
> > Because there is no stable TSC available, for example (which cannot
> > be used to measure passage of time).
> > 
> 
> What you say is true, but completely unrelated. I am not talking about
> situations in which userspace pvclock is available and you end up not
> using it.
> 
> I am talking about situations in which it is available, you are capable
> of using it, but then decides for some reason to permanently disabled -
> as in not setting it up altogether.
> 
> It seems to me that if the host has code to deal with userspace pvclock,
> and you already coded the guest in a way that you may or may not use it
> (dependent on the value of the stable bit), you could very well only
> check for the cpuid flag, and do the guest setup if available - skipping
> this MSR dance altogether.
> 
> Now, of course, there is the problem of communicating the address in
> which the guest expects the page to be. Skipping the MSR setup would
> require it to be more or less at a fixed location. We could in principle
> lay them down together with the already existing pvclock structure. (But
> granted, I am not sure it is worth it...)
> 
> I think in general, this question deserves a bit more of attention. We
> are about to have just the perfect opportunity for this next week, so
> let's use it.

In essence you are proposing a different interface to communicate the
"userspace vsyscall pvclock area", other than the MSR, right?

If so:

1. What is the problem with the MSR interface.
2. What advantage this new interface (which honestly i do not
understand) provides?


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 net-next 0/8] enable/disable zero copy tx dynamically

2012-11-02 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 1 Nov 2012 21:16:17 +0200

> 
> tun supports zero copy transmit since 
> 0690899b4d4501b3505be069b9a687e68ccbe15b,
> however you can only enable this mode if you know your workload does not
> trigger heavy guest to host/host to guest traffic - otherwise you
> get a (minor) performance regression.
> This patchset addresses this problem by notifying the owner
> device when callback is invoked because of a data copy.
> This makes it possible to detect whether zero copy is appropriate
> dynamically: we start in zero copy mode, when we detect
> data copied we disable zero copy for a while.
> 
> With this patch applied, I get the same performance for
> guest to host and guest to guest both with and without zero copy tx.

Series applied, thanks Michael.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 14/20] MIPS: Use the UM bit instead of the CU0 enable bit in the status register to figure out the stack for saving regs.

2012-11-02 Thread Maciej W. Rozycki
On Wed, 31 Oct 2012, Sanjay Lal wrote:

> diff --git a/arch/mips/include/asm/stackframe.h 
> b/arch/mips/include/asm/stackframe.h
> index cb41af5..59c9245 100644
> --- a/arch/mips/include/asm/stackframe.h
> +++ b/arch/mips/include/asm/stackframe.h
> @@ -30,7 +30,7 @@
>  #define STATMASK 0x1f
>  #endif
>  
> -#ifdef CONFIG_MIPS_MT_SMTC
> +#if defined(CONFIG_MIPS_MT_SMTC) || defined (CONFIG_MIPS_HW_FIBERS)
>  #include 
>  #endif /* CONFIG_MIPS_MT_SMTC */
>  
> @@ -162,9 +162,9 @@
>   .setnoat
>   .setreorder
>   mfc0k0, CP0_STATUS
> - sll k0, 3   /* extract cu0 bit */
> + andik0,k0,0x10  /* check user mode bit*/
>   .setnoreorder
> - bltzk0, 8f
> + beq k0, $0, 8f
>move   k1, sp
>   .setreorder
>   /* Called from user mode, new stack. */

 Any reason this is needed for?  If so, then given that this is generic 
code a corresponding piece has to be added to support the MIPS I ISA 
processors that have the user mode bit in a different location.  
Presumably you'll update all the other places that fiddle with 
CP0.Status.CU0 too?

  Maciej
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/20] KVM for MIPS32 Processors

2012-11-02 Thread Sanjay Lal

On Nov 1, 2012, at 10:51 AM, Avi Kivity wrote:

> On 10/31/2012 05:17 PM, Sanjay Lal wrote:
>> The following patchset implements KVM support for MIPS32R2 processors,
>> using Trap & Emulate, with basic runtime binary translation to improve
>> performance.  The goal has been to keep the Guest kernel changes to a
>> minimum.
>> 
>> The patch is against Linux 3.7-rc2.  
>> 
>> There is a companion patchset for QEMU that adds KVM support for the 
>> MIPS target.
>> 
>> KVM/MIPS should support MIPS32-R2 processors and beyond.
>> It has been tested on the following platforms:
>>  - Malta Board with FPGA based 34K (Little Endian).
>>  - Sigma Designs TangoX board with a 24K based 8654 SoC (Little Endian).
>>  - Malta Board with 74K @ 1GHz (Little Endian).
>>  - OVPSim MIPS simulator from Imperas emulating a Malta board with 
>>24Kc and 1074Kc cores (Little Endian).
>> 
>> Both Guest kernel and Guest Userspace execute in UM. The Guest address space 
>> is
>> as folows:
>> Guest User address space:   0x -> 0x4000
>> Guest Kernel Unmapped:  0x4000 -> 0x6000
>> Guest Kernel Mapped:0x6000 -> 0x8000
>> 
>> As a result, Guest Usermode virtual memory is limited to 1GB.
>> 
>> Relase Notes
>> 
>> (1) 16K Page Size:
>>Both Host Kernel and Guest Kernel should have the same page size, 
>>currently at least 16K.  Note that due to cache aliasing issues, 
>>4K page sizes are NOT supported.
>> 
>> (2) No HugeTLB/Large Page Support:
>>Both the host kernel and Guest kernel should have the page size 
>>set to at least 16K.
>>This will be implemented in a future release.
>> 
>> (3) SMP Guests to not work
>>Linux-3.7-rc2 based SMP guest hangs due to the following code sequence 
>>in the generated TLB handlers:
>> LL/TLBP/SC
>>Since the TLBP instruction causes a trap the reservation gets cleared
>>when we ERET back to the guest. This causes the guest to hang in an 
>>infinite loop.
>>As a workaround, make sure that CONFIG_SMP is disabled for Guest kernels.
>>This will be fixed in a future release.
>> 
>> (4) FPU support:
>>Currently KVM/MIPS emulates a 24K CPU without a FPU.
>>This will be fixed in a future release
>> 
> 
> Thanks for posting this, new architectures are always a welcome addition.
> 
> Some general notes:
> - please read and follow Documentation/CodingStyle.  In general the
> patches are okay except for indentation (use tabs, not spaces, and set
> your editor tab width to 8).

I'll definitely be re-formatting the code based on the recommended coding style 
and running the patches through checkpatch.pl for v2 of the patch set.

Regards
Sanjay

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/20] KVM/MIPS32: Arch specific KVM data structures.

2012-11-02 Thread Sanjay Lal

On Nov 1, 2012, at 11:04 AM, Avi Kivity wrote:

> On 10/31/2012 05:18 PM, Sanjay Lal wrote:
> 
>> +
>> +/* Special address that contains the comm page, used for reducing # of 
>> traps */
>> +#define KVM_GUEST_COMMPAGE_ADDR 0x0
>> +
>> +struct kvm_arch
>> +{
>> +/* Guest GVA->HPA page table */
>> +ulong *guest_pmap;
>> +ulong guest_pmap_npages;
>> +
>> +/* Wired host TLB used for the commpage */
>> +int commpage_tlb;
>> +
>> +pfn_t (*gfn_to_pfn) (struct kvm *kvm, gfn_t gfn);
>> +void (*release_pfn_clean) (pfn_t pfn);
>> +bool (*is_error_pfn) (pfn_t pfn);
> 
> Why this indirection?  Do those functions change at runtime?

On MIPS, kernel modules are executed from "mapped space", which requires TLBs.  
The TLB handling code is statically linked with the rest of the kernel 
(kvm_tlb.c) to avoid the possibility of double faulting. The problem is that 
the code references routines that are part of the the KVM module, which are 
only available once the module is loaded, hence the indirection.

> 
>> +
>> +struct kvm_mips_callbacks {
>> +int (*handle_cop_unusable)(struct kvm_vcpu *vcpu);
>> +int (*handle_tlb_mod)(struct kvm_vcpu *vcpu);
>> +int (*handle_tlb_ld_miss)(struct kvm_vcpu *vcpu);
>> +int (*handle_tlb_st_miss)(struct kvm_vcpu *vcpu);
>> +int (*handle_addr_err_st)(struct kvm_vcpu *vcpu);
>> +int (*handle_addr_err_ld)(struct kvm_vcpu *vcpu);
>> +int (*handle_syscall)(struct kvm_vcpu *vcpu);
>> +int (*handle_res_inst)(struct kvm_vcpu *vcpu);
>> +int (*handle_break)(struct kvm_vcpu *vcpu);
>> +gpa_t (*gva_to_gpa)(gva_t gva);
>> +void (*queue_timer_int)(struct kvm_vcpu *vcpu);
>> +void (*dequeue_timer_int)(struct kvm_vcpu *vcpu);
>> +void (*queue_io_int)(struct kvm_vcpu *vcpu, struct kvm_mips_interrupt 
>> *irq);
>> +void (*dequeue_io_int)(struct kvm_vcpu *vcpu, struct kvm_mips_interrupt 
>> *irq);
>> +int (*irq_deliver)(struct kvm_vcpu *vcpu, unsigned int priority, 
>> uint32_t cause);
>> +int (*irq_clear)(struct kvm_vcpu *vcpu, unsigned int priority, uint32_t 
>> cause);
>> +int (*vcpu_ioctl_get_regs)(struct kvm_vcpu *vcpu, struct kvm_regs 
>> *regs);
>> +int (*vcpu_ioctl_set_regs)(struct kvm_vcpu *vcpu, struct kvm_regs 
>> *regs);
>> +int (*vcpu_init)(struct kvm_vcpu *vcpu);
>> +};
> 
> We use callbacks on x86 because we have two separate implementations
> (svm and vmx).  Will that be the case on MIPS? If not, use direct calls.

We will eventually have separate implementations based on the features 
supported by H/W.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of select privileged instructions.

2012-11-02 Thread Sanjay Lal

On Nov 1, 2012, at 11:24 AM, Avi Kivity wrote:

> On 10/31/2012 05:19 PM, Sanjay Lal wrote:
>> Currently, the following instructions are translated:
>> - CACHE (indexed)
>> - CACHE (va based): translated to a synci, overkill on D-CACHE operations, 
>> but still much faster than a trap.
>> - mfc0/mtc0: the virtual COP0 registers for the guest are implemented as 2-D 
>> array
>>  [COP#][SEL] and this is mapped into the guest kernel address space @ VA 0x0.
>>  mfc0/mtc0 operations are transformed to load/stores.
>> 
> 
> Seems to be more of binary patching, yes?  Binary translation usually
> involves hiding the translated code so the guest is not able to detect
> that it is patched.

Now that you mention it, I think binary patching would be more applicable.  If 
the "self-aware" guest ever compared the code it would realize that it has 
changed.

Regards
Sanjay


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 07/20] KVM/MIPS32: Dynamic binary translation of select privileged instructions.

2012-11-02 Thread Sanjay Lal

On Nov 1, 2012, at 11:24 AM, Avi Kivity wrote:

> On 10/31/2012 05:19 PM, Sanjay Lal wrote:
>> Currently, the following instructions are translated:
>> - CACHE (indexed)
>> - CACHE (va based): translated to a synci, overkill on D-CACHE operations, 
>> but still much faster than a trap.
>> - mfc0/mtc0: the virtual COP0 registers for the guest are implemented as 2-D 
>> array
>>  [COP#][SEL] and this is mapped into the guest kernel address space @ VA 0x0.
>>  mfc0/mtc0 operations are transformed to load/stores.
>> 
> 
> Seems to be more of binary patching, yes?  Binary translation usually
> involves hiding the translated code so the guest is not able to detect
> that it is patched.
> 
> 
> -- 
> error compiling committee.c: too many arguments to function
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target-i386: cpu: fix --disable-kvm compilation

2012-11-02 Thread Andreas Färber
Am 02.11.2012 17:25, schrieb Eduardo Habkost:
> This fixes the following:
>   target-i386/cpu.o: In function `kvm_cpu_fill_host':
>   target-i386/cpu.c:783: undefined reference to `kvm_state'
> 
> I didn't notice the problem before because GCC was optimizing the entire
> kvm_cpu_fill_host() function out (because all calls are conditional on
> kvm_enabled()).
> 
> * cpu_x86_fill_model_id() is used only if CONFIG_KVM is set, so #ifdef it
>   entirely to avoid compiler warnings.
> 
> * kvm_cpu_fill_host() should be called only if KVM is enabled, so
>   use #ifdef CONFIG_KVM around the entire function body.
> 
> Reported-by: Andreas Färber 
> Signed-off-by: Eduardo Habkost 

Acked-by: Andreas Färber 

If no one objects to this solution, unless Marcelo or Avi beat me, I'll
send out an urgent pull request as requested by Anthony.

Thanks,
Andreas

> ---
>  target-i386/cpu.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index c46286a..e1db639 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -758,6 +758,7 @@ static x86_def_t builtin_x86_defs[] = {
>  },
>  };
>  
> +#ifdef CONFIG_KVM
>  static int cpu_x86_fill_model_id(char *str)
>  {
>  uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
> @@ -772,6 +773,7 @@ static int cpu_x86_fill_model_id(char *str)
>  }
>  return 0;
>  }
> +#endif
>  
>  /* Fill a x86_def_t struct with information about the host CPU, and
>   * the CPU features supported by the host hardware + host kernel
> @@ -780,6 +782,7 @@ static int cpu_x86_fill_model_id(char *str)
>   */
>  static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>  {
> +#ifdef CONFIG_KVM
>  KVMState *s = kvm_state;
>  uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
>  
> @@ -838,6 +841,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>   * unsupported ones later.
>   */
>  x86_cpu_def->svm_features = -1;
> +#endif /* CONFIG_KVM */
>  }
>  
>  static int unavailable_host_feature(struct model_features_t *f, uint32_t 
> mask)
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] target-i386: cpu: fix --disable-kvm compilation

2012-11-02 Thread Eduardo Habkost
This fixes the following:
  target-i386/cpu.o: In function `kvm_cpu_fill_host':
  target-i386/cpu.c:783: undefined reference to `kvm_state'

I didn't notice the problem before because GCC was optimizing the entire
kvm_cpu_fill_host() function out (because all calls are conditional on
kvm_enabled()).

* cpu_x86_fill_model_id() is used only if CONFIG_KVM is set, so #ifdef it
  entirely to avoid compiler warnings.

* kvm_cpu_fill_host() should be called only if KVM is enabled, so
  use #ifdef CONFIG_KVM around the entire function body.

Reported-by: Andreas Färber 
Signed-off-by: Eduardo Habkost 
---
 target-i386/cpu.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index c46286a..e1db639 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -758,6 +758,7 @@ static x86_def_t builtin_x86_defs[] = {
 },
 };
 
+#ifdef CONFIG_KVM
 static int cpu_x86_fill_model_id(char *str)
 {
 uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
@@ -772,6 +773,7 @@ static int cpu_x86_fill_model_id(char *str)
 }
 return 0;
 }
+#endif
 
 /* Fill a x86_def_t struct with information about the host CPU, and
  * the CPU features supported by the host hardware + host kernel
@@ -780,6 +782,7 @@ static int cpu_x86_fill_model_id(char *str)
  */
 static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
 {
+#ifdef CONFIG_KVM
 KVMState *s = kvm_state;
 uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
 
@@ -838,6 +841,7 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
  * unsupported ones later.
  */
 x86_cpu_def->svm_features = -1;
+#endif /* CONFIG_KVM */
 }
 
 static int unavailable_host_feature(struct model_features_t *f, uint32_t mask)
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 27/28] target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID

2012-11-02 Thread Eduardo Habkost
On Fri, Nov 02, 2012 at 04:34:00PM +0100, Andreas Färber wrote:
> Am 31.10.2012 10:40, schrieb Marcelo Tosatti:
> > From: Eduardo Habkost 
> > 
> > Change the kvm_cpu_fill_host() function to use
> > kvm_arch_get_supported_cpuid() instead of running the CPUID instruction
> > directly, when checking for supported CPUID features.
> > 
> > This should solve two problems at the same time:
> > 
> >  * "-cpu host" was not enabling features that don't need support on
> >the host CPU (e.g. x2apic);
> >  * "check" and "enforce" options were not detecting problems when the
> >host CPU did support a feature, but the KVM kernel code didn't
> >support it.
> > 
> > Signed-off-by: Eduardo Habkost 
> > Signed-off-by: Marcelo Tosatti 
> > ---
> >  target-i386/cpu.c |   25 +++--
> >  1 files changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> > index 390ed47..4c84e9f 100644
> > --- a/target-i386/cpu.c
> > +++ b/target-i386/cpu.c
> > @@ -773,13 +773,13 @@ static int cpu_x86_fill_model_id(char *str)
> >   */
> >  static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
> >  {
> > +KVMState *s = kvm_state;
> 
> This broke the linux-user build:
> 
> target-i386/cpu.o: In function `kvm_cpu_fill_host':
> /home/andreas/QEMU/qemu-rcar/target-i386/cpu.c:783: undefined reference
> to `kvm_state'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-i386] Fehler 1
> make: *** [subdir-i386-linux-user] Fehler 2
> 
> Any idea how to fix?

This function should never be called without CONFIG_KVM, so we can
#ifdef out the whole function body. I will send a patch shortly.


> 
> Andreas
> 
> >  uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
> >  
> >  assert(kvm_enabled());
> >  
> >  x86_cpu_def->name = "host";
> >  host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
> > -x86_cpu_def->level = eax;
> >  x86_cpu_def->vendor1 = ebx;
> >  x86_cpu_def->vendor2 = edx;
> >  x86_cpu_def->vendor3 = ecx;
> > @@ -788,21 +788,24 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
> >  x86_cpu_def->family = ((eax >> 8) & 0x0F) + ((eax >> 20) & 0xFF);
> >  x86_cpu_def->model = ((eax >> 4) & 0x0F) | ((eax & 0xF) >> 12);
> >  x86_cpu_def->stepping = eax & 0x0F;
> > -x86_cpu_def->ext_features = ecx;
> > -x86_cpu_def->features = edx;
> > +
> > +x86_cpu_def->level = kvm_arch_get_supported_cpuid(s, 0x0, 0, R_EAX);
> > +x86_cpu_def->features = kvm_arch_get_supported_cpuid(s, 0x1, 0, R_EDX);
> > +x86_cpu_def->ext_features = kvm_arch_get_supported_cpuid(s, 0x1, 0, 
> > R_ECX);
> >  
> >  if (x86_cpu_def->level >= 7) {
> > -x86_cpu_def->cpuid_7_0_ebx_features = 
> > kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
> > +x86_cpu_def->cpuid_7_0_ebx_features =
> > +kvm_arch_get_supported_cpuid(s, 0x7, 0, R_EBX);
> >  } else {
> >  x86_cpu_def->cpuid_7_0_ebx_features = 0;
> >  }
> >  
> > -host_cpuid(0x8000, 0, &eax, &ebx, &ecx, &edx);
> > -x86_cpu_def->xlevel = eax;
> > +x86_cpu_def->xlevel = kvm_arch_get_supported_cpuid(s, 0x8000, 0, 
> > R_EAX);
> > +x86_cpu_def->ext2_features =
> > +kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX);
> > +x86_cpu_def->ext3_features =
> > +kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
> >  
> > -host_cpuid(0x8001, 0, &eax, &ebx, &ecx, &edx);
> > -x86_cpu_def->ext2_features = edx;
> > -x86_cpu_def->ext3_features = ecx;
> >  cpu_x86_fill_model_id(x86_cpu_def->model_id);
> >  x86_cpu_def->vendor_override = 0;
> >  
> > @@ -811,11 +814,13 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
> >  x86_cpu_def->vendor2 == CPUID_VENDOR_VIA_2 &&
> >  x86_cpu_def->vendor3 == CPUID_VENDOR_VIA_3) {
> >  host_cpuid(0xC000, 0, &eax, &ebx, &ecx, &edx);
> > +eax = kvm_arch_get_supported_cpuid(s, 0xC000, 0, R_EAX);
> >  if (eax >= 0xC001) {
> >  /* Support VIA max extended level */
> >  x86_cpu_def->xlevel2 = eax;
> >  host_cpuid(0xC001, 0, &eax, &ebx, &ecx, &edx);
> > -x86_cpu_def->ext4_features = edx;
> > +x86_cpu_def->ext4_features =
> > +kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
> >  }
> >  }
> >  
> > 
> 
> 
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

-- 
Eduardo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 27/28] target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID

2012-11-02 Thread Andreas Färber
Am 02.11.2012 16:34, schrieb Andreas Färber:
> Am 31.10.2012 10:40, schrieb Marcelo Tosatti:
>> From: Eduardo Habkost 
>>
>> Change the kvm_cpu_fill_host() function to use
>> kvm_arch_get_supported_cpuid() instead of running the CPUID instruction
>> directly, when checking for supported CPUID features.
>>
>> This should solve two problems at the same time:
>>
>>  * "-cpu host" was not enabling features that don't need support on
>>the host CPU (e.g. x2apic);
>>  * "check" and "enforce" options were not detecting problems when the
>>host CPU did support a feature, but the KVM kernel code didn't
>>support it.
>>
>> Signed-off-by: Eduardo Habkost 
>> Signed-off-by: Marcelo Tosatti 
>> ---
>>  target-i386/cpu.c |   25 +++--
>>  1 files changed, 15 insertions(+), 10 deletions(-)
>>
>> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
>> index 390ed47..4c84e9f 100644
>> --- a/target-i386/cpu.c
>> +++ b/target-i386/cpu.c
>> @@ -773,13 +773,13 @@ static int cpu_x86_fill_model_id(char *str)
>>   */
>>  static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>>  {
>> +KVMState *s = kvm_state;
> 
> This broke the linux-user build:
> 
> target-i386/cpu.o: In function `kvm_cpu_fill_host':
> /home/andreas/QEMU/qemu-rcar/target-i386/cpu.c:783: undefined reference
> to `kvm_state'
> collect2: error: ld returned 1 exit status
> make[1]: *** [qemu-i386] Fehler 1
> make: *** [subdir-i386-linux-user] Fehler 2

As a quickfix this would work, but strikes me as ugly:

Signed-off-by: Andreas Färber 

diff --git a/target-i386/cpu.c b/target-i386/cpu.c
index c46286a..8663623 100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -780,7 +780,11 @@ static int cpu_x86_fill_model_id(char *str)
  */
 static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
 {
+#ifdef CONFIG_KVM
 KVMState *s = kvm_state;
+#else
+KVMState *s = NULL;
+#endif
 uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;

 assert(kvm_enabled());

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH 27/28] target-i386: kvm_cpu_fill_host: use GET_SUPPORTED_CPUID

2012-11-02 Thread Andreas Färber
Am 31.10.2012 10:40, schrieb Marcelo Tosatti:
> From: Eduardo Habkost 
> 
> Change the kvm_cpu_fill_host() function to use
> kvm_arch_get_supported_cpuid() instead of running the CPUID instruction
> directly, when checking for supported CPUID features.
> 
> This should solve two problems at the same time:
> 
>  * "-cpu host" was not enabling features that don't need support on
>the host CPU (e.g. x2apic);
>  * "check" and "enforce" options were not detecting problems when the
>host CPU did support a feature, but the KVM kernel code didn't
>support it.
> 
> Signed-off-by: Eduardo Habkost 
> Signed-off-by: Marcelo Tosatti 
> ---
>  target-i386/cpu.c |   25 +++--
>  1 files changed, 15 insertions(+), 10 deletions(-)
> 
> diff --git a/target-i386/cpu.c b/target-i386/cpu.c
> index 390ed47..4c84e9f 100644
> --- a/target-i386/cpu.c
> +++ b/target-i386/cpu.c
> @@ -773,13 +773,13 @@ static int cpu_x86_fill_model_id(char *str)
>   */
>  static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>  {
> +KVMState *s = kvm_state;

This broke the linux-user build:

target-i386/cpu.o: In function `kvm_cpu_fill_host':
/home/andreas/QEMU/qemu-rcar/target-i386/cpu.c:783: undefined reference
to `kvm_state'
collect2: error: ld returned 1 exit status
make[1]: *** [qemu-i386] Fehler 1
make: *** [subdir-i386-linux-user] Fehler 2

Any idea how to fix?

Andreas

>  uint32_t eax = 0, ebx = 0, ecx = 0, edx = 0;
>  
>  assert(kvm_enabled());
>  
>  x86_cpu_def->name = "host";
>  host_cpuid(0x0, 0, &eax, &ebx, &ecx, &edx);
> -x86_cpu_def->level = eax;
>  x86_cpu_def->vendor1 = ebx;
>  x86_cpu_def->vendor2 = edx;
>  x86_cpu_def->vendor3 = ecx;
> @@ -788,21 +788,24 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>  x86_cpu_def->family = ((eax >> 8) & 0x0F) + ((eax >> 20) & 0xFF);
>  x86_cpu_def->model = ((eax >> 4) & 0x0F) | ((eax & 0xF) >> 12);
>  x86_cpu_def->stepping = eax & 0x0F;
> -x86_cpu_def->ext_features = ecx;
> -x86_cpu_def->features = edx;
> +
> +x86_cpu_def->level = kvm_arch_get_supported_cpuid(s, 0x0, 0, R_EAX);
> +x86_cpu_def->features = kvm_arch_get_supported_cpuid(s, 0x1, 0, R_EDX);
> +x86_cpu_def->ext_features = kvm_arch_get_supported_cpuid(s, 0x1, 0, 
> R_ECX);
>  
>  if (x86_cpu_def->level >= 7) {
> -x86_cpu_def->cpuid_7_0_ebx_features = 
> kvm_arch_get_supported_cpuid(kvm_state, 0x7, 0, R_EBX);
> +x86_cpu_def->cpuid_7_0_ebx_features =
> +kvm_arch_get_supported_cpuid(s, 0x7, 0, R_EBX);
>  } else {
>  x86_cpu_def->cpuid_7_0_ebx_features = 0;
>  }
>  
> -host_cpuid(0x8000, 0, &eax, &ebx, &ecx, &edx);
> -x86_cpu_def->xlevel = eax;
> +x86_cpu_def->xlevel = kvm_arch_get_supported_cpuid(s, 0x8000, 0, 
> R_EAX);
> +x86_cpu_def->ext2_features =
> +kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_EDX);
> +x86_cpu_def->ext3_features =
> +kvm_arch_get_supported_cpuid(s, 0x8001, 0, R_ECX);
>  
> -host_cpuid(0x8001, 0, &eax, &ebx, &ecx, &edx);
> -x86_cpu_def->ext2_features = edx;
> -x86_cpu_def->ext3_features = ecx;
>  cpu_x86_fill_model_id(x86_cpu_def->model_id);
>  x86_cpu_def->vendor_override = 0;
>  
> @@ -811,11 +814,13 @@ static void kvm_cpu_fill_host(x86_def_t *x86_cpu_def)
>  x86_cpu_def->vendor2 == CPUID_VENDOR_VIA_2 &&
>  x86_cpu_def->vendor3 == CPUID_VENDOR_VIA_3) {
>  host_cpuid(0xC000, 0, &eax, &ebx, &ecx, &edx);
> +eax = kvm_arch_get_supported_cpuid(s, 0xC000, 0, R_EAX);
>  if (eax >= 0xC001) {
>  /* Support VIA max extended level */
>  x86_cpu_def->xlevel2 = eax;
>  host_cpuid(0xC001, 0, &eax, &ebx, &ecx, &edx);
> -x86_cpu_def->ext4_features = edx;
> +x86_cpu_def->ext4_features =
> +kvm_arch_get_supported_cpuid(s, 0xC001, 0, R_EDX);
>  }
>  }
>  
> 


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH v3 27/35] postcopy/outgoing: implement forward/backword prefault

2012-11-02 Thread Eric Blake
On 11/01/2012 11:24 PM, Isaku Yamahata wrote:
>>> +++ b/qapi-schema.json
>>> @@ -2095,7 +2095,8 @@
>>>  ##
>>>  { 'command': 'migrate',
>>>'data': {'uri': 'str', '*blk': 'bool', '*inc': 'bool', '*detach': 'bool' 
>>> ,
>>> -   '*postcopy': 'bool', '*nobg': 'bool'} }
>>> +   '*postcopy': 'bool', '*nobg': 'bool',
>>> +   '*forward': 'int', '*backward': 'int'} }
>>
>> Do we really want to be adding new options to migrate (and if so,
>> where's the documentation), or do we need a new monitor command similar
>> to migrate-set-capabilities or migrate-set-cache-size?
> 
> Okay, migrate-set-capabilities seems usable for boolean and scalable
> for future extension.
> On the other hand, migrate-set-cache-size takes only single integer
> as arguments. So it doesn't seem usable without modification.
> How about this?
> 
> { 'type': 'MigrationParameters',
>   'data': {'parameter': 'name': 'str', 'value': 'int' } }

More like:

{ 'enum': 'MigrationParameterName',
  'data': ['ParameterName'... ] }

{ 'type': 'MigrationParameter',
  'data': {'parameter': 'MigrationParameterName', 'value': 'int' } }

> 
> { 'command': 'migrate-set-parameters',
>'data': { 'parameters' ['MigrationParameters']}}

Yes, this seems more extensible.

> 
> 
> { 'command': 'query-migrate-parameters',
>   'returns': [['MigrationParameters']]}

One layer too many of [], but yes, this also seems reasonable.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Paolo Bonzini
Il 02/11/2012 16:17, Gerd Hoffmann ha scritto:
> On 11/02/12 16:13, Paolo Bonzini wrote:
>>> >> Hi,
>>> >>
 >>> I think deferring IRQ events to the point when the complete vmstate
 >>> is loaded is the cleaner and more robust approach.
>>> >>
>>> >> Agree.  Just schedule a bh in post_load.
>>> >> See also a229c0535bd336efaec786dd6e352a54e0a8187d
>> > 
>> > No, it cannot a bh.  Right now incoming migration is blocking,
>> > but this will change in 1.3.  There is no guarantee that a
>> > bottom half will run after migration has completed.
> Then we'll need some new way to do this, maybe a new post_load handler
> which is called once _all_ state is loaded.

The simplest is a vm_clock timer that expires at time 0.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Gerd Hoffmann
On 11/02/12 16:13, Paolo Bonzini wrote:
>> Hi,
>>
>>> I think deferring IRQ events to the point when the complete vmstate
>>> is
>>> loaded is the cleaner and more robust approach.
>>
>> Agree.  Just schedule a bh in post_load.
>> See also a229c0535bd336efaec786dd6e352a54e0a8187d
> 
> No, it cannot a bh.  Right now incoming migration is blocking,
> but this will change in 1.3.  There is no guarantee that a
> bottom half will run after migration has completed.

Then we'll need some new way to do this, maybe a new post_load handler
which is called once _all_ state is loaded.

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Paolo Bonzini
> Hi,
> 
> > I think deferring IRQ events to the point when the complete vmstate
> > is
> > loaded is the cleaner and more robust approach.
> 
> Agree.  Just schedule a bh in post_load.
> See also a229c0535bd336efaec786dd6e352a54e0a8187d

No, it cannot a bh.  Right now incoming migration is blocking,
but this will change in 1.3.  There is no guarantee that a
bottom half will run after migration has completed.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Gerd Hoffmann
  Hi,

> I think deferring IRQ events to the point when the complete vmstate is
> loaded is the cleaner and more robust approach.

Agree.  Just schedule a bh in post_load.
See also a229c0535bd336efaec786dd6e352a54e0a8187d

cheers,
  Gerd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Jan Kiszka
On 2012-11-02 15:53, Paolo Bonzini wrote:
> Il 30/10/2012 19:21, Jan Kiszka ha scritto:
 Aren't we still dependent on the order of processing?  If the APIC is
 restored after the device, won't we get the same problem?
>>>
>>> Strictly speaking yes, but CPUs and APICs are always the first devices
>>> to be saved.
>> Hmm, thinking about this again: Why is the MSI event injected at all
>> during restore, specifically while the device models are in transitional
>> state. Can you explain this?
> 
> Because the (virtio-serial) port was connected on the source and
> disconnected on the destination, or vice versa.
> 
> In my simplified reproducer, I'm really using different command-lines on
> the source and destination, but it is not necessary.  For example, if
> you have a socket backend, the destination will usually be disconnected
> at the time the machine loads.
> 
> One alternative fix is a vm_clock timer that expires immediately.  It
> would fix both MSI and INTx, on the other hand I thought it was an APIC
> bug because the QEMU APIC works nicely.

I think deferring IRQ events to the point when the complete vmstate is
loaded is the cleaner and more robust approach.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] apic: always update the in-kernel status after loading

2012-11-02 Thread Paolo Bonzini
Il 30/10/2012 19:21, Jan Kiszka ha scritto:
> > > Aren't we still dependent on the order of processing?  If the APIC is
> > > restored after the device, won't we get the same problem?
> > 
> > Strictly speaking yes, but CPUs and APICs are always the first devices
> > to be saved.
> Hmm, thinking about this again: Why is the MSI event injected at all
> during restore, specifically while the device models are in transitional
> state. Can you explain this?

Because the (virtio-serial) port was connected on the source and
disconnected on the destination, or vice versa.

In my simplified reproducer, I'm really using different command-lines on
the source and destination, but it is not necessary.  For example, if
you have a socket backend, the destination will usually be disconnected
at the time the machine loads.

One alternative fix is a vm_clock timer that expires immediately.  It
would fix both MSI and INTx, on the other hand I thought it was an APIC
bug because the QEMU APIC works nicely.

> Does the same pattern then also apply on INTx injection?

Yes.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] seabios/pci: enable 64 bit bar on seabios

2012-11-02 Thread Kevin O'Connor
On Fri, Nov 02, 2012 at 01:42:08PM +0800, Xudong Hao wrote:
> 64 bit bar sizing and MMIO allocation. The 64 bit window is placed above high
> memory, top down from the end of guest physical address space.

Your patch seems to be against an old version of SeaBIOS.  The latest
SeaBIOS already supports 64bit pci bars.

-Kevin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] KVM: remove unnecessary return value check

2012-11-02 Thread Guo Chao
No need to check return value before breaking switch.

Signed-off-by: Guo Chao 
---
 arch/x86/kvm/x86.c  |   32 
 virt/kvm/kvm_main.c |   30 --
 2 files changed, 62 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index d9d5b5d..2aac611 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2694,9 +2694,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
return PTR_ERR(u.lapic);
 
r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_INTERRUPT: {
@@ -2706,16 +2703,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (copy_from_user(&irq, argp, sizeof irq))
goto out;
r = kvm_vcpu_ioctl_interrupt(vcpu, &irq);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_NMI: {
r = kvm_vcpu_ioctl_nmi(vcpu);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_SET_CPUID: {
@@ -2726,8 +2717,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
if (copy_from_user(&cpuid, cpuid_arg, sizeof cpuid))
goto out;
r = kvm_vcpu_ioctl_set_cpuid(vcpu, &cpuid, cpuid_arg->entries);
-   if (r)
-   goto out;
break;
}
case KVM_SET_CPUID2: {
@@ -2739,8 +2728,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
goto out;
r = kvm_vcpu_ioctl_set_cpuid2(vcpu, &cpuid,
  cpuid_arg->entries);
-   if (r)
-   goto out;
break;
}
case KVM_GET_CPUID2: {
@@ -3205,8 +3192,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
switch (ioctl) {
case KVM_SET_TSS_ADDR:
r = kvm_vm_ioctl_set_tss_addr(kvm, arg);
-   if (r < 0)
-   goto out;
break;
case KVM_SET_IDENTITY_MAP_ADDR: {
u64 ident_addr;
@@ -3215,14 +3200,10 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (copy_from_user(&ident_addr, argp, sizeof ident_addr))
goto out;
r = kvm_vm_ioctl_set_identity_map_addr(kvm, ident_addr);
-   if (r < 0)
-   goto out;
break;
}
case KVM_SET_NR_MMU_PAGES:
r = kvm_vm_ioctl_set_nr_mmu_pages(kvm, arg);
-   if (r)
-   goto out;
break;
case KVM_GET_NR_MMU_PAGES:
r = kvm_vm_ioctl_get_nr_mmu_pages(kvm);
@@ -3313,8 +3294,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
get_irqchip_out:
kfree(chip);
-   if (r)
-   goto out;
break;
}
case KVM_SET_IRQCHIP: {
@@ -3336,8 +3315,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
r = 0;
set_irqchip_out:
kfree(chip);
-   if (r)
-   goto out;
break;
}
case KVM_GET_PIT: {
@@ -3364,9 +3341,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (!kvm->arch.vpit)
goto out;
r = kvm_vm_ioctl_set_pit(kvm, &u.ps);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_GET_PIT2: {
@@ -3390,9 +3364,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (!kvm->arch.vpit)
goto out;
r = kvm_vm_ioctl_set_pit2(kvm, &u.ps2);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_REINJECT_CONTROL: {
@@ -3401,9 +3372,6 @@ long kvm_arch_vm_ioctl(struct file *filp,
if (copy_from_user(&control, argp, sizeof(control)))
goto out;
r = kvm_vm_ioctl_reinject(kvm, &control);
-   if (r)
-   goto out;
-   r = 0;
break;
}
case KVM_XEN_HVM_CONFIG: {
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index f73efe0..baeabaa 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1929,10 +1929,6 @@ out_free1:
goto out;
}
r = kvm_arch_vcpu_ioctl_set_regs(vcpu, kvm_regs);
-   if (r)
-   goto out_free2;
-   r = 0;
-out_free2:
kfree(kvm_regs);
break;
}
@@ -1958,9 +1954,6 @@ out_free2:
goto out;

[PATCH 2/3] KVM: X86: fix return value of kvm_vm_ioctl_set_tss_addr()

2012-11-02 Thread Guo Chao
Return value of this function will be that of ioctl().

#include 
#include 

int main () {
int fd;
fd = open ("/dev/kvm", 0);
fd = ioctl (fd, KVM_CREATE_VM, 0);
ioctl (fd, KVM_SET_TSS_ADDR, 0xf000);   
perror ("");
return 0;
}

Output is "Operation not permitted". That's not what 
we want. 

Return -EINVAL in this case.

Signed-off-by: Guo Chao 
---
 arch/x86/kvm/x86.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index b3151ec..d9d5b5d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2944,7 +2944,7 @@ static int kvm_vm_ioctl_set_tss_addr(struct kvm *kvm, 
unsigned long addr)
int ret;
 
if (addr > (unsigned int)(-3 * PAGE_SIZE))
-   return -1;
+   return -EINVAL;
ret = kvm_x86_ops->set_tss_addr(kvm, addr);
return ret;
 }
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] KVM: do not kfree error pointer

2012-11-02 Thread Guo Chao
We should avoid kfree()ing error pointer in kvm_vcpu_ioctl() and
kvm_arch_vcpu_ioctl().

Signed-off-by: Guo Chao 
---
 arch/x86/kvm/x86.c  |   19 ++-
 virt/kvm/kvm_main.c |2 ++
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 224a7e7..b3151ec 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2687,14 +2687,11 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
break;
}
case KVM_SET_LAPIC: {
-   r = -EINVAL;
if (!vcpu->arch.apic)
goto out;
u.lapic = memdup_user(argp, sizeof(*u.lapic));
-   if (IS_ERR(u.lapic)) {
-   r = PTR_ERR(u.lapic);
-   goto out;
-   }
+   if (IS_ERR(u.lapic))
+   return PTR_ERR(u.lapic);
 
r = kvm_vcpu_ioctl_set_lapic(vcpu, u.lapic);
if (r)
@@ -2875,10 +2872,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
case KVM_SET_XSAVE: {
u.xsave = memdup_user(argp, sizeof(*u.xsave));
-   if (IS_ERR(u.xsave)) {
-   r = PTR_ERR(u.xsave);
-   goto out;
-   }
+   if (IS_ERR(u.xsave))
+   return PTR_ERR(u.xsave);
 
r = kvm_vcpu_ioctl_x86_set_xsave(vcpu, u.xsave);
break;
@@ -2900,10 +2895,8 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
case KVM_SET_XCRS: {
u.xcrs = memdup_user(argp, sizeof(*u.xcrs));
-   if (IS_ERR(u.xcrs)) {
-   r = PTR_ERR(u.xcrs);
-   goto out;
-   }
+   if (IS_ERR(u.xcrs))
+   return PTR_ERR(u.xcrs);
 
r = kvm_vcpu_ioctl_x86_set_xcrs(vcpu, u.xcrs);
break;
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index be70035..f73efe0 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1954,6 +1954,7 @@ out_free2:
kvm_sregs = memdup_user(argp, sizeof(*kvm_sregs));
if (IS_ERR(kvm_sregs)) {
r = PTR_ERR(kvm_sregs);
+   kvm_sregs = NULL;
goto out;
}
r = kvm_arch_vcpu_ioctl_set_sregs(vcpu, kvm_sregs);
@@ -2054,6 +2055,7 @@ out_free2:
fpu = memdup_user(argp, sizeof(*fpu));
if (IS_ERR(fpu)) {
r = PTR_ERR(fpu);
+   fpu = NULL;
goto out;
}
r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 10/16] x86: vdso: pvclock gettime support

2012-11-02 Thread Glauber Costa
On 11/02/2012 04:33 AM, Marcelo Tosatti wrote:
> On Thu, Nov 01, 2012 at 07:42:43PM -0200, Marcelo Tosatti wrote:
>> On Thu, Nov 01, 2012 at 06:41:46PM +0400, Glauber Costa wrote:
>>> On 11/01/2012 02:47 AM, Marcelo Tosatti wrote:
 +#ifdef CONFIG_PARAVIRT_CLOCK
 +
 +static notrace const struct pvclock_vsyscall_time_info *get_pvti(int cpu)
 +{
 +  const aligned_pvti_t *pvti_base;
 +  int idx = cpu / (PAGE_SIZE/PVTI_SIZE);
 +  int offset = cpu % (PAGE_SIZE/PVTI_SIZE);
 +
 +  BUG_ON(PVCLOCK_FIXMAP_BEGIN + idx > PVCLOCK_FIXMAP_END);
 +
 +  pvti_base = (aligned_pvti_t *)__fix_to_virt(PVCLOCK_FIXMAP_BEGIN+idx);
 +
 +  return &pvti_base[offset].info;
 +}
 +
>>> Does BUG_ON() really do what you believe it does while in userspace
>>> context? We're not running with the kernel descriptors, so this will
>>> probably just kill the process without any explanation
>>
>> A coredump is generated which can be used to trace back to ud2a instruction
>> at the vdso code.
> 
> All comments have been addressed. Let me know if there is anything else
> on v3 that you'd like to see done differently.
> 
Mainly:

1) stick a "v3" string in the subject. You didn't do it for v2, and I
got confused at some points while looking for the correct patches

2) The changelogs are, in general, a bit poor. I've pointed to the ones
specifically that pops out, but I would appreciate if you would go over
them again, making them more informative.

3) Please make sure Peter is okay with the proposed notifier change.

4) Please consider allocating memory with __alloc_bootmem_node instead.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 08/16] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-11-02 Thread Glauber Costa
On 11/02/2012 01:39 AM, Marcelo Tosatti wrote:
> On Thu, Nov 01, 2012 at 06:28:31PM +0400, Glauber Costa wrote:
>> On 11/01/2012 02:47 AM, Marcelo Tosatti wrote:
>>> Allow a guest to register a second location for the VCPU time info
>>>
>>> structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
>>> This is intended to allow the guest kernel to map this information
>>> into a usermode accessible page, so that usermode can efficiently
>>> calculate system time from the TSC without having to make a syscall.
>>>
>>> Signed-off-by: Marcelo Tosatti 
>>>
>>
>> Changelog doesn't make a lot of sense. (specially from first line to the
>> second). Please add in here the reasons why we can't (or decided not to)
>> use the same page. The info in the last mail thread is good enough, just
>> put it here.
> 
> Fixed.
> 
>>> Index: vsyscall/arch/x86/include/asm/kvm_para.h
>>> ===
>>> --- vsyscall.orig/arch/x86/include/asm/kvm_para.h
>>> +++ vsyscall/arch/x86/include/asm/kvm_para.h
>>> @@ -23,6 +23,7 @@
>>>  #define KVM_FEATURE_ASYNC_PF   4
>>>  #define KVM_FEATURE_STEAL_TIME 5
>>>  #define KVM_FEATURE_PV_EOI 6
>>> +#define KVM_FEATURE_USERSPACE_CLOCKSOURCE 7
>>>  
>>>  /* The last 8 bits are used to indicate how to interpret the flags field
>>>   * in pvclock structure. If no bits are set, all flags are ignored.
>>> @@ -39,6 +40,7 @@
>>>  #define MSR_KVM_ASYNC_PF_EN 0x4b564d02
>>>  #define MSR_KVM_STEAL_TIME  0x4b564d03
>>>  #define MSR_KVM_PV_EOI_EN  0x4b564d04
>>> +#define MSR_KVM_USERSPACE_TIME  0x4b564d05
>>>  
>>
>> I accept that it is possible that we may be better off with the page
>> mapped twice.
>>
>> But why do we need an extra MSR? When, and why, would you enable the
>> kernel-based pvclock, but disabled the userspace pvclock?
> 
> Because there is no stable TSC available, for example (which cannot
> be used to measure passage of time).
> 

What you say is true, but completely unrelated. I am not talking about
situations in which userspace pvclock is available and you end up not
using it.

I am talking about situations in which it is available, you are capable
of using it, but then decides for some reason to permanently disabled -
as in not setting it up altogether.

It seems to me that if the host has code to deal with userspace pvclock,
and you already coded the guest in a way that you may or may not use it
(dependent on the value of the stable bit), you could very well only
check for the cpuid flag, and do the guest setup if available - skipping
this MSR dance altogether.

Now, of course, there is the problem of communicating the address in
which the guest expects the page to be. Skipping the MSR setup would
require it to be more or less at a fixed location. We could in principle
lay them down together with the already existing pvclock structure. (But
granted, I am not sure it is worth it...)

I think in general, this question deserves a bit more of attention. We
are about to have just the perfect opportunity for this next week, so
let's use it.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/18] KVM: x86: introduce facility to support vsyscall pvclock, via MSR

2012-11-02 Thread Glauber Costa
On 10/31/2012 07:12 AM, Marcelo Tosatti wrote:
> On Tue, Oct 30, 2012 at 11:39:32AM +0200, Avi Kivity wrote:
>> On 10/29/2012 08:40 PM, Marcelo Tosatti wrote:
>>> On Mon, Oct 29, 2012 at 10:44:41AM -0700, Jeremy Fitzhardinge wrote:
 On 10/29/2012 07:45 AM, Glauber Costa wrote:
> On 10/24/2012 05:13 PM, Marcelo Tosatti wrote:
>> Allow a guest to register a second location for the VCPU time info
>>
>> structure for each vcpu (as described by MSR_KVM_SYSTEM_TIME_NEW).
>> This is intended to allow the guest kernel to map this information
>> into a usermode accessible page, so that usermode can efficiently
>> calculate system time from the TSC without having to make a syscall.
>>
>> Signed-off-by: Marcelo Tosatti 
> Can you please be a bit more specific about why we need this? Why does
> the host need to provide us with two pages with the exact same data? Why
> can't just do it with mapping tricks in the guest?

 In Xen the pvclock structure is embedded within a pile of other stuff
 that shouldn't be mapped into guest memory, so providing for a second
 location allows it to be placed whereever is convenient for the guest.
 That's a restriction of the Xen ABI, but I don't know if it affects KVM.

 J
>>>
>>> It is possible to share the data for KVM in theory, but:
>>>
>>> - It is a small amount of memory. 
>>> - It requires aligning to page size (the in-kernel percpu array 
>>> is currently cacheline aligned).
>>> - It is possible to modify flags separately for userspace/kernelspace,
>>> if desired.
>>>
>>> This justifies the duplication IMO (code is simple and clean).
>>>
>>
>> What would be the changes required to remove the duplication?  If it's
>> just page alignment, then is seems even smaller.  In addition we avoid
>> expanding the ABI again.
> 
> This would require changing the kernel copy from percpu data, which
> there is no guarantee is linear (necessary for fixmap mapping), to
> dynamically allocated (which in turn can be tricky due to early boot
> clock requirement).
> 
> Hum, no thanks.
> 
You allocate it using bootmemory for vsyscall anyway. If they are
strictly in the same physical location, you are not allocating anything
extra.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] KVM: ARM: Transparent huge pages and hugetlbfs support

2012-11-02 Thread Christoffer Dall
Support transparent huge pages in KVM/ARM. This requires quite a bit of
checkint and for qemu support to take advantage of this, you need to
make sure qemu allocates pages on aligned to the PMD size.

Signed-off-by: Christoffer Dall 
---
 arch/arm/include/asm/kvm_host.h |6 +-
 arch/arm/kvm/mmu.c  |  126 +++
 2 files changed, 103 insertions(+), 29 deletions(-)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 7127fe7..4eea228 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -34,9 +34,9 @@
 #define KVM_VCPU_MAX_FEATURES 0
 
 /* We don't currently support large pages. */
-#define KVM_HPAGE_GFN_SHIFT(x) 0
-#define KVM_NR_PAGE_SIZES  1
-#define KVM_PAGES_PER_HPAGE(x) (1UL<<31)
+#define KVM_HPAGE_GFN_SHIFT(_level)(((_level) - 1) * 21)
+#define KVM_HPAGE_SIZE (1UL << KVM_HPAGE_GFN_SHIFT(1))
+#define KVM_PAGES_PER_HPAGE(KVM_HPAGE_SIZE / PAGE_SIZE)
 
 struct kvm_vcpu;
 u32 *kvm_vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num, u32 mode);
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 96ab6a8..762647c 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -302,8 +303,7 @@ static void free_stage2_ptes(pmd_t *pmd, unsigned long addr)
pmd_page = virt_to_page(pmd);
 
for (i = 0; i < PTRS_PER_PMD; i++, addr += PMD_SIZE) {
-   BUG_ON(pmd_sect(*pmd));
-   if (!pmd_none(*pmd) && pmd_table(*pmd)) {
+   if (pmd_table(*pmd)) {
pte = pte_offset_kernel(pmd, addr);
free_guest_pages(pte, addr);
pte_free_kernel(NULL, pte);
@@ -470,7 +470,7 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
 {
pgd_t *pgd;
pud_t *pud;
-   pmd_t *pmd;
+   pmd_t *pmd, old_pmd;
pte_t *pte, old_pte;
 
/* Create 2nd stage page table mapping - Level 1 */
@@ -486,7 +486,22 @@ static int stage2_set_pte(struct kvm *kvm, struct 
kvm_mmu_memory_cache *cache,
} else
pmd = pmd_offset(pud, addr);
 
-   /* Create 2nd stage page table mapping - Level 2 */
+   /* Create 2nd stage section mappings (huge tlb pages) - Level 2 */
+   if (pte_huge(*new_pte) || pmd_huge(*pmd)) {
+   pte_t *huge_pte = (pte_t *)pmd;
+   BUG_ON(pmd_present(*pmd) && !pmd_huge(*pmd));
+
+   old_pmd = *pmd;
+   set_pte_ext(huge_pte, *new_pte, 0); /* new_pte really new_pmd */
+   if (pmd_present(old_pmd))
+   __kvm_tlb_flush_vmid(kvm);
+   else
+   get_page(virt_to_page(pmd));
+   return 0;
+   }
+
+   /* Create 2nd stage page mappings - Level 2 */
+   BUG_ON(pmd_present(*pmd) && pmd_huge(*pmd));
if (pmd_none(*pmd)) {
if (!cache)
return 0; /* ignore calls from kvm_set_spte_hva */
@@ -551,7 +566,8 @@ out:
return ret;
 }
 
-static void coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn)
+static void coherent_icache_guest_page(struct kvm *kvm, hva_t hva,
+  unsigned long size)
 {
/*
 * If we are going to insert an instruction page and the icache is
@@ -563,24 +579,64 @@ static void coherent_icache_guest_page(struct kvm *kvm, 
gfn_t gfn)
 * damn shame - as written in the ARM ARM (DDI 0406C - Page B3-1384)
 */
if (icache_is_pipt()) {
-   unsigned long hva = gfn_to_hva(kvm, gfn);
-   __cpuc_coherent_user_range(hva, hva + PAGE_SIZE);
+   __cpuc_coherent_user_range(hva, hva + size);
} else if (!icache_is_vivt_asid_tagged()) {
/* any kind of VIPT cache */
__flush_icache_all();
}
 }
 
+static bool transparent_hugepage_adjust(struct kvm *kvm, pfn_t *pfnp,
+   phys_addr_t *ipap)
+{
+   pfn_t pfn = *pfnp;
+   gfn_t gfn = *ipap >> PAGE_SHIFT;
+
+   if (PageTransCompound(pfn_to_page(pfn))) {
+   unsigned long mask;
+   kvm_err("transparent huge page at: %#18llx\n",
+   (unsigned long long)*ipap);
+   /*
+* mmu_notifier_retry was successful and we hold the
+* mmu_lock here, so the pmd can't become splitting
+* from under us, and in turn
+* __split_huge_page_refcount() can't run from under
+* us and we can safely transfer the refcount from
+* PG_tail to PG_head as we switch the pfn from tail to
+* head.
+*/
+   mask = KVM_PAGES_PER_HPAGE - 1;
+   VM_BUG_ON((gfn & mask) != (pfn & mask));
+   if (pfn & mask) {
+   

[RFC PATCH 3/4] KVM: ARM: Improve stage2_clear_pte

2012-11-02 Thread Christoffer Dall
Factor out parts of the functionality to make the code more readable and
rename to unmap_stage2_range while supporting unmapping ranges in one
go.

Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/mmu.c |  122 +++-
 1 file changed, 83 insertions(+), 39 deletions(-)

diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index cb03d45..96ab6a8 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -365,59 +365,103 @@ void kvm_free_stage2_pgd(struct kvm *kvm)
kvm->arch.pgd = NULL;
 }
 
+static void clear_pud_entry(pud_t *pud)
+{
+   pmd_t *pmd_table = pmd_offset(pud, 0);
+   pud_clear(pud);
+   pmd_free(NULL, pmd_table);
+   put_page(virt_to_page(pud));
+}
+
+static void clear_pmd_entry(pmd_t *pmd)
+{
+   if (pmd_huge(*pmd)) {
+   pmd_clear(pmd);
+   } else {
+   pte_t *pte_table = pte_offset_kernel(pmd, 0);
+   pmd_clear(pmd);
+   pte_free_kernel(NULL, pte_table);
+   }
+   put_page(virt_to_page(pmd));
+}
+
+static bool pmd_empty(pmd_t *pmd)
+{
+   struct page *pmd_page = virt_to_page(pmd);
+   return page_count(pmd_page) == 1;
+}
+
+static void clear_pte_entry(pte_t *pte)
+{
+   set_pte_ext(pte, __pte(0), 0);
+   put_page(virt_to_page(pte));
+}
+
+static bool pte_empty(pte_t *pte)
+{
+   struct page *pte_page = virt_to_page(pte);
+   return page_count(pte_page) == 1;
+}
+
 /**
- * stage2_clear_pte -- Clear a stage-2 PTE.
- * @kvm:  The VM pointer
- * @addr: The physical address of the PTE
+ * unmap_stage2_range -- Clear stage2 page table entries to unmap a range
+ * @kvm:   The VM pointer
+ * @start: The intermediate physical base address of the range to unmap
+ * @size:  The size of the area to unmap
  *
- * Clear a stage-2 PTE, lowering the various ref-counts. Also takes
- * care of invalidating the TLBs.  Must be called while holding
- * mmu_lock, otherwise another faulting VCPU may come in and mess
- * things behind our back.
+ * Clear a range of stage-2 mappings, lowering the various ref-counts. Also
+ * takes care of invalidating the TLBs.  Must be called while holding
+ * mmu_lock, otherwise another faulting VCPU may come in and mess with things
+ * behind our backs.
  */
-static void stage2_clear_pte(struct kvm *kvm, phys_addr_t addr)
+static void unmap_stage2_range(struct kvm *kvm, phys_addr_t start, size_t size)
 {
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
pte_t *pte;
-   struct page *page;
-
-   pgd = kvm->arch.pgd + pgd_index(addr);
-   pud = pud_offset(pgd, addr);
-   if (pud_none(*pud))
-   return;
+   phys_addr_t addr = start, end = start + size;
+   size_t range;
 
-   pmd = pmd_offset(pud, addr);
-   if (pmd_none(*pmd))
-   return;
+   while (addr < end) {
+   pgd = kvm->arch.pgd + pgd_index(addr);
+   pud = pud_offset(pgd, addr);
+   if (pud_none(*pud)) {
+   addr += PUD_SIZE;
+   continue;
+   }
 
-   pte = pte_offset_kernel(pmd, addr);
-   set_pte_ext(pte, __pte(0), 0);
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd)) {
+   addr += PMD_SIZE;
+   continue;
+   }
 
-   page = virt_to_page(pte);
-   put_page(page);
-   if (page_count(page) != 1) {
-   kvm_tlb_flush_vmid(kvm);
-   return;
-   }
+   if (pmd_huge(*pmd)) {
+   clear_pmd_entry(pmd);
+   if (pmd_empty(pmd))
+   clear_pud_entry(pud);
+   addr += PMD_SIZE;
+   continue;
+   }
 
-   /* Need to remove pte page */
-   pmd_clear(pmd);
-   pte_free_kernel(NULL, (pte_t *)((unsigned long)pte & PAGE_MASK));
+   pte = pte_offset_kernel(pmd, addr);
+   clear_pte_entry(pte);
+   range = PAGE_SIZE;
+
+   /* If we emptied the pte, walk back up the ladder */
+   if (pte_empty(pte)) {
+   clear_pmd_entry(pmd);
+   range = PMD_SIZE;
+   if (pmd_empty(pmd)) {
+   clear_pud_entry(pud);
+   range = PUD_SIZE;
+   }
+   }
 
-   page = virt_to_page(pmd);
-   put_page(page);
-   if (page_count(page) != 1) {
-   kvm_tlb_flush_vmid(kvm);
-   return;
+   addr += range;
}
 
-   pud_clear(pud);
-   pmd_free(NULL, (pmd_t *)((unsigned long)pmd & PAGE_MASK));
-
-   page = virt_to_page(pud);
-   put_page(page);
kvm_tlb_flush_vmid(kvm);
 }
 
@@ -693,7 +737,7 @@ static void handle_hva_to_gpa(struct kvm *kvm,
 
 static void kvm_unmap_hva_handler(struct kvm *kvm, gpa_t gpa, void *

[RFC PATCH 2/4] KVM: ARM: Fixup trace ipa printing

2012-11-02 Thread Christoffer Dall
The arguments where shifted and printed a 64 bit integer as a 32 bit
integer.

Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/trace.h |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index c3d05f4..cd52640 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -42,14 +42,14 @@ TRACE_EVENT(kvm_exit,
 TRACE_EVENT(kvm_guest_fault,
TP_PROTO(unsigned long vcpu_pc, unsigned long hsr,
 unsigned long hxfar,
-unsigned long ipa),
+unsigned long long ipa),
TP_ARGS(vcpu_pc, hsr, hxfar, ipa),
 
TP_STRUCT__entry(
__field(unsigned long,  vcpu_pc )
__field(unsigned long,  hsr )
__field(unsigned long,  hxfar   )
-   __field(unsigned long,  ipa )
+   __field(   unsigned long long,  ipa )
),
 
TP_fast_assign(
@@ -60,9 +60,9 @@ TRACE_EVENT(kvm_guest_fault,
),
 
TP_printk("guest fault at PC %#08lx (hxfar %#08lx, "
- "ipa %#08lx, hsr %#08lx",
+ "ipa %#16llx, hsr %#08lx",
  __entry->vcpu_pc, __entry->hxfar,
- __entry->hsr, __entry->ipa)
+ __entry->ipa, __entry->hsr)
 );
 
 TRACE_EVENT(kvm_irq_line,
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/4] KVM: ARM: Report support of mmu notifiers to user space

2012-11-02 Thread Christoffer Dall
This should have been added a long time ago, and is at least required
for user space to take advantage of hugetlbfs.

Signed-off-by: Christoffer Dall 
---
 arch/arm/kvm/arm.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 69bec17..9a7d2d6 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -199,6 +199,7 @@ int kvm_dev_ioctl_check_extension(long ext)
break;
 #endif
case KVM_CAP_USER_MEMORY:
+   case KVM_CAP_SYNC_MMU:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_ONE_REG:
r = 1;
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/4] KVM: ARM: Support transparent huge pages and hugetlbfs

2012-11-02 Thread Christoffer Dall
The following series implements support for transparent huge pages and
hugetlbfs for KVM/ARM.  The patch series is based on
kvm-arm-v13-vgic-timers with Will Deacon's hugetlb branch series merged.

These patches can also be fetched from here:
git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-hugetlb

Christoffer Dall (4):
  KVM: ARM: Report support of mmu notifiers to user space
  KVM: ARM: Fixup trace ipa printing
  KVM: ARM: Improve stage2_clear_pte
  KVM: ARM: Transparent huge pages and hugetlbfs support

 arch/arm/include/asm/kvm_host.h |6 +-
 arch/arm/kvm/arm.c  |1 +
 arch/arm/kvm/mmu.c  |  248 +--
 arch/arm/kvm/trace.h|8 +-
 4 files changed, 191 insertions(+), 72 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 09/16] x86: kvm guest: pvclock vsyscall support

2012-11-02 Thread Glauber Costa
On 11/01/2012 02:47 AM, Marcelo Tosatti wrote:
> + info = pvclock_get_vsyscall_time_info(cpu);
> +
> + low = (int)__pa(info) | 1;
> + high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32);
> + ret = native_write_msr_safe(MSR_KVM_USERSPACE_TIME, low, high);
> + printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n",
> +cpu, high, low, txt);
> +

Why do you put info in the lower half, and the hv_clock in the higher half ?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] seabios/pci: enable 64 bit bar on seabios

2012-11-02 Thread Gerd Hoffmann
On 11/02/12 06:42, Xudong Hao wrote:
> 64 bit bar sizing and MMIO allocation. The 64 bit window is placed above high
> memory, top down from the end of guest physical address space.

What problem you are trying to fix?  The existing code should handle
64bit bars just fine.  By default they are placed below 4G though for
compatibility reasons (make old 32bit guests happy).  When running out
of address space seabios will try map them above 4G though to make room
below 4G.

Mapping your 64bit PCI bars above 4G unconditionally (for testing or
other reasons) can simply be done this way:

--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -599,7 +599,7 @@ static void pci_bios_map_devices(struct pci_bus *busses)
 {
 pcimem_start = RamSize;

-if (pci_bios_init_root_regions(busses)) {
+if (1 /* pci_bios_init_root_regions(busses) */) {
 struct pci_region r64_mem, r64_pref;
 r64_mem.list = NULL;
 r64_pref.list = NULL;

We might want add a config option for this.

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html