Re: [RFC PATCH] arm/arm64: KVM: Fix BE accesses to GICv2 EISR and ELRSR regs

2014-10-15 Thread Victor Kamensky
On 14 October 2014 08:21, Victor Kamensky  wrote:
> On 14 October 2014 02:47, Marc Zyngier  wrote:
>> On Sun, Sep 28 2014 at 03:04:26 PM, Christoffer Dall 
>>  wrote:
>>> The EIRSR and ELRSR registers are 32-bit registers on GICv2, and we
>>> store these as an array of two such registers on the vgic vcpu struct.
>>> However, we access them as a single 64-bit value or as a bitmap pointer
>>> in the generic vgic code, which breaks BE support.
>>>
>>> Instead, store them as u64 values on the vgic structure and do the
>>> word-swapping in the assembly code, which already handles the byte order
>>> for BE systems.
>>>
>>> Signed-off-by: Christoffer Dall 
>>
>> (still going through my email backlog, hence the delay).
>>
>> This looks like a valuable fix. Haven't had a chance to try it (no BE
>> setup at hand) but maybe Victor can help reproducing this?.
>
> I'll give it a spin.

Tested-by: Victor Kamensky 

Tested on v3.17 + this fix on TC2 (V7) and Mustang (V8) with BE
kvm host, tried different combination of guests BE/LE V7/V8. All looks
good.

Only with latest qemu in BE V8 mode in v3.17 without this
fix I was able to reproduce the issue that Will spotted. With kvmtool,
and older qemu V8 BE code never hit vgic_v2_set_lr function so
that is why we did not run into it before. I guess fix in qemu in
pl011 mentioned by 1f2bb4acc125, uncovered vgic_v2_set_lr
code path and this BE issue. With this patch it works fine now.

Thanks,
Victor

> Thanks,
> Victor
>
>> Acked-by: Marc Zyngier 
>>
>> M.
>> --
>> Jazz is not dead. It just smells funny.
>> ___
>> kvmarm mailing list
>> kvm...@lists.cs.columbia.edu
>> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: reduce networking latency

2014-10-15 Thread David Xu
2014-09-29 5:04 GMT-04:00 Michael S. Tsirkin :
> On Wed, Sep 24, 2014 at 02:40:53PM -0400, David Xu wrote:
>> Hi Michael,
>>
>> I found this interesting project from KVM TODO website:
>>
>> allow handling short packets from softirq or VCPU context
>>  Plan:
>>We are going through the scheduler 3 times
>>(could be up to 5 if softirqd is involved)
>>Consider RX: host irq -> io thread -> VCPU thread ->
>>guest irq -> guest thread.
>>This adds a lot of latency.
>>We can cut it by some 1.5x if we do a bit of work
>>either in the VCPU or softirq context.
>>  Testing: netperf TCP RR - should be improved drastically
>>   netperf TCP STREAM guest to host - no regression
>>
>> Would you mind saying more about the work either in the vCPU or
>> softirq context?
>
> For TX, we might be able to execute it directly from VCPU context.
> For RX, from softirq context.
>

Which step is removed for TX and RX compared with vanilla? Or what's
the new path?

TX: guest thread-> host irq?
TX: host irq-> ?

Thanks.

>> Why it is only for short packets handling?
>
> That's just one idea to avoid doing too much work like this.
>
> Doing too much work in VCPU context would break pipelining,
> likely degrading stream performance.
> Work in softirq context is not accounted against the correct
> cgroups, doing a lot of work there will mean guest can steal
> CPU from other guests.
>
>> Thanks a
>> lot!
>>
>>
>> Regards,
>>
>> Cong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Nested VMX guest tsc_deadline OOPS

2014-10-15 Thread Andy Lutomirski
On Tue, Oct 14, 2014 at 3:46 AM, Bandan Das  wrote:
> Andy Lutomirski  writes:
>
>> I don't know what's going on here, but a nested VMX boot using -cpu
>> host on SNB on L0 and L1 fails with the OOPS below.  I kind of suspect
>> that there's both a KVM bug and an APIC driver bug here.
>
> What kernel version is this ? What are you running for L1 and L2 ?

The problem seems to be that, under nested virtualization without
kvm-clock enabled, TSC calibration in the L2 guest fails completely.
I have no idea whether this is a bug in Linux, KVM, or QEMU.

The rest of the problem seems to be that the apic code falls over
completely on systems that advertise a TSC deadline timer if TSC
calibration fails.  I sent a patch to fix this.  With that patch
applied, L2 can boot, but it still has an unusable TSC.

--Andy

>
>> -cpu host,-tsc-deadline on L1 works around the OOPS.
>>
>> [0.184000] Freeing SMP alternatives memory: 32K (81ff
>> - 81ff8000)
>> [0.188000] [ cut here ]
>> [0.189000] WARNING: CPU: 0 PID: 1 at
>> arch/x86/kernel/apic/apic.c:1393 setup_local_APIC+0x26c/0x320()
>> [0.19] Modules linked in:
>> [0.192000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.17.0+ #459
>> [0.193000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [0.194000]  0009 8800070a3e40 818588a7
>> 
>> [0.199000]  8800070a3e78 8108db68 
>> 00f0
>> [0.205000]    0001
>> 8800070a3e88
>> [0.21] Call Trace:
>> [0.211000]  [] dump_stack+0x45/0x56
>> [0.212000]  [] warn_slowpath_common+0x78/0xa0
>> [0.213000]  [] warn_slowpath_null+0x15/0x20
>> [0.214000]  [] setup_local_APIC+0x26c/0x320
>> [0.215000]  [] native_smp_prepare_cpus+0x255/0x312
>> [0.216000]  [] kernel_init_freeable+0x94/0x1d1
>> [0.217000]  [] ? rest_init+0x80/0x80
>> [0.218000]  [] kernel_init+0x9/0xf0
>> [0.219000]  [] ret_from_fork+0x7c/0xb0
>> [0.22]  [] ? rest_init+0x80/0x80
>> [0.221000] ---[ end trace f5f19be5716e445c ]---
>> [0.234000] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
>> [0.244000] smpboot: CPU0: Intel(R) Core(TM) i7-3930K CPU @ 3.20GHz
>> (fam: 06, model: 2d, stepping: 06)
>> [0.248000] divide error:  [#1] SMP
>> [0.248000] Modules linked in:
>> [0.248000] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW
>> 3.17.0+ #459
>> [0.248000] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
>> [0.248000] task: 880007098000 ti: 8800070a task.ti:
>> 8800070a
>> [0.248000] RIP: 0010:[]  []
>> clockevents_config.part.3+0x1f/0xa0
>> [0.248000] RSP: :8800070a3e58  EFLAGS: 00010246
>> [0.248000] RAX:  RBX: 880007c0cd00 RCX: 
>> 
>> [0.248000] RDX:  RSI:  RDI: 
>> 
>> [0.248000] RBP: 8800070a3e70 R08: 0001 R09: 
>> 82036b90
>> [0.248000] R10: 00a7 R11: 00a6 R12: 
>> b018
>> [0.248000] R13: 0040 R14: b020 R15: 
>> 
>> [0.248000] FS:  () GS:880007c0()
>> knlGS:
>> [0.248000] CS:  0010 DS:  ES:  CR0: 80050033
>> [0.248000] CR2: 8800020dd000 CR3: 01e11000 CR4: 
>> 000406f0
>> [0.248000] Stack:
>> [0.248000]  880007c0cd00 b018 0040
>> 8800070a3e88
>> [0.248000]  810ebbcb 880007c0cd00 8800070a3e98
>> 8107646f
>> [0.248000]  8800070a3ed8 81f086d0 0053
>> 
>> [0.248000] Call Trace:
>> [0.248000]  [] 
>> clockevents_config_and_register+0x1b/0x30
>> [0.248000]  [] setup_APIC_timer+0x10f/0x170
>> [0.248000]  [] setup_boot_APIC_clock+0x4ae/0x4ba
>> [0.248000]  [] native_smp_prepare_cpus+0x2ef/0x312
>> [0.248000]  [] kernel_init_freeable+0x94/0x1d1
>> [0.248000]  [] ? rest_init+0x80/0x80
>> [0.248000]  [] kernel_init+0x9/0xf0
>> [0.248000]  [] ret_from_fork+0x7c/0xb0
>> [0.248000]  [] ? rest_init+0x80/0x80
>> [0.248000] Code: c3 66 66 2e 0f 1f 84 00 00 00 00 00 55 31 d2 89
>> f1 89 f6 41 b8 01 00 00 00 48 89 e5 41 55 41 54 53 48 89 fb 48 8b 7f
>> 70 48 89 f8 <48> f7 f6 48 85 c0 74 0b 48 3d 58 02 00 00 41 89 c0 77 4e
>> 4c 8d
>> [0.248000] RIP  [] clockevents_config.part.3+0x1f/0xa0
>> [0.248000]  RSP 
>> [0.249000] ---[ end trace f5f19be5716e445d ]---
>> [0.25] Kernel panic - not syncing: Attempted to kill init!
>> exitcode=0x000b
>> [0.25]
>> [0.25] ---[ end Kernel panic - not syncing: Attempted to kill
>> init! exitcode=0x000b
>> [0.25]
>>
>>
>> --Andy
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>> the body of a mess

Re: [PATCH v2 07/15] arm/arm64: KVM: make the value of ICC_SRE_EL1 a per-VM variable

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:48PM +0100, Andre Przywara wrote:
> ICC_SRE_EL1 is a system register allowing msr/mrs accesses to the
> GIC CPU interface for EL1 (guests). Currently we force it to 0, but
> for proper GICv3 support we have to allow guests to use it (depending
> on their selected virtual GIC model).
> So add ICC_SRE_EL1 to the list of saved/restored registers on a
> world switch, but actually disallow a guest to change it by only
> restoring a fixed, once-initialized value.
> This value depends on the GIC model userland has chosen for a guest.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arch/arm64/kernel/asm-offsets.c |1 +
>  arch/arm64/kvm/vgic-v3-switch.S |   14 +-
>  include/kvm/arm_vgic.h  |1 +
>  virt/kvm/arm/vgic-v3.c  |9 +++--
>  4 files changed, 18 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/kernel/asm-offsets.c b/arch/arm64/kernel/asm-offsets.c
> index 9a9fce0..9d34486 100644
> --- a/arch/arm64/kernel/asm-offsets.c
> +++ b/arch/arm64/kernel/asm-offsets.c
> @@ -140,6 +140,7 @@ int main(void)
>DEFINE(VGIC_V2_CPU_ELRSR,  offsetof(struct vgic_cpu, vgic_v2.vgic_elrsr));
>DEFINE(VGIC_V2_CPU_APR,offsetof(struct vgic_cpu, vgic_v2.vgic_apr));
>DEFINE(VGIC_V2_CPU_LR, offsetof(struct vgic_cpu, vgic_v2.vgic_lr));
> +  DEFINE(VGIC_V3_CPU_SRE,offsetof(struct vgic_cpu, vgic_v3.vgic_sre));
>DEFINE(VGIC_V3_CPU_HCR,offsetof(struct vgic_cpu, vgic_v3.vgic_hcr));
>DEFINE(VGIC_V3_CPU_VMCR,   offsetof(struct vgic_cpu, vgic_v3.vgic_vmcr));
>DEFINE(VGIC_V3_CPU_MISR,   offsetof(struct vgic_cpu, vgic_v3.vgic_misr));
> diff --git a/arch/arm64/kvm/vgic-v3-switch.S b/arch/arm64/kvm/vgic-v3-switch.S
> index d160469..617a012 100644
> --- a/arch/arm64/kvm/vgic-v3-switch.S
> +++ b/arch/arm64/kvm/vgic-v3-switch.S
> @@ -148,17 +148,18 @@
>   * x0: Register pointing to VCPU struct
>   */
>  .macro   restore_vgic_v3_state
> - // Disable SRE_EL1 access. Necessary, otherwise
> - // ICH_VMCR_EL2.VFIQEn becomes one, and FIQ happens...
> - msr_s   ICC_SRE_EL1, xzr
> - isb
> -
>   // Compute the address of struct vgic_cpu
>   add x3, x0, #VCPU_VGIC_CPU
>  
>   // Restore all interesting registers
>   ldr w4, [x3, #VGIC_V3_CPU_HCR]
>   ldr w5, [x3, #VGIC_V3_CPU_VMCR]
> + ldr w25, [x3, #VGIC_V3_CPU_SRE]
> +
> + msr_s   ICC_SRE_EL1, x25
> +
> + // make sure SRE is valid before writing the other registers
> + isb
>  
>   msr_s   ICH_HCR_EL2, x4
>   msr_s   ICH_VMCR_EL2, x5
> @@ -244,9 +245,12 @@
>   dsb sy
>  
>   // Prevent the guest from touching the GIC system registers
> + // if SRE isn't enabled for GICv3 emulation
> + cbnzx25, 1f
>   mrs_s   x5, ICC_SRE_EL2
>   and x5, x5, #~ICC_SRE_EL2_ENABLE
>   msr_s   ICC_SRE_EL2, x5
> +1:
>  .endm
>  
>  ENTRY(__save_vgic_v3_state)
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 7e7c99e..8aa8482 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -199,6 +199,7 @@ struct vgic_v3_cpu_if {
>  #ifdef CONFIG_ARM_GIC_V3
>   u32 vgic_hcr;
>   u32 vgic_vmcr;
> + u32 vgic_sre;   /* Restored only, change ignored */
>   u32 vgic_misr;  /* Saved only */
>   u32 vgic_eisr;  /* Saved only */
>   u32 vgic_elrsr; /* Saved only */
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 86e8b99..8af8037 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -145,15 +145,20 @@ static void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, 
> struct vgic_vmcr *vmcrp)
>  
>  static void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  {
> + struct vgic_v3_cpu_if *vgic_v3;
> +
> + vgic_v3 = &vcpu->arch.vgic_cpu.vgic_v3;
>   /*
>* By forcing VMCR to zero, the GIC will restore the binary
>* points to their reset values. Anything else resets to zero
>* anyway.
>*/
> - vcpu->arch.vgic_cpu.vgic_v3.vgic_vmcr = 0;
> + vgic_v3->vgic_vmcr = 0;
> +
> + vgic_v3->vgic_sre   = 0;
   why the tab ^^^ ?
>  
>   /* Get the show on the road... */
> - vcpu->arch.vgic_cpu.vgic_v3.vgic_hcr = ICH_HCR_EN;
> + vgic_v3->vgic_hcr = ICH_HCR_EN;
>  }
>  
>  static const struct vgic_ops vgic_v3_ops = {
> -- 
> 1.7.9.5
> 

Besides the nit:

Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 06/15] arm/arm64: KVM: make the maximum number of vCPUs a per-VM value

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:47PM +0100, Andre Przywara wrote:
> Currently the maximum number of vCPUs supported is a global value
> limited by the used GIC model. GICv3 will lift this limit, but we
> still need to observe it for guests using GICv2.
> So the maximum number of vCPUs is per-VM value, depending on the
> GIC model the guest uses.
> Store and check the value in struct kvm_arch, but keep it down to
> 8 for now.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arch/arm/include/asm/kvm_host.h   |1 +
>  arch/arm/kvm/arm.c|6 ++
>  arch/arm64/include/asm/kvm_host.h |3 +++
>  virt/kvm/arm/vgic-v2.c|5 +
>  virt/kvm/arm/vgic-v3.c|6 ++
>  5 files changed, 21 insertions(+)
> 
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index cf99ad0..0638419 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -67,6 +67,7 @@ struct kvm_arch {
>  
>   /* Interrupt controller */
>   struct vgic_distvgic;
> + int max_vcpus;
>  };
>  
>  #define KVM_NR_MEM_OBJS 40
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 157e1b6..190f05f 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -142,6 +142,7 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type)
>  
>   /* Mark the initial VMID generation invalid */
>   kvm->arch.vmid_gen = 0;
> + kvm->arch.max_vcpus = CONFIG_KVM_ARM_MAX_VCPUS;
>  
>   return ret;
>  out_free_stage2_pgd:
> @@ -223,6 +224,11 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
> unsigned int id)
>   int err;
>   struct kvm_vcpu *vcpu;
>  
> + if (id >= kvm->arch.max_vcpus) {
> + err = -EINVAL;
> + goto out;
> + }
> +
>   vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL);
>   if (!vcpu) {
>   err = -ENOMEM;
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 017fbae..a325161 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -58,6 +58,9 @@ struct kvm_arch {
>   /* VTTBR value associated with above pgd and vmid */
>   u64vttbr;
>  
> + /* The maximum number of vCPUs depends on the used GIC model */
> + int max_vcpus;
> +
>   /* Interrupt controller */
>   struct vgic_distvgic;
>  
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index 90947c6..c7362ff 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -177,11 +177,16 @@ static struct vgic_params vgic_v2_params;
>  static bool vgic_v2_init_emul(struct kvm *kvm, int type)
>  {
>   struct vgic_vm_ops *vm_ops = &kvm->arch.vgic.vm_ops;
> + int nr_vcpus;
>  
>   switch (type) {
>   case KVM_DEV_TYPE_ARM_VGIC_V2:
> + nr_vcpus = atomic_read(&kvm->online_vcpus);
> + if (nr_vcpus > 8)
> + return false;
>   vm_ops->get_lr = vgic_v2_get_lr;
>   vm_ops->set_lr = vgic_v2_set_lr;
> + kvm->arch.max_vcpus = 8;
>   return true;
>   }
>  
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index a38339e..86e8b99 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -171,11 +171,17 @@ static const struct vgic_ops vgic_v3_ops = {
>  static bool vgic_v3_init_emul_compat(struct kvm *kvm, int type)
>  {
>   struct vgic_vm_ops *vm_ops = &kvm->arch.vgic.vm_ops;
> + int nr_vcpus;
>  
>   switch (type) {
>   case KVM_DEV_TYPE_ARM_VGIC_V2:
> + nr_vcpus = atomic_read(&kvm->online_vcpus);
> + if (nr_vcpus > 8)
> + return false;
> +

I have a feeling we could be seeing this error a bit, could we dedicate
an error code for the purpose or print a ratelimited warning or
something?

>   vm_ops->get_lr = vgic_v3_get_lr;
>   vm_ops->set_lr = vgic_v3_set_lr;
> + kvm->arch.max_vcpus = 8;
>   return true;
>   }
>   return false;
> -- 
> 1.7.9.5
> 
Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 05/15] arm/arm64: KVM: introduce per-VM ops

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:46PM +0100, Andre Przywara wrote:
> Currently we only have one virtual GIC model supported, so all guests
> use the same emulation code. With the addition of another model we
> end up with different guests using potentially different vGIC models,
> so we have to split up some functions to be per VM.
> Introduce a vgic_vm_ops struct to hold function pointers for those
> functions that are different and provide the necessary code to
> initialize them.
> This includes functions that depend on the emulated GIC model only
> and functions that depend on the combination of host and guest GIC.
> 
> Signed-off-by: Andre Przywara 
> ---
>  include/kvm/arm_vgic.h |   18 --
>  virt/kvm/arm/vgic-v2.c |   17 +++--
>  virt/kvm/arm/vgic-v3.c |   16 +++--
>  virt/kvm/arm/vgic.c|   92 
> +++-
>  4 files changed, 113 insertions(+), 30 deletions(-)
> 
> diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
> index 4feac9a..7e7c99e 100644
> --- a/include/kvm/arm_vgic.h
> +++ b/include/kvm/arm_vgic.h
> @@ -99,8 +99,6 @@ struct vgic_vmcr {
>  };
>  
>  struct vgic_ops {
> - struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
> - void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
>   void(*sync_lr_elrsr)(struct kvm_vcpu *, int, struct vgic_lr);
>   u64 (*get_elrsr)(const struct kvm_vcpu *vcpu);
>   u64 (*get_eisr)(const struct kvm_vcpu *vcpu);
> @@ -123,6 +121,17 @@ struct vgic_params {
>   unsigned intmaint_irq;
>   /* Virtual control interface base address */
>   void __iomem*vctrl_base;
> + bool (*init_emul)(struct kvm *kvm, int type);
> +};
> +
> +struct vgic_vm_ops {
> + struct vgic_lr  (*get_lr)(const struct kvm_vcpu *, int);
> + void(*set_lr)(struct kvm_vcpu *, int, struct vgic_lr);
> + bool(*handle_mmio)(struct kvm_vcpu *, struct kvm_run *,
> +struct kvm_exit_mmio *);
> + bool(*queue_sgi)(struct kvm_vcpu *vcpu, int irq);
> + void(*unqueue_sgi)(struct kvm_vcpu *vcpu, int irq, int source);
> + int (*vgic_init)(struct kvm *kvm, const struct vgic_params *params);
>  };
>  
>  struct vgic_dist {
> @@ -131,6 +140,9 @@ struct vgic_dist {
>   boolin_kernel;
>   boolready;
>  
> + /* vGIC model the kernel emulates for the guest (GICv2 or GICv3) */
> + u32 vgic_model;
> +
>   int nr_cpus;
>   int nr_irqs;
>  
> @@ -168,6 +180,8 @@ struct vgic_dist {
>  
>   /* Bitmap indicating which CPU has something pending */
>   unsigned long   irq_pending_on_cpu;
> +
> + struct vgic_vm_ops  vm_ops;
>  #endif
>  };
>  
> diff --git a/virt/kvm/arm/vgic-v2.c b/virt/kvm/arm/vgic-v2.c
> index 01124ef..90947c6 100644
> --- a/virt/kvm/arm/vgic-v2.c
> +++ b/virt/kvm/arm/vgic-v2.c
> @@ -161,8 +161,6 @@ static void vgic_v2_enable(struct kvm_vcpu *vcpu)
>  }
>  
>  static const struct vgic_ops vgic_v2_ops = {
> - .get_lr = vgic_v2_get_lr,
> - .set_lr = vgic_v2_set_lr,
>   .sync_lr_elrsr  = vgic_v2_sync_lr_elrsr,
>   .get_elrsr  = vgic_v2_get_elrsr,
>   .get_eisr   = vgic_v2_get_eisr,
> @@ -176,6 +174,20 @@ static const struct vgic_ops vgic_v2_ops = {
>  
>  static struct vgic_params vgic_v2_params;
>  
> +static bool vgic_v2_init_emul(struct kvm *kvm, int type)
> +{
> + struct vgic_vm_ops *vm_ops = &kvm->arch.vgic.vm_ops;
> +
> + switch (type) {
> + case KVM_DEV_TYPE_ARM_VGIC_V2:
> + vm_ops->get_lr = vgic_v2_get_lr;
> + vm_ops->set_lr = vgic_v2_set_lr;
> + return true;
> + }
> +
> + return false;
> +}
> +
>  /**
>   * vgic_v2_probe - probe for a GICv2 compatible interrupt controller in DT
>   * @node:pointer to the DT node
> @@ -214,6 +226,7 @@ int vgic_v2_probe(struct device_node *vgic_node,
>   ret = -ENOMEM;
>   goto out;
>   }
> + vgic->init_emul = vgic_v2_init_emul;

this feels overly complicated, can't each host-support function just
export this function and we call it directly from the vgic creation
code?

That or just keep the host-specific functions the way they are and
handle the split within these functions?

Bah, I'm not sure, this patch is just terrible to review...  Perhaps
it's because we're doing some combination of introducing infrastructure
and also changing random bits and naming to be version-specific.

>  
>   vgic->nr_lr = readl_relaxed(vgic->vctrl_base + GICH_VTR);
>   vgic->nr_lr = (vgic->nr_lr & 0x3f) + 1;
> diff --git a/virt/kvm/arm/vgic-v3.c b/virt/kvm/arm/vgic-v3.c
> index 1c2c8ee..a38339e 100644
> --- a/virt/kvm/arm/vgic-v3.c
> +++ b/virt/kvm/arm/vgic-v3.c
> @@ -157,8 +157,6 @@ static void vgic_v3_enable(struct kvm_vcpu *vcpu)
>  }
>  
>  static const struct vgic

Re: [PATCH v2 04/15] arm/arm64: KVM: wrap 64 bit MMIO accesses with two 32 bit ones

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:45PM +0100, Andre Przywara wrote:
> Some GICv3 registers can and will be accessed as 64 bit registers.
> Currently the register handling code can only deal with 32 bit
> accesses, so we do two consecutive calls to cover this.
> 
> Signed-off-by: Andre Przywara 
> ---
>  virt/kvm/arm/vgic.c |   48 +---
>  1 file changed, 45 insertions(+), 3 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index 3b6f78d..bef9aa0 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -926,6 +926,48 @@ static bool vgic_validate_access(const struct vgic_dist 
> *dist,
>  }
>  
>  /*
> + * Call the respective handler function for the given range.
> + * We split up any 64 bit accesses into two consecutive 32 bit
> + * handler calls and merge the result afterwards.
> + */
> +static bool call_range_handler(struct kvm_vcpu *vcpu,
> +struct kvm_exit_mmio *mmio,
> +unsigned long offset,
> +const struct mmio_range *range)
> +{
> + u32 *data32 = (void *)mmio->data;
> + struct kvm_exit_mmio mmio32;
> + bool ret;
> +
> + if (likely(mmio->len <= 4))
> + return range->handle_mmio(vcpu, mmio, offset);
> +
> + /*
> +  * We assume that any access greater than 4 bytes is actually

Is this an assumption or something that will always hold true at this
point in the code?  If the former, which situations could it not hold
and what would happen?  If the latter, we should just state that.

> +  * 8 bytes long, caused by a 64-bit access
> +  */
> +
> + mmio32.len = 4;
> + mmio32.is_write = mmio->is_write;
> +
> + mmio32.phys_addr = mmio->phys_addr + 4;
> + if (mmio->is_write)
> + *(u32 *)mmio32.data = data32[1];
> + ret = range->handle_mmio(vcpu, &mmio32, offset + 4);
> + if (!mmio->is_write)
> + data32[1] = *(u32 *)mmio32.data;
> +
> + mmio32.phys_addr = mmio->phys_addr;
> + if (mmio->is_write)
> + *(u32 *)mmio32.data = data32[0];
> + ret |= range->handle_mmio(vcpu, &mmio32, offset);
> + if (!mmio->is_write)
> + data32[0] = *(u32 *)mmio32.data;

won't this break on a BE system?

> +
> + return ret;
> +}
> +
> +/*
>   * vgic_handle_mmio_range - handle an in-kernel MMIO access
>   * @vcpu:pointer to the vcpu performing the access
>   * @run: pointer to the kvm_run structure
> @@ -956,10 +998,10 @@ static bool vgic_handle_mmio_range(struct kvm_vcpu 
> *vcpu, struct kvm_run *run,
>   spin_lock(&vcpu->kvm->arch.vgic.lock);
>   offset -= range->base;
>   if (vgic_validate_access(dist, range, offset)) {
> - updated_state = range->handle_mmio(vcpu, mmio, offset);
> + updated_state = call_range_handler(vcpu, mmio, offset, range);
>   } else {
> - vgic_reg_access(mmio, NULL, offset,
> - ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED);
> + if (!mmio->is_write)
> + memset(mmio->data, 0, mmio->len);
>   updated_state = false;
>   }
>   spin_unlock(&vcpu->kvm->arch.vgic.lock);
> -- 
> 1.7.9.5
> 

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 03/15] arm/arm64: KVM: refactor vgic_handle_mmio() function

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:44PM +0100, Andre Przywara wrote:
> Currently we only need to deal with one MMIO region for the GIC
> emulation, but we soon need to extend this. Refactor the existing
> code to allow easier addition of different ranges without code
> duplication.
> 
> Signed-off-by: Andre Przywara 
> ---
>  virt/kvm/arm/vgic.c |   72 
> ---
>  1 file changed, 51 insertions(+), 21 deletions(-)
> 
> diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c
> index bba8692..3b6f78d 100644
> --- a/virt/kvm/arm/vgic.c
> +++ b/virt/kvm/arm/vgic.c
> @@ -925,37 +925,28 @@ static bool vgic_validate_access(const struct vgic_dist 
> *dist,
>   return true;
>  }
>  
> -/**
> - * vgic_handle_mmio - handle an in-kernel MMIO access
> +/*
> + * vgic_handle_mmio_range - handle an in-kernel MMIO access
>   * @vcpu:pointer to the vcpu performing the access
>   * @run: pointer to the kvm_run structure
>   * @mmio:pointer to the data describing the access
> + * @ranges:  pointer to the register defining structure
> + * @mmio_base:   base address for this mapping
>   *
> - * returns true if the MMIO access has been performed in kernel space,
> - * and false if it needs to be emulated in user space.
> + * returns true if the MMIO access could be performed
>   */
> -bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> -   struct kvm_exit_mmio *mmio)
> +static bool vgic_handle_mmio_range(struct kvm_vcpu *vcpu, struct kvm_run 
> *run,
> + struct kvm_exit_mmio *mmio,
> + const struct mmio_range *ranges,
> + unsigned long mmio_base)

now when we're chopping this up and about to add more logic based on
our struct mmio_range, I think we should really consider getting rid of
that comment abou the kvm_bus_io_*() API or actually use that API.

>  {
>   const struct mmio_range *range;
>   struct vgic_dist *dist = &vcpu->kvm->arch.vgic;
> - unsigned long base = dist->vgic_dist_base;
>   bool updated_state;
>   unsigned long offset;
>  
> - if (!irqchip_in_kernel(vcpu->kvm) ||
> - mmio->phys_addr < base ||
> - (mmio->phys_addr + mmio->len) > (base + KVM_VGIC_V2_DIST_SIZE))
> - return false;
> -
> - /* We don't support ldrd / strd or ldm / stm to the emulated vgic */
> - if (mmio->len > 4) {
> - kvm_inject_dabt(vcpu, mmio->phys_addr);
> - return true;
> - }
> -
> - offset = mmio->phys_addr - base;
> - range = find_matching_range(vgic_dist_ranges, mmio, offset);
> + offset = mmio->phys_addr - mmio_base;
> + range = find_matching_range(ranges, mmio, offset);
>   if (unlikely(!range || !range->handle_mmio)) {
>   pr_warn("Unhandled access %d %08llx %d\n",
>   mmio->is_write, mmio->phys_addr, mmio->len);
> @@ -963,7 +954,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
> kvm_run *run,
>   }
>  
>   spin_lock(&vcpu->kvm->arch.vgic.lock);
> - offset = mmio->phys_addr - range->base - base;
> + offset -= range->base;
>   if (vgic_validate_access(dist, range, offset)) {
>   updated_state = range->handle_mmio(vcpu, mmio, offset);
>   } else {
> @@ -981,6 +972,45 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
> kvm_run *run,
>   return true;
>  }
>  
> +#define IS_IN_RANGE(addr, alen, base, len) \
> + (((addr) >= (base)) && (((addr) + (alen)) < ((base) + (len

that should be <= ((base) + (len)) right?

that's a lot of parenthesis, how about creating a static inline instead?

you could also rename alen to access_len

> +
> +static bool vgic_v2_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> + struct kvm_exit_mmio *mmio)
> +{
> + unsigned long base = vcpu->kvm->arch.vgic.vgic_dist_base;
> +
> + if (!IS_IN_RANGE(mmio->phys_addr, mmio->len, base,
> +  KVM_VGIC_V2_DIST_SIZE))
> + return false;
> +
> + /* GICv2 does not support accesses wider than 32 bits */
> + if (mmio->len > 4) {
> + kvm_inject_dabt(vcpu, mmio->phys_addr);
> + return true;
> + }
> +
> + return vgic_handle_mmio_range(vcpu, run, mmio, vgic_dist_ranges, base);
> +}
> +
> +/**
> + * vgic_handle_mmio - handle an in-kernel MMIO access for the GIC emulation
> + * @vcpu:  pointer to the vcpu performing the access
> + * @run:   pointer to the kvm_run structure
> + * @mmio:  pointer to the data describing the access
> + *
> + * returns true if the MMIO access has been performed in kernel space,
> + * and false if it needs to be emulated in user space.
> + */
> +bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
> +   struct kvm_exit_mmio *mmio)
> +{
> + if (!irqchip_in_kernel(vcpu->kvm))
> + return false;
> +
> + return vgic_v2_handle_mmio(v

Re: [PATCH v2 01/15] arm/arm64: KVM: rework MPIDR assignment and add accessors

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:42PM +0100, Andre Przywara wrote:
> The virtual MPIDR registers (containing topology information) for the
> guest are currently mapped linearily to the vcpu_id. Improve this
> mapping for arm64 by using three levels to not artificially limit the
> number of vCPUs. Also add an accessor to later allow easier access to
> a vCPU with a given MPIDR.
> Use this new accessor in the PSCI emulation.
> 
> Signed-off-by: Andre Przywara 
> ---
>  arch/arm/include/asm/kvm_emulate.h   |2 +-
>  arch/arm/include/asm/kvm_host.h  |2 ++
>  arch/arm/kvm/arm.c   |   15 +++
>  arch/arm/kvm/psci.c  |   15 ---
>  arch/arm64/include/asm/kvm_emulate.h |3 ++-
>  arch/arm64/include/asm/kvm_host.h|2 ++
>  arch/arm64/kvm/sys_regs.c|   11 +--
>  7 files changed, 35 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/arm/include/asm/kvm_emulate.h 
> b/arch/arm/include/asm/kvm_emulate.h
> index 69b7469..18e45f7 100644
> --- a/arch/arm/include/asm/kvm_emulate.h
> +++ b/arch/arm/include/asm/kvm_emulate.h
> @@ -159,7 +159,7 @@ static inline u32 kvm_vcpu_hvc_get_imm(struct kvm_vcpu 
> *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
>  {
> - return vcpu->arch.cp15[c0_MPIDR];
> + return vcpu->arch.cp15[c0_MPIDR] & MPIDR_HWID_BITMASK;

why?

>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
> index 6dfb404..cf99ad0 100644
> --- a/arch/arm/include/asm/kvm_host.h
> +++ b/arch/arm/include/asm/kvm_host.h
> @@ -233,4 +233,6 @@ static inline void vgic_arch_setup(const struct 
> vgic_params *vgic)
>  int kvm_perf_init(void);
>  int kvm_perf_teardown(void);
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr);
> +
>  #endif /* __ARM_KVM_HOST_H__ */
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 7d5065e..addb990 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -990,6 +990,21 @@ static void check_kvm_target_cpu(void *ret)
>   *(int *)ret = kvm_target_cpu();
>  }
>  
> +struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
> +{
> + unsigned long c_mpidr;
> + struct kvm_vcpu *vcpu;
> + int i;
> +
> + mpidr &= MPIDR_HWID_BITMASK;
> + kvm_for_each_vcpu(i, vcpu, kvm) {
> + c_mpidr = kvm_vcpu_get_mpidr(vcpu);
> + if (c_mpidr == mpidr)
> + return vcpu;
> + }
> + return NULL;
> +}
> +
>  /**
>   * Initialize Hyp-mode and memory mappings on all CPUs.
>   */
> diff --git a/arch/arm/kvm/psci.c b/arch/arm/kvm/psci.c
> index 09cf377..49f0992 100644
> --- a/arch/arm/kvm/psci.c
> +++ b/arch/arm/kvm/psci.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * This is an implementation of the Power State Coordination Interface
> @@ -65,25 +66,17 @@ static void kvm_psci_vcpu_off(struct kvm_vcpu *vcpu)
>  static unsigned long kvm_psci_vcpu_on(struct kvm_vcpu *source_vcpu)
>  {
>   struct kvm *kvm = source_vcpu->kvm;
> - struct kvm_vcpu *vcpu = NULL, *tmp;
> + struct kvm_vcpu *vcpu = NULL;
>   wait_queue_head_t *wq;
>   unsigned long cpu_id;
>   unsigned long context_id;
> - unsigned long mpidr;
>   phys_addr_t target_pc;
> - int i;
>  
> - cpu_id = *vcpu_reg(source_vcpu, 1);
> + cpu_id = *vcpu_reg(source_vcpu, 1) & MPIDR_HWID_BITMASK;
>   if (vcpu_mode_is_32bit(source_vcpu))
>   cpu_id &= ~((u32) 0);
>  
> - kvm_for_each_vcpu(i, tmp, kvm) {
> - mpidr = kvm_vcpu_get_mpidr(tmp);
> - if ((mpidr & MPIDR_HWID_BITMASK) == (cpu_id & 
> MPIDR_HWID_BITMASK)) {
> - vcpu = tmp;
> - break;
> - }
> - }
> + vcpu = kvm_mpidr_to_vcpu(kvm, cpu_id);
>  
>   /*
>* Make sure the caller requested a valid CPU and that the CPU is
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index fdc3e21..4b8d636 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -27,6 +27,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num);
>  unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu);
> @@ -179,7 +180,7 @@ static inline u8 kvm_vcpu_trap_get_fault(const struct 
> kvm_vcpu *vcpu)
>  
>  static inline unsigned long kvm_vcpu_get_mpidr(struct kvm_vcpu *vcpu)
>  {
> - return vcpu_sys_reg(vcpu, MPIDR_EL1);
> + return vcpu_sys_reg(vcpu, MPIDR_EL1) & MPIDR_HWID_BITMASK;

why?

>  }
>  
>  static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e10c45a..017fbae 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/

Re: [PATCH v2 02/15] arm/arm64: KVM: pass down user space provided GIC type into vGIC code

2014-10-15 Thread Christoffer Dall
On Thu, Aug 21, 2014 at 02:06:43PM +0100, Andre Przywara wrote:
> With the introduction of a second emulated GIC model we need to let
> userspace specify the GIC model to use for each VM. Pass the
> userspace provided value down into the vGIC code to differentiate
> later.
> 
> Signed-off-by: Andre Przywara 

Acked-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] target-ppc: kvm: Fix memory overflow issue about strncat()

2014-10-15 Thread Alexander Graf


On 15.10.14 15:48, Chen Gang wrote:
> strncat() will append additional '\0' to destination buffer, so need
> additional 1 byte for it, or may cause memory overflow, just like other
> area within QEMU have done.
> 
> And can use g_strdup_printf() instead of strncat(), which may be more
> easier understanding.
> 
> Signed-off-by: Chen Gang 

Thanks, applied to ppc-next. And sorry for not replying earlier.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] target-ppc: kvm: Fix memory overflow issue about strncat()

2014-10-15 Thread Chen Gang
strncat() will append additional '\0' to destination buffer, so need
additional 1 byte for it, or may cause memory overflow, just like other
area within QEMU have done.

And can use g_strdup_printf() instead of strncat(), which may be more
easier understanding.

Signed-off-by: Chen Gang 
---
 target-ppc/kvm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target-ppc/kvm.c b/target-ppc/kvm.c
index 66e7ce5..cea6a87 100644
--- a/target-ppc/kvm.c
+++ b/target-ppc/kvm.c
@@ -1782,7 +1782,7 @@ static int kvmppc_find_cpu_dt(char *buf, int buf_len)
  * format) */
 static uint64_t kvmppc_read_int_cpu_dt(const char *propname)
 {
-char buf[PATH_MAX];
+char buf[PATH_MAX], *tmp;
 union {
 uint32_t v32;
 uint64_t v64;
@@ -1794,10 +1794,10 @@ static uint64_t kvmppc_read_int_cpu_dt(const char 
*propname)
 return -1;
 }
 
-strncat(buf, "/", sizeof(buf) - strlen(buf));
-strncat(buf, propname, sizeof(buf) - strlen(buf));
+tmp = g_strdup_printf("%s/%s", buf, propname);
 
-f = fopen(buf, "rb");
+f = fopen(tmp, "rb");
+g_free(tmp);
 if (!f) {
 return -1;
 }
-- 
1.9.3
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v11 2/6] KVM: Add generic implementation of kvm_vm_ioctl_get_dirty_log

2014-10-15 Thread Alexander Graf


On 23.09.14 02:54, Mario Smarduch wrote:
> Add support for generic implementation of dirty log read function. For now
> x86_64 and ARMv7 share generic dirty log read. Other architectures call
> their architecture specific functions.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/kvm/Kconfig  |1 +
>  arch/arm/kvm/arm.c|2 +-
>  arch/arm64/kvm/Kconfig|1 +
>  arch/ia64/kvm/Kconfig |1 +
>  arch/ia64/kvm/kvm-ia64.c  |2 +-
>  arch/mips/kvm/Kconfig |1 +
>  arch/mips/kvm/mips.c  |2 +-
>  arch/powerpc/kvm/Kconfig  |1 +
>  arch/powerpc/kvm/book3s.c |2 +-
>  arch/powerpc/kvm/booke.c  |2 +-
>  arch/s390/kvm/Kconfig |1 +
>  arch/s390/kvm/kvm-s390.c  |2 +-
>  arch/x86/kvm/x86.c|   86 --
>  include/linux/kvm_host.h  |6 +++
>  virt/kvm/Kconfig  |3 ++
>  virt/kvm/kvm_main.c   |   91 
> +
>  16 files changed, 112 insertions(+), 92 deletions(-)
> 
> diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
> index 4be5bb1..cd9bb1c 100644
> --- a/arch/arm/kvm/Kconfig
> +++ b/arch/arm/kvm/Kconfig
> @@ -23,6 +23,7 @@ config KVM
>   select HAVE_KVM_CPU_RELAX_INTERCEPT
>   select KVM_MMIO
>   select KVM_ARM_HOST
> + select HAVE_KVM_ARCH_DIRTY_LOG

I think we're better off treating the "common dirty log" option as the
odd case. So instead of adding this option for every arch, just set it
on x86 and later on arm. Then you can ...

>   depends on ARM_VIRT_EXT && ARM_LPAE && !CPU_BIG_ENDIAN
>   ---help---
> Support hosting virtualized guest machines. You will also
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 3c82b37..c52b2bd 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -774,7 +774,7 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
>   }
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
> *log)
>  {
>   return -EINVAL;
>  }
> diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
> index 8ba85e9..40a8d19 100644
> --- a/arch/arm64/kvm/Kconfig
> +++ b/arch/arm64/kvm/Kconfig
> @@ -26,6 +26,7 @@ config KVM
>   select KVM_ARM_HOST
>   select KVM_ARM_VGIC
>   select KVM_ARM_TIMER
> + select HAVE_KVM_ARCH_DIRTY_LOG
>   ---help---
> Support hosting virtualized guest machines.
>  
> diff --git a/arch/ia64/kvm/Kconfig b/arch/ia64/kvm/Kconfig
> index 990b864..217f10a 100644
> --- a/arch/ia64/kvm/Kconfig
> +++ b/arch/ia64/kvm/Kconfig
> @@ -28,6 +28,7 @@ config KVM
>   select HAVE_KVM_IRQ_ROUTING
>   select KVM_APIC_ARCHITECTURE
>   select KVM_MMIO
> + select HAVE_KVM_ARCH_DIRTY_LOG
>   ---help---
> Support hosting fully virtualized guest machines using hardware
> virtualization extensions.  You will need a fairly recent
> diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c
> index 6a4309b..3166df5 100644
> --- a/arch/ia64/kvm/kvm-ia64.c
> +++ b/arch/ia64/kvm/kvm-ia64.c
> @@ -1812,7 +1812,7 @@ static void kvm_ia64_sync_dirty_log(struct kvm *kvm,
>   spin_unlock(&kvm->arch.dirty_log_lock);
>  }
>  
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm,
>   struct kvm_dirty_log *log)
>  {
>   int r;
> diff --git a/arch/mips/kvm/Kconfig b/arch/mips/kvm/Kconfig
> index 30e334e..b57f49e 100644
> --- a/arch/mips/kvm/Kconfig
> +++ b/arch/mips/kvm/Kconfig
> @@ -20,6 +20,7 @@ config KVM
>   select PREEMPT_NOTIFIERS
>   select ANON_INODES
>   select KVM_MMIO
> + select HAVE_KVM_ARCH_DIRTY_LOG
>   ---help---
> Support for hosting Guest kernels.
> Currently supported on MIPS32 processors.
> diff --git a/arch/mips/kvm/mips.c b/arch/mips/kvm/mips.c
> index d687c6e..885fdfe 100644
> --- a/arch/mips/kvm/mips.c
> +++ b/arch/mips/kvm/mips.c
> @@ -791,7 +791,7 @@ out:
>  }
>  
>  /* Get (and clear) the dirty memory log for a memory slot. */
> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> +int kvm_arch_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log 
> *log)
>  {
>   struct kvm_memory_slot *memslot;
>   unsigned long ga, ga_end;
> diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
> index d6a53b9..4f28a82 100644
> --- a/arch/powerpc/kvm/Kconfig
> +++ b/arch/powerpc/kvm/Kconfig
> @@ -21,6 +21,7 @@ config KVM
>   select PREEMPT_NOTIFIERS
>   select ANON_INODES
>   select HAVE_KVM_EVENTFD
> + select HAVE_KVM_ARCH_DIRTY_LOG
>  
>  config KVM_BOOK3S_HANDLER
>   bool
> diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
> index c254c27..304faa1 100644
> --- a/arch/powerpc/kvm/book3s.c
> +++ b/arch/powerpc/kvm/book3s.c
> @@ -815,7 +815,7 @@ int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
>   r

Re: [PATCH v11 1/6] KVM: Add architecture-specific TLB flush implementations

2014-10-15 Thread Alexander Graf


On 23.09.14 02:54, Mario Smarduch wrote:
> Add support to declare architecture specific TLB flush function, for now 
> ARMv7.
> 
> Signed-off-by: Mario Smarduch 
> ---
>  include/linux/kvm_host.h |1 +
>  virt/kvm/Kconfig |3 +++
>  virt/kvm/kvm_main.c  |4 
>  3 files changed, 8 insertions(+)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index ec4e3bd..a49a6df 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -592,6 +592,7 @@ void kvm_load_guest_fpu(struct kvm_vcpu *vcpu);
>  void kvm_put_guest_fpu(struct kvm_vcpu *vcpu);
>  
>  void kvm_flush_remote_tlbs(struct kvm *kvm);
> +void kvm_arch_flush_remote_tlbs(struct kvm *kvm);
>  void kvm_reload_remote_mmus(struct kvm *kvm);
>  void kvm_make_mclock_inprogress_request(struct kvm *kvm);
>  void kvm_make_scan_ioapic_request(struct kvm *kvm);
> diff --git a/virt/kvm/Kconfig b/virt/kvm/Kconfig
> index 13f2d19..f1efaa5 100644
> --- a/virt/kvm/Kconfig
> +++ b/virt/kvm/Kconfig
> @@ -34,3 +34,6 @@ config HAVE_KVM_CPU_RELAX_INTERCEPT
>  
>  config KVM_VFIO
> bool
> +
> +config HAVE_KVM_ARCH_TLB_FLUSH_ALL
> +   bool
> diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> index 4b6c01b..d0a24f5 100644
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -186,12 +186,16 @@ static bool make_all_cpus_request(struct kvm *kvm, 
> unsigned int req)
>  
>  void kvm_flush_remote_tlbs(struct kvm *kvm)
>  {
> +#ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL

Please make this an #ifndef on the outer definition and just define your
own copy of this function inside of arch/arm/kvm :).


Alex

> + kvm_arch_flush_remote_tlbs(kvm);
> +#else
>   long dirty_count = kvm->tlbs_dirty;
>  
>   smp_mb();
>   if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
>   ++kvm->stat.remote_tlb_flush;
>   cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> +#endif
>  }
>  EXPORT_SYMBOL_GPL(kvm_flush_remote_tlbs);
>  
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path

2014-10-15 Thread Gleb Natapov
On Mon, Oct 13, 2014 at 05:52:38AM -0300, Marcelo Tosatti wrote:
> On Fri, Oct 10, 2014 at 04:09:29PM +0300, Gleb Natapov wrote:
> > On Wed, Oct 08, 2014 at 04:22:31PM -0300, Marcelo Tosatti wrote:
> > > > > 
> > > > > Argh, lets try again:
> > > > > 
> > > > > skip_pinned = true
> > > > > --
> > > > > 
> > > > > mark page dirty, keep spte intact
> > > > > 
> > > > > called from get dirty log path.
> > > > > 
> > > > > skip_pinned = false
> > > > > ---
> > > > > reload remote mmu
> > > > > destroy pinned spte.
> > > > > 
> > > > > called from: dirty log enablement, rmap write protect (unused for 
> > > > > pinned
> > > > > sptes)
> > > > > 
> > > > > 
> > > > > Note this behaviour is your suggestion:
> > > > 
> > > > Yes, I remember that and I thought we will not need this skip_pinned
> > > > at all.  For rmap write protect case there shouldn't be any pinned 
> > > > pages,
> > > > but why dirty log enablement sets skip_pinned to false? Why not mark
> > > > pinned pages as dirty just like you do in get dirty log path?
> > > 
> > > Because if its a large spte, it must be nuked (or marked read-only,
> > > which for pinned sptes, is not possible).
> > > 
> > If a large page has one small page pinned inside it its spte will
> > be marked as pinned, correct?  
> 
> Correct.
> 
> > We did nuke large ptes here until very
> > recently: c126d94f2c90ed9d, but we cannot drop a pte here anyway without
> > kicking all vcpu from a guest mode, but do you need additional skip_pinned
> > parameter?  Why not check if spte is large instead?
> 
> Nuke only if large spte is found? Can do that, instead.
> 
> > So why not have per slot pinned page list (Xiao suggested the same) and do:
> 
> The interface is per-vcpu (that is registration of pinned pages is
> performed on a per-vcpu basis).
> 
PEBS is per cpu, but it does not mean that pinning should be per cpu, it
can be done globally with ref counting.

> > spte_write_protect() {
> >   if (is_pinned(spte) {
> > if (large(spte))
> >// cannot drop while vcpu are running
> >mmu_reload_pinned_vcpus();
> > else
> >return false;
> > }
> > 
> > 
> > get_dirty_log() {
> >  for_each(pinned pages i)
> > makr_dirty(i);
> > }
> 
> That is effectively the same this patchset does, except that the spte
> pinned bit is checked at spte_write_protect, instead of looping over 
> page pinned list. Fail to see huge advantage there.
>
I think spte_write_protect is a strange place to mark pages dirty, but
otherwise yes the effect is the same, so definitely not a huge difference.
If global pinned list is a PITA in your opinion leave it as is.

> I'll drop the skip_pinned parameter and use is_large_pte check instead.
> 
Thanks,

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next RFC 0/3] virtio-net: Conditionally enable tx interrupt

2014-10-15 Thread Jason Wang
On 10/15/2014 07:06 AM, Michael S. Tsirkin wrote:
> On Tue, Oct 14, 2014 at 02:53:27PM -0400, David Miller wrote:
>> > From: Jason Wang 
>> > Date: Sat, 11 Oct 2014 15:16:43 +0800
>> > 
>>> > > We free old transmitted packets in ndo_start_xmit() currently, so any
>>> > > packet must be orphaned also there. This was used to reduce the 
>>> > > overhead of
>>> > > tx interrupt to achieve better performance. But this may not work for 
>>> > > some
>>> > > protocols such as TCP stream. TCP depends on the value of sk_wmem_alloc 
>>> > > to
>>> > > implement various optimization for small packets stream such as TCP 
>>> > > small
>>> > > queue and auto corking. But orphaning packets early in ndo_start_xmit()
>>> > > disable such things more or less since sk_wmem_alloc was not accurate. 
>>> > > This
>>> > > lead extra low throughput for TCP stream of small writes.
>>> > > 
>>> > > This series tries to solve this issue by enable tx interrupts for all 
>>> > > TCP
>>> > > packets other than the ones with push bit or pure ACK. This is done 
>>> > > through
>>> > > the support of urgent descriptor which can force an interrupt for a
>>> > > specified packet. If tx interrupt was enabled for a packet, there's no 
>>> > > need
>>> > > to orphan it in ndo_start_xmit(), we can free it tx napi which is 
>>> > > scheduled
>>> > > by tx interrupt. Then sk_wmem_alloc was more accurate than before and 
>>> > > TCP
>>> > > can batch more for small write. More larger skb was produced by TCP in 
>>> > > this
>>> > > case to improve both throughput and cpu utilization.
>>> > > 
>>> > > Test shows great improvements on small write tcp streams. For most of 
>>> > > the
>>> > > other cases, the throughput and cpu utilization are the same in the
>>> > > past. Only few cases, more cpu utilization was noticed which needs more
>>> > > investigation.
>>> > > 
>>> > > Review and comments are welcomed.
>> > 
>> > I think proper accounting and queueing (at all levels, not just TCP
>> > sockets) is more important than trying to skim a bunch of cycles by
>> > avoiding TX interrupts.
>> > 
>> > Having an event to free the SKB is absolutely essential for the stack
>> > to operate correctly.
>> > 
>> > And with virtio-net you don't even have the excuse of "the HW
>> > unfortunately doesn't have an appropriate TX event."
>> > 
>> > So please don't play games, and instead use TX interrupts all the
>> > time.  You can mitigate them in various ways, but don't turn them on
>> > selectively based upon traffic type, that's terrible.
>> > 
>> > You can even use ->xmit_more to defer the TX interrupt indication to
>> > the final TX packet in the chain.
> I guess we can just defer the kick, interrupt will naturally be
> deferred as well.
> This should solve the problem for old hosts as well.

Interrupt were delayed but not reduced, to support this we need publish
avail idx as used event. This should reduce the tx interrupt in the case
of bulk dequeuing.

I will draft a new rfc series contain this.
>
> We'll also need to implement bql for this.
> Something like the below?
> Completely untested - posting here to see if I figured the
> API out correctly. Has to be applied on top of the previous patch.

Looks so. I believe better to have but not a must.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 03/25] virtio-pci: move freeze/restore to virtio core

2014-10-15 Thread Paul Bolle
On Mon, 2014-10-13 at 10:48 +0300, Michael S. Tsirkin wrote:
> This is in preparation to extending config changed event handling
> in core.
> Wrapping these in an API also seems to make for a cleaner code.
> 
> Signed-off-by: Michael S. Tsirkin 
> Reviewed-by: Cornelia Huck 

This patch landed in today's linux-next (next-20141015).

>  include/linux/virtio.h  |  6 +
>  drivers/virtio/virtio.c | 54 
> +
>  drivers/virtio/virtio_pci.c | 54 
> ++---
>  3 files changed, 62 insertions(+), 52 deletions(-)
> 
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index 3c19bd3..8df7ba8 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -78,6 +78,7 @@ bool virtqueue_is_broken(struct virtqueue *vq);
>  /**
>   * virtio_device - representation of a device using virtio
>   * @index: unique position on the virtio bus
> + * @failed: saved value for CONFIG_S_FAILED bit (for restore)

s/CONFIG_S_FAILED/VIRTIO_CONFIG_S_FAILED/?

>   * @dev: underlying device.
>   * @id: the device type identification (used to match it with a driver).
>   * @config: the configuration ops for this device.


Paul Bolle

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html