date:20120816

perf uncore lkvm woes

2012-08-16 Thread Pekka Enberg

Hello,
[0.248962] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1+ #24
[penberg@tux ~]$ cat perf-kvmtool-issue
Hello,

Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
are doing uncore_init() on virtualized CPU which breaks boot.

Pekka

[penberg@tux kvm]$ ./vm run
  # lkvm run -k ../../arch/x86/boot/bzImage -m 448 -c 4 --name guest-30425
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
[0.00] Initializing cgroup subsys cpuset
[0.00] Initializing cgroup subsys cpu
[0.00] Linux version 3.6.0-rc1+ (penberg@tux) (gcc version
4.6.3 20120306 (Red Hat 4.6.3-2) (GCC) ) #24 SMP Thu Aug 16 09:55:41
EEST 2012
[0.00] Command line: noapic noacpi pci=conf1 reboot=k panic=1
i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 console=ttyS0
earlyprintk=serial i8042.noaux=1  root=/dev/root rw
rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
init=/virt/init  ip=dhcp
[0.00] e820: BIOS-provided physical RAM map:
[0.00] BIOS-e820: [mem 0x-0x0009fbff] usable
[0.00] BIOS-e820: [mem 0x0009fc00-0x0009] reserved
[0.00] BIOS-e820: [mem 0x000f-0x000e] reserved
[0.00] BIOS-e820: [mem 0x0010-0x1bff] usable
[0.00] bootconsole [earlyser0] enabled
[0.00] NX (Execute Disable) protection: active
[0.00] DMI not present or invalid.
[0.00] No AGP bridge found
[0.00] e820: last_pfn = 0x1c000 max_arch_pfn = 0x4
[0.00] x86 PAT enabled: cpu 0, old 0x70106, new 0x7010600070106
[0.00] CPU MTRRs all blank - virtualized system.
[0.00] found SMP MP-table at [mem 0x000f0370-0x000f037f]
mapped at [880f0370]
[0.00] init_memory_mapping: [mem 0x-0x1bff]
[0.00] ACPI BIOS Bug: Error: A valid RSDP was not found
(20120711/tbxfroot-219)
[0.00] No NUMA configuration found
[0.00] Faking a node at [mem 0x-0x1bff]
[0.00] Initmem setup node 0 [mem 0x-0x1bff]
[0.00]   NODE_DATA [mem 0x1bffc000-0x1bff]
[0.00] Zone ranges:
[0.00]   DMA  [mem 0x0001-0x00ff]
[0.00]   DMA32[mem 0x0100-0x]
[0.00]   Normal   empty
[0.00] Movable zone start for each node
[0.00] Early memory node ranges
[0.00]   node   0: [mem 0x0001-0x0009efff]
[0.00]   node   0: [mem 0x0010-0x1bff]
[0.00] Intel MultiProcessor Specification v1.4
[0.00] MPTABLE: OEM ID: KVMCPU00
[0.00] MPTABLE: Product ID: 0.1
[0.00] MPTABLE: APIC at: 0xFEE0
[0.00] Processor #0 (Bootup-CPU)
[0.00] Processor #1
[0.00] Processor #2
[0.00] Processor #3
[0.00] IOAPIC[0]: apic_id 5, version 17, address 0xfec0, GSI 0-23
[0.00] Processors: 4
[0.00] smpboot: Allowing 4 CPUs, 0 hotplug CPUs
[0.00] PM: Registered nosave memory: 0009f000 - 000a
[0.00] PM: Registered nosave memory: 000a - 000f
[0.00] PM: Registered nosave memory: 000f - 000ff000
[0.00] PM: Registered nosave memory: 000ff000 - 0010
[0.00] e820: [mem 0x1c00-0x] available for PCI devices
[0.00] setup_percpu: NR_CPUS:64 nr_cpumask_bits:64
nr_cpu_ids:4 nr_node_ids:1
[0.00] PERCPU: Embedded 27 pages/cpu @88001bc0 s78272
r8192 d24128 u524288
[0.00] Built 1 zonelists in Node order, mobility grouping on.
Total pages: 112777
[0.00] Policy zone: DMA32
[0.00] Kernel command line: noapic noacpi pci=conf1 reboot=k
panic=1 i8042.direct=1 i8042.dumbkbd=1 i8042.nopnp=1 console=ttyS0
earlyprintk=serial i8042.noaux=1  root=/dev/root rw
rootflags=rw,trans=virtio,version=9p2000.L rootfstype=9p
init=/virt/init  ip=dhcp
[0.00] PID hash table entries: 2048 (order: 2, 16384 bytes)
[0.00] __ex_table already sorted, skipping sort
[0.00] xsave: enabled xstate_bv 0x7, cntxt size 0x340
[0.00] Checking aperture...
[0.00] No AGP bridge found
[0.00] Memory: 434972k/458752k available (7288k kernel code,
452k absent, 23328k reserved, 5691k data, 600k init)
[0.00] SLUB: Genslabs=15, HWalign=64, Order=0-3, MinObjects=0,
CPUs=4, Nodes=1
[0.00] Hierarchical RCU implementation.
[0.00]  RCU restricting CPUs from NR_CPUS=64 to nr_cpu_ids=4.
[0.00] NR_IRQS:4352 nr_irqs:712 16
[0.00] Console: colour *CGA 80x25
[0.00] console [ttyS0] enabled, bootconsole disabled
[0.00] console [ttyS0] enabled, bootconsole disabled
[0.00] tsc: Fast TSC calibration using PIT
[0.00] tsc: Detected 2691.428 MHz processor
[0.003002] Calibrating delay loop (skipped), value calculated
using timer

Re: perf uncore lkvm woes

2012-08-16 Thread Cyrill Gorcunov

On Thu, Aug 16, 2012 at 10:01:58AM +0300, Pekka Enberg wrote:
 Hello,
 [0.248962] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1+ #24
 [penberg@tux ~]$ cat perf-kvmtool-issue
 Hello,
 
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot.

Hi, I guess some cpuid/msr bit is not cleared again ;) I'll take a look
once time permit.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Cyrill Gorcunov

On Thu, Aug 16, 2012 at 11:07:43AM +0400, Cyrill Gorcunov wrote:
 On Thu, Aug 16, 2012 at 10:01:58AM +0300, Pekka Enberg wrote:
  Hello,
  [0.248962] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1+ #24
  [penberg@tux ~]$ cat perf-kvmtool-issue
  Hello,
  
  Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
  are doing uncore_init() on virtualized CPU which breaks boot.
 
 Hi, I guess some cpuid/msr bit is not cleared again ;) I'll take a look
 once time permit.

If only I'm not missing something we've two options 1) either tune up
cpu model via cpuid interception in lkvm (which is bad I think)
2) provide some new kernel boot line option to not use unboxed pmu.

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Pekka Enberg

On Thu, Aug 16, 2012 at 10:01:58AM +0300, Pekka Enberg wrote:
  Hello,
  [0.248962] Pid: 0, comm: swapper/0 Not tainted 3.6.0-rc1+ #24
  [penberg@tux ~]$ cat perf-kvmtool-issue
  Hello,
  
  Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
  are doing uncore_init() on virtualized CPU which breaks boot.

On Thu, 16 Aug 2012, Cyrill Gorcunov wrote:
 Hi, I guess some cpuid/msr bit is not cleared again ;) I'll take a look
 once time permit.

Alternative fix would be to change our CPUID name to KVMKVMKVM or 
something to avoid going through these code paths.

Ingo?

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Peter Zijlstra

On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot. 

I think you're the first.. I don't normally use kvm if I can at all
avoid it.

But I think its a 'simple' matter of kvm not emulating the entire
hardware. Afaik the uncore isn't enumerated and we simply assume MSR
presence based on cpu model.

Added Zheng Yan who wrote most of that stuff.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Yan, Zheng

On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
 On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot. 
 
 I think you're the first.. I don't normally use kvm if I can at all
 avoid it.
 
 But I think its a 'simple' matter of kvm not emulating the entire
 hardware. Afaik the uncore isn't enumerated and we simply assume MSR
 presence based on cpu model.

The Intel uncore doc does not specify how to check if uncore exist.
How about disabling uncore on virtualized CPU?

Regards
Yan, Zheng
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Pekka Enberg

On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
 On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot.

 I think you're the first.. I don't normally use kvm if I can at all
 avoid it.

 But I think its a 'simple' matter of kvm not emulating the entire
 hardware. Afaik the uncore isn't enumerated and we simply assume MSR
 presence based on cpu model.

On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
zheng.z@linux.intel.com wrote:
 The Intel uncore doc does not specify how to check if uncore exist.
 How about disabling uncore on virtualized CPU?

(CC'ing Avi.)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Cyrill Gorcunov

On Thu, Aug 16, 2012 at 10:41:53AM +0300, Pekka Enberg wrote:
 On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
  On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
  Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
  are doing uncore_init() on virtualized CPU which breaks boot.
 
  I think you're the first.. I don't normally use kvm if I can at all
  avoid it.
 
  But I think its a 'simple' matter of kvm not emulating the entire
  hardware. Afaik the uncore isn't enumerated and we simply assume MSR
  presence based on cpu model.
 
 On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
 zheng.z@linux.intel.com wrote:
  The Intel uncore doc does not specify how to check if uncore exist.
  How about disabling uncore on virtualized CPU?
 
 (CC'ing Avi.)

Why not simply add bootline option for that? Would it be acceptible?

Cyrill
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Avi Kivity

On 08/16/2012 10:41 AM, Pekka Enberg wrote:
 On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
 On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot.

 I think you're the first.. I don't normally use kvm if I can at all
 avoid it.

 But I think its a 'simple' matter of kvm not emulating the entire
 hardware. Afaik the uncore isn't enumerated and we simply assume MSR
 presence based on cpu model.
 
 On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
 zheng.z@linux.intel.com wrote:
 The Intel uncore doc does not specify how to check if uncore exist.
 How about disabling uncore on virtualized CPU?
 
 (CC'ing Avi.)
 

Seems reasonable, if unfortunate.  It's pretty easy to check for.

If those are separate MSRs, we can also trap the #GP and exit gracefully.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Avi Kivity

On 08/16/2012 10:46 AM, Cyrill Gorcunov wrote:
 On Thu, Aug 16, 2012 at 10:41:53AM +0300, Pekka Enberg wrote:
 On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
  On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
  Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
  are doing uncore_init() on virtualized CPU which breaks boot.
 
  I think you're the first.. I don't normally use kvm if I can at all
  avoid it.
 
  But I think its a 'simple' matter of kvm not emulating the entire
  hardware. Afaik the uncore isn't enumerated and we simply assume MSR
  presence based on cpu model.
 
 On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
 zheng.z@linux.intel.com wrote:
  The Intel uncore doc does not specify how to check if uncore exist.
  How about disabling uncore on virtualized CPU?
 
 (CC'ing Avi.)
 
 Why not simply add bootline option for that? Would it be acceptible?
 

Most users just install a distro, they don't mess with kernel command lines.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] KVM: PPC: booke/bookehv: Add guest debug support

2012-08-16 Thread Bhushan Bharat-R65777

  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..da71c84 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -25,6 +25,7 @@
   /* Select powerpc specific features in linux/kvm.h */  #define
  __KVM_HAVE_SPAPR_TCE  #define __KVM_HAVE_PPC_SMT
  +#define __KVM_HAVE_GUEST_DEBUG
 
   struct kvm_regs {
__u64 pc;
  @@ -265,10 +266,19 @@ struct kvm_fpu {  };
 
   struct kvm_debug_exit_arch {
  + __u32 exception;
  + __u32 pc;
  + __u32 status;
   };
 
  PC must be 64-bit.  What goes in status and exception?

status -  exit because of h/w breakpoint, watchpoint (read, write or both) and 
software breakpoint.
exception - returns the exception number. If the exit is not handled (say not 
h/w breakpoint or software breakpoint set for this address) by qemu then it is 
supposed to inject the exception to guest. This is how it is implemented for 
x86.

 
  ok
 
 
   /* for KVM_SET_GUEST_DEBUG */
   struct kvm_guest_debug_arch {
  + struct {
  + __u64 addr;
  + __u32 type;
  + __u32 pad1;
  + __u64 pad2;
  + } bp[16];
   };
 
  What goes in type?
 
  Type denote breakpoint, read watchpoint, write watchpoint or watchpoint 
  (both
 read and write). Will adding a comment to describe this is ok?
 
 Yes, please make sure all of this is well documented.
 
   /* definition of registers in kvm_run */ @@ -285,6 +295,17 @@
  struct kvm_sync_regs {
   #define KVM_CPU_3S_644
   #define KVM_CPU_E500MC   5
 
  +/* Debug related defines */
  +#define KVM_INST_GUESTGDB   0x7C00021C  /* ehpriv OC=0 */
 
  Will this work on all PPC?
 
  It certainly won't work on other architectures, so at a minimum it's
  KVM_PPC_INST_GUEST_GDB, but maybe it needs to be determined at runtime.
 
  How to determine at run time? adding another ioctl ?
 
 Or extend an existing one.  Is there any other information about debug
 capabilities that you expose -- number of hardware breakpoints supported, etc
 
  +#define KVM_GUESTDBG_USE_SW_BP  0x0001
  +#define KVM_GUESTDBG_USE_HW_BP  0x0002
 
  Where do these get used?  Any reason for these particular values?  If
  you're trying to create a partition where the upper half is generic
  and the lower half is arch-specific, say so.
 
  KVM_SET_GUEST_DEBUG ioctl used to set/unset debug interrupts, which
  have a u32 control element. We have inherited this mechanism from
  x86 implementation and it looks like lower 16 bits are generic (like
  KVM_GUESTDBG_ENBLE, KVM_GUESTDBG_SINGLESTEP etc and upper 16 bits are
  Architecture specific.
 
  I will add a comment to describe this.
 
 I don't think the sw/hw distinction belongs here -- it should be per 
 breakpoint.

KVM does not track the software breakpoint, so it is not per breakpoint.
In KVM, when KVM_GUESTDBG_USE_SW_BP flag is set and special trap instruction is 
executed by guest then exit to userspace.

 
  + run-exit_reason = KVM_EXIT_DEBUG;
  + run-debug.arch.pc = vcpu-arch.pc;
  + run-debug.arch.exception = exit_nr;
  + run-debug.arch.status = 0;
  + kvmppc_account_exit(vcpu, DEBUG_EXITS);
  + return RESUME_HOST;
 
  The interface isn't (clearly labelled as) booke specific, but you
  return booke- specific exception numbers.  How's userspace supposed
  to know what to do with them?  What do you plan on doing with them in QEMU?
 
  This is booke specific.
 
 Then put booke in the name,

Which data structure name should have booke?

 but what about it really needs to be booke specific?
 Why does QEMU care about the exception type?

Explained above.

Thanks
-Bharat

 
  +#ifndef CONFIG_PPC_FSL_BOOK3E
  + PPC_LD(r7, VCPU_HOST_DBG+KVMPPC_DBG_IAC3, r4)
  + PPC_LD(r8, VCPU_HOST_DBG+KVMPPC_DBG_IAC4, r4)
  + mtspr   SPRN_IAC3, r7
  + mtspr   SPRN_IAC4, r8
  +#endif
 
  Can you handle this at runtime with a feature section?
 
  Why you want this to make run time? Removing config_ ?
 
 Currently KVM hardcodes the target hardware in a way that is unacceptable in
 much of the rest of the kernel.  We have a long term goal to stop doing that,
 and we should avoid making it worse by adding random ifdefs for specific CPUs.
 
 -Scott

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Avi Kivity

On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
 On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
 On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
  
  Userspace may want to modify the ROM (for example, when programming a
  flash device).  It is also possible to map an hva range rw through one
  slot and ro through another.
  
  Right, can do that with multiple userspace maps to the same anonymous 
  memory region (see other email).
 
 Yes it's possible.  It requires that we move all memory allocation to be
 fd based, since userspace can't predict what memory will be dual-mapped
 (at least if emulated hardware allows this).
 
 It can:
 - Create named memory object, with associated fd.
 - Copy data from large anonymous memory region to named memory.

That doesn't work if dma is in progress (assigned device).  It also
doubles the amount of memory in use.

 - Unmap region that must be dual-mapped from large anonymous memory chunk.
 - Map named memory object at address.
 
 The last step can be replaced by adjusting KVM memory slots.
 
 The disadvantage of protection information in memory slots
 is that it duplicates functionality that is handled by 
 userspace mappings.

Agree.  So does the memory slots mechanism, and even dirty logging.

 
 Moreover, multiple memory maps are necessary for any
 split-qemu-into-smaller-pieces solutions.

Complex users can use complex mechanism, but let's keep the simple stuff
simple.

 
  Is this a reasonable
 requirement?  Do ksm/thp/autonuma work with this?
 
 As mentioned, only memory used for ROM purposes must be dual mapped. 
 
 I don't think there is any way to create multiple mappings 
 to one anonymous memory object ATM, but POSIX defines it
 (posix_typed_mem_open).
 
 The limitation of thp/ksm on shared memory also affects any other user
 of shared memory, so it should be fixed there.
 
 Also, QEMU ROM is allocated separately from RAM, correct?
 

Correct.  But the chipset is also able to to write-protect some ranges
in the 0xc-0x10 area via the PAM.  It is able to write-protect
both RAM and PCI memory (usually mapped to flash).



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Cyrill Gorcunov

On Thu, Aug 16, 2012 at 11:45:52AM +0300, Avi Kivity wrote:
  On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
  zheng.z@linux.intel.com wrote:
   The Intel uncore doc does not specify how to check if uncore exist.
   How about disabling uncore on virtualized CPU?
  
  (CC'ing Avi.)
  
  Why not simply add bootline option for that? Would it be acceptible?
 
 Most users just install a distro, they don't mess with kernel command lines.

The command line option might be added implicitly in qemu/lkvm.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

2012-08-16 Thread Avi Kivity

On 08/16/2012 08:14 AM, Xudong Hao wrote:
 Enable KVM FPU fully eager restore, if there is other FPU state which isn't 
 tracked by
 CR0.TS bit.
 
 Tested with these cases:
 1) SpecCPU2000 workload( 1 VM, 2 VMs)
 2) Program for floating point caculate

Is the motivation performance or correctness?

 +
  struct kvm_memory_alias {
   __u32 slot;  /* this has a different namespace than memory slots */
   __u32 flags;
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index b6379e5..2e628e5 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -5966,7 +5966,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
   vcpu-guest_fpu_loaded = 0;
   fpu_save_init(vcpu-arch.guest_fpu);
   ++vcpu-stat.fpu_reload;
 - kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
 + /*
 +  * Currently KVM trigger FPU restore by #NM (via CR0.TS),
 +  * till now only XCR0.bit0, XCR0.bit1, XCR0.bit2 is tracked
 +  * by TS bit, there might be other FPU state is not tracked
 +  * by TS bit. 

Which state is that?

 Here it only make FPU deactivate request and do 
 +  * FPU lazy restore for these cases: 1)xsave isn't enabled 
 +  * in guest, 2)all guest FPU states can be tracked by TS bit.
 +  * For others, doing fully FPU eager restore.
 +  */
 + if (!kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) ||
 + !(vcpu-arch.xcr0  ~KVM_XSTATE_LAZY))
 + kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
   trace_kvm_fpu(0);
  }

Is there no way to track accesses to this extended state?

Although I expect that on modern hardware which exits rarely, eager fpu
reload might be more performant.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: PPC: 440: Implement mtdcrx

2012-08-16 Thread Avi Kivity

On 08/16/2012 01:42 AM, Alexander Graf wrote:
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
  arch/powerpc/kvm/44x_emulate.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
 index c8c6157..ea57631 100644
 --- a/arch/powerpc/kvm/44x_emulate.c
 +++ b/arch/powerpc/kvm/44x_emulate.c
 @@ -28,6 +28,7 @@
  #include 44x_tlb.h
  
  #define XOP_MFDCR   323
 +#define XOP_MTDCRX  387
  #define XOP_MTDCR   451
  #define XOP_TLBSX   914
  #define XOP_ICCCI   966
 @@ -84,6 +85,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
  
   break;
  
 + case XOP_MTDCRX:
 + dcrn = kvmppc_get_gpr(vcpu, ra);
   case XOP_MTDCR:

It's customary to put a /* fallthrough */ comment to shut down any
alarms that may be firing off in readers' minds.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Pekka Enberg

On Thu, Aug 16, 2012 at 12:06 PM, Cyrill Gorcunov gorcu...@openvz.org wrote:
 Most users just install a distro, they don't mess with kernel command lines.

 The command line option might be added implicitly in qemu/lkvm.

That does not make sense for QEMU and we want less mandatory command line
options for LKVM too.

Pekka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Bug 42980] BUG in gfn_to_pfn_prot

2012-08-16 Thread bugzilla-daemon

https://bugzilla.kernel.org/show_bug.cgi?id=42980


Alan a...@lxorguk.ukuu.org.uk changed:

   What|Removed |Added

 Kernel Version|3.2.2-gentoo|3.4




--- Comment #16 from Alan a...@lxorguk.ukuu.org.uk  2012-08-16 09:32:17 ---
Thanks

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

2012-08-16 Thread Hao, Xudong

 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Avi Kivity
 Sent: Thursday, August 16, 2012 5:08 PM
 To: Hao, Xudong
 Cc: kvm@vger.kernel.org; Zhang, Xiantao
 Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU
 
 On 08/16/2012 08:14 AM, Xudong Hao wrote:
  Enable KVM FPU fully eager restore, if there is other FPU state which isn't
 tracked by
  CR0.TS bit.
 
  Tested with these cases:
  1) SpecCPU2000 workload( 1 VM, 2 VMs)
  2) Program for floating point caculate
 
 Is the motivation performance or correctness?
 

It's not performance improvement, it could be treated as a correctness. I do 
not say current code has issue, but just as code comment, it's for the other 
FPU state.

  +
   struct kvm_memory_alias {
  __u32 slot;  /* this has a different namespace than memory slots */
  __u32 flags;
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index b6379e5..2e628e5 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -5966,7 +5966,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
  vcpu-guest_fpu_loaded = 0;
  fpu_save_init(vcpu-arch.guest_fpu);
  ++vcpu-stat.fpu_reload;
  -   kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
  +   /*
  +* Currently KVM trigger FPU restore by #NM (via CR0.TS),
  +* till now only XCR0.bit0, XCR0.bit1, XCR0.bit2 is tracked
  +* by TS bit, there might be other FPU state is not tracked
  +* by TS bit.
 
 Which state is that?
 

Except the last 3 bits, other bit are these state.

  Here it only make FPU deactivate request and do
  +* FPU lazy restore for these cases: 1)xsave isn't enabled
  +* in guest, 2)all guest FPU states can be tracked by TS bit.
  +* For others, doing fully FPU eager restore.
  +*/
  +   if (!kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) ||
  +   !(vcpu-arch.xcr0  ~KVM_XSTATE_LAZY))
  +   kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
  trace_kvm_fpu(0);
   }
 
 Is there no way to track accesses to this extended state?
 

Because I can't define the extended state now, so using this method. But just 
as I say, the extended state are NO-LAZY except the last 3 bit.

 Although I expect that on modern hardware which exits rarely, eager fpu
 reload might be more performant.
 
 
 --
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] Document IACx/DACx registers access using ONE_REG API

2012-08-16 Thread Alexander Graf


On 16.08.2012, at 05:37, Bharat Bhushan wrote:

 Patch to access the debug registers (IACx/DACx) using ONE_REG api
 was sent earlier. But that missed the respective documentation.
 
 Also corrected the index number referencing in section 4.69
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com

Thanks, applied to kvm-ppc-next.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] KVM: PPC: 440: Implement mtdcrx

2012-08-16 Thread Alexander Graf


On 16.08.2012, at 11:11, Avi Kivity wrote:

 On 08/16/2012 01:42 AM, Alexander Graf wrote:
 Signed-off-by: Alexander Graf ag...@suse.de
 ---
 arch/powerpc/kvm/44x_emulate.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
 index c8c6157..ea57631 100644
 --- a/arch/powerpc/kvm/44x_emulate.c
 +++ b/arch/powerpc/kvm/44x_emulate.c
 @@ -28,6 +28,7 @@
 #include 44x_tlb.h
 
 #define XOP_MFDCR   323
 +#define XOP_MTDCRX  387
 #define XOP_MTDCR   451
 #define XOP_TLBSX   914
 #define XOP_ICCCI   966
 @@ -84,6 +85,8 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
 kvm_vcpu *vcpu,
 
  break;
 
 +case XOP_MTDCRX:
 +dcrn = kvmppc_get_gpr(vcpu, ra);
  case XOP_MTDCR:
 
 It's customary to put a /* fallthrough */ comment to shut down any
 alarms that may be firing off in readers' minds.

Yeah, I moved this over into function calls now. Makes the code easier to read 
:). And hopefully the compiler is smart enough to optimize it the same way.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Windows slow boot: contractor wanted

2012-08-16 Thread Richard Davies

Hi,

We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
contractor to track down and fix problems we have with large memory Windows
guests booting very slowly - they can take several hours.

We previously reported these problems in July (copied below) and they are
still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.

This is a serious issue for us which is causing significant pain to our
larger Windows VM customers when their servers are offline for many hours
during boot.

If anyone knowledgeable in the area would be interested in being paid to
work on this, or if you know someone who might be, I would be delighted to
hear from you.

Cheers,

Richard.


= Previous bug report

http://marc.info/?l=qemu-develm=134304194329745


We have been experiencing this problem for a while now too, using qemu-kvm
(currently at 1.1.1).

Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
produces the issue:

qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice 
tablet -vnc :99 -monitor stdio -hda test.img

The hardware consists of dual AMD Opteron 6128 processors (16 cores in
total) and 64GB of memory. This command line was tested on kernel 3.1.4. 

I've also tested with -no-hpet.

What I have seen is much as described: the memory fills out slowly, and top
on the host will show the process using 100% on all allocated CPU cores. The
most extreme case was a machine which took something between 6 and 8 hours
to boot.

This seems to be related to the assigned memory, as described, but also the
number of processor cores (which makes sense if we believe it's a timing
issue?). I have seen slow-booting guests improved by switching down to a
single or even two cores.

Matthew, I agree that this seems to be linked to the number of VMs running -
in fact, shutting down other VMs on a dedicated test host caused the machine
to start booting at a normal speed (with no reboot required).

However, the level of contention is never such that this could be explained
by the host simply being overcommitted.

If it helps anyone, there's an image of the hard drive I've been using to
test at:

http://46.20.114.253/

It's 5G of gzip file containing a fairly standard Windows 2008 trial
installation. Since it's in the trial period, anyone who wants to use it may
have to re-arm the trial: http://support.microsoft.com/kb/948472

Please let me know if I can provide any more information, or test anything.

Best wishes,

Owen Tuz
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Avi Kivity

On 08/16/2012 11:40 AM, Avi Kivity wrote:
 On 08/16/2012 10:41 AM, Pekka Enberg wrote:
 On 08/16/2012 03:19 PM, Peter Zijlstra wrote:
 On Thu, 2012-08-16 at 10:01 +0300, Pekka Enberg wrote:
 Has anyone seen this? It's kvmtool/next with 3.6.0-rc1. Looks like we
 are doing uncore_init() on virtualized CPU which breaks boot.

 I think you're the first.. I don't normally use kvm if I can at all
 avoid it.

 But I think its a 'simple' matter of kvm not emulating the entire
 hardware. Afaik the uncore isn't enumerated and we simply assume MSR
 presence based on cpu model.
 
 On Thu, Aug 16, 2012 at 10:38 AM, Yan, Zheng
 zheng.z@linux.intel.com wrote:
 The Intel uncore doc does not specify how to check if uncore exist.
 How about disabling uncore on virtualized CPU?
 
 (CC'ing Avi.)
 
 
 Seems reasonable, if unfortunate.  It's pretty easy to check for.
 
 If those are separate MSRs, we can also trap the #GP and exit gracefully.

Another option is to deal with them on the host side.  That has the
benefit of working with non-Linux guests too.

We can just ignore the MSR and print some warning.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/3] KVM: PPC: More 440 fixes

2012-08-16 Thread Alexander Graf

With these patches applied, I can successfully run a guest on my 460EX
board. So this is the official revival of the 440 KVM target ;-).

As a sidenote: These patches are required because I'm running on a
newer chip than the original development was based on. If you are on an
older 440 CPU that doesn't have FP or mtdcrx, these patches are not
necessary.

Alex

Alexander Graf (3):
  KVM: PPC: 440: Implement mtdcrx
  KVM: PPC: 440: Implement mfdcrx
  KVM: PPC: BookE: Support FPU on non-hv systems

 arch/powerpc/kvm/44x_emulate.c |  110 
 arch/powerpc/kvm/booke.c   |   11 
 2 files changed, 77 insertions(+), 44 deletions(-)

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] KVM: PPC: 440: Implement mfdcrx

2012-08-16 Thread Alexander Graf

We need mfdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rework mfdcr into a function
---
 arch/powerpc/kvm/44x_emulate.c |   74 +++-
 1 files changed, 43 insertions(+), 31 deletions(-)

diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
index 3843a75..1a793c4 100644
--- a/arch/powerpc/kvm/44x_emulate.c
+++ b/arch/powerpc/kvm/44x_emulate.c
@@ -27,6 +27,7 @@
 #include booke.h
 #include 44x_tlb.h
 
+#define XOP_MFDCRX  259
 #define XOP_MFDCR   323
 #define XOP_MTDCRX  387
 #define XOP_MTDCR   451
@@ -51,6 +52,43 @@ static int emulate_mtdcr(struct kvm_vcpu *vcpu, int rs, int 
dcrn)
}
 }
 
+static int emulate_mfdcr(struct kvm_vcpu *vcpu, int rt, int dcrn)
+{
+   /* The guest may access CPR0 registers to determine the timebase
+* frequency, and it must know the real host frequency because it
+* can directly access the timebase registers.
+*
+* It would be possible to emulate those accesses in userspace,
+* but userspace can really only figure out the end frequency.
+* We could decompose that into the factors that compute it, but
+* that's tricky math, and it's easier to just report the real
+* CPR0 values.
+*/
+   switch (dcrn) {
+   case DCRN_CPR0_CONFIG_ADDR:
+   kvmppc_set_gpr(vcpu, rt, vcpu-arch.cpr0_cfgaddr);
+   break;
+   case DCRN_CPR0_CONFIG_DATA:
+   local_irq_disable();
+   mtdcr(DCRN_CPR0_CONFIG_ADDR,
+ vcpu-arch.cpr0_cfgaddr);
+   kvmppc_set_gpr(vcpu, rt,
+  mfdcr(DCRN_CPR0_CONFIG_DATA));
+   local_irq_enable();
+   break;
+   default:
+   vcpu-run-dcr.dcrn = dcrn;
+   vcpu-run-dcr.data =  0;
+   vcpu-run-dcr.is_write = 0;
+   vcpu-arch.io_gpr = rt;
+   vcpu-arch.dcr_needed = 1;
+   kvmppc_account_exit(vcpu, DCR_EXITS);
+   return EMULATE_DO_DCR;
+   }
+
+   return EMULATE_DONE;
+}
+
 int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance)
 {
@@ -68,38 +106,12 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
switch (get_xop(inst)) {
 
case XOP_MFDCR:
-   /* The guest may access CPR0 registers to determine the 
timebase
-* frequency, and it must know the real host frequency 
because it
-* can directly access the timebase registers.
-*
-* It would be possible to emulate those accesses in 
userspace,
-* but userspace can really only figure out the end 
frequency.
-* We could decompose that into the factors that 
compute it, but
-* that's tricky math, and it's easier to just report 
the real
-* CPR0 values.
-*/
-   switch (dcrn) {
-   case DCRN_CPR0_CONFIG_ADDR:
-   kvmppc_set_gpr(vcpu, rt, 
vcpu-arch.cpr0_cfgaddr);
-   break;
-   case DCRN_CPR0_CONFIG_DATA:
-   local_irq_disable();
-   mtdcr(DCRN_CPR0_CONFIG_ADDR,
- vcpu-arch.cpr0_cfgaddr);
-   kvmppc_set_gpr(vcpu, rt,
-  mfdcr(DCRN_CPR0_CONFIG_DATA));
-   local_irq_enable();
-   break;
-   default:
-   run-dcr.dcrn = dcrn;
-   run-dcr.data =  0;
-   run-dcr.is_write = 0;
-   vcpu-arch.io_gpr = rt;
-   vcpu-arch.dcr_needed = 1;
-   kvmppc_account_exit(vcpu, DCR_EXITS);
-   emulated = EMULATE_DO_DCR;
-   }
+   emulated = emulate_mfdcr(vcpu, rt, dcrn);
+   break;
 
+   case XOP_MFDCRX:
+   emulated = emulate_mfdcr(vcpu, rt,
+   kvmppc_get_gpr(vcpu, ra));
break;
 
case XOP_MTDCR:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] KVM: PPC: BookE: Support FPU on non-hv systems

2012-08-16 Thread Alexander Graf

When running on HV aware hosts, we can not trap when the guest sets the FP
bit, so we just let it do so when it wants to, because it has full access to
MSR.

For non-HV aware hosts with an FPU (like 440), we need to also adjust the
shadow MSR though. Otherwise the guest gets an FP unavailable trap even when
it really enabled the FP bit in MSR.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/booke.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 959aae9..5f0476a 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -122,6 +122,16 @@ static void kvmppc_vcpu_sync_spe(struct kvm_vcpu *vcpu)
 }
 #endif
 
+static void kvmppc_vcpu_sync_fpu(struct kvm_vcpu *vcpu)
+{
+#if defined(CONFIG_PPC_FPU)  !defined(CONFIG_KVM_BOOKE_HV)
+   /* We always treat the FP bit as enabled from the host
+  perspective, so only need to adjust the shadow MSR */
+   vcpu-arch.shadow_msr = ~MSR_FP;
+   vcpu-arch.shadow_msr |= vcpu-arch.shared-msr  MSR_FP;
+#endif
+}
+
 /*
  * Helper function for full MSR writes.  No need to call this if only
  * EE/CE/ME/DE/RI are changing.
@@ -138,6 +148,7 @@ void kvmppc_set_msr(struct kvm_vcpu *vcpu, u32 new_msr)
 
kvmppc_mmu_msr_notify(vcpu, old_msr);
kvmppc_vcpu_sync_spe(vcpu);
+   kvmppc_vcpu_sync_fpu(vcpu);
 }
 
 static void kvmppc_booke_queue_irqprio(struct kvm_vcpu *vcpu,
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: PPC: 440: Implement mtdcrx

2012-08-16 Thread Alexander Graf

We need mtdcrx to execute properly on 460 cores.

Signed-off-by: Alexander Graf ag...@suse.de

---

v1 - v2:

  - rework mtdcr into a function
---
 arch/powerpc/kvm/44x_emulate.c |   36 +++-
 1 files changed, 23 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/kvm/44x_emulate.c b/arch/powerpc/kvm/44x_emulate.c
index c8c6157..3843a75 100644
--- a/arch/powerpc/kvm/44x_emulate.c
+++ b/arch/powerpc/kvm/44x_emulate.c
@@ -28,11 +28,29 @@
 #include 44x_tlb.h
 
 #define XOP_MFDCR   323
+#define XOP_MTDCRX  387
 #define XOP_MTDCR   451
 #define XOP_TLBSX   914
 #define XOP_ICCCI   966
 #define XOP_TLBWE   978
 
+static int emulate_mtdcr(struct kvm_vcpu *vcpu, int rs, int dcrn)
+{
+   /* emulate some access in kernel */
+   switch (dcrn) {
+   case DCRN_CPR0_CONFIG_ADDR:
+   vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, rs);
+   return EMULATE_DONE;
+   default:
+   vcpu-run-dcr.dcrn = dcrn;
+   vcpu-run-dcr.data = kvmppc_get_gpr(vcpu, rs);
+   vcpu-run-dcr.is_write = 1;
+   vcpu-arch.dcr_needed = 1;
+   kvmppc_account_exit(vcpu, DCR_EXITS);
+   return EMULATE_DO_DCR;
+   }
+}
+
 int kvmppc_core_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu,
unsigned int inst, int *advance)
 {
@@ -85,20 +103,12 @@ int kvmppc_core_emulate_op(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
 
case XOP_MTDCR:
-   /* emulate some access in kernel */
-   switch (dcrn) {
-   case DCRN_CPR0_CONFIG_ADDR:
-   vcpu-arch.cpr0_cfgaddr = kvmppc_get_gpr(vcpu, 
rs);
-   break;
-   default:
-   run-dcr.dcrn = dcrn;
-   run-dcr.data = kvmppc_get_gpr(vcpu, rs);
-   run-dcr.is_write = 1;
-   vcpu-arch.dcr_needed = 1;
-   kvmppc_account_exit(vcpu, DCR_EXITS);
-   emulated = EMULATE_DO_DCR;
-   }
+   emulated = emulate_mtdcr(vcpu, rs, dcrn);
+   break;
 
+   case XOP_MTDCRX:
+   emulated = emulate_mtdcr(vcpu, rs,
+   kvmppc_get_gpr(vcpu, ra));
break;
 
case XOP_TLBWE:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Peter Zijlstra

On Thu, 2012-08-16 at 14:06 +0300, Avi Kivity wrote:

 Another option is to deal with them on the host side.  That has the
 benefit of working with non-Linux guests too.

Right, its an insane amount of MSRs though, but it could be done if
someone takes the time to enumerate them all.

If KVM then simply ignores all writes and returns all 0 on read we can
do the same we do for the regular PMU in check_hw_exists().

 We can just ignore the MSR and print some warning.

If you don't mind printing a warning every time a Linux guest boots ;-)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

2012-08-16 Thread Avi Kivity

On 08/16/2012 12:48 PM, Hao, Xudong wrote:
 -Original Message-
 From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
 Behalf Of Avi Kivity
 Sent: Thursday, August 16, 2012 5:08 PM
 To: Hao, Xudong
 Cc: kvm@vger.kernel.org; Zhang, Xiantao
 Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

 On 08/16/2012 08:14 AM, Xudong Hao wrote:
  Enable KVM FPU fully eager restore, if there is other FPU state which isn't
 tracked by
  CR0.TS bit.

  Tested with these cases:
  1) SpecCPU2000 workload( 1 VM, 2 VMs)
  2) Program for floating point caculate

 Is the motivation performance or correctness?

 It's not performance improvement, it could be treated as a correctness. I do 
 not say current code has issue, but just as code comment, it's for the other 
 FPU state.

  +
   struct kvm_memory_alias {
 __u32 slot;  /* this has a different namespace than memory slots */
 __u32 flags;
  diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
  index b6379e5..2e628e5 100644
  --- a/arch/x86/kvm/x86.c
  +++ b/arch/x86/kvm/x86.c
  @@ -5966,7 +5966,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu *vcpu)
 vcpu-guest_fpu_loaded = 0;
 fpu_save_init(vcpu-arch.guest_fpu);
 ++vcpu-stat.fpu_reload;
  -  kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
  +  /*
  +   * Currently KVM trigger FPU restore by #NM (via CR0.TS),
  +   * till now only XCR0.bit0, XCR0.bit1, XCR0.bit2 is tracked
  +   * by TS bit, there might be other FPU state is not tracked
  +   * by TS bit.

 Which state is that?

 Except the last 3 bits, other bit are these state.

  Here it only make FPU deactivate request and do
  +   * FPU lazy restore for these cases: 1)xsave isn't enabled
  +   * in guest, 2)all guest FPU states can be tracked by TS bit.
  +   * For others, doing fully FPU eager restore.
  +   */
  +  if (!kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) ||
  +  !(vcpu-arch.xcr0  ~KVM_XSTATE_LAZY))
  +  kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
 trace_kvm_fpu(0);
   }

 Is there no way to track accesses to this extended state?

 Because I can't define the extended state now, so using this method. But just 
 as I say, the extended state are NO-LAZY except the last 3 bit.

Ok.  Please check that ~KVM_XSTATE_LAZY expands to 64-bits correctly,
maybe we need to cast it to u64 before negating it.

Note that we limit xcr0 to the bits allowed by the host, so the currect
kernel is safe even on hardware with state that isn't tracked by cr0.ts.
 But it's better to be safe here.

Joerg, IIRC LWP uses one of these bits?  Should it be added to the mask?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread Avi Kivity

On 08/16/2012 02:17 PM, Peter Zijlstra wrote:
 On Thu, 2012-08-16 at 14:06 +0300, Avi Kivity wrote:
 
 Another option is to deal with them on the host side.  That has the
 benefit of working with non-Linux guests too.
 
 Right, its an insane amount of MSRs though, but it could be done if
 someone takes the time to enumerate them all.

It's tedious but that's life.

 
 If KVM then simply ignores all writes and returns all 0 on read we can
 do the same we do for the regular PMU in check_hw_exists().
 
 We can just ignore the MSR and print some warning.
 
 If you don't mind printing a warning every time a Linux guest boots ;-)

We can printk_once() it, or only under debug, now that we have dynamic
debug.  We also need to print the warning only if the counter is
enabled.  Will Linux configure uncore counters by default (why and which?)

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Windows slow boot: contractor wanted

2012-08-16 Thread Avi Kivity

On 08/16/2012 01:47 PM, Richard Davies wrote:
 Hi,
 
 We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
 contractor to track down and fix problems we have with large memory Windows
 guests booting very slowly - they can take several hours.
 
 We previously reported these problems in July (copied below) and they are
 still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
 
 This is a serious issue for us which is causing significant pain to our
 larger Windows VM customers when their servers are offline for many hours
 during boot.
 
 If anyone knowledgeable in the area would be interested in being paid to
 work on this, or if you know someone who might be, I would be delighted to
 hear from you.
 

I happen to be gainfully employed but maybe I can help.  Can you collect
a trace during the slow boot period and post in somewhere?  See
http://www.linux-kvm.org/page/Tracing for instructions.

4G/8way is not a particularly large guest.  What is the host
configuration (memory, core count)?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

2012-08-16 Thread Hao, Xudong

 -Original Message-
 From: Avi Kivity [mailto:a...@redhat.com]
 Sent: Thursday, August 16, 2012 6:59 PM
 To: Hao, Xudong
 Cc: kvm@vger.kernel.org; Zhang, Xiantao; Roedel, Joerg
 Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

 On 08/16/2012 12:48 PM, Hao, Xudong wrote:
  -Original Message-
  From: kvm-ow...@vger.kernel.org [mailto:kvm-ow...@vger.kernel.org] On
  Behalf Of Avi Kivity
  Sent: Thursday, August 16, 2012 5:08 PM
  To: Hao, Xudong
  Cc: kvm@vger.kernel.org; Zhang, Xiantao
  Subject: Re: [PATCH] kvm/fpu: Enable fully eager restore kvm FPU

  On 08/16/2012 08:14 AM, Xudong Hao wrote:
   Enable KVM FPU fully eager restore, if there is other FPU state which 
   isn't
  tracked by
   CR0.TS bit.

   Tested with these cases:
   1) SpecCPU2000 workload( 1 VM, 2 VMs)
   2) Program for floating point caculate

  Is the motivation performance or correctness?

  It's not performance improvement, it could be treated as a correctness. I do
 not say current code has issue, but just as code comment, it's for the other 
 FPU
 state.

   +
struct kvm_memory_alias {
__u32 slot;  /* this has a different namespace than memory 
   slots */
__u32 flags;
   diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
   index b6379e5..2e628e5 100644
   --- a/arch/x86/kvm/x86.c
   +++ b/arch/x86/kvm/x86.c
   @@ -5966,7 +5966,18 @@ void kvm_put_guest_fpu(struct kvm_vcpu
 *vcpu)
vcpu-guest_fpu_loaded = 0;
fpu_save_init(vcpu-arch.guest_fpu);
++vcpu-stat.fpu_reload;
   -kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
   +/*
   + * Currently KVM trigger FPU restore by #NM (via CR0.TS),
   + * till now only XCR0.bit0, XCR0.bit1, XCR0.bit2 is tracked
   + * by TS bit, there might be other FPU state is not tracked
   + * by TS bit.

  Which state is that?

  Except the last 3 bits, other bit are these state.

   Here it only make FPU deactivate request and do
   + * FPU lazy restore for these cases: 1)xsave isn't enabled
   + * in guest, 2)all guest FPU states can be tracked by TS bit.
   + * For others, doing fully FPU eager restore.
   + */
   +if (!kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) ||
   +!(vcpu-arch.xcr0  ~KVM_XSTATE_LAZY))
   +kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);
trace_kvm_fpu(0);
}

  Is there no way to track accesses to this extended state?

  Because I can't define the extended state now, so using this method. But 
  just
 as I say, the extended state are NO-LAZY except the last 3 bit.

 Ok.  Please check that ~KVM_XSTATE_LAZY expands to 64-bits correctly,
 maybe we need to cast it to u64 before negating it.

Thanks.

+   if (!kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE) ||
+   !(vcpu-arch.xcr0  ~((u64)KVM_XSTATE_LAZY)))
+   kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu);

 Note that we limit xcr0 to the bits allowed by the host, so the currect
 kernel is safe even on hardware with state that isn't tracked by cr0.ts.
  But it's better to be safe here.

 Joerg, IIRC LWP uses one of these bits?  Should it be added to the mask?

Bit 62? Maybe LWP should change to eager too, I'm not sure. Joerg?

 --
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code

2012-08-16 Thread Alexander Graf

From: Paul Mackerras pau...@samba.org

In handling the H_CEDE hypercall, if this vcpu has already been
prodded (with the H_PROD hypercall, which Linux guests don't in fact
use), we branch to a numeric label '1f'.  Unfortunately there is
another '1:' label before the one that we want to jump to.  This fixes
the problem by using a textual label, 'kvm_cede_prodded'.  It also
changes the label for another longish branch from '2:' to
'kvm_cede_exit' to avoid a possible future problem if code modifications
add another numeric '2:' label in between.

Signed-off-by: Paul Mackerras pau...@samba.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S 
b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5a84c8d..44b72fe 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -1421,13 +1421,13 @@ _GLOBAL(kvmppc_h_cede)
sync/* order setting ceded vs. testing prodded */
lbz r5,VCPU_PRODDED(r3)
cmpwi   r5,0
-   bne 1f
+   bne kvm_cede_prodded
li  r0,0/* set trap to 0 to say hcall is handled */
stw r0,VCPU_TRAP(r3)
li  r0,H_SUCCESS
std r0,VCPU_GPR(R3)(r3)
 BEGIN_FTR_SECTION
-   b   2f  /* just send it up to host on 970 */
+   b   kvm_cede_exit   /* just send it up to host on 970 */
 END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
 
/*
@@ -1446,7 +1446,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_206)
or  r4,r4,r0
PPC_POPCNTW(R7,R4)
cmpwr7,r8
-   bge 2f
+   bge kvm_cede_exit
stwcx.  r4,0,r6
bne 31b
li  r0,1
@@ -1555,7 +1555,8 @@ kvm_end_cede:
b   hcall_real_fallback
 
/* cede when already previously prodded case */
-1: li  r0,0
+kvm_cede_prodded:
+   li  r0,0
stb r0,VCPU_PRODDED(r3)
sync/* order testing prodded vs. clearing ceded */
stb r0,VCPU_CEDED(r3)
@@ -1563,7 +1564,8 @@ kvm_end_cede:
blr
 
/* we've ceded but we want to give control to the host */
-2: li  r3,H_TOO_HARD
+kvm_cede_exit:
+   li  r3,H_TOO_HARD
blr
 
 secondary_too_late:
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] KVM: PPC: Add cache flush on page map

2012-08-16 Thread Alexander Graf

When we map a page that wasn't icache cleared before, do so when first
mapping it in KVM using the same information bits as the Linux mapping
logic. That way we are 100% sure that any page we map does not have stale
entries in the icache.

Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/include/asm/kvm_host.h   |1 +
 arch/powerpc/include/asm/kvm_ppc.h|   12 
 arch/powerpc/kvm/book3s_32_mmu_host.c |3 +++
 arch/powerpc/kvm/book3s_64_mmu_host.c |2 ++
 arch/powerpc/kvm/e500_tlb.c   |3 +++
 arch/powerpc/mm/mem.c |1 +
 6 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index 50ea12f..a8bf5c6 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -33,6 +33,7 @@
 #include asm/kvm_asm.h
 #include asm/processor.h
 #include asm/page.h
+#include asm/cacheflush.h
 
 #define KVM_MAX_VCPUS  NR_CPUS
 #define KVM_MAX_VCORES NR_CPUS
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 0124937..e006f0b 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -219,4 +219,16 @@ void kvmppc_claim_lpid(long lpid);
 void kvmppc_free_lpid(long lpid);
 void kvmppc_init_lpid(unsigned long nr_lpids);
 
+static inline void kvmppc_mmu_flush_icache(pfn_t pfn)
+{
+   /* Clear i-cache for new pages */
+   struct page *page;
+   page = pfn_to_page(pfn);
+   if (!test_bit(PG_arch_1, page-flags)) {
+   flush_dcache_icache_page(page);
+   set_bit(PG_arch_1, page-flags);
+   }
+}
+
+
 #endif /* __POWERPC_KVM_PPC_H__ */
diff --git a/arch/powerpc/kvm/book3s_32_mmu_host.c 
b/arch/powerpc/kvm/book3s_32_mmu_host.c
index f922c29..837f13e 100644
--- a/arch/powerpc/kvm/book3s_32_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_32_mmu_host.c
@@ -211,6 +211,9 @@ next_pteg:
pteg1 |= PP_RWRX;
}
 
+   if (orig_pte-may_execute)
+   kvmppc_mmu_flush_icache(hpaddr  PAGE_SHIFT);
+
local_irq_disable();
 
if (pteg[rr]) {
diff --git a/arch/powerpc/kvm/book3s_64_mmu_host.c 
b/arch/powerpc/kvm/book3s_64_mmu_host.c
index 10fc8ec..0688b6b 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_host.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_host.c
@@ -126,6 +126,8 @@ int kvmppc_mmu_map_page(struct kvm_vcpu *vcpu, struct 
kvmppc_pte *orig_pte)
 
if (!orig_pte-may_execute)
rflags |= HPTE_R_N;
+   else
+   kvmppc_mmu_flush_icache(hpaddr  PAGE_SHIFT);
 
hash = hpt_hash(va, PTE_SIZE, MMU_SEGSIZE_256M);
 
diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index c510fc9..fb3bb3a 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -539,6 +539,9 @@ static inline void kvmppc_e500_shadow_map(struct 
kvmppc_vcpu_e500 *vcpu_e500,
 
kvmppc_e500_setup_stlbe(vcpu_e500-vcpu, gtlbe, tsize,
ref, gvaddr, stlbe);
+
+   /* Clear i-cache for new pages */
+   kvmppc_mmu_flush_icache(pfn);
 }
 
 /* XXX only map the one-one case, for now use TLB0 */
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index baaafde..fbdad0e 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -469,6 +469,7 @@ void flush_dcache_icache_page(struct page *page)
__flush_dcache_icache_phys(page_to_pfn(page)  PAGE_SHIFT);
 #endif
 }
+EXPORT_SYMBOL(flush_dcache_icache_page);
 
 void clear_user_page(void *page, unsigned long vaddr, struct page *pg)
 {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL 3.6 0/3] ppc patch queue 2012-08-16 for 3.6

2012-08-16 Thread Alexander Graf

Hi Avi,

This is my patch queue for ppc patches that should go into 3.6.  Please pull.

  * Fix memset in e500_tlb
  * Fix icache flush when mapping executable pages
  * Fix wrong branch in book3s hv code

Alex


The following changes since commit 439793d4b3c99e550daebd868bbd58967c93d0b3:
  Gleb Natapov (1):
KVM: x86: update KVM_SAVE_MSRS_BEGIN to correct value

are available in the git repository at:

  git://github.com/agraf/linux-2.6.git for-upstream-master

Alan Cox (1):
  ppc: e500_tlb memset clears nothing

Alexander Graf (1):
  KVM: PPC: Add cache flush on page map

Paul Mackerras (1):
  KVM: PPC: Book3S HV: Fix incorrect branch in H_CEDE code

 arch/powerpc/include/asm/kvm_host.h |1 +
 arch/powerpc/include/asm/kvm_ppc.h  |   12 
 arch/powerpc/kvm/book3s_32_mmu_host.c   |3 +++
 arch/powerpc/kvm/book3s_64_mmu_host.c   |2 ++
 arch/powerpc/kvm/book3s_hv_rmhandlers.S |   12 +++-
 arch/powerpc/kvm/e500_tlb.c |   11 +++
 arch/powerpc/mm/mem.c   |1 +
 7 files changed, 33 insertions(+), 9 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] ppc: e500_tlb memset clears nothing

2012-08-16 Thread Alexander Graf

From: Alan Cox a...@linux.intel.com

Put the parameters the right way around

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=44031

Reported-by: David Binderman dcb...@hotmail.com
Signed-off-by: Alan Cox a...@linux.intel.com
Signed-off-by: Andrew Morton a...@linux-foundation.org
Signed-off-by: Alexander Graf ag...@suse.de
---
 arch/powerpc/kvm/e500_tlb.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c
index fb3bb3a..a2b6671 100644
--- a/arch/powerpc/kvm/e500_tlb.c
+++ b/arch/powerpc/kvm/e500_tlb.c
@@ -322,11 +322,11 @@ static inline void kvmppc_e500_ref_release(struct 
tlbe_ref *ref)
 static void clear_tlb1_bitmap(struct kvmppc_vcpu_e500 *vcpu_e500)
 {
if (vcpu_e500-g2h_tlb1_map)
-   memset(vcpu_e500-g2h_tlb1_map,
-  sizeof(u64) * vcpu_e500-gtlb_params[1].entries, 0);
+   memset(vcpu_e500-g2h_tlb1_map, 0,
+  sizeof(u64) * vcpu_e500-gtlb_params[1].entries);
if (vcpu_e500-h2g_tlb1_rmap)
-   memset(vcpu_e500-h2g_tlb1_rmap,
-  sizeof(unsigned int) * host_tlb_params[1].entries, 0);
+   memset(vcpu_e500-h2g_tlb1_rmap, 0,
+  sizeof(unsigned int) * host_tlb_params[1].entries);
 }
 
 static void clear_tlb_privs(struct kvmppc_vcpu_e500 *vcpu_e500)
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Alex Williamson

On Wed, 2012-08-15 at 13:59 -0600, Alex Williamson wrote:
 On Wed, 2012-08-15 at 22:22 +0300, Michael S. Tsirkin wrote:
  On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
   On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
 v8:
 
 Trying a new approach.  Nobody seems to like the internal IRQ
 source ID object and the interactions it implies between irqfd
 and eoifd, so let's get rid of it.  Instead, simply expose
 IRQ source IDs to userspace.  This lets the user be in charge
 of freeing them or hanging onto a source ID for later use.

In the end it turns out source ID is an optimization for shared
interrupts, isn't it?  Can't we apply the optimization transparently to
the user?  E.g. if we have some spare source IDs, allocate them, if we
run out, use a shared source ID?
   
   Let's think about shared source IDs a bit more.  I think it's wrong that
   irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
   irqfd users can share a source ID.  We do not get the logical OR of all
   users by putting them on the same source ID, we get last set wins.
   KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
   logical OR happens in userspace.  How would we not starve a user if we
   define KVM_IRQFD_SOURCE_ID?  What am I missing?
  
  That all irqfds are deasserted on EOI anyway.  So there's no point
  to do a logical OR.
 
 Ok, so the argument is:
 
 - edge irqfds (the code now) can share a source ID because there is no
 state.  Overlapping interrupt injects always cause one or more edge
 triggers.
 - your proposed level extension can only be asserted by the inject
 eventfd and is only de-asserted by EOI, which de-asserts and notifies
 all users.
 
 What prevents an edge irqfd being registered to the same GSI as a level
 irqfd, resulting in a de-assert that might result in the irr not being
 seen by the guest and therefore maybe not getting an EOI? (I think this
 is the same problem as why we can't use the exiting irqfd to insert a
 level interrupt)
 
 Having the de-assert only on EOI policy allows level irqfds to share a
 source ID, but do they all need to share a separate source ID from edge
 irqfds?
 
   So I'm inclined to say source IDs are a requirement for shared
   interrupts.
  
  Can yo show a specific example that breaks?
  I don't think it can exist.
 
 Only the edge vs level interaction if we define the policy above for
 de-assert.

Hmm, there is still a race w/ level.  If we have a number of
level-deassert-irqfds making use of the same gsi and sourceid and we
individually de-assert and notify, a re-assert could get lost if it
happens before all of the de-asserts have finished.  We either need
separate sourceids or we need to do a single de-assert followed by
multiple notifies.  Right?  Thanks,

Alex

   That means the re-use scheme becomes complicated (ex. we
   run out of IRQ source IDs, so we start looking for sharing by re-using a
   source ID used by a different GSI).  Do we want to do that in kernel or
   userspace?  This series allows userspace to deal with that complexity.
   Please let me know if I'm thinking incorrectly about source ID re-use.
   Thanks,
   
   Alex
  
  I think there is a misunderstanding.
  All deassert on ack irqfds can share a source ID.
  This is why I am now thinking deassert on ack behaviour
  should be set when irqfd is assigned.
 
 Maybe you were already thinking along the lines of a separate source ID
 for de-assert on ack irqfds vs normal irqfds then.  I think I missed
 that.  Thanks,
 
 Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Michael S. Tsirkin

On Thu, Aug 16, 2012 at 06:34:52AM -0600, Alex Williamson wrote:
So I'm inclined to say source IDs are a requirement for shared
interrupts.
   
   Can yo show a specific example that breaks?
   I don't think it can exist.
  
  Only the edge vs level interaction if we define the policy above for
  de-assert.
 
 Hmm, there is still a race w/ level.  If we have a number of
 level-deassert-irqfds making use of the same gsi and sourceid and we
 individually de-assert and notify, a re-assert could get lost if it
 happens before all of the de-asserts have finished.
 We either need
 separate sourceids or we need to do a single de-assert followed by
 multiple notifies.  Right?  Thanks,
 
 Alex

Good catch, I agree, we need a single deassert.

I think I see how to implement this without reference counting and
stuff.  So we chain all auto-deassert irqfds for a given GSI together,
and have a single ack notifier.  When list becomes empty, remove the ack
notifier.

It's actually a good thing to do anyway, too many ack notifiers
would slow unrelated GSIs down.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

vm pxe fail

2012-08-16 Thread Andrew Holway

Hallo

I have a kvm vm that I am attempting to boot from pxe. The dhcp works perfectly 
and I can see the VM in the pxe server arp. but the tftp just times out. I 
don't see any tftp traffic on either the physical host or on the pie server. I 
am using a bridged interface. I have tried using several virtual nic drivers, 
several different mac addresses and several different ips.  on the physical 
host I can get the pxelinux.0 file from the pxe server via tftp and can clearly 
see that traffic with tcpdump.

Ive tried using various virtual interfaces.

I can pxe boot my physical hosts with no problems.

I can tftp fine from the physical host and see the traffic with ethdump

Here is the terminal output from the VM: 
https://dl.dropbox.com/u/98200887/Screen%20Shot%202012-08-15%20at%206.41.12%20PM.png

Thanks,

Andrew

[root@node002 ~]# yum list | grep qemu
gpxe-roms-qemu.noarch   0.9.7-6.9.el6   @base   
qemu-img.x86_64 2:0.12.1.2-2.295.el6_3.1@updates
qemu-kvm.x86_64 2:0.12.1.2-2.295.el6_3.1@updates
qemu-guest-agent.x86_64 2:0.12.1.2-2.295.el6_3.1updates 
qemu-kvm-tools.x86_64   2:0.12.1.2-2.295.el6_3.1updates 

[root@node002 ~]# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes:   1baseT/Full 
Supports auto-negotiation: No
Advertised link modes:  1baseT/Full 
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: Unknown!
Duplex: Unknown! (255)
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Supports Wake-on: g
Wake-on: g
Current message level: 0x0014 (20)
Link detected: no

[root@node002 ~]# brctl show
bridge name bridge id   STP enabled interfaces
br0 8000.009c02241ae0   no  eth1
vnet0
virbr0  8000.525400a6d5aa   yes virbr0-nic

[root@node002 ~]# ethtool vnet0
Settings for vnet0:
Supported ports: [ ]
Supported link modes:   
Supports auto-negotiation: No
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Speed: 10Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: off
MDI-X: Unknown
Current message level: 0xffa1 (-95)
Link detected: yes

domain type='kvm'
  namevm004/name
  uuid4f03b09b-e834-bbf3-a6c2-1689f3156ef2/uuid
  memory unit='KiB'2097152/memory
  currentMemory unit='KiB'2097152/currentMemory
  vcpu placement='static'2/vcpu
  os
type arch='x86_64' machine='rhel6.3.0'hvm/type
boot dev='hd'/
  /os
  features
acpi/
apic/
pae/
  /features
  clock offset='utc'/
  on_poweroffdestroy/on_poweroff
  on_rebootrestart/on_reboot
  on_crashrestart/on_crash
  devices
emulator/usr/libexec/qemu-kvm/emulator
disk type='file' device='disk'
  driver name='qemu' type='raw' cache='none'/
  source file='/cm/shared/vm/vm004.img'/
  target dev='hda' bus='ide'/
  address type='drive' controller='0' bus='0' target='0' unit='0'/
/disk
disk type='block' device='cdrom'
  driver name='qemu' type='raw'/
  target dev='hdc' bus='ide'/
  readonly/
  address type='drive' controller='0' bus='1' target='0' unit='0'/
/disk
controller type='usb' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01' 
function='0x2'/
/controller
controller type='ide' index='0'
  address type='pci' domain='0x' bus='0x00' slot='0x01' 
function='0x1'/
/controller
interface type='bridge'
  mac address='00:00:00:00:00:0d'/
  source bridge='br0'/
  model type='rtl8139'/
  address type='pci' domain='0x' bus='0x00' slot='0x03' 
function='0x0'/
/interface
serial type='pty'
  target port='0'/
/serial
console type='pty'
  target type='serial' port='0'/
/console
input type='mouse' bus='ps2'/
graphics type='vnc' port='-1' autoport='yes'/
video
  model type='cirrus' vram='9216' heads='1'/
  address type='pci' domain='0x' bus='0x00' slot='0x02' 
function='0x0'/
/video
memballoon model='virtio'
  address type='pci' domain='0x' bus='0x00' slot='0x04' 
function='0x0'/
/memballoon
  /devices
/domain


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vm pxe fail

2012-08-16 Thread Stefan Hajnoczi

On Thu, Aug 16, 2012 at 1:25 PM, Andrew Holway a.hol...@syseleven.de wrote:
 I have a kvm vm that I am attempting to boot from pxe. The dhcp works 
 perfectly and I can see the VM in the pxe server arp. but the tftp just times 
 out. I don't see any tftp traffic on either the physical host or on the pie 
 server. I am using a bridged interface. I have tried using several virtual 
 nic drivers, several different mac addresses and several different ips.  on 
 the physical host I can get the pxelinux.0 file from the pxe server via tftp 
 and can clearly see that traffic with tcpdump.

 Ive tried using various virtual interfaces.

 I can pxe boot my physical hosts with no problems.

 I can tftp fine from the physical host and see the traffic with ethdump

Have you run tcpdump on the tap interface?  (This is different from
running tcpdump on host eth0 because it is earlier in the network path
and happens before the software bridge.)

What do iptables -L -n and ebtables -L say?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 14/19] pci-assign: Drop configure switches

2012-08-16 Thread Jan Kiszka

There are no other dependencies of device assignment except for
CONFIG_KVM and an x86 target. So there is also no point in controlling
this feature via configure.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 configure |   11 ---
 hw/i386/Makefile.objs |2 +-
 2 files changed, 1 insertions(+), 12 deletions(-)

diff --git a/configure b/configure
index 5c3ad51..87424ce 100755
--- a/configure
+++ b/configure
@@ -211,7 +211,6 @@ bsd_user=no
 guest_base=
 uname_release=
 mixemu=no
-kvm_cap_device_assignment=yes
 aix=no
 blobs=yes
 pkgversion= ($(kvm_version))
@@ -729,10 +728,6 @@ for opt do
   ;;
   --enable-tcg-interpreter) tcg_interpreter=yes
   ;;
-  --disable-kvm-device-assignment) kvm_cap_device_assignment=no
-  ;;
-  --enable-kvm-device-assignment) kvm_cap_device_assignment=yes
-  ;;
   --disable-cap-ng)  cap_ng=no
   ;;
   --enable-cap-ng) cap_ng=yes
@@ -1091,8 +1086,6 @@ echo   --disable-slirp  disable SLIRP userspace 
network connectivity
 echo   --disable-kvmdisable KVM acceleration support
 echo   --enable-kvm enable KVM acceleration support
 echo   --enable-tcg-interpreter enable TCG with bytecode interpreter (TCI)
-echo   --disable-kvm-device-assignment  disable KVM device assignment support
-echo   --enable-kvm-device-assignment   enable KVM device assignment support
 echo   --disable-nptl   disable usermode NPTL support
 echo   --enable-nptlenable usermode NPTL support
 echo   --enable-system  enable all system emulation targets
@@ -3118,7 +3111,6 @@ echo ATTR/XATTR support $attr
 echo Install blobs $blobs
 echo KVM support   $kvm
 echo TCG interpreter   $tcg_interpreter
-echo KVM device assig. $kvm_cap_device_assignment
 echo fdt support   $fdt
 echo preadv support$preadv
 echo fdatasync $fdatasync
@@ -3845,9 +3837,6 @@ case $target_arch2 in
   if test $vhost_net = yes ; then
 echo CONFIG_VHOST_NET=y  $config_target_mak
   fi
-  if test $kvm_cap_device_assignment = yes ; then
-echo CONFIG_KVM_DEVICE_ASSIGNMENT=y  $config_target_mak
-  fi
 fi
 esac
 case $target_arch2 in
diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index ad52387..29f3e6f 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -14,7 +14,7 @@ obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 obj-y += testdev.o
 obj-y += acpi.o acpi_piix4.o
 
-obj-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
+obj-$(CONFIG_KVM) += device-assignment.o
 
 
 obj-y := $(addprefix ../,$(obj-y))
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 08/19] pci-assign: Factor out kvm_device_msix_supported

2012-08-16 Thread Jan Kiszka

Encapsulate the ugly check if MSI-X assignment is supported in a
separate helper function.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |5 +
 target-i386/kvm.c  |7 +++
 target-i386/kvm_i386.h |1 +
 3 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 1d0af34..80ac2fc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1283,10 +1283,7 @@ static int assigned_device_pci_cap_init(PCIDevice 
*pci_dev)
 }
 /* Expose MSI-X capability */
 pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_MSIX, 0);
-/* Would really like to test kvm_check_extension(, KVM_CAP_DEVICE_MSIX),
- * but the kernel doesn't expose it.  Instead do a dummy call to
- * KVM_ASSIGN_SET_MSIX_NR to see if it exists. */
-if (pos != 0  kvm_assign_set_msix_nr(kvm_state, NULL) == -EFAULT) {
+if (pos != 0  kvm_device_msix_supported(kvm_state)) {
 int bar_nr;
 uint32_t msix_table_entry;
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 04d1c7d..677a791 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2154,6 +2154,13 @@ int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id)
 KVM_DEV_IRQ_HOST_MSI);
 }
 
+bool kvm_device_msix_supported(KVMState *s)
+{
+/* The kernel lacks a corresponding KVM_CAP, so we probe by calling
+ * KVM_ASSIGN_SET_MSIX_NR with an invalid parameter. */
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT;
+}
+
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 {
 return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index e827f5b..6f66b6d 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -27,6 +27,7 @@ int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, 
bool use_host_msi);
 int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int virq);
 int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
 
+bool kvm_device_msix_supported(KVMState *s);
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id);
 
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Jan Kiszka

The refactored version cleanly hides the KVM IOCTL structure from the
users and also zeros out the padding field.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |7 ++-
 qemu-kvm.c |8 
 qemu-kvm.h |4 
 target-i386/kvm.c  |   13 +
 target-i386/kvm_i386.h |2 ++
 5 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 0e2f8e6..af8a5aa 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1045,7 +1045,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 uint16_t entries_nr = 0;
 int i, r = 0;
-struct kvm_assigned_msix_entry msix_entry;
 MSIXTableEntry *entry = adev-msix_table;
 
 /* Get the usable entry number for allocating */
@@ -1075,7 +1074,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-irq_entries_nr = adev-msix_max;
 adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
 
-msix_entry.assigned_dev_id = adev-dev_id;
 entry = adev-msix_table;
 for (i = 0; i  adev-msix_max; i++, entry++) {
 if (msix_masked(entry)) {
@@ -1098,9 +1096,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 
 kvm_add_routing_entry(kvm_state, adev-entry[i]);
 
-msix_entry.gsi = adev-entry[i].gsi;
-msix_entry.entry = i;
-r = kvm_assign_set_msix_entry(kvm_state, msix_entry);
+r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
+   adev-entry[i].gsi);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 1a2a4fd..ec1911f 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -185,14 +185,6 @@ int kvm_get_irq_route_gsi(void)
 #endif
 }
 
-#ifdef KVM_CAP_DEVICE_MSIX
-int kvm_assign_set_msix_entry(KVMState *s,
-  struct kvm_assigned_msix_entry *entry)
-{
-return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
-}
-#endif
-
 #if !defined(TARGET_I386)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 3fd6046..ad628d5 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -65,10 +65,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry 
*entry);
 int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
  struct kvm_irq_routing_entry *newentry);
 
-
-int kvm_assign_set_msix_entry(KVMState *s,
-  struct kvm_assigned_msix_entry *entry);
-
 #endif /* CONFIG_KVM */
 
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 676f45b..e9353ed 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2173,6 +2173,19 @@ int kvm_device_msix_init_vectors(KVMState *s, uint32_t 
dev_id,
 return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
 }
 
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+   int virq)
+{
+struct kvm_assigned_msix_entry msix_entry = {
+.assigned_dev_id = dev_id,
+.gsi = virq,
+.entry = vector,
+};
+
+memset(msix_entry.padding, 0, sizeof(msix_entry.padding));
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry);
+}
+
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 {
 return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index aac14eb..bd3b398 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -30,6 +30,8 @@ int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
 bool kvm_device_msix_supported(KVMState *s);
 int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
  uint32_t nr_vectors);
+int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
+   int virq);
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id);
 
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 05/19] pci-assign: Factor out kvm_device_intx_assign

2012-08-16 Thread Jan Kiszka

Avoid passing kvm_assigned_irq on INTx assignment and separate this
function from (to-be-refactored) MSI/MSI-X assignment.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   16 ++--
 target-i386/kvm.c  |   24 
 target-i386/kvm_i386.h |2 ++
 3 files changed, 32 insertions(+), 10 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 0b33c04..d448fdc 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -861,11 +861,10 @@ static int assign_device(AssignedDevice *dev)
 
 static int assign_intx(AssignedDevice *dev)
 {
-struct kvm_assigned_irq assigned_irq_data;
 AssignedIRQType new_type;
 PCIINTxRoute intx_route;
 bool intx_host_msi;
-int r = 0;
+int r;
 
 /* Interrupt PIN 0 means don't use INTx */
 if (assigned_dev_pci_read_byte(dev-dev, PCI_INTERRUPT_PIN) == 0) {
@@ -881,7 +880,7 @@ static int assign_intx(AssignedDevice *dev)
 
 if (dev-intx_route.mode == intx_route.mode 
 dev-intx_route.irq == intx_route.irq) {
-return r;
+return 0;
 }
 
 switch (dev-assigned_irq_type) {
@@ -911,20 +910,17 @@ static int assign_intx(AssignedDevice *dev)
 }
 
 retry:
-memset(assigned_irq_data, 0, sizeof(assigned_irq_data));
-assigned_irq_data.assigned_dev_id = dev-dev_id;
-assigned_irq_data.guest_irq = intx_route.irq;
-assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
 if (dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK 
 dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
-assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
+intx_host_msi = true;
 new_type = ASSIGNED_IRQ_INTX_HOST_MSI;
 } else {
-assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
+intx_host_msi = false;
 new_type = ASSIGNED_IRQ_INTX_HOST_INTX;
 }
 
-r = kvm_assign_irq(kvm_state, assigned_irq_data);
+r = kvm_device_intx_assign(kvm_state, dev-dev_id, intx_host_msi,
+   intx_route.irq);
 if (r  0) {
 if (r == -EIO  !(dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK) 
 dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 058ed3f..e2041f4 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2092,6 +2092,30 @@ int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id)
 return kvm_vm_ioctl(s, KVM_DEASSIGN_PCI_DEVICE, dev_data);
 }
 
+static int kvm_assign_irq_internal(KVMState *s, uint32_t dev_id,
+   uint32_t irq_type, uint32_t guest_irq)
+{
+struct kvm_assigned_irq assigned_irq;
+
+memset(assigned_irq, 0, sizeof(assigned_irq));
+assigned_irq.assigned_dev_id = dev_id;
+assigned_irq.guest_irq = guest_irq;
+assigned_irq.flags = irq_type;
+if (kvm_check_extension(s, KVM_CAP_ASSIGN_DEV_IRQ)) {
+return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq);
+} else {
+return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
+}
+}
+
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id, bool use_host_msi,
+   uint32_t guest_irq)
+{
+uint32_t irq_type = KVM_DEV_IRQ_GUEST_INTX |
+(use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX);
+return kvm_assign_irq_internal(s, dev_id, irq_type, guest_irq);
+}
+
 static int kvm_deassign_irq_internal(KVMState *s, uint32_t dev_id,
  uint32_t type)
 {
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index fdecdd5..5a24168 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -19,6 +19,8 @@ int kvm_device_pci_assign(KVMState *s, PCIHostDeviceAddress 
*dev_addr,
   uint32_t flags, uint32_t *dev_id);
 int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id);
 
+int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
+   bool use_host_msi, uint32_t guest_irq);
 int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, bool use_host_msi);
 
 int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 04/19] pci-assign: Refactor interrupt deassignment

2012-08-16 Thread Jan Kiszka

Introduce three new KVM services, kvm_device_intx/msi/msix_deassign, to
release assigned interrupts. These no longer require to pass
pre-initialized kvm_assigned_irq structures but only request required
parameters.

To trace which type of interrupt is currently assigned, use a new enum
AssignedIRQType instead of KVM constants.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   80 ++-
 qemu-kvm.c |5 ---
 qemu-kvm.h |   11 --
 target-i386/kvm.c  |   29 +
 target-i386/kvm_i386.h |6 +++
 5 files changed, 86 insertions(+), 45 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index ee64c33..0b33c04 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -106,6 +106,14 @@ typedef struct {
 uint32_t ctrl;
 } MSIXTableEntry;
 
+typedef enum AssignedIRQType {
+ASSIGNED_IRQ_NONE = 0,
+ASSIGNED_IRQ_INTX_HOST_INTX,
+ASSIGNED_IRQ_INTX_HOST_MSI,
+ASSIGNED_IRQ_MSI,
+ASSIGNED_IRQ_MSIX
+} AssignedIRQType;
+
 typedef struct AssignedDevice {
 PCIDevice dev;
 PCIHostDeviceAddress host;
@@ -117,7 +125,7 @@ typedef struct AssignedDevice {
 PCIDevRegions real_device;
 int run;
 PCIINTxRoute intx_route;
-int irq_requested_type;
+AssignedIRQType assigned_irq_type;
 int bound;
 struct {
 #define ASSIGNED_DEVICE_CAP_MSI (1  0)
@@ -854,7 +862,9 @@ static int assign_device(AssignedDevice *dev)
 static int assign_intx(AssignedDevice *dev)
 {
 struct kvm_assigned_irq assigned_irq_data;
+AssignedIRQType new_type;
 PCIINTxRoute intx_route;
+bool intx_host_msi;
 int r = 0;
 
 /* Interrupt PIN 0 means don't use INTx */
@@ -874,17 +884,26 @@ static int assign_intx(AssignedDevice *dev)
 return r;
 }
 
-memset(assigned_irq_data, 0, sizeof(assigned_irq_data));
-assigned_irq_data.assigned_dev_id = dev-dev_id;
-assigned_irq_data.guest_irq = intx_route.irq;
-if (dev-irq_requested_type) {
-assigned_irq_data.flags = dev-irq_requested_type;
-r = kvm_deassign_irq(kvm_state, assigned_irq_data);
-if (r) {
-perror(assign_intx: deassign);
-}
-dev-irq_requested_type = 0;
+switch (dev-assigned_irq_type) {
+case ASSIGNED_IRQ_INTX_HOST_INTX:
+case ASSIGNED_IRQ_INTX_HOST_MSI:
+intx_host_msi = dev-assigned_irq_type == ASSIGNED_IRQ_INTX_HOST_MSI;
+r = kvm_device_intx_deassign(kvm_state, dev-dev_id, intx_host_msi);
+break;
+case ASSIGNED_IRQ_MSI:
+r = kvm_device_msi_deassign(kvm_state, dev-dev_id);
+break;
+case ASSIGNED_IRQ_MSIX:
+r = kvm_device_msix_deassign(kvm_state, dev-dev_id);
+break;
+default:
+r = 0;
+break;
 }
+if (r) {
+perror(assign_intx: deassign);
+}
+dev-assigned_irq_type = ASSIGNED_IRQ_NONE;
 
 if (intx_route.mode == PCI_INTX_DISABLED) {
 dev-intx_route = intx_route;
@@ -892,12 +911,18 @@ static int assign_intx(AssignedDevice *dev)
 }
 
 retry:
+memset(assigned_irq_data, 0, sizeof(assigned_irq_data));
+assigned_irq_data.assigned_dev_id = dev-dev_id;
+assigned_irq_data.guest_irq = intx_route.irq;
 assigned_irq_data.flags = KVM_DEV_IRQ_GUEST_INTX;
 if (dev-features  ASSIGNED_DEVICE_PREFER_MSI_MASK 
-dev-cap.available  ASSIGNED_DEVICE_CAP_MSI)
+dev-cap.available  ASSIGNED_DEVICE_CAP_MSI) {
 assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_MSI;
-else
+new_type = ASSIGNED_IRQ_INTX_HOST_MSI;
+} else {
 assigned_irq_data.flags |= KVM_DEV_IRQ_HOST_INTX;
+new_type = ASSIGNED_IRQ_INTX_HOST_INTX;
+}
 
 r = kvm_assign_irq(kvm_state, assigned_irq_data);
 if (r  0) {
@@ -920,7 +945,7 @@ retry:
 }
 
 dev-intx_route = intx_route;
-dev-irq_requested_type = assigned_irq_data.flags;
+dev-assigned_irq_type = new_type;
 return r;
 }
 
@@ -958,23 +983,19 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
  PCI_MSI_FLAGS);
 int r;
 
-memset(assigned_irq_data, 0, sizeof assigned_irq_data);
-assigned_irq_data.assigned_dev_id = assigned_dev-dev_id;
-
 /* Some guests gratuitously disable MSI even if they're not using it,
  * try to catch this by only deassigning irqs if the guest is using
  * MSI or intends to start. */
-if ((assigned_dev-irq_requested_type  KVM_DEV_IRQ_GUEST_MSI) ||
+if (assigned_dev-assigned_irq_type == ASSIGNED_IRQ_MSI ||
 (ctrl_byte  PCI_MSI_FLAGS_ENABLE)) {
 
-assigned_irq_data.flags = assigned_dev-irq_requested_type;
 free_dev_irq_entries(assigned_dev);
-r = kvm_deassign_irq(kvm_state, assigned_irq_data);
+r = kvm_device_msi_deassign(kvm_state, assigned_dev-dev_id);
 /* -ENXIO means no assigned irq */
 if (r  r != -ENXIO)

[PATCH 16/19] pci-assign: Fix coding style issues

2012-08-16 Thread Jan Kiszka

No functional changes.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/kvm/pci-assign.c |  181 +-
 1 files changed, 105 insertions(+), 76 deletions(-)

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index 3611539..cfd859e 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -57,10 +57,10 @@
 #ifdef DEVICE_ASSIGNMENT_DEBUG
 #define DEBUG(fmt, ...)   \
 do {  \
-  fprintf(stderr, %s:  fmt, __func__ , __VA_ARGS__);\
+fprintf(stderr, %s:  fmt, __func__ , __VA_ARGS__);  \
 } while (0)
 #else
-#define DEBUG(fmt, ...) do { } while(0)
+#define DEBUG(fmt, ...)
 #endif
 
 typedef struct {
@@ -186,27 +186,27 @@ static uint64_t assigned_dev_ioport_rw(AssignedDevRegion 
*dev_region,
 DEBUG(out data=%lx, size=%d, e_phys=%lx, host=%x\n,
   *data, size, addr, port);
 switch (size) {
-case 1:
-outb(*data, port);
-break;
-case 2:
-outw(*data, port);
-break;
-case 4:
-outl(*data, port);
-break;
+case 1:
+outb(*data, port);
+break;
+case 2:
+outw(*data, port);
+break;
+case 4:
+outl(*data, port);
+break;
 }
 } else {
 switch (size) {
-case 1:
-val = inb(port);
-break;
-case 2:
-val = inw(port);
-break;
-case 4:
-val = inl(port);
-break;
+case 1:
+val = inb(port);
+break;
+case 2:
+val = inw(port);
+break;
+case 4:
+val = inl(port);
+break;
 }
 DEBUG(in data=%lx, size=%d, e_phys=%lx, host=%x\n,
   val, size, addr, port);
@@ -354,13 +354,14 @@ static uint32_t assigned_dev_pci_read(PCIDevice *d, int 
pos, int len)
 again:
 ret = pread(fd, val, len, pos);
 if (ret != len) {
-   if ((ret  0)  (errno == EINTR || errno == EAGAIN))
-   goto again;
+if ((ret  0)  (errno == EINTR || errno == EAGAIN)) {
+goto again;
+}
 
-   fprintf(stderr, %s: pread failed, ret = %zd errno = %d\n,
-   __func__, ret, errno);
+fprintf(stderr, %s: pread failed, ret = %zd errno = %d\n,
+__func__, ret, errno);
 
-   exit(1);
+exit(1);
 }
 
 return val;
@@ -380,16 +381,15 @@ static void assigned_dev_pci_write(PCIDevice *d, int pos, 
uint32_t val, int len)
 again:
 ret = pwrite(fd, val, len, pos);
 if (ret != len) {
-   if ((ret  0)  (errno == EINTR || errno == EAGAIN))
-   goto again;
+if ((ret  0)  (errno == EINTR || errno == EAGAIN)) {
+goto again;
+}
 
-   fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
-   __func__, ret, errno);
+fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
+__func__, ret, errno);
 
-   exit(1);
+exit(1);
 }
-
-return;
 }
 
 static void assigned_dev_emulate_config_read(AssignedDevice *dev,
@@ -418,21 +418,25 @@ static uint8_t pci_find_cap_offset(PCIDevice *d, uint8_t 
cap, uint8_t start)
 int status;
 
 status = assigned_dev_pci_read_byte(d, PCI_STATUS);
-if ((status  PCI_STATUS_CAP_LIST) == 0)
+if ((status  PCI_STATUS_CAP_LIST) == 0) {
 return 0;
+}
 
 while (max_cap--) {
 pos = assigned_dev_pci_read_byte(d, pos);
-if (pos  0x40)
+if (pos  0x40) {
 break;
+}
 
 pos = ~3;
 id = assigned_dev_pci_read_byte(d, pos + PCI_CAP_LIST_ID);
 
-if (id == 0xff)
+if (id == 0xff) {
 break;
-if (id == cap)
+}
+if (id == cap) {
 return pos;
+}
 
 pos += PCI_CAP_LIST_NEXT;
 }
@@ -447,8 +451,9 @@ static int assigned_dev_register_regions(PCIRegion 
*io_regions,
 PCIRegion *cur_region = io_regions;
 
 for (i = 0; i  regions_num; i++, cur_region++) {
-if (!cur_region-valid)
+if (!cur_region-valid) {
 continue;
+}
 
 /* handle memory io regions */
 if (cur_region-type  IORESOURCE_MEM) {
@@ -587,7 +592,7 @@ static int get_real_device(AssignedDevice *pci_dev, 
uint16_t r_seg,
 dev-region_number = 0;
 
 snprintf(dir, sizeof(dir), /sys/bus/pci/devices/%04x:%02x:%02x.%x/,
-r_seg, r_bus, r_dev, r_func);
+ r_seg, r_bus, r_dev, r_func);
 
 snprintf(name, sizeof(name), %sconfig,

[PATCH 02/19] pci-assign: Factor out kvm_device_pci_assign/deassign

2012-08-16 Thread Jan Kiszka

In contrast to the old wrappers, the new ones only take the required
parameters instead of a preinitialized kvm_assigned_pci_dev structure.
We also move the dev_id generation into kvm_device_pci_assign and store
the result in the AssignedDevice structure. It will serve as handle for
all upcoming kvm_device_* functions.

While refactoring these services, start moving KVM services where they
should finally end up in upstream QEMU: in the i386-specific KVM layer.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   46 --
 qemu-kvm.c |   14 --
 qemu-kvm.h |   24 
 target-i386/kvm.c  |   36 
 target-i386/kvm_i386.h |6 ++
 5 files changed, 54 insertions(+), 72 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 529e229..3f6196a 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -41,6 +41,7 @@
 #include range.h
 #include sysemu.h
 #include pci.h
+#include kvm_i386.h
 
 #define MSIX_PAGE_SIZE 0x1000
 
@@ -108,6 +109,7 @@ typedef struct {
 typedef struct AssignedDevice {
 PCIDevice dev;
 PCIHostDeviceAddress host;
+uint32_t dev_id;
 uint32_t features;
 int intpin;
 uint8_t debug_flags;
@@ -115,9 +117,6 @@ typedef struct AssignedDevice {
 PCIDevRegions real_device;
 int run;
 PCIINTxRoute intx_route;
-uint16_t h_segnr;
-uint8_t h_busnr;
-uint8_t h_devfn;
 int irq_requested_type;
 int bound;
 struct {
@@ -761,12 +760,6 @@ static void free_assigned_device(AssignedDevice *dev)
 free_dev_irq_entries(dev);
 }
 
-static uint32_t calc_assigned_dev_id(AssignedDevice *dev)
-{
-return (uint32_t)dev-h_segnr  16 | (uint32_t)dev-h_busnr  8 |
-   (uint32_t)dev-h_devfn;
-}
-
 static void assign_failed_examine(AssignedDevice *dev)
 {
 char name[PATH_MAX], dir[PATH_MAX], driver[PATH_MAX] = {}, *ns;
@@ -820,24 +813,17 @@ fail:
 
 static int assign_device(AssignedDevice *dev)
 {
-struct kvm_assigned_pci_dev assigned_dev_data;
+uint32_t flags = KVM_DEV_ASSIGN_ENABLE_IOMMU;
 int r;
 
 /* Only pass non-zero PCI segment to capable module */
 if (!kvm_check_extension(kvm_state, KVM_CAP_PCI_SEGMENT) 
-dev-h_segnr) {
+dev-host.domain) {
 fprintf(stderr, Can't assign device inside non-zero PCI segment 
 as this KVM module doesn't support it.\n);
 return -ENODEV;
 }
 
-memset(assigned_dev_data, 0, sizeof(assigned_dev_data));
-assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
-assigned_dev_data.segnr = dev-h_segnr;
-assigned_dev_data.busnr = dev-h_busnr;
-assigned_dev_data.devfn = dev-h_devfn;
-
-assigned_dev_data.flags = KVM_DEV_ASSIGN_ENABLE_IOMMU;
 if (!kvm_check_extension(kvm_state, KVM_CAP_IOMMU)) {
 fprintf(stderr, No IOMMU found.  Unable to assign device \%s\\n,
 dev-dev.qdev.id);
@@ -846,10 +832,10 @@ static int assign_device(AssignedDevice *dev)
 
 if (dev-features  ASSIGNED_DEVICE_SHARE_INTX_MASK 
 kvm_has_intx_set_mask()) {
-assigned_dev_data.flags |= KVM_DEV_ASSIGN_PCI_2_3;
+flags |= KVM_DEV_ASSIGN_PCI_2_3;
 }
 
-r = kvm_assign_pci_device(kvm_state, assigned_dev_data);
+r = kvm_device_pci_assign(kvm_state, dev-host, flags, dev-dev_id);
 if (r  0) {
 fprintf(stderr, Failed to assign device \%s\ : %s\n,
 dev-dev.qdev.id, strerror(-r));
@@ -889,7 +875,7 @@ static int assign_irq(AssignedDevice *dev)
 }
 
 memset(assigned_irq_data, 0, sizeof(assigned_irq_data));
-assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(dev);
+assigned_irq_data.assigned_dev_id = dev-dev_id;
 assigned_irq_data.guest_irq = intx_route.irq;
 if (dev-irq_requested_type) {
 assigned_irq_data.flags = dev-irq_requested_type;
@@ -940,13 +926,9 @@ retry:
 
 static void deassign_device(AssignedDevice *dev)
 {
-struct kvm_assigned_pci_dev assigned_dev_data;
 int r;
 
-memset(assigned_dev_data, 0, sizeof(assigned_dev_data));
-assigned_dev_data.assigned_dev_id = calc_assigned_dev_id(dev);
-
-r = kvm_deassign_pci_device(kvm_state, assigned_dev_data);
+r = kvm_device_pci_deassign(kvm_state, dev-dev_id);
 if (r  0)
fprintf(stderr, Failed to deassign device \%s\ : %s\n,
 dev-dev.qdev.id, strerror(-r));
@@ -977,7 +959,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 int r;
 
 memset(assigned_irq_data, 0, sizeof assigned_irq_data);
-assigned_irq_data.assigned_dev_id = calc_assigned_dev_id(assigned_dev);
+assigned_irq_data.assigned_dev_id = assigned_dev-dev_id;
 
 /* Some guests gratuitously disable MSI even if they're not using it,
  * try to catch this by only deassigning irqs if the guest is using
@@ -1059,7 +1041,7 @@ static int

[PATCH 17/19] pci-assign: Replace exit() with hw_error()

2012-08-16 Thread Jan Kiszka

This is more appropriate and allows central handling.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/kvm/pci-assign.c |   10 ++
 1 files changed, 2 insertions(+), 8 deletions(-)

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index cfd859e..29a4671 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -358,10 +358,7 @@ again:
 goto again;
 }
 
-fprintf(stderr, %s: pread failed, ret = %zd errno = %d\n,
-__func__, ret, errno);
-
-exit(1);
+hw_error(pci read failed, ret = %zd errno = %d\n, ret, errno);
 }
 
 return val;
@@ -385,10 +382,7 @@ again:
 goto again;
 }
 
-fprintf(stderr, %s: pwrite failed, ret = %zd errno = %d\n,
-__func__, ret, errno);
-
-exit(1);
+hw_error(pci write failed, ret = %zd errno = %d\n, ret, errno);
 }
 }
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 11/19] pci-assign: Rework MSI-X route setup

2012-08-16 Thread Jan Kiszka

Use kvm_irqchip_add_msi_route and introduce kvm_irqchip_update_msi_route
to set up the required IRQ routes for MSI-X injections. This removes the
last direct interaction with the IRQ routing API of the KVM kernel so
that we can remove/unexport related services.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   71 ++-
 kvm-all.c  |   51 ---
 kvm-stub.c |9 ---
 kvm.h  |5 +--
 qemu-kvm.c |  128 
 qemu-kvm.h |   22 
 6 files changed, 71 insertions(+), 215 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index af8a5aa..7ffd26c 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -141,8 +141,6 @@ typedef struct AssignedDevice {
 uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
 int msi_virq_nr;
 int *msi_virq;
-int irq_entries_nr;
-struct kvm_irq_routing_entry *entry;
 MSIXTableEntry *msix_table;
 target_phys_addr_t msix_table_addr;
 uint16_t msix_max;
@@ -701,7 +699,7 @@ again:
 
 static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
 
-static void free_dev_irq_entries(AssignedDevice *dev)
+static void free_msi_virqs(AssignedDevice *dev)
 {
 int i;
 
@@ -714,15 +712,6 @@ static void free_dev_irq_entries(AssignedDevice *dev)
 g_free(dev-msi_virq);
 dev-msi_virq = NULL;
 dev-msi_virq_nr = 0;
-
-for (i = 0; i  dev-irq_entries_nr; i++) {
-if (dev-entry[i].type) {
-kvm_del_routing_entry(dev-entry[i]);
-}
-}
-g_free(dev-entry);
-dev-entry = NULL;
-dev-irq_entries_nr = 0;
 }
 
 static void free_assigned_device(AssignedDevice *dev)
@@ -778,7 +767,7 @@ static void free_assigned_device(AssignedDevice *dev)
 close(dev-real_device.config_fd);
 }
 
-free_dev_irq_entries(dev);
+free_msi_virqs(dev);
 }
 
 static void assign_failed_examine(AssignedDevice *dev)
@@ -1001,7 +990,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 if (r  r != -ENXIO)
 perror(assigned_dev_update_msi: deassign irq);
 
-free_dev_irq_entries(assigned_dev);
+free_msi_virqs(assigned_dev);
 
 assigned_dev-assigned_irq_type = ASSIGNED_IRQ_NONE;
 pci_device_set_intx_routing_notifier(pci_dev, NULL);
@@ -1046,6 +1035,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 uint16_t entries_nr = 0;
 int i, r = 0;
 MSIXTableEntry *entry = adev-msix_table;
+MSIMessage msg;
 
 /* Get the usable entry number for allocating */
 for (i = 0; i  adev-msix_max; i++, entry++) {
@@ -1069,45 +1059,38 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 return r;
 }
 
-free_dev_irq_entries(adev);
+free_msi_virqs(adev);
 
-adev-irq_entries_nr = adev-msix_max;
-adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
+adev-msi_virq_nr = adev-msix_max;
+adev-msi_virq = g_malloc(adev-msix_max * sizeof(*adev-msi_virq));
 
 entry = adev-msix_table;
 for (i = 0; i  adev-msix_max; i++, entry++) {
+adev-msi_virq[i] = -1;
+
 if (msix_masked(entry)) {
 continue;
 }
 
-r = kvm_get_irq_route_gsi();
-if (r  0)
+msg.address = entry-addr_lo | ((uint64_t)entry-addr_hi  32);
+msg.data = entry-data;
+r = kvm_irqchip_add_msi_route(kvm_state, msg);
+if (r  0) {
 return r;
-
-adev-entry[i].gsi = r;
-adev-entry[i].type = KVM_IRQ_ROUTING_MSI;
-adev-entry[i].flags = 0;
-adev-entry[i].u.msi.address_lo = entry-addr_lo;
-adev-entry[i].u.msi.address_hi = entry-addr_hi;
-adev-entry[i].u.msi.data = entry-data;
+}
+adev-msi_virq[i] = r;
 
 DEBUG(MSI-X vector %d, gsi %d, addr %08x_%08x, data %08x\n, i,
   r, entry-addr_hi, entry-addr_lo, entry-data);
 
-kvm_add_routing_entry(kvm_state, adev-entry[i]);
-
 r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
-   adev-entry[i].gsi);
+   adev-msi_virq[i]);
 if (r) {
 fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
 break;
 }
 }
 
-if (r == 0) {
-kvm_irqchip_commit_routes(kvm_state);
-}
-
 return r;
 }
 
@@ -1127,13 +1110,13 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
  * MSIX or intends to start. */
 if ((assigned_dev-assigned_irq_type == ASSIGNED_IRQ_MSIX) ||
 (ctrl_word  PCI_MSIX_FLAGS_ENABLE)) {
-
-free_dev_irq_entries(assigned_dev);
 r = kvm_device_msix_deassign(kvm_state, assigned_dev-dev_id);
 /* -ENXIO means no assigned irq */
 if (r  r != -ENXIO)
 perror(assigned_dev_update_msix: deassign irq);
 
+

[PATCH 07/19] pci-assign: Rework MSI assignment

2012-08-16 Thread Jan Kiszka

Introduce kvm_device_msi_assign and use upstream's
kvm_irqchip_add_msi_route/kvm_irqchip_release_virq to provide MSI
support for assigned devices.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   54 +--
 target-i386/kvm.c  |6 +
 target-i386/kvm_i386.h |1 +
 3 files changed, 36 insertions(+), 25 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index d448fdc..1d0af34 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -41,6 +41,7 @@
 #include range.h
 #include sysemu.h
 #include pci.h
+#include msi.h
 #include kvm_i386.h
 
 #define MSIX_PAGE_SIZE 0x1000
@@ -138,6 +139,8 @@ typedef struct AssignedDevice {
 } cap;
 uint8_t emulate_config_read[PCI_CONFIG_SPACE_SIZE];
 uint8_t emulate_config_write[PCI_CONFIG_SPACE_SIZE];
+int msi_virq_nr;
+int *msi_virq;
 int irq_entries_nr;
 struct kvm_irq_routing_entry *entry;
 MSIXTableEntry *msix_table;
@@ -702,6 +705,16 @@ static void free_dev_irq_entries(AssignedDevice *dev)
 {
 int i;
 
+for (i = 0; i  dev-msi_virq_nr; i++) {
+if (dev-msi_virq[i] = 0) {
+kvm_irqchip_release_virq(kvm_state, dev-msi_virq[i]);
+dev-msi_virq[i] = -1;
+}
+}
+g_free(dev-msi_virq);
+dev-msi_virq = NULL;
+dev-msi_virq_nr = 0;
+
 for (i = 0; i  dev-irq_entries_nr; i++) {
 if (dev-entry[i].type) {
 kvm_del_routing_entry(dev-entry[i]);
@@ -973,7 +986,6 @@ static void assigned_dev_update_irq_routing(PCIDevice *dev)
 
 static void assigned_dev_update_msi(PCIDevice *pci_dev)
 {
-struct kvm_assigned_irq assigned_irq_data;
 AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 uint8_t ctrl_byte = pci_get_byte(pci_dev-config + pci_dev-msi_cap +
  PCI_MSI_FLAGS);
@@ -984,43 +996,35 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
  * MSI or intends to start. */
 if (assigned_dev-assigned_irq_type == ASSIGNED_IRQ_MSI ||
 (ctrl_byte  PCI_MSI_FLAGS_ENABLE)) {
-
-free_dev_irq_entries(assigned_dev);
 r = kvm_device_msi_deassign(kvm_state, assigned_dev-dev_id);
 /* -ENXIO means no assigned irq */
 if (r  r != -ENXIO)
 perror(assigned_dev_update_msi: deassign irq);
 
+free_dev_irq_entries(assigned_dev);
+
 assigned_dev-assigned_irq_type = ASSIGNED_IRQ_NONE;
 pci_device_set_intx_routing_notifier(pci_dev, NULL);
 }
 
 if (ctrl_byte  PCI_MSI_FLAGS_ENABLE) {
 uint8_t *pos = pci_dev-config + pci_dev-msi_cap;
-
-assigned_dev-entry = g_malloc0(sizeof(*(assigned_dev-entry)));
-assigned_dev-entry-u.msi.address_lo =
-pci_get_long(pos + PCI_MSI_ADDRESS_LO);
-assigned_dev-entry-u.msi.address_hi = 0;
-assigned_dev-entry-u.msi.data = pci_get_word(pos + PCI_MSI_DATA_32);
-assigned_dev-entry-type = KVM_IRQ_ROUTING_MSI;
-r = kvm_get_irq_route_gsi();
-if (r  0) {
-perror(assigned_dev_update_msi: kvm_get_irq_route_gsi);
+MSIMessage msg;
+int virq;
+
+msg.address = pci_get_long(pos + PCI_MSI_ADDRESS_LO);
+msg.data = pci_get_word(pos + PCI_MSI_DATA_32);
+virq = kvm_irqchip_add_msi_route(kvm_state, msg);
+if (virq  0) {
+perror(assigned_dev_update_msi: kvm_irqchip_add_msi_route);
 return;
 }
-assigned_dev-entry-gsi = r;
 
-kvm_add_routing_entry(kvm_state, assigned_dev-entry);
-kvm_irqchip_commit_routes(kvm_state);
-   assigned_dev-irq_entries_nr = 1;
-
-memset(assigned_irq_data, 0, sizeof assigned_irq_data);
-assigned_irq_data.assigned_dev_id = assigned_dev-dev_id;
-assigned_irq_data.guest_irq = assigned_dev-entry-gsi;
-   assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSI | KVM_DEV_IRQ_GUEST_MSI;
-if (kvm_assign_irq(kvm_state, assigned_irq_data)  0) {
-perror(assigned_dev_enable_msi: assign irq);
+assigned_dev-msi_virq = g_malloc(sizeof(*assigned_dev-msi_virq));
+assigned_dev-msi_virq_nr = 1;
+assigned_dev-msi_virq[0] = virq;
+if (kvm_device_msi_assign(kvm_state, assigned_dev-dev_id, virq)  0) {
+perror(assigned_dev_update_msi: kvm_device_msi_assign);
 }
 
 assigned_dev-intx_route.mode = PCI_INTX_DISABLED;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 4941744..04d1c7d 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2142,6 +2142,12 @@ int kvm_device_intx_deassign(KVMState *s, uint32_t 
dev_id, bool use_host_msi)
 (use_host_msi ? KVM_DEV_IRQ_HOST_MSI : KVM_DEV_IRQ_HOST_INTX));
 }
 
+int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int virq)
+{
+return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSI |
+  KVM_DEV_IRQ_GUEST_MSI, virq);

[PATCH 09/19] pci-assign: Replace kvm_assign_set_msix_nr with kvm_device_msix_init_vectors

2012-08-16 Thread Jan Kiszka

The refactored version cleanly hides the KVM IOCTL structure from the
users and also zeros out the padding field.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |7 ++-
 qemu-kvm.c |5 -
 qemu-kvm.h |1 -
 target-i386/kvm.c  |   12 
 target-i386/kvm_i386.h |2 ++
 5 files changed, 16 insertions(+), 11 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 80ac2fc..0e2f8e6 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1045,7 +1045,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 uint16_t entries_nr = 0;
 int i, r = 0;
-struct kvm_assigned_msix_nr msix_nr;
 struct kvm_assigned_msix_entry msix_entry;
 MSIXTableEntry *entry = adev-msix_table;
 
@@ -1064,9 +1063,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 return 0;
 }
 
-msix_nr.assigned_dev_id = adev-dev_id;
-msix_nr.entry_nr = entries_nr;
-r = kvm_assign_set_msix_nr(kvm_state, msix_nr);
+r = kvm_device_msix_init_vectors(kvm_state, adev-dev_id, entries_nr);
 if (r != 0) {
 fprintf(stderr, fail to set MSI-X entry number for MSIX! %s\n,
strerror(-r));
@@ -1078,7 +1075,7 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 adev-irq_entries_nr = adev-msix_max;
 adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
 
-msix_entry.assigned_dev_id = msix_nr.assigned_dev_id;
+msix_entry.assigned_dev_id = adev-dev_id;
 entry = adev-msix_table;
 for (i = 0; i  adev-msix_max; i++, entry++) {
 if (msix_masked(entry)) {
diff --git a/qemu-kvm.c b/qemu-kvm.c
index 8416a8d..1a2a4fd 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -186,11 +186,6 @@ int kvm_get_irq_route_gsi(void)
 }
 
 #ifdef KVM_CAP_DEVICE_MSIX
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr)
-{
-return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
-}
-
 int kvm_assign_set_msix_entry(KVMState *s,
   struct kvm_assigned_msix_entry *entry)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index c247ad0..3fd6046 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -66,7 +66,6 @@ int kvm_update_routing_entry(struct kvm_irq_routing_entry 
*entry,
  struct kvm_irq_routing_entry *newentry);
 
 
-int kvm_assign_set_msix_nr(KVMState *s, struct kvm_assigned_msix_nr *msix_nr);
 int kvm_assign_set_msix_entry(KVMState *s,
   struct kvm_assigned_msix_entry *entry);
 
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 677a791..676f45b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2161,6 +2161,18 @@ bool kvm_device_msix_supported(KVMState *s)
 return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, NULL) == -EFAULT;
 }
 
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+ uint32_t nr_vectors)
+{
+struct kvm_assigned_msix_nr msix_nr = {
+.assigned_dev_id = dev_id,
+.entry_nr = nr_vectors,
+.padding = 0,
+};
+
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
+}
+
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 {
 return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index 6f66b6d..aac14eb 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -28,6 +28,8 @@ int kvm_device_msi_assign(KVMState *s, uint32_t dev_id, int 
virq);
 int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
 
 bool kvm_device_msix_supported(KVMState *s);
+int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
+ uint32_t nr_vectors);
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id);
 
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 06/19] qemu-kvm: Move kvm_device_intx_set_mask service

2012-08-16 Thread Jan Kiszka

Move kvm_device_intx_set_mask prototype and implementation to their
upstream positions.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 qemu-kvm.c |9 -
 qemu-kvm.h |2 --
 target-i386/kvm.c  |9 +
 target-i386/kvm_i386.h |1 +
 4 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 8bc9857..8416a8d 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -37,15 +37,6 @@ static int kvm_old_assign_irq(KVMState *s,
 return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
 }
 
-int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked)
-{
-struct kvm_assigned_pci_dev assigned_dev;
-
-assigned_dev.assigned_dev_id = dev_id;
-assigned_dev.flags = masked ? KVM_DEV_ASSIGN_MASK_INTX : 0;
-return kvm_vm_ioctl(s, KVM_ASSIGN_SET_INTX_MASK, assigned_dev);
-}
-
 #ifdef KVM_CAP_ASSIGN_DEV_IRQ
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index 1cdface..c247ad0 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -43,8 +43,6 @@
  */
 int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
 
-int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked);
-
 struct kvm_irq_routing_entry;
 
 void kvm_add_routing_entry(KVMState *s, struct kvm_irq_routing_entry *entry);
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e2041f4..4941744 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2116,6 +2116,15 @@ int kvm_device_intx_assign(KVMState *s, uint32_t dev_id, 
bool use_host_msi,
 return kvm_assign_irq_internal(s, dev_id, irq_type, guest_irq);
 }
 
+int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked)
+{
+struct kvm_assigned_pci_dev assigned_dev;
+
+assigned_dev.assigned_dev_id = dev_id;
+assigned_dev.flags = masked ? KVM_DEV_ASSIGN_MASK_INTX : 0;
+return kvm_vm_ioctl(s, KVM_ASSIGN_SET_INTX_MASK, assigned_dev);
+}
+
 static int kvm_deassign_irq_internal(KVMState *s, uint32_t dev_id,
  uint32_t type)
 {
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index 5a24168..28f26bb 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -21,6 +21,7 @@ int kvm_device_pci_deassign(KVMState *s, uint32_t dev_id);
 
 int kvm_device_intx_assign(KVMState *s, uint32_t dev_id,
bool use_host_msi, uint32_t guest_irq);
+int kvm_device_intx_set_mask(KVMState *s, uint32_t dev_id, bool masked);
 int kvm_device_intx_deassign(KVMState *s, uint32_t dev_id, bool use_host_msi);
 
 int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 12/19] pci-assign: Factor out kvm_device_msix_assign

2012-08-16 Thread Jan Kiszka

Avoid passing kvm_assigned_irq on MSI-X assignment. Drop kvm_assign_irq
as it's now no longer used.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |9 +
 qemu-kvm.c |   27 ---
 qemu-kvm.h |   11 ---
 target-i386/kvm.c  |6 ++
 target-i386/kvm_i386.h |1 +
 5 files changed, 8 insertions(+), 46 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 7ffd26c..32a082d 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1096,15 +1096,11 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
*pci_dev)
 
 static void assigned_dev_update_msix(PCIDevice *pci_dev)
 {
-struct kvm_assigned_irq assigned_irq_data;
 AssignedDevice *assigned_dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 uint16_t ctrl_word = pci_get_word(pci_dev-config + pci_dev-msix_cap +
   PCI_MSIX_FLAGS);
 int r;
 
-memset(assigned_irq_data, 0, sizeof assigned_irq_data);
-assigned_irq_data.assigned_dev_id = assigned_dev-dev_id;
-
 /* Some guests gratuitously disable MSIX even if they're not using it,
  * try to catch this by only deassigning irqs if the guest is using
  * MSIX or intends to start. */
@@ -1122,16 +1118,13 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
 }
 
 if (ctrl_word  PCI_MSIX_FLAGS_ENABLE) {
-assigned_irq_data.flags = KVM_DEV_IRQ_HOST_MSIX |
-  KVM_DEV_IRQ_GUEST_MSIX;
-
 if (assigned_dev_update_msix_mmio(pci_dev)  0) {
 perror(assigned_dev_update_msix_mmio);
 return;
 }
 
 if (assigned_dev-msi_virq_nr  0) {
-if (kvm_assign_irq(kvm_state, assigned_irq_data)  0) {
+if (kvm_device_msix_assign(kvm_state, assigned_dev-dev_id)  0) {
 perror(assigned_dev_enable_msix: assign irq);
 return;
 }
diff --git a/qemu-kvm.c b/qemu-kvm.c
index e45e4a7..3dc56ea 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -30,33 +30,6 @@
 
 #define ALIGN(x, y) (((x)+(y)-1)  ~((y)-1))
 
-#ifdef KVM_CAP_DEVICE_ASSIGNMENT
-static int kvm_old_assign_irq(KVMState *s,
-  struct kvm_assigned_irq *assigned_irq)
-{
-return kvm_vm_ioctl(s, KVM_ASSIGN_IRQ, assigned_irq);
-}
-
-#ifdef KVM_CAP_ASSIGN_DEV_IRQ
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-int ret;
-
-ret = kvm_ioctl(s, KVM_CHECK_EXTENSION, KVM_CAP_ASSIGN_DEV_IRQ);
-if (ret  0) {
-return kvm_vm_ioctl(s, KVM_ASSIGN_DEV_IRQ, assigned_irq);
-}
-
-return kvm_old_assign_irq(s, assigned_irq);
-}
-#else
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq)
-{
-return kvm_old_assign_irq(s, assigned_irq);
-}
-#endif
-#endif
-
 #if !defined(TARGET_I386)
 void kvm_arch_init_irq_routing(KVMState *s)
 {
diff --git a/qemu-kvm.h b/qemu-kvm.h
index ae7a33c..f7d9cd5 100644
--- a/qemu-kvm.h
+++ b/qemu-kvm.h
@@ -32,17 +32,6 @@
 
 #include kvm.h
 
-/*!
- * \brief Assign IRQ for an assigned device
- *
- * Used for PCI device assignment, this function assigns IRQ numbers for
- * an physical device and guest IRQ handling.
- *
- * \param kvm Pointer to the current kvm_context
- * \param assigned_irq Parameters, like dev id, host irq, guest irq, etc
- */
-int kvm_assign_irq(KVMState *s, struct kvm_assigned_irq *assigned_irq);
-
 #endif /* CONFIG_KVM */
 
 #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index e9353ed..67635af 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -2186,6 +2186,12 @@ int kvm_device_msix_set_vector(KVMState *s, uint32_t 
dev_id, uint32_t vector,
 return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry);
 }
 
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id)
+{
+return kvm_assign_irq_internal(s, dev_id, KVM_DEV_IRQ_HOST_MSIX |
+  KVM_DEV_IRQ_GUEST_MSIX, 0);
+}
+
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
 {
 return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
index bd3b398..f6ab82f 100644
--- a/target-i386/kvm_i386.h
+++ b/target-i386/kvm_i386.h
@@ -32,6 +32,7 @@ int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
  uint32_t nr_vectors);
 int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
int virq);
+int kvm_device_msix_assign(KVMState *s, uint32_t dev_id);
 int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id);
 
 #endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 13/19] qemu-kvm: Kill qemu-kvm.[ch]

2012-08-16 Thread Jan Kiszka

Hurray!

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |1 -
 kvm-all.c  |3 ---
 kvm.h  |7 ---
 qemu-kvm.c |   37 -
 qemu-kvm.h |   37 -
 5 files changed, 0 insertions(+), 85 deletions(-)
 delete mode 100644 qemu-kvm.c
 delete mode 100644 qemu-kvm.h

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 32a082d..5ef5629 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -31,7 +31,6 @@
 #include sys/mman.h
 #include sys/types.h
 #include sys/stat.h
-#include qemu-kvm.h
 #include hw.h
 #include pc.h
 #include qemu-error.h
diff --git a/kvm-all.c b/kvm-all.c
index 8ab47f1..badf1d8 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2043,6 +2043,3 @@ int kvm_on_sigbus(int code, void *addr)
 {
 return kvm_arch_on_sigbus(code, addr);
 }
-
-#undef PAGE_SIZE
-#include qemu-kvm.c
diff --git a/kvm.h b/kvm.h
index 0c09be8..2a68a52 100644
--- a/kvm.h
+++ b/kvm.h
@@ -146,7 +146,6 @@ int kvm_set_signal_mask(CPUArchState *env, const sigset_t 
*sigset);
 
 int kvm_on_sigbus_vcpu(CPUArchState *env, int code, void *addr);
 int kvm_on_sigbus(int code, void *addr);
-#endif /* NEED_CPU_H */
 
 /* internal API */
 
@@ -154,7 +153,6 @@ int kvm_ioctl(KVMState *s, int type, ...);
 
 int kvm_vm_ioctl(KVMState *s, int type, ...);
 
-#ifdef NEED_CPU_H
 int kvm_vcpu_ioctl(CPUArchState *env, int type, ...);
 
 /* Arch specific hooks */
@@ -280,9 +278,4 @@ int kvm_irqchip_add_irqfd(KVMState *s, int fd, int virq);
 int kvm_irqchip_remove_irqfd(KVMState *s, int fd, int virq);
 int kvm_irqchip_add_irq_notifier(KVMState *s, EventNotifier *n, int virq);
 int kvm_irqchip_remove_irq_notifier(KVMState *s, EventNotifier *n, int virq);
-
-#ifdef NEED_CPU_H
-#include qemu-kvm.h
-#endif
-
 #endif
diff --git a/qemu-kvm.c b/qemu-kvm.c
deleted file mode 100644
index 3dc56ea..000
--- a/qemu-kvm.c
+++ /dev/null
@@ -1,37 +0,0 @@
-/*
- * qemu/kvm integration
- *
- * Copyright (C) 2006-2008 Qumranet Technologies
- *
- * Licensed under the terms of the GNU GPL version 2 or higher.
- */
-#include config.h
-#include config-host.h
-
-#include assert.h
-#include string.h
-#include hw/hw.h
-#include sysemu.h
-#include qemu-common.h
-#include console.h
-#include block.h
-#include compatfd.h
-#include gdbstub.h
-#include monitor.h
-#include cpus.h
-
-#include qemu-kvm.h
-
-#define EXPECTED_KVM_API_VERSION 12
-
-#if EXPECTED_KVM_API_VERSION != KVM_API_VERSION
-#error libkvm: userspace and kernel version mismatch
-#endif
-
-#define ALIGN(x, y) (((x)+(y)-1)  ~((y)-1))
-
-#if !defined(TARGET_I386)
-void kvm_arch_init_irq_routing(KVMState *s)
-{
-}
-#endif
diff --git a/qemu-kvm.h b/qemu-kvm.h
deleted file mode 100644
index f7d9cd5..000
--- a/qemu-kvm.h
+++ /dev/null
@@ -1,37 +0,0 @@
-/*
- * qemu/kvm integration
- *
- * Copyright (C) 2006-2008 Qumranet Technologies
- *
- * Licensed under the terms of the GNU GPL version 2 or higher.
- */
-#ifndef THE_ORIGINAL_AND_TRUE_QEMU_KVM_H
-#define THE_ORIGINAL_AND_TRUE_QEMU_KVM_H
-
-#include cpu.h
-
-#include signal.h
-#include stdlib.h
-
-#ifdef CONFIG_KVM
-
-#include stdint.h
-
-#ifndef __user
-#define __user   /* temporary, until installed via make headers_install */
-#endif
-
-#include linux/kvm.h
-
-#include signal.h
-
-/* FIXME: share this number with kvm */
-/* FIXME: or dynamically alloc/realloc regions */
-#define KVM_MAX_NUM_MEM_REGIONS 32u
-#define MAX_VCPUS 16
-
-#include kvm.h
-
-#endif /* CONFIG_KVM */
-
-#endif
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 18/19] pci-assign: Drop unused or write-only variables

2012-08-16 Thread Jan Kiszka

Remove remainders of previous refactorings.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/kvm/pci-assign.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/hw/kvm/pci-assign.c b/hw/kvm/pci-assign.c
index 29a4671..4f5daf3 100644
--- a/hw/kvm/pci-assign.c
+++ b/hw/kvm/pci-assign.c
@@ -120,13 +120,10 @@ typedef struct AssignedDevice {
 uint32_t dev_id;
 uint32_t features;
 int intpin;
-uint8_t debug_flags;
 AssignedDevRegion v_addrs[PCI_NUM_REGIONS - 1];
 PCIDevRegions real_device;
-int run;
 PCIINTxRoute intx_route;
 AssignedIRQType assigned_irq_type;
-int bound;
 struct {
 #define ASSIGNED_DEVICE_CAP_MSI (1  0)
 #define ASSIGNED_DEVICE_CAP_MSIX (1  1)
@@ -146,7 +143,6 @@ typedef struct AssignedDevice {
 MemoryRegion mmio;
 char *configfd_name;
 int32_t bootindex;
-QLIST_ENTRY(AssignedDevice) next;
 } AssignedDevice;
 
 static void assigned_dev_update_irq_routing(PCIDevice *dev);
@@ -699,8 +695,6 @@ again:
 return 0;
 }
 
-static QLIST_HEAD(, AssignedDevice) devs = QLIST_HEAD_INITIALIZER(devs);
-
 static void free_msi_virqs(AssignedDevice *dev)
 {
 int i;
@@ -1767,7 +1761,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 /* handle interrupt routing */
 e_intx = dev-dev.config[0x3d] - 1;
 dev-intpin = e_intx;
-dev-run = 0;
 dev-intx_route.mode = PCI_INTX_DISABLED;
 dev-intx_route.irq = -1;
 
@@ -1784,7 +1777,6 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 }
 
 assigned_dev_load_option_rom(dev);
-QLIST_INSERT_HEAD(devs, dev, next);
 
 add_boot_device_path(dev-bootindex, pci_dev-qdev, NULL);
 
@@ -1801,7 +1793,6 @@ static void assigned_exitfn(struct PCIDevice *pci_dev)
 {
 AssignedDevice *dev = DO_UPCAST(AssignedDevice, dev, pci_dev);
 
-QLIST_REMOVE(dev, next);
 deassign_device(dev);
 free_assigned_device(dev);
 }
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 03/19] pci-assign: Rename assign_irq to assign_intx

2012-08-16 Thread Jan Kiszka

The previous name may incorrectly suggest that this function assigns all
types of IRQs though it's only dealing with legacy interrupts.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/device-assignment.c |   14 +++---
 1 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 3f6196a..ee64c33 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -851,7 +851,7 @@ static int assign_device(AssignedDevice *dev)
 return r;
 }
 
-static int assign_irq(AssignedDevice *dev)
+static int assign_intx(AssignedDevice *dev)
 {
 struct kvm_assigned_irq assigned_irq_data;
 PCIINTxRoute intx_route;
@@ -881,7 +881,7 @@ static int assign_irq(AssignedDevice *dev)
 assigned_irq_data.flags = dev-irq_requested_type;
 r = kvm_deassign_irq(kvm_state, assigned_irq_data);
 if (r) {
-perror(assign_irq: deassign);
+perror(assign_intx: deassign);
 }
 dev-irq_requested_type = 0;
 }
@@ -943,7 +943,7 @@ static void assigned_dev_update_irq_routing(PCIDevice *dev)
 Error *err = NULL;
 int r;
 
-r = assign_irq(assigned_dev);
+r = assign_intx(assigned_dev);
 if (r  0) {
 qdev_unplug(dev-qdev, err);
 assert(!err);
@@ -1008,7 +1008,7 @@ static void assigned_dev_update_msi(PCIDevice *pci_dev)
 assigned_dev-intx_route.irq = -1;
 assigned_dev-irq_requested_type = assigned_irq_data.flags;
 } else {
-assign_irq(assigned_dev);
+assign_intx(assigned_dev);
 }
 }
 
@@ -1141,7 +1141,7 @@ static void assigned_dev_update_msix(PCIDevice *pci_dev)
 assigned_dev-intx_route.irq = -1;
 assigned_dev-irq_requested_type = assigned_irq_data.flags;
 } else {
-assign_irq(assigned_dev);
+assign_intx(assigned_dev);
 }
 }
 
@@ -1769,8 +1769,8 @@ static int assigned_initfn(struct PCIDevice *pci_dev)
 if (r  0)
 goto out;
 
-/* assign irq for the device */
-r = assign_irq(dev);
+/* assign legacy INTx to the device */
+r = assign_intx(dev);
 if (r  0)
 goto assigned_out;
 
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 15/19] pci-assign: Move and rename source file

2012-08-16 Thread Jan Kiszka

Move device-assignment.c into hw/kvm, calling it pci-assign.c now, just
like its device name.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---
 hw/i386/Makefile.objs|3 ---
 hw/kvm/Makefile.objs |2 +-
 hw/{device-assignment.c = kvm/pci-assign.c} |   10 +-
 3 files changed, 6 insertions(+), 9 deletions(-)
 rename hw/{device-assignment.c = kvm/pci-assign.c} (99%)

diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs
index 29f3e6f..523e224 100644
--- a/hw/i386/Makefile.objs
+++ b/hw/i386/Makefile.objs
@@ -14,7 +14,4 @@ obj-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
 obj-y += testdev.o
 obj-y += acpi.o acpi_piix4.o
 
-obj-$(CONFIG_KVM) += device-assignment.o
-
-
 obj-y := $(addprefix ../,$(obj-y))
diff --git a/hw/kvm/Makefile.objs b/hw/kvm/Makefile.objs
index 226497a..f620d7f 100644
--- a/hw/kvm/Makefile.objs
+++ b/hw/kvm/Makefile.objs
@@ -1 +1 @@
-obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o
+obj-$(CONFIG_KVM) += clock.o apic.o i8259.o ioapic.o i8254.o pci-assign.o
diff --git a/hw/device-assignment.c b/hw/kvm/pci-assign.c
similarity index 99%
rename from hw/device-assignment.c
rename to hw/kvm/pci-assign.c
index 5ef5629..3611539 100644
--- a/hw/device-assignment.c
+++ b/hw/kvm/pci-assign.c
@@ -31,16 +31,16 @@
 #include sys/mman.h
 #include sys/types.h
 #include sys/stat.h
-#include hw.h
-#include pc.h
+#include hw/hw.h
+#include hw/pc.h
 #include qemu-error.h
 #include console.h
-#include loader.h
+#include hw/loader.h
 #include monitor.h
 #include range.h
 #include sysemu.h
-#include pci.h
-#include msi.h
+#include hw/pci.h
+#include hw/msi.h
 #include kvm_i386.h
 
 #define MSIX_PAGE_SIZE 0x1000
-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 00/19] pci-assign: Refactor for upstream merge

2012-08-16 Thread Jan Kiszka

With this series, we are getting very close to obsoleting qemu-kvm. It
refactors hw/device-assignment.c and the associated KVM helper functions
into a form that should allow merging them into QEMU. Once the series is
acceptable for qemu-kvm, I will break out the necessary uq/master
patches and push pci-assign to upstream.

The major step of this series is to define a regular set of kvm_device_*
services that encapsulate classic (i.e. KVM-based, non-VFIO) device
assignment features and export them to i386 targets only. There will
never be another arch using them, therefore I pushed them into this
corner. Moreover, the device assignment device now makes use of the new
KVM IRQ/MSI routing API and no longer pokes into the internals of that
layer. Finally, I moved the code into hw/kvm/pci-assign.c, dropped the
superfluous configure option and did some basic code cleanups (mostly
coding style) to bring things in shape.

Note that patch 1 is a simple bug fix that should likely be applied for
qemu-kvm-1.2 independently.

This series depends on [1] and [2] and QEMU upstream (2b97f88c92) being
merged into qemu-kvm.

Please review.

[1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/95528
[2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/95522

Jan Kiszka (19):
  pci-assign: Only clean up registered IO resources
  pci-assign: Factor out kvm_device_pci_assign/deassign
  pci-assign: Rename assign_irq to assign_intx
  pci-assign: Refactor interrupt deassignment
  pci-assign: Factor out kvm_device_intx_assign
  qemu-kvm: Move kvm_device_intx_set_mask service
  pci-assign: Rework MSI assignment
  pci-assign: Factor out kvm_device_msix_supported
  pci-assign: Replace kvm_assign_set_msix_nr with
kvm_device_msix_init_vectors
  pci-assign: Replace kvm_assign_set_msix_entry with
kvm_device_msix_set_vector
  pci-assign: Rework MSI-X route setup
  pci-assign: Factor out kvm_device_msix_assign
  qemu-kvm: Kill qemu-kvm.[ch]
  pci-assign: Drop configure switches
  pci-assign: Move and rename source file
  pci-assign: Fix coding style issues
  pci-assign: Replace exit() with hw_error()
  pci-assign: Drop unused or write-only variables
  pci-assign: Gracefully handle missing in-kernel irqchip support

 configure|   11 -
 hw/i386/Makefile.objs|3 -
 hw/kvm/Makefile.objs |2 +-
 hw/{device-assignment.c = kvm/pci-assign.c} |  502 +-
 kvm-all.c|   54 +++-
 kvm-stub.c   |9 -
 kvm.h|   12 +-
 qemu-kvm.c   |  233 
 qemu-kvm.h   |  112 --
 target-i386/kvm.c|  142 
 target-i386/kvm_i386.h   |   22 ++
 11 files changed, 461 insertions(+), 641 deletions(-)
 rename hw/{device-assignment.c = kvm/pci-assign.c} (84%)
 delete mode 100644 qemu-kvm.c
 delete mode 100644 qemu-kvm.h

-- 
1.7.3.4

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH][Autotest] client.shared: Adds VersionableClass.

2012-08-16 Thread Jiří Župka

VersionableClass provides class hierarchy which automatically select
right version of class. Class manipulation is used for this reason.

Closer description is in autotest/client/shared/base_utils.py
   class VersionableClass

Working example is in autotest/client/shared/base_utils_unittest.py

pull-request: https://github.com/autotest/autotest/pull/519

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/shared/base_utils.py  |  263 ++
 client/shared/base_utils_unittest.py |  233 ++
 2 files changed, 496 insertions(+), 0 deletions(-)

diff --git a/client/shared/base_utils.py b/client/shared/base_utils.py
index b2064e1..e38baf6 100644
--- a/client/shared/base_utils.py
+++ b/client/shared/base_utils.py
@@ -2145,3 +2145,266 @@ def generate_random_string(length, 
ignore_str=string.punctuation,
 str += tmp
 length -= 1
 return str
+
+
+class VersionableClass(object):
+
+VersionableClass provides class hierarchy which automatically select right
+version of class. Class manipulation is used for this reason.
+By this reason is:
+Advantage) Only one version is working in one process. Class is changed in
+   whole process.
+Disadvantage) Only one version is working in one process.
+
+Example of usage (in base_utils_unittest):
+
+class FooC(object):
+pass
+
+#Not implemented get_version - not used for versioning.
+class VCP(FooC, VersionableClass):
+def __new__(cls, *args, **kargs):
+VCP.master_class = VCP
+return super(VCP, cls).__new__(cls, *args, **kargs)
+
+def foo(self):
+pass
+
+class VC2(VCP, VersionableClass):
+@staticmethod
+def get_version():
+return get_version_from_system
+
+@classmethod
+def is_right_version(cls, version):
+if version is not None:
+if version is satisfied:
+return True
+return False
+
+def func1(self):
+print func1
+
+def func2(self):
+print func2
+
+# get_version could be inherited.
+class VC3(VC2, VersionableClass):
+@classmethod
+def is_right_version(cls, version):
+if version is not None:
+if version+1 is satisfied:
+return True
+return False
+
+def func2(self):
+print func2_2
+
+class M(VCP):
+pass
+
+m = M()   # - When class is constructed the right version is
+  #automatically selected. In this case VC3 is selected.
+m.func2() # call VC3.func2(m)
+m.func1() # call VC2.func1(m)
+m.foo()   # call VC1.foo(m)
+
+# When controlled program version is changed then is necessary call
+ check_repair_versions or recreate object.
+
+m.check_repair_versions()
+
+# priority of class. (change place where is method searched first in group
+# of verisonable class.)
+
+class PP(VersionableClass):
+def __new__(cls, *args, **kargs):
+PP.master_class = PP
+return super(PP, cls).__new__(cls, *args, **kargs)
+
+class PP2(PP, VersionableClass):
+@staticmethod
+def get_version():
+return get_version_from_system
+
+@classmethod
+def is_right_version(cls, version):
+if version is not None:
+if version is satisfied:
+return True
+return False
+
+def func1(self):
+print PP func1
+
+class N(VCP, PP):
+pass
+
+n = N()
+
+n.func1() # - func2
+
+n.set_priority_class(PP, [VCP, PP])
+
+n.func1() # - PP func1
+
+Necessary for using:
+1) Subclass of versionable class must have implemented class methods
+  get_version and is_right_version. These two methods are necessary
+  for correct version section. Class without this method will be never
+  chosen like suitable class.
+
+2) Every class derived from master_class have to add to class definition
+  inheritance from VersionableClass. Direct inheritance from Versionable
+  Class is use like a mark for manipulation with VersionableClass.
+
+3) Master of VersionableClass have to defined class variable
+  cls.master_class.
+
+def __new__(cls, *args, **kargs):
+cls.check_repair_versions()
+return super(VersionableClass, cls).__new__(cls, *args, **kargs)
+
+#VersionableClass class management class.
+
+@classmethod
+def check_repair_versions(cls, master_classes=None):
+
+Check version of versionable class and if version not
+match repair version to correct version.
+
+@param master_classes: Check and repair only master_class.
+@type master_classes: list.
+
+if master_classes is None:
+master_classes =

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-16 Thread Benoît Canet

Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit :
 Hi,
 
 We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
 contractor to track down and fix problems we have with large memory Windows
 guests booting very slowly - they can take several hours.
 
 We previously reported these problems in July (copied below) and they are
 still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
 
 This is a serious issue for us which is causing significant pain to our
 larger Windows VM customers when their servers are offline for many hours
 during boot.
 
 If anyone knowledgeable in the area would be interested in being paid to
 work on this, or if you know someone who might be, I would be delighted to
 hear from you.
 
 Cheers,
 
 Richard.
 
 
 = Previous bug report
 
 http://marc.info/?l=qemu-develm=134304194329745
 
 
 We have been experiencing this problem for a while now too, using qemu-kvm
 (currently at 1.1.1).
 
 Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
 produces the issue:
 
 qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus 
 -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
 
 The hardware consists of dual AMD Opteron 6128 processors (16 cores in
 total) and 64GB of memory. This command line was tested on kernel 3.1.4. 
 
 I've also tested with -no-hpet.
 
 What I have seen is much as described: the memory fills out slowly, and top
 on the host will show the process using 100% on all allocated CPU cores. The
 most extreme case was a machine which took something between 6 and 8 hours
 to boot.
 
 This seems to be related to the assigned memory, as described, but also the
 number of processor cores (which makes sense if we believe it's a timing
 issue?). I have seen slow-booting guests improved by switching down to a
 single or even two cores.
 
 Matthew, I agree that this seems to be linked to the number of VMs running -
 in fact, shutting down other VMs on a dedicated test host caused the machine
 to start booting at a normal speed (with no reboot required).
 
 However, the level of contention is never such that this could be explained
 by the host simply being overcommitted.
 
 If it helps anyone, there's an image of the hard drive I've been using to
 test at:
 
 http://46.20.114.253/
 
 It's 5G of gzip file containing a fairly standard Windows 2008 trial
 installation. Since it's in the trial period, anyone who wants to use it may
 have to re-arm the trial: http://support.microsoft.com/kb/948472
 
 Please let me know if I can provide any more information, or test anything.

For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu 
kernel
on a core i7 with these parameters.

Benoît

 
 Best wishes,
 
 Owen Tuz
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: perf uncore lkvm woes

2012-08-16 Thread David Ahern


On 8/16/12 5:17 AM, Peter Zijlstra wrote:

If you don't mind printing a warning every time a Linux guest boots ;-)


That feature already exists today for perf related probing. e.g.,

[585929.678746] kvm [10752]: vcpu0 unhandled rdmsr: 0x345
[585929.709870] kvm_set_msr_common: 54 callbacks suppressed
[585929.709986] kvm [10752]: vcpu0 unhandled wrmsr: 0x680 data 0
[585929.710104] kvm [10752]: vcpu0 unhandled wrmsr: 0x6c0 data 0
[585929.710221] kvm [10752]: vcpu0 unhandled wrmsr: 0x681 data 0
[585929.710352] kvm [10752]: vcpu0 unhandled wrmsr: 0x6c1 data 0
[585929.710467] kvm [10752]: vcpu0 unhandled wrmsr: 0x682 data 0
[585929.710581] kvm [10752]: vcpu0 unhandled wrmsr: 0x6c2 data 0
[585929.710707] kvm [10752]: vcpu0 unhandled wrmsr: 0x683 data 0
[585929.710822] kvm [10752]: vcpu0 unhandled wrmsr: 0x6c3 data 0
[585929.710937] kvm [10752]: vcpu0 unhandled wrmsr: 0x684 data 0
[585929.711052] kvm [10752]: vcpu0 unhandled wrmsr: 0x6c4 data 0


David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/19] pci-assign: Refactor for upstream merge

2012-08-16 Thread Avi Kivity

On 08/16/2012 04:54 PM, Jan Kiszka wrote:
 With this series, we are getting very close to obsoleting qemu-kvm. It
 refactors hw/device-assignment.c and the associated KVM helper functions
 into a form that should allow merging them into QEMU. Once the series is
 acceptable for qemu-kvm, I will break out the necessary uq/master
 patches and push pci-assign to upstream.
 
 The major step of this series is to define a regular set of kvm_device_*
 services that encapsulate classic (i.e. KVM-based, non-VFIO) device
 assignment features and export them to i386 targets only. There will
 never be another arch using them, therefore I pushed them into this
 corner. Moreover, the device assignment device now makes use of the new
 KVM IRQ/MSI routing API and no longer pokes into the internals of that
 layer. Finally, I moved the code into hw/kvm/pci-assign.c, dropped the
 superfluous configure option and did some basic code cleanups (mostly
 coding style) to bring things in shape.
 
 Note that patch 1 is a simple bug fix that should likely be applied for
 qemu-kvm-1.2 independently.
 
 This series depends on [1] and [2] and QEMU upstream (2b97f88c92) being
 merged into qemu-kvm.
 
 Please review.

From a quick review it looks ready to merge.  Of course I'd appreciate a
review from Alex or Michael as well.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/19] pci-assign: Refactor for upstream merge

2012-08-16 Thread Jan Kiszka

On 2012-08-16 16:34, Avi Kivity wrote:
 On 08/16/2012 04:54 PM, Jan Kiszka wrote:
 With this series, we are getting very close to obsoleting qemu-kvm. It
 refactors hw/device-assignment.c and the associated KVM helper functions
 into a form that should allow merging them into QEMU. Once the series is
 acceptable for qemu-kvm, I will break out the necessary uq/master
 patches and push pci-assign to upstream.

 The major step of this series is to define a regular set of kvm_device_*
 services that encapsulate classic (i.e. KVM-based, non-VFIO) device
 assignment features and export them to i386 targets only. There will
 never be another arch using them, therefore I pushed them into this
 corner. Moreover, the device assignment device now makes use of the new
 KVM IRQ/MSI routing API and no longer pokes into the internals of that
 layer. Finally, I moved the code into hw/kvm/pci-assign.c, dropped the
 superfluous configure option and did some basic code cleanups (mostly
 coding style) to bring things in shape.

 Note that patch 1 is a simple bug fix that should likely be applied for
 qemu-kvm-1.2 independently.

 This series depends on [1] and [2] and QEMU upstream (2b97f88c92) being
 merged into qemu-kvm.

 Please review.
 
From a quick review it looks ready to merge.  Of course I'd appreciate a
 review from Alex or Michael as well.

Great, thanks.

FWIW, as the upstream integration is technically trivial now, I pushed a
preview to

git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

Will send out once reviews for this series are done.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [User question] Huge buffer size on KVM host

2012-08-16 Thread Martin Wawro


On Aug 15, 2012, at 2:57 PM, Avi Kivity wrote:

 
 We are using logical volumes and the cache is set to 'none'.
 
 Strange, that should work without any buffering.
 
 What the contents of
 
  /sys/block/sda/queue/hw_sector_size
 
 and
 
  /sys/block/sda/queue/logical_block_size
 
 ?
 

Hi Avi,

It seems that the kernel on that particular machine is too old, those entries 
are
not featured. We checked on a comparable setup with a newer kernel and both 
entries
were set to 512.

We also did have a third more thorough look on the caching. It turns out that 
the
virt-manager does not seem to honor the caching adjusted in the GUI correctly.
We disabled caching on all virtual devices for this particular VM and checking
with ps -fxal revealed, that only one of those devices (and a rather small 
one too)
had this set. We corrected this in the XML file directly and the buffer size
currently resides at around 1.8 GB after rebooting the VM (the only virtio 
device
not having the cache=none option set is now the (non-mounted) cdrom).


Best regards,
-- 


Martin Wawro   | Digital Medics GmbH 
Managing Director  |  Otto-Hahn-Str. 15, 44227 Dortmund, Germany
Tel. +49-231-9742-6622 |  Fax: +49-231-9742-6623
Key: 0xB0A225BD|Registered at AG Dortmund, HRB 19360




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3] KVM: x86 emulator: access GPRs on demand

2012-08-16 Thread Avi Kivity

Instead of populating the the entire register file, read in registers
as they are accessed, and write back only the modified ones.  This
saves a VMREAD and VMWRITE on Intel (for rsp, since it is not usually
used during emulation), and a two 128-byte copies for the registers.

Signed-off-by: Avi Kivity a...@redhat.com
---

v3:
  fix misplaced parentheses in em_loop() and em_jcxz(), unbreaking those 
instructions.

v2:
  add APIs for managing the register cache.  This reduces the potential for 
confusion
between ctxt-regs_dirty and vcpu-arch.regs_dirty.
  move cache management to the entry points
  add missing writebacks to int and task switch emulation


 arch/x86/include/asm/kvm_emulate.h |  20 ++-
 arch/x86/kvm/emulate.c | 305 ++---
 arch/x86/kvm/x86.c |  45 +++---
 3 files changed, 223 insertions(+), 147 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index c764f43..282aee5 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -86,6 +86,19 @@ struct x86_instruction_info {
 
 struct x86_emulate_ops {
/*
+* read_gpr: read a general purpose register (rax - r15)
+*
+* @reg: gpr number.
+*/
+   ulong (*read_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg);
+   /*
+* write_gpr: write a general purpose register (rax - r15)
+*
+* @reg: gpr number.
+* @val: value to write.
+*/
+   void (*write_gpr)(struct x86_emulate_ctxt *ctxt, unsigned reg, ulong 
val);
+   /*
 * read_std: Read bytes of standard (non-emulated/special) memory.
 *   Used for descriptor reading.
 *  @addr:  [IN ] Linear address from which to read.
@@ -281,8 +294,10 @@ struct x86_emulate_ctxt {
bool rip_relative;
unsigned long _eip;
struct operand memop;
+   u32 regs_valid;  /* bitmaps of registers in _regs[] that can be read */
+   u32 regs_dirty;  /* bitmaps of registers in _regs[] that have been 
written */
/* Fields above regs are cleared together. */
-   unsigned long regs[NR_VCPU_REGS];
+   unsigned long _regs[NR_VCPU_REGS];
struct operand *memopp;
struct fetch_cache fetch;
struct read_cache io_read;
@@ -394,4 +409,7 @@ int emulator_task_switch(struct x86_emulate_ctxt *ctxt,
 u16 tss_selector, int idt_index, int reason,
 bool has_error_code, u32 error_code);
 int emulate_int_real(struct x86_emulate_ctxt *ctxt, int irq);
+void emulator_invalidate_register_cache(struct x86_emulate_ctxt *ctxt);
+void emulator_writeback_register_cache(struct x86_emulate_ctxt *ctxt);
+
 #endif /* _ASM_X86_KVM_X86_EMULATE_H */
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 10f0136..c6a6f7f 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -202,6 +202,42 @@ struct gprefix {
 #define EFLG_RESERVED_ZEROS_MASK 0xffc0802a
 #define EFLG_RESERVED_ONE_MASK 2
 
+static ulong reg_read(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   if (!(ctxt-regs_valid  (1  nr))) {
+   ctxt-regs_valid |= 1  nr;
+   ctxt-_regs[nr] = ctxt-ops-read_gpr(ctxt, nr);
+   }
+   return ctxt-_regs[nr];
+}
+
+static ulong *reg_write(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   ctxt-regs_valid |= 1  nr;
+   ctxt-regs_dirty |= 1  nr;
+   return ctxt-_regs[nr];
+}
+
+static ulong *reg_rmw(struct x86_emulate_ctxt *ctxt, unsigned nr)
+{
+   reg_read(ctxt, nr);
+   return reg_write(ctxt, nr);
+}
+
+static void writeback_registers(struct x86_emulate_ctxt *ctxt)
+{
+   unsigned reg;
+
+   for_each_set_bit(reg, (ulong *)ctxt-regs_dirty, 16)
+   ctxt-ops-write_gpr(ctxt, reg, ctxt-_regs[reg]);
+}
+
+static void invalidate_registers(struct x86_emulate_ctxt *ctxt)
+{
+   ctxt-regs_dirty = 0;
+   ctxt-regs_valid = 0;
+}
+
 /*
  * Instruction emulation:
  * Most instructions are emulated directly via a fragment of inline assembly
@@ -374,8 +410,8 @@ struct gprefix {
 #define __emulate_1op_rax_rdx(ctxt, _op, _suffix, _ex) \
do {\
unsigned long _tmp; \
-   ulong *rax = (ctxt)-regs[VCPU_REGS_RAX];  \
-   ulong *rdx = (ctxt)-regs[VCPU_REGS_RDX];  \
+   ulong *rax = reg_rmw((ctxt), VCPU_REGS_RAX);\
+   ulong *rdx = reg_rmw((ctxt), VCPU_REGS_RDX);\
\
__asm__ __volatile__ (  \
_PRE_EFLAGS(0, 5, 1)  \
@@ -773,14 +809,15 @@ static int do_insn_fetch(struct x86_emulate_ctxt *ctxt,
  * pointer into the block that

Re: [User question] Huge buffer size on KVM host

2012-08-16 Thread Avi Kivity

On 08/16/2012 05:54 PM, Martin Wawro wrote:
 
 On Aug 15, 2012, at 2:57 PM, Avi Kivity wrote:
 
 
 We are using logical volumes and the cache is set to 'none'.
 
 Strange, that should work without any buffering.
 
 What the contents of
 
  /sys/block/sda/queue/hw_sector_size
 
 and
 
  /sys/block/sda/queue/logical_block_size
 
 ?
 
 
 Hi Avi,
 
 It seems that the kernel on that particular machine is too old, those entries 
 are
 not featured. We checked on a comparable setup with a newer kernel and both 
 entries
 were set to 512.
 
 We also did have a third more thorough look on the caching. It turns out that 
 the
 virt-manager does not seem to honor the caching adjusted in the GUI correctly.
 We disabled caching on all virtual devices for this particular VM and checking
 with ps -fxal revealed, that only one of those devices (and a rather small 
 one too)
 had this set. We corrected this in the XML file directly and the buffer size
 currently resides at around 1.8 GB after rebooting the VM (the only virtio 
 device
 not having the cache=none option set is now the (non-mounted) cdrom).
 

cc += libvirt-list

Is there a reason that cdroms don't get cache=none?


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vm pxe fail

2012-08-16 Thread Andrew Holway


On Aug 16, 2012, at 3:54 PM, Stefan Hajnoczi wrote:

 On Thu, Aug 16, 2012 at 1:25 PM, Andrew Holway a.hol...@syseleven.de wrote:
 I have a kvm vm that I am attempting to boot from pxe. The dhcp works 
 perfectly and I can see the VM in the pxe server arp. but the tftp just 
 times out. I don't see any tftp traffic on either the physical host or on 
 the pie server. I am using a bridged interface. I have tried using several 
 virtual nic drivers, several different mac addresses and several different 
 ips.  on the physical host I can get the pxelinux.0 file from the pxe server 
 via tftp and can clearly see that traffic with tcpdump.
 
 Ive tried using various virtual interfaces.
 
 I can pxe boot my physical hosts with no problems.
 
 I can tftp fine from the physical host and see the traffic with ethdump
 
 Have you run tcpdump on the tap interface?  (This is different from
 running tcpdump on host eth0 because it is earlier in the network path
 and happens before the software bridge.)

Yes. I can just see DHCP traffic.

 
 What do iptables -L -n and ebtables -L say?
 

[root@node002 ~]# iptables -L -n 
Chain INPUT (policy ACCEPT)
target prot opt source   destination 
ACCEPT udp  --  0.0.0.0/00.0.0.0/0   udp dpt:53 
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   tcp dpt:53 
ACCEPT udp  --  0.0.0.0/00.0.0.0/0   udp dpt:67 
ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   tcp dpt:67 

Chain FORWARD (policy ACCEPT)
target prot opt source   destination 
ACCEPT all  --  0.0.0.0/0192.168.122.0/24state 
RELATED,ESTABLISHED 
ACCEPT all  --  192.168.122.0/24 0.0.0.0/0   
ACCEPT all  --  0.0.0.0/00.0.0.0/0   
REJECT all  --  0.0.0.0/00.0.0.0/0   reject-with 
icmp-port-unreachable 
REJECT all  --  0.0.0.0/00.0.0.0/0   reject-with 
icmp-port-unreachable 

Chain OUTPUT (policy ACCEPT)
target prot opt source   destination  


[root@node002 ~]# ebtables -L
Bridge table: filter

Bridge chain: INPUT, entries: 0, policy: ACCEPT

Bridge chain: FORWARD, entries: 0, policy: ACCEPT

Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

[root@node002 ~]# tcpdump -i vnet0 udp
tcpdump: WARNING: vnet0: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on vnet0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:08:08.849344 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 387
17:08:08.849413 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 387
17:08:08.849661 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
BOOTP/DHCP, Reply, length 360
17:08:09.812645 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 387
17:08:09.812709 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 387
17:08:09.812903 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
BOOTP/DHCP, Reply, length 360
17:08:11.789993 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 399
17:08:11.790107 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, Request 
from 00:00:00:00:00:0d (oui Ethernet), length 399
17:08:11.790294 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
BOOTP/DHCP, Reply, length 360


And then…….silence!



 Stefan


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 00/14] KVM/ARM Implementation

2012-08-16 Thread Christoffer Dall

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.  Work is done in
collaboration between Columbia University, Virtual Open Systems and
ARM/Linaro.

The patch series applies to kvm/next, specifically commit:
 dbcb4e798072d114fe68813f39a9efd239ab99c0

This is Version 10 of the patch series, but the first two versions
were reviewed outside of the KVM mailing list. Changes can also be
pulled from:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v10

A non-flattened edition of the patch series can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-a15-v10-stage

WARNING: This patch series release breaks compatibility with QEMU as it
worked with kvm-a15-v9 due to the new reset and set target API.  Please
use the latest Linaro master branch (or the mirror from here):
 git://github.com/virtualopensystems/qemu.git kvm-a15-v10

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add mem_type prot_pte accessor
  2. ARM: ARM_VIRT_EXT config option
  3. ARM: Section based HYP idmaps
  4. ARM: Expose PMNC bitfields for KVM use

KVM guys, please consider pulling the KVM generic patches as early as
possible. Thanks.

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  5. Skeleton and reset hooks
  6. Hypervisor initialization
  7. Memory virtualization setup (hyp mode mappings and 2nd stage)
  8. Inject IRQs and FIQs from userspace
  9. World-switch implementation and Hyp exception vectors
 10. Emulation framework and coproc emulation
 11. Coproc user space API
 12. Handle guest user memory aborts
 13. Handle guest MMIO aborts
 14. Support guest wait-for-interrupt instructions

Testing:
Limited testing, but have run GCC inside guest, which compiled a small
hello-world program, which was successfully run. For v10 both ARM/Thumb-2
kernels were tested as both host/guest and both a compiled-in version
and a kernel module version of KVM was tested. Hardware still
unavailable to me, so all testing has been done on ARM Fast Models.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf


Additionally a few major milestones are coming up shortly:
 - Support Thumb MMIO emulation and test MMIO emulation code (under way)
 - Merge Marc Zyngier's patch series for VGIC and timers (review in
   progress)
 - Change from SMC based install to relying on booting the kernel in Hyp
   mode. (review of patches from Marc Zyngier underway)

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check kvm_condition_valid for HVC traps of undefs
 - Added a naive implementation of kvm_unmap_hva_range

Changes since v8:
 - Support cache maintenance on SMP through set/way
 - Hyp mode idmaps are now section based and happen at kernel init
 - Handle aborts in Hyp mode
 - Inject undefined exceptions into the guest on error
 - Kernel-side reset of all crucial registers
 - Specifically state which target CPU is being virtualized
 - Exit statistics in debugfs
 - Some L2CTLR cp15 emulation cleanups
 - Support spte_hva for MMU notifiers and take write faults
 - FIX: Race condition in VMID generation
 - BUG: Run exit handling code with disabled preemption
 - Save/Restore abort fault register during world switch

Changes since v7:
 - Traps accesses to ACTLR
 - Do not trap WFE execution
 - Upgrade barriers and TLB operations to inner-shareable domain
 - Restrucure hyp_pgd related code to be more opaque
 - Random SMP fixes
 - Random BUG fixes
 - Improve commenting
 - Support module loading/unloading of KVM/ARM
 - Thumb-2 support for host kernel and KVM
 - Unaligned cross-page wide guest Thumb instruction fetching
 - Support ITSTATE fields in CPSR for Thumb guests
 - Document HCR settings

Changes since v6:
 - Support for MMU notifiers to not pin user pages in memory
 - Suport build with log debugging
 - Bugfix: v6 clobbered r7 in init code
 - Simplify hyp code mapping
 - Cleanup of register access code
 - Table-based CP15 emulation from Rusty

[PATCH v10 01/14] ARM: add mem_type prot_pte accessor

2012-08-16 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

The KVM hypervisor mmu code requires access to the mem_type prot_pte
field when setting up page tables pointing to a device. Unfortunately,
the mem_type structure is opaque.

Add an accessor (get_mem_type_prot_pte()) to retrieve the prot_pte
value.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/mach/map.h |1 +
 arch/arm/mm/mmu.c   |6 ++
 2 files changed, 7 insertions(+)

diff --git a/arch/arm/include/asm/mach/map.h b/arch/arm/include/asm/mach/map.h
index a6efcdd..3787c9f 100644
--- a/arch/arm/include/asm/mach/map.h
+++ b/arch/arm/include/asm/mach/map.h
@@ -37,6 +37,7 @@ extern void iotable_init(struct map_desc *, int);
 
 struct mem_type;
 extern const struct mem_type *get_mem_type(unsigned int type);
+extern pteval_t get_mem_type_prot_pte(unsigned int type);
 /*
  * external interface to remap single page with appropriate type
  */
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 4c2d045..76bf4f5 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -301,6 +301,12 @@ const struct mem_type *get_mem_type(unsigned int type)
 }
 EXPORT_SYMBOL(get_mem_type);
 
+pteval_t get_mem_type_prot_pte(unsigned int type)
+{
+   return get_mem_type(type)-prot_pte;
+}
+EXPORT_SYMBOL(get_mem_type_prot_pte);
+
 /*
  * Adjust the PMD section entries according to the CPU in use.
  */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 02/14] ARM: Add config option ARM_VIRT_EXT

2012-08-16 Thread Christoffer Dall

Select this option for ARM processors equipped with hardware
Virtualization Extensions.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/mm/Kconfig |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/arm/mm/Kconfig b/arch/arm/mm/Kconfig
index 101b968..037dc53 100644
--- a/arch/arm/mm/Kconfig
+++ b/arch/arm/mm/Kconfig
@@ -597,6 +597,16 @@ config ARM_LPAE
 
  If unsure, say N.
 
+config ARM_VIRT_EXT
+   bool Support for ARM Virtualization Extensions
+   depends on ARM_LPAE
+   help
+ Say Y if you have an ARMv7 processor supporting the ARM hardware
+ Virtualization extensions. KVM depends on this feature and will
+ not run without it being selected. If you say Y here, the kernel
+ will not boot on a machine without virtualization extensions and
+ will not boot as a KVM guest.
+
 config ARCH_PHYS_ADDR_T_64BIT
def_bool ARM_LPAE
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 03/14] ARM: Section based HYP idmap

2012-08-16 Thread Christoffer Dall

Add a HYP pgd to the core code (so it can benefit all Linux
hypervisors).

Populate this pgd with an identity mapping of the code contained
in the .hyp.idmap.text section

Offer a method to drop the this identity mapping through
hyp_idmap_teardown and re-create it through hyp_idmap_setup.

Make all the above depend on CONFIG_ARM_VIRT_EXT

Cc: Will Deacon will.dea...@arm.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/idmap.h|7 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |1 
 arch/arm/kernel/vmlinux.lds.S   |6 ++
 arch/arm/mm/idmap.c |   88 +++
 4 files changed, 89 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..a1ab8d6 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,11 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+extern pgd_t *hyp_pgd;
+
+void hyp_idmap_teardown(void);
+void hyp_idmap_setup(void);
+#endif
+
 #endif /* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN(_AT(pmdval_t, 1)  54)
 #define PMD_SECT_AP_WRITE  (_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ   (_AT(pmdval_t, 0))
+#define PMD_SECT_AP1   (_AT(pmdval_t, 1)  6)
 #define PMD_SECT_TEX(x)(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 36ff15b..12fd2eb 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
ALIGN_FUNCTION();   \
VMLINUX_SYMBOL(__idmap_text_start) = .; \
*(.idmap.text)  \
-   VMLINUX_SYMBOL(__idmap_text_end) = .;
+   VMLINUX_SYMBOL(__idmap_text_end) = .;   \
+   ALIGN_FUNCTION();   \
+   VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
+   *(.hyp.idmap.text)  \
+   VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..7a944af 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include linux/module.h
 #include linux/kernel.h
+#include linux/slab.h
 
 #include asm/cputype.h
 #include asm/idmap.h
@@ -59,11 +61,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+const char *text_end, unsigned long prot)
 {
-   unsigned long prot, next;
+   unsigned long addr, end;
+   unsigned long next;
+
+   addr = virt_to_phys(text_start);
+   end = virt_to_phys(text_end);
+
+   pr_info(Setting up static %sidentity map for 0x%llx - 0x%llx\n,
+   prot ? HYP  : ,
+   (long long)addr, (long long)end);
+   prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-   prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
if (cpu_architecture() = CPU_ARCH_ARMv5TEJ  !cpu_is_xscale())
prot |= PMD_BIT4;
 
@@ -78,24 +89,77 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-   phys_addr_t idmap_start, idmap_end;
-
idmap_pgd = pgd_alloc(init_mm);
if (!idmap_pgd)
return -ENOMEM;
 
-   /* Add an identity mapping for the physical address of the section. */
-   idmap_start = virt_to_phys((void *)__idmap_text_start);
-   idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-   pr_info(Setting up static identity map for 0x%llx - 0x%llx\n,
-   (long long)idmap_start, (long long)idmap_end);
-   identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+   identity_mapping_add(idmap_pgd, __idmap_text_start,
+__idmap_text_end, 0);
 
return 0;
 }
 early_initcall(init_static_idmap);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+pgd_t *hyp_pgd;
+EXPORT_SYMBOL_GPL(hyp_pgd);
+
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+
+   pud = pud_offset(pgd, addr);
+   pmd = pmd_offset(pud, addr);
+   pud_clear(pud);
+   clean_pmd_entry(pmd);
+   pmd_free(NULL,

[PATCH v10 04/14] ARM: Expose PMNC bitfields for KVM use

2012-08-16 Thread Christoffer Dall

From: Rusty Russell rusty.russ...@linaro.org

We want some of these for use in KVM, so pull them out of
arch/arm/kernel/perf_event_v7.c into their own asm/perf_bits.h.

Signed-off-by: Rusty Russell rusty.russ...@linaro.org
---
 arch/arm/include/asm/perf_bits.h |   56 ++
 arch/arm/kernel/perf_event_v7.c  |   51 +--
 2 files changed, 57 insertions(+), 50 deletions(-)
 create mode 100644 arch/arm/include/asm/perf_bits.h

diff --git a/arch/arm/include/asm/perf_bits.h b/arch/arm/include/asm/perf_bits.h
new file mode 100644
index 000..eeb266a
--- /dev/null
+++ b/arch/arm/include/asm/perf_bits.h
@@ -0,0 +1,56 @@
+#ifndef __ARM_PERF_BITS_H__
+#define __ARM_PERF_BITS_H__
+
+/*
+ * ARMv7 low level PMNC access
+ */
+
+/*
+ * Per-CPU PMNC: config reg
+ */
+#define ARMV7_PMNC_E   (1  0) /* Enable all counters */
+#define ARMV7_PMNC_P   (1  1) /* Reset all counters */
+#define ARMV7_PMNC_C   (1  2) /* Cycle counter reset */
+#define ARMV7_PMNC_D   (1  3) /* CCNT counts every 64th cpu cycle */
+#define ARMV7_PMNC_X   (1  4) /* Export to ETM */
+#define ARMV7_PMNC_DP  (1  5) /* Disable CCNT if non-invasive debug*/
+#defineARMV7_PMNC_N_SHIFT  11   /* Number of counters 
supported */
+#defineARMV7_PMNC_N_MASK   0x1f
+#defineARMV7_PMNC_MASK 0x3f /* Mask for writable bits */
+
+/*
+ * FLAG: counters overflow flag status reg
+ */
+#defineARMV7_FLAG_MASK 0x  /* Mask for writable 
bits */
+#defineARMV7_OVERFLOWED_MASK   ARMV7_FLAG_MASK
+
+/*
+ * PMXEVTYPER: Event selection reg
+ */
+#defineARMV7_EVTYPE_MASK   0xc0ff  /* Mask for writable 
bits */
+#defineARMV7_EVTYPE_EVENT  0xff/* Mask for EVENT bits 
*/
+
+/*
+ * Event filters for PMUv2
+ */
+#defineARMV7_EXCLUDE_PL1   (1  31)
+#defineARMV7_EXCLUDE_USER  (1  30)
+#defineARMV7_INCLUDE_HYP   (1  27)
+
+#ifndef __ASSEMBLY__
+static inline u32 armv7_pmnc_read(void)
+{
+   u32 val;
+   asm volatile(mrc p15, 0, %0, c9, c12, 0 : =r(val));
+   return val;
+}
+
+static inline void armv7_pmnc_write(u32 val)
+{
+   val = ARMV7_PMNC_MASK;
+   isb();
+   asm volatile(mcr p15, 0, %0, c9, c12, 0 : : r(val));
+}
+#endif
+
+#endif /* __ARM_PERF_BITS_H__ */
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index f04070b..09851b3 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -17,6 +17,7 @@
  */
 
 #ifdef CONFIG_CPU_V7
+#include asm/perf_bits.h
 
 static struct arm_pmu armv7pmu;
 
@@ -744,61 +745,11 @@ static const unsigned 
armv7_a7_perf_cache_map[PERF_COUNT_HW_CACHE_MAX]
 #defineARMV7_COUNTER_MASK  (ARMV7_MAX_COUNTERS - 1)
 
 /*
- * ARMv7 low level PMNC access
- */
-
-/*
  * Perf Event to low level counters mapping
  */
 #defineARMV7_IDX_TO_COUNTER(x) \
(((x) - ARMV7_IDX_COUNTER0)  ARMV7_COUNTER_MASK)
 
-/*
- * Per-CPU PMNC: config reg
- */
-#define ARMV7_PMNC_E   (1  0) /* Enable all counters */
-#define ARMV7_PMNC_P   (1  1) /* Reset all counters */
-#define ARMV7_PMNC_C   (1  2) /* Cycle counter reset */
-#define ARMV7_PMNC_D   (1  3) /* CCNT counts every 64th cpu cycle */
-#define ARMV7_PMNC_X   (1  4) /* Export to ETM */
-#define ARMV7_PMNC_DP  (1  5) /* Disable CCNT if non-invasive debug*/
-#defineARMV7_PMNC_N_SHIFT  11   /* Number of counters 
supported */
-#defineARMV7_PMNC_N_MASK   0x1f
-#defineARMV7_PMNC_MASK 0x3f /* Mask for writable bits */
-
-/*
- * FLAG: counters overflow flag status reg
- */
-#defineARMV7_FLAG_MASK 0x  /* Mask for writable 
bits */
-#defineARMV7_OVERFLOWED_MASK   ARMV7_FLAG_MASK
-
-/*
- * PMXEVTYPER: Event selection reg
- */
-#defineARMV7_EVTYPE_MASK   0xc0ff  /* Mask for writable 
bits */
-#defineARMV7_EVTYPE_EVENT  0xff/* Mask for EVENT bits 
*/
-
-/*
- * Event filters for PMUv2
- */
-#defineARMV7_EXCLUDE_PL1   (1  31)
-#defineARMV7_EXCLUDE_USER  (1  30)
-#defineARMV7_INCLUDE_HYP   (1  27)
-
-static inline u32 armv7_pmnc_read(void)
-{
-   u32 val;
-   asm volatile(mrc p15, 0, %0, c9, c12, 0 : =r(val));
-   return val;
-}
-
-static inline void armv7_pmnc_write(u32 val)
-{
-   val = ARMV7_PMNC_MASK;
-   isb();
-   asm volatile(mcr p15, 0, %0, c9, c12, 0 : : r(val));
-}
-
 static inline int armv7_pmnc_has_overflowed(u32 pmnc)
 {
return pmnc  ARMV7_OVERFLOWED_MASK;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v10 05/14] KVM: ARM: Initial skeleton to compile KVM support

2012-08-16 Thread Christoffer Dall

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files and some
tracing functionality.

Only supported core is Cortex-A15 for now.

Contains minor reset hook driven from kvm_vcpu_set_target, which will
eventually be a custom ARM ioctl to set the core we are emulating.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

“Nothing to see here. Move along, move along...

Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/Kconfig   |2 
 arch/arm/Makefile  |1 
 arch/arm/include/asm/kvm.h |   79 +
 arch/arm/include/asm/kvm_arm.h |   28 +++
 arch/arm/include/asm/kvm_asm.h |   30 +++
 arch/arm/include/asm/kvm_coproc.h  |   24 +++
 arch/arm/include/asm/kvm_emulate.h |  108 
 arch/arm/include/asm/kvm_host.h|  160 ++
 arch/arm/kvm/Kconfig   |   44 +
 arch/arm/kvm/Makefile  |   23 +++
 arch/arm/kvm/arm.c |  317 
 arch/arm/kvm/coproc.c  |   22 ++
 arch/arm/kvm/emulate.c |  127 ++
 arch/arm/kvm/exports.c |   21 ++
 arch/arm/kvm/guest.c   |  163 +++
 arch/arm/kvm/init.S|   19 ++
 arch/arm/kvm/interrupts.S  |   19 ++
 arch/arm/kvm/mmu.c |   17 ++
 arch/arm/kvm/reset.c   |   74 
 arch/arm/kvm/trace.h   |   52 ++
 include/linux/kvm.h|1 
 21 files changed, 1331 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm.h
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/exports.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index e91c7cd..8cc2e41 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -2341,3 +2341,5 @@ source security/Kconfig
 source crypto/Kconfig
 
 source lib/Kconfig
+
+source arch/arm/kvm/Kconfig
diff --git a/arch/arm/Makefile b/arch/arm/Makefile
index 30eae87..3bcc414 100644
--- a/arch/arm/Makefile
+++ b/arch/arm/Makefile
@@ -255,6 +255,7 @@ core-$(CONFIG_VFP)  += arch/arm/vfp/
 # If we have a machine-specific directory, then include it in the build.
 core-y += arch/arm/kernel/ arch/arm/mm/ 
arch/arm/common/
 core-y += arch/arm/net/
+core-y += arch/arm/kvm/
 core-y += $(machdirs) $(platdirs)
 
 drivers-$(CONFIG_OPROFILE)  += arch/arm/oprofile/
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
new file mode 100644
index 000..bc5d72b
--- /dev/null
+++ b/arch/arm/include/asm/kvm.h
@@ -0,0 +1,79 @@
+/*
+ * Copyright (C) 2012 - Virtual Open Systems and Columbia University
+ * Author: Christoffer Dall c.d...@virtualopensystems.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License, version 2, as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.
+ */
+
+#ifndef __ARM_KVM_H__
+#define __ARM_KVM_H__
+
+#include asm/types.h
+
+#define __KVM_HAVE_GUEST_DEBUG
+
+/*
+ * Modes used for short-hand mode determinition in the world-switch code and
+ * in emulation code.
+ *
+ * Note: These indices do NOT correspond to the value of the CPSR mode bits!
+ */
+enum vcpu_mode {
+   MODE_FIQ = 0,
+   MODE_IRQ,
+   MODE_SVC,
+   MODE_ABT,
+   MODE_UND,
+   MODE_USR,
+   MODE_SYS
+};
+
+struct kvm_regs {
+   __u32 regs0_7[8];   /* Unbanked regs. (r0 - r7)*/
+   __u32 fiq_regs8_12[5];  /* Banked fiq regs. (r8 - r12) */
+   __u32 usr_regs8_12[5];  /* Banked usr

[PATCH v10 06/14] KVM: ARM: Hypervisor inititalization

2012-08-16 Thread Christoffer Dall

Sets up the required registers to run code in HYP-mode from the kernel.

By setting the HVBAR the kernel can execute code in Hyp-mode with
the MMU disabled. The HVBAR initially points to initialization code,
which initializes other Hyp-mode registers and enables the MMU
for Hyp-mode. Afterwards, the HVBAR is changed to point to KVM
Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and is comprised of:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Note: The initialization mechanism currently relies on an SMC #0 call
to the secure monitor, which was merely a fast way of getting to the
hypervisor. We are working on supporting Hyp mode boot of the kernel
and control of Hyp mode through a local kernel mechanism.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |  109 +
 arch/arm/include/asm/kvm_asm.h  |   25 +++
 arch/arm/include/asm/kvm_mmu.h  |   36 
 arch/arm/include/asm/pgtable-3level-hwdef.h |4 
 arch/arm/include/asm/pgtable-3level.h   |4 
 arch/arm/include/asm/pgtable.h  |1 
 arch/arm/kvm/arm.c  |  224 +++
 arch/arm/kvm/exports.c  |   16 ++
 arch/arm/kvm/init.S |  130 
 arch/arm/kvm/interrupts.S   |   48 ++
 arch/arm/kvm/mmu.c  |  189 +++
 mm/memory.c |2 
 12 files changed, 788 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 2f9d28e..6e46541 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -19,10 +19,119 @@
 #ifndef __ARM_KVM_ARM_H__
 #define __ARM_KVM_ARM_H__
 
+#include asm/types.h
+
 /* Supported Processor Types */
 #define CORTEX_A15 (0xC0F)
 
 /* Multiprocessor Affinity Register */
 #define MPIDR_CPUID(0x3  0)
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE(1  27)
+#define HCR_TVM(1  26)
+#define HCR_TTLB   (1  25)
+#define HCR_TPU(1  24)
+#define HCR_TPC(1  23)
+#define HCR_TSW(1  22)
+#define HCR_TAC(1  21)
+#define HCR_TIDCP  (1  20)
+#define HCR_TSC(1  19)
+#define HCR_TID3   (1  18)
+#define HCR_TID2   (1  17)
+#define HCR_TID1   (1  16)
+#define HCR_TID0   (1  15)
+#define HCR_TWE(1  14)
+#define HCR_TWI(1  13)
+#define HCR_DC (1  12)
+#define HCR_BSU(3  10)
+#define HCR_BSU_IS (1  10)
+#define HCR_FB (1  9)
+#define HCR_VA (1  8)
+#define HCR_VI (1  7)
+#define HCR_VF (1  6)
+#define HCR_AMO(1  5)
+#define HCR_IMO(1  4)
+#define HCR_FMO(1  3)
+#define HCR_PTW(1  2)
+#define HCR_SWIO   (1  1)
+#define HCR_VM 1
+
+/*
+ * The bits we set in HCR:
+ * TAC:Trap ACTLR
+ * TSC:Trap SMC
+ * TSW:Trap cache operations by set/way
+ * TWI:Trap WFI
+ * TIDCP:  Trap L2CTLR/L2ECTLR
+ * BSU_IS: Upgrade barriers to the inner shareable domain
+ * FB: Force broadcast of all maintainance operations
+ * AMO:Override CPSR.A and enable signaling with VA
+ * IMO:Override CPSR.I and enable signaling with VI
+ * FMO:Override CPSR.F and enable signaling with VF
+ * SWIO:   Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+   HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+   HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE  (1  30)
+#define HSCTLR_EE  (1  25)
+#define HSCTLR_FI  (1  21)
+#define HSCTLR_WXN (1  19)
+#define HSCTLR_I   (1  12)
+#define HSCTLR_C   (1  2)
+#define HSCTLR_A   (1  1)
+#define HSCTLR_M   1
+#define HSCTLR_MASK(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE  (1  31)
+#define TTBCR_IMP  (1  30)
+#define TTBCR_SH1  (3  28)
+#define TTBCR_ORGN1(3  26)

[PATCH v10 07/14] KVM: ARM: Memory virtualization setup

2012-08-16 Thread Christoffer Dall

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Introduces new ARM-specific kernel memory types, PAGE_KVM_GUEST and
pgprot_guest variables used to map 2nd stage memory for KVM guests.

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_asm.h|2 
 arch/arm/include/asm/kvm_host.h   |   18 ++
 arch/arm/include/asm/kvm_mmu.h|9 +
 arch/arm/include/asm/pgtable-3level.h |9 +
 arch/arm/include/asm/pgtable.h|4 
 arch/arm/kvm/Kconfig  |1 
 arch/arm/kvm/arm.c|   38 +++
 arch/arm/kvm/exports.c|1 
 arch/arm/kvm/interrupts.S |8 +
 arch/arm/kvm/mmu.c|  373 +
 arch/arm/mm/mmu.c |3 
 11 files changed, 465 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 58d51e3..55b6446 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -34,6 +34,7 @@
 #define SMCHYP_HVBAR_W 0xfff0
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -48,6 +49,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index d7e3398..d86ce39 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -157,4 +157,22 @@ struct kvm_vcpu_stat {
 struct kvm_vcpu_init;
 int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
const struct kvm_vcpu_init *init);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+   unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 8252921..11f4c3a 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,4 +33,13 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+ phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 #endif /* __ARM_KVM_MMU_H__ */
diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index 1169a8a..7351eee 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,6 +102,15 @@
  */
 #define L_PGD_SWAPPER  (_AT(pgdval_t, 1)  55)/* 
swapper_pg_dir entry */
 
+/*
+ * 2-nd stage PTE definitions for LPAE.
+ */
+#define L_PTE2_SHARED  L_PTE_SHARED
+#define L_PTE2_READ

[PATCH v10 08/14] KVM: ARM: Inject IRQs and FIQs from userspace

2012-08-16 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

Userspace can inject IRQs and FIQs through the KVM_IRQ_LINE VM ioctl.
This ioctl is used since the sematics are in fact two lines that can be
either raised or lowered on the VCPU - the IRQ and FIQ lines.

KVM needs to know which VCPU it must operate on and whether the FIQ or
IRQ line is raised/lowered. Hence both pieces of information is packed
in the kvm_irq_level-irq field. The irq fild value will be:
  IRQ: vcpu_index  1
  FIQ: (vcpu_index  1) | 1

This is documented in Documentation/kvm/api.txt.

The effect of the ioctl is simply to simply raise/lower the
corresponding irq_line field on the VCPU struct, which will cause the
world-switch code to raise/lower virtual interrupts when running the
guest on next switch. The wait_for_interrupt flag is also cleared for
raised IRQs or FIQs causing an idle VCPU to become active again. CPUs
in guest mode are kicked to make sure they refresh their interrupt status.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   12 ++---
 arch/arm/include/asm/kvm.h|9 +++
 arch/arm/include/asm/kvm_arm.h|1 +
 arch/arm/kvm/arm.c|   47 +
 include/linux/kvm.h   |1 +
 5 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bf33aaa..8345b78 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -614,15 +614,19 @@ only go to the IOAPIC.  On ia64, a IOSAPIC is created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM uses two types of interrupt lines per CPU: IRQ and FIQ.  The value of the
+irq field should be (vcpu_index  1) for IRQs and ((vcpu_index  1) | 1) for
+FIQs. Level is used to raise/lower the line.
 
 struct kvm_irq_level {
union {
diff --git a/arch/arm/include/asm/kvm.h b/arch/arm/include/asm/kvm.h
index bc5d72b..4a3e25d 100644
--- a/arch/arm/include/asm/kvm.h
+++ b/arch/arm/include/asm/kvm.h
@@ -22,6 +22,15 @@
 #include asm/types.h
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
+
+/*
+ * KVM_IRQ_LINE macros to set/read IRQ/FIQ for specific VCPU index.
+ */
+enum KVM_ARM_IRQ_LINE_TYPE {
+   KVM_ARM_IRQ_LINE = 0,
+   KVM_ARM_FIQ_LINE = 1,
+};
 
 /*
  * Modes used for short-hand mode determinition in the world-switch code and
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 6e46541..0f641c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -74,6 +74,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 3f97e7c..8306587 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -24,6 +24,7 @@
 #include linux/fs.h
 #include linux/mman.h
 #include linux/sched.h
+#include linux/kvm.h
 #include trace/events/kvm.h
 
 #define CREATE_TRACE_POINTS
@@ -265,6 +266,7 @@ void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 {
+   vcpu-cpu = cpu;
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
@@ -305,6 +307,51 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
return -EINVAL;
 }
 
+int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct kvm_irq_level *irq_level)
+{
+   unsigned int vcpu_idx;
+   struct kvm_vcpu *vcpu;
+   unsigned long *ptr;
+   bool set;
+   int bit_index;
+
+   vcpu_idx = irq_level-irq  1;
+   if (vcpu_idx = KVM_MAX_VCPUS)
+   return -EINVAL;
+
+   vcpu = kvm_get_vcpu(kvm, vcpu_idx);
+   if (!vcpu)
+   return -EINVAL;
+
+   trace_kvm_set_irq(irq_level-irq, irq_level-level, 0);
+
+   if ((irq_level-irq  1) == KVM_ARM_IRQ_LINE)
+   bit_index = ffs(HCR_VI) - 1;
+   else /* KVM_ARM_FIQ_LINE */
+   bit_index = ffs(HCR_VF) - 1;
+
+   ptr = (unsigned long *)vcpu-arch.irq_lines;
+   if (irq_level-level)
+

[PATCH v10 09/14] KVM: ARM: World-switch implementation

2012-08-16 Thread Christoffer Dall

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Switching from host kernel to Hyp-mode:
   Switching to Hyp mode is done through a simple HVC instructions. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will store the necessary state on the Hyp stack, which will look like
   this (growing downwards, see the hyp_hvc handler):
 ...
 stack_page + 4: spsr (Host-SVC cpsr)
 stack_page: lr_usr
 --: stack bottom

Hyp-ABI: Switching from Hyp-mode to host kernel SVC mode:
   When returning from Hyp mode to SVC mode, another HVC instruction is
   executed from Hyp mode, which is taken in the hyp_svc handler. The
   bottom of the Hyp is derived from the Hyp stack pointer (only a single
   page aligned stack is used per CPU) and the initial SVC registers are
   used to restore the host state.

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |   38 ++
 arch/arm/include/asm/kvm_host.h |   10 +
 arch/arm/kernel/asm-offsets.c   |   45 ++
 arch/arm/kvm/arm.c  |  166 +
 arch/arm/kvm/interrupts.S   |  711 +++
 5 files changed, 967 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 0f641c1..ee345a6 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -104,6 +104,18 @@
 #define TTBCR_T0SZ 3
 #define HTCR_MASK  (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)  (1  x)
+#define HSTR_TTEE  (1  16)
+#define HSTR_TJDBX (1  17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)   (1  x)
+#define HCPTR_TCP_MASK (0x3fff)
+#define HCPTR_TASE (1  15)
+#define HCPTR_TTA  (1  20)
+#define HCPTR_TCPAC(1  31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA  (1  11)
 #define HDCR_TDOSA (1  10)
@@ -134,5 +146,31 @@
 #define VTTBR_X(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT   (26)
+#define HSR_EC (0x3fU  HSR_EC_SHIFT)
+#define HSR_IL (1U  25)
+#define HSR_ISS(HSR_IL - 1)
+#define HSR_ISV_SHIFT  (24)
+#define HSR_ISV(1U  HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN (0x00)
+#define HSR_EC_WFI (0x01)
+#define HSR_EC_CP15_32 (0x03)
+#define HSR_EC_CP15_64 (0x04)
+#define HSR_EC_CP14_MR (0x05)
+#define HSR_EC_CP14_LS (0x06)
+#define HSR_EC_CP_0_13 (0x07)
+#define HSR_EC_CP10_ID (0x08)
+#define HSR_EC_JAZELLE (0x09)
+#define HSR_EC_BXJ (0x0A)
+#define HSR_EC_CP14_64 (0x0C)
+#define HSR_EC_SVC_HYP (0x11)
+#define HSR_EC_HVC (0x12)
+#define HSR_EC_SMC (0x13)
+#define HSR_EC_IABT(0x20)
+#define HSR_EC_IABT_HYP(0x21)
+#define HSR_EC_DABT(0x24)
+#define HSR_EC_DABT_HYP(0x25)
 
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h

[PATCH v10 10/14] KVM: ARM: Emulation framework and CP15 emulation

2012-08-16 Thread Christoffer Dall

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skip the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.


Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |9 +
 arch/arm/include/asm/kvm_coproc.h  |7 
 arch/arm/include/asm/kvm_emulate.h |5 
 arch/arm/include/asm/kvm_host.h|5 
 arch/arm/kvm/arm.c |  166 ++
 arch/arm/kvm/coproc.c  |  572 
 arch/arm/kvm/emulate.c |  120 
 arch/arm/kvm/trace.h   |   28 ++
 8 files changed, 910 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ee345a6..ae586c1 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -76,6 +76,11 @@
HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE   (1  30)
+#define SCTLR_EE   (1  25)
+#define SCTLR_V(1  13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
 #define HSCTLR_EE  (1  25)
@@ -153,6 +158,10 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_CV_SHIFT   (24)
+#define HSR_CV (1U  HSR_CV_SHIFT)
+#define HSR_COND_SHIFT (20)
+#define HSR_COND   (0xfU  HSR_COND_SHIFT)
 
 #define HSR_EC_UNKNOWN (0x00)
 #define HSR_EC_WFI (0x01)
diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..c451fb4 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,11 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 9e29335..d914029 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -51,6 +51,11 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
return mode;
 }
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+
 /*
  * Return the SPSR for the specified mode of the virtual CPU.
  */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 5414eeb..778d2af 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -102,6 +102,7 @@ enum cp15_regs {
c5_AIFSR,   /* Auxilary Instruction Fault Status Register */
c6_DFAR,/* Data Fault Address Register */
c6_IFAR,/* Instruction Fault Address Register */
+   c9_L2CTLR,  /* Cortex A15 L2 Control Register */
c10_PRRR,   /* Primary Region Remap Register */
c10_NMRR,   /* Normal Memory Remap Register */
c12_VBAR,   /* Vector Base Address Register */
@@ -142,6 +143,10 @@ struct kvm_vcpu_arch {
 * Anything that is not used directly from assembly code goes
 * here.
 */
+   /* dcache set/way operation pending */
+   int last_pcpu;
+   cpumask_t require_dcache_flush;
+
/* IO related fields */
bool mmio_sign_extend;  /* for byte/halfword loads */
u32 mmio_rd;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 28bf2c2..8eec273 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -37,10 +37,13 @@
 #include asm/cputype.h
 #include asm/idmap.h
 #include asm/tlbflush.h
+#include asm/cacheflush.h
 #include

[PATCH v10 12/14] KVM: ARM: Handle guest faults in KVM

2012-08-16 Thread Christoffer Dall

Handles the guest faults in KVM by mapping in corresponding user pages
in the 2nd stage page tables.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |9 
 arch/arm/include/asm/kvm_asm.h |2 +
 arch/arm/kvm/mmu.c |  102 
 3 files changed, 112 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index ae586c1..4cff3b7 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,11 +158,20 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_FSC(0x3f)
+#define HSR_FSC_TYPE   (0x3c)
+#define HSR_WNR(1  6)
 #define HSR_CV_SHIFT   (24)
 #define HSR_CV (1U  HSR_CV_SHIFT)
 #define HSR_COND_SHIFT (20)
 #define HSR_COND   (0xfU  HSR_COND_SHIFT)
 
+#define FSC_FAULT  (0x04)
+#define FSC_PERM   (0x0c)
+
+/* Hyp Prefetch Fault Address Register (HPFAR/HDFAR) */
+#define HPFAR_MASK (~0xf)
+
 #define HSR_EC_UNKNOWN (0x00)
 #define HSR_EC_WFI (0x01)
 #define HSR_EC_CP15_32 (0x03)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 55b6446..85bd676 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -48,6 +48,8 @@ extern char __kvm_hyp_vector[];
 extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
+
 extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
index 6cb0e38..448fbd6 100644
--- a/arch/arm/kvm/mmu.c
+++ b/arch/arm/kvm/mmu.c
@@ -25,6 +25,7 @@
 #include asm/kvm_mmu.h
 #include asm/kvm_asm.h
 #include asm/mach/map.h
+#include asm/kvm_asm.h
 
 static DEFINE_MUTEX(kvm_hyp_pgd_mutex);
 
@@ -491,9 +492,108 @@ out:
return ret;
 }
 
+static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+ gfn_t gfn, struct kvm_memory_slot *memslot,
+ bool is_iabt)
+{
+   pte_t new_pte;
+   pfn_t pfn;
+   int ret;
+   bool write_fault, writable;
+   struct kvm_mmu_memory_cache *memcache = vcpu-arch.mmu_page_cache;
+
+   /* TODO: Use instr. decoding for non-ISV to determine r/w fault */
+   if (is_iabt)
+   write_fault = false;
+   else if ((vcpu-arch.hsr  HSR_ISV)  !(vcpu-arch.hsr  HSR_WNR))
+   write_fault = false;
+   else
+   write_fault = true;
+
+   if ((vcpu-arch.hsr  HSR_FSC_TYPE) == FSC_PERM  !write_fault) {
+   kvm_err(Unexpected L2 read permission error\n);
+   return -EFAULT;
+   }
+
+   pfn = gfn_to_pfn_prot(vcpu-kvm, gfn, write_fault, writable);
+
+   if (is_error_pfn(pfn)) {
+   put_page(pfn_to_page(pfn));
+   kvm_err(No host mapping: gfn %u (0x%08x)\n,
+   (unsigned int)gfn,
+   (unsigned int)gfn  PAGE_SHIFT);
+   return -EFAULT;
+   }
+
+   /* We need minimum second+third level pages */
+   ret = mmu_topup_memory_cache(memcache, 2, KVM_NR_MEM_OBJS);
+   if (ret)
+   return ret;
+   new_pte = pfn_pte(pfn, PAGE_KVM_GUEST);
+   if (writable)
+   new_pte |= L_PTE2_WRITE;
+   spin_lock(vcpu-kvm-arch.pgd_lock);
+   stage2_set_pte(vcpu-kvm, memcache, fault_ipa, new_pte);
+   spin_unlock(vcpu-kvm-arch.pgd_lock);
+
+   return ret;
+}
+
+/**
+ * kvm_handle_guest_abort - handles all 2nd stage aborts
+ * @vcpu:  the VCPU pointer
+ * @run:   the kvm_run structure
+ *
+ * Any abort that gets to the host is almost guaranteed to be caused by a
+ * missing second stage translation table entry, which can mean that either the
+ * guest simply needs more memory and we must allocate an appropriate page or 
it
+ * can mean that the guest tried to access I/O memory, which is emulated by 
user
+ * space. The distinction is based on the IPA causing the fault and whether 
this
+ * memory region has been registered as standard RAM by user space.
+ */
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-   return -EINVAL;
+   unsigned long hsr_ec;
+   unsigned long fault_status;
+   phys_addr_t fault_ipa;
+   struct kvm_memory_slot *memslot = NULL;
+   bool is_iabt;
+   gfn_t gfn;
+   int ret;
+
+   hsr_ec = vcpu-arch.hsr  HSR_EC_SHIFT;
+   is_iabt = (hsr_ec == HSR_EC_IABT);
+
+   /* Check that the second stage fault is a translation fault */
+   fault_status = (vcpu-arch.hsr  HSR_FSC_TYPE);
+   if (fault_status != FSC_FAULT  fault_status != FSC_PERM) {
+   kvm_err(Unsupported fault status: EC=%#lx DFCS=%#lx\n,
+

[PATCH v10 13/14] KVM: ARM: Handle I/O aborts

2012-08-16 Thread Christoffer Dall

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |3 
 arch/arm/include/asm/kvm_emulate.h |2 
 arch/arm/include/asm/kvm_mmu.h |1 
 arch/arm/kvm/arm.c |6 +
 arch/arm/kvm/emulate.c |  273 
 arch/arm/kvm/mmu.c |  162 +
 arch/arm/kvm/trace.h   |   21 +++
 7 files changed, 466 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4cff3b7..21cb240 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -158,8 +158,11 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT  (16)
+#define HSR_SRT_MASK   (0xf  HSR_SRT_SHIFT)
 #define HSR_FSC(0x3f)
 #define HSR_FSC_TYPE   (0x3c)
+#define HSR_SSE(1  21)
 #define HSR_WNR(1  6)
 #define HSR_CV_SHIFT   (24)
 #define HSR_CV (1U  HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index d914029..d899fbb 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -52,6 +52,8 @@ static inline enum vcpu_mode vcpu_mode(struct kvm_vcpu *vcpu)
 }
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+   unsigned long instr);
 void kvm_adjust_itstate(struct kvm_vcpu *vcpu);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 11f4c3a..c3f90b0 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -38,6 +38,7 @@ void kvm_free_stage2_pgd(struct kvm *kvm);
 int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
  phys_addr_t pa, unsigned long size);
 
+int kvm_handle_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
 
 void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 4eafdcd..31ddf56 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -565,6 +565,12 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (unlikely(!vcpu-arch.target))
return -ENOEXEC;
 
+   if (run-exit_reason == KVM_EXIT_MMIO) {
+   ret = kvm_handle_mmio_return(vcpu, vcpu-run);
+   if (ret)
+   return ret;
+   }
+
if (vcpu-sigset_active)
sigprocmask(SIG_SETMASK, vcpu-sigset, sigsaved);
 
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index 93bd3e2..cc5fa89 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -132,11 +132,284 @@ u32 *vcpu_reg_mode(struct kvm_vcpu *vcpu, u8 reg_num, 
u32 mode)
return reg_array + vcpu_reg_offsets[mode][reg_num];
 }
 
+/**
+ * Utility functions common for all emulation code
+ */
+
+/*
+ * This one accepts a matrix where the first element is the
+ * bits as they must be, and the second element is the bitmask.
+ */
+#define INSTR_NONE -1
+static int kvm_instr_index(u32 instr, u32 table[][2], int table_entries)
+{
+   int i;
+   u32 mask;
+
+   for (i = 0; i  table_entries; i++) {
+   mask = table[i][1];
+   if ((table[i][0]  mask) == (instr  mask))
+   return i;
+   }
+   return INSTR_NONE;
+}
+
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
return 0;
 }
 
+
+/**
+ * Load-Store instruction emulation
+ */
+
+/*
+ * Must be ordered with LOADS first and WRITES afterwards
+ * for easy distinction when doing MMIO.
+ */
+#define NUM_LD_INSTR  9

[PATCH v10 14/14] KVM: ARM: Guest wait-for-interrupts (WFI) support

2012-08-16 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

When the guest executes a WFI instruction the operation is trapped to
KVM, which emulates the instruction in software. There is no correlation
between a guest executing a WFI instruction and actually putting the
hardware into a low-power mode, since a KVM guest is essentially a
process and the WFI instruction can be seen as 'sleep' call from this
process. Therefore, we block the vcpu when the guest excecutes a wfi
instruction and the IRQ or FIQ lines are not raised.

When an interrupt comes in through KVM_IRQ_LINE (see previous patch) we
signal the VCPU thread and unflag the VCPU to no longer wait for
interrupts.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/arm.c |   10 --
 arch/arm/kvm/emulate.c |   13 -
 arch/arm/kvm/trace.h   |   16 
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 31ddf56..09a6800 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -313,9 +313,16 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
return -EINVAL;
 }
 
+/**
+ * kvm_arch_vcpu_runnable - determine if the vcpu can be scheduled
+ * @v: The VCPU pointer
+ *
+ * If the guest CPU is not waiting for interrupts or an interrupt line is
+ * asserted, the CPU is by definition runnable.
+ */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return 0;
+   return !!v-arch.irq_lines;
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -581,7 +588,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * Check conditions before entering the guest
 */
cond_resched();
-
update_vttbr(vcpu-kvm);
 
local_irq_disable();
diff --git a/arch/arm/kvm/emulate.c b/arch/arm/kvm/emulate.c
index cc5fa89..6cbdb08 100644
--- a/arch/arm/kvm/emulate.c
+++ b/arch/arm/kvm/emulate.c
@@ -154,9 +154,20 @@ static int kvm_instr_index(u32 instr, u32 table[][2], int 
table_entries)
return INSTR_NONE;
 }
 
+/**
+ * kvm_handle_wfi - handle a wait-for-interrupts instruction executed by a 
guest
+ * @vcpu:  the vcpu pointer
+ * @run:   the kvm_run structure pointer
+ *
+ * Simply sets the wait_for_interrupts flag on the vcpu structure, which will
+ * halt execution of world-switches and schedule other host processes until
+ * there is an incoming IRQ or FIQ to the VM.
+ */
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
-   return 0;
+   trace_kvm_wfi(vcpu-arch.regs.pc);
+   kvm_vcpu_block(vcpu);
+   return 1;
 }
 
 
diff --git a/arch/arm/kvm/trace.h b/arch/arm/kvm/trace.h
index 325106c..28ed1a1 100644
--- a/arch/arm/kvm/trace.h
+++ b/arch/arm/kvm/trace.h
@@ -90,6 +90,22 @@ TRACE_EVENT(kvm_emulate_cp15_imp,
__entry-CRm, __entry-Op2)
 );
 
+TRACE_EVENT(kvm_wfi,
+   TP_PROTO(unsigned long vcpu_pc),
+   TP_ARGS(vcpu_pc),
+
+   TP_STRUCT__entry(
+   __field(unsigned long,  vcpu_pc )
+   ),
+
+   TP_fast_assign(
+   __entry-vcpu_pc= vcpu_pc;
+   ),
+
+   TP_printk(guest executed wfi at: 0x%08lx, __entry-vcpu_pc)
+);
+
+
 #endif /* _TRACE_KVM_H */
 
 #undef TRACE_INCLUDE_PATH

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vm pxe fail

2012-08-16 Thread Stefan Hajnoczi

On Thu, Aug 16, 2012 at 4:09 PM, Andrew Holway a.hol...@syseleven.de wrote:

 On Aug 16, 2012, at 3:54 PM, Stefan Hajnoczi wrote:

 On Thu, Aug 16, 2012 at 1:25 PM, Andrew Holway a.hol...@syseleven.de wrote:
 I have a kvm vm that I am attempting to boot from pxe. The dhcp works 
 perfectly and I can see the VM in the pxe server arp. but the tftp just 
 times out. I don't see any tftp traffic on either the physical host or on 
 the pie server. I am using a bridged interface. I have tried using several 
 virtual nic drivers, several different mac addresses and several different 
 ips.  on the physical host I can get the pxelinux.0 file from the pxe 
 server via tftp and can clearly see that traffic with tcpdump.

 Ive tried using various virtual interfaces.

 I can pxe boot my physical hosts with no problems.

 I can tftp fine from the physical host and see the traffic with ethdump

 Have you run tcpdump on the tap interface?  (This is different from
 running tcpdump on host eth0 because it is earlier in the network path
 and happens before the software bridge.)

 Yes. I can just see DHCP traffic.


 What do iptables -L -n and ebtables -L say?


 [root@node002 ~]# iptables -L -n
 Chain INPUT (policy ACCEPT)
 target prot opt source   destination
 ACCEPT udp  --  0.0.0.0/00.0.0.0/0   udp dpt:53
 ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   tcp dpt:53
 ACCEPT udp  --  0.0.0.0/00.0.0.0/0   udp dpt:67
 ACCEPT tcp  --  0.0.0.0/00.0.0.0/0   tcp dpt:67

 Chain FORWARD (policy ACCEPT)
 target prot opt source   destination
 ACCEPT all  --  0.0.0.0/0192.168.122.0/24state 
 RELATED,ESTABLISHED
 ACCEPT all  --  192.168.122.0/24 0.0.0.0/0
 ACCEPT all  --  0.0.0.0/00.0.0.0/0
 REJECT all  --  0.0.0.0/00.0.0.0/0   reject-with 
 icmp-port-unreachable
 REJECT all  --  0.0.0.0/00.0.0.0/0   reject-with 
 icmp-port-unreachable

 Chain OUTPUT (policy ACCEPT)
 target prot opt source   destination


 [root@node002 ~]# ebtables -L
 Bridge table: filter

 Bridge chain: INPUT, entries: 0, policy: ACCEPT

 Bridge chain: FORWARD, entries: 0, policy: ACCEPT

 Bridge chain: OUTPUT, entries: 0, policy: ACCEPT

 [root@node002 ~]# tcpdump -i vnet0 udp
 tcpdump: WARNING: vnet0: no IPv4 address assigned
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on vnet0, link-type EN10MB (Ethernet), capture size 65535 bytes
 17:08:08.849344 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 387
 17:08:08.849413 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 387
 17:08:08.849661 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
 BOOTP/DHCP, Reply, length 360
 17:08:09.812645 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 387
 17:08:09.812709 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 387
 17:08:09.812903 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
 BOOTP/DHCP, Reply, length 360
 17:08:11.789993 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 399
 17:08:11.790107 IP 0.0.0.0.bootpc  255.255.255.255.bootps: BOOTP/DHCP, 
 Request from 00:00:00:00:00:0d (oui Ethernet), length 399
 17:08:11.790294 IP master.cm.cluster.bootps  255.255.255.255.bootpc: 
 BOOTP/DHCP, Reply, length 360

Strange how the VM gets 3 DHCP Replies.  That makes it seem like the
VM isn't receiving/processing the DHCP Replies.

Your screenshot shows that the IP address and other network details
from DHCP are being received though...

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] KVM: PPC: booke/bookehv: Add guest debug support

2012-08-16 Thread Bhushan Bharat-R65777

 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 31, 2012 3:31 AM
 To: Bhushan Bharat-R65777
 Cc: Wood Scott-B07421; kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 ag...@suse.de
 Subject: Re: [PATCH 2/2] KVM: PPC: booke/bookehv: Add guest debug support

 On 07/30/2012 02:37 AM, Bhushan Bharat-R65777 wrote:

  -Original Message-
  From: Wood Scott-B07421
  Sent: Friday, July 27, 2012 7:00 AM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
  Bhushan Bharat-
  R65777
  Subject: Re: [PATCH 2/2] KVM: PPC: booke/bookehv: Add guest debug
  support

  On 07/26/2012 12:32 AM, Bharat Bhushan wrote:
  This patch adds:
   1) KVM debug handler added for e500v2.
   2) Guest debug by qemu gdb stub.

  Does it make sense for these to both be in the same patch?  If
  there's common code used by both, that could be added first.

  ok

  Signed-off-by: Liu Yu yu@freescale.com
  Signed-off-by: Varun Sethi varun.se...@freescale.com
  [bharat.bhus...@freescale.com: Substantial changes]
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/include/asm/kvm.h|   21 +
   arch/powerpc/include/asm/kvm_host.h   |7 ++
   arch/powerpc/include/asm/kvm_ppc.h|2 +
   arch/powerpc/include/asm/reg_booke.h  |1 +
   arch/powerpc/kernel/asm-offsets.c |   31 ++-
   arch/powerpc/kvm/booke.c  |  146 
  +++---
   arch/powerpc/kvm/booke_interrupts.S   |  160
  -
   arch/powerpc/kvm/bookehv_interrupts.S |  141 
  -
   arch/powerpc/kvm/e500mc.c |3 +-
   arch/powerpc/kvm/powerpc.c|2 +-
   10 files changed, 492 insertions(+), 22 deletions(-)

  diff --git a/arch/powerpc/include/asm/kvm.h
  b/arch/powerpc/include/asm/kvm.h index 3c14202..da71c84 100644
  --- a/arch/powerpc/include/asm/kvm.h
  +++ b/arch/powerpc/include/asm/kvm.h
  @@ -25,6 +25,7 @@
   /* Select powerpc specific features in linux/kvm.h */  #define
  __KVM_HAVE_SPAPR_TCE  #define __KVM_HAVE_PPC_SMT
  +#define __KVM_HAVE_GUEST_DEBUG

   struct kvm_regs {
__u64 pc;
  @@ -265,10 +266,19 @@ struct kvm_fpu {  };

   struct kvm_debug_exit_arch {
  + __u32 exception;
  + __u32 pc;
  + __u32 status;
   };

  PC must be 64-bit.  What goes in status and exception?

  ok

   /* for KVM_SET_GUEST_DEBUG */
   struct kvm_guest_debug_arch {
  + struct {
  + __u64 addr;
  + __u32 type;
  + __u32 pad1;
  + __u64 pad2;
  + } bp[16];
   };

  What goes in type?

  Type denote breakpoint, read watchpoint, write watchpoint or watchpoint 
  (both
 read and write). Will adding a comment to describe this is ok?

 Yes, please make sure all of this is well documented.

   /* definition of registers in kvm_run */ @@ -285,6 +295,17 @@
  struct kvm_sync_regs {
   #define KVM_CPU_3S_644
   #define KVM_CPU_E500MC   5

  +/* Debug related defines */
  +#define KVM_INST_GUESTGDB   0x7C00021C  /* ehpriv OC=0 */

  Will this work on all PPC?

  It certainly won't work on other architectures, so at a minimum it's
  KVM_PPC_INST_GUEST_GDB, but maybe it needs to be determined at runtime.

  How to determine at run time? adding another ioctl ?

 Or extend an existing one.  Is there any other information about debug
 capabilities that you expose -- number of hardware breakpoints supported, etc?

  +#define KVM_GUESTDBG_USE_SW_BP  0x0001
  +#define KVM_GUESTDBG_USE_HW_BP  0x0002

  Where do these get used?  Any reason for these particular values?  If
  you're trying to create a partition where the upper half is generic
  and the lower half is arch-specific, say so.

  KVM_SET_GUEST_DEBUG ioctl used to set/unset debug interrupts, which
  have a u32 control element. We have inherited this mechanism from
  x86 implementation and it looks like lower 16 bits are generic (like
  KVM_GUESTDBG_ENBLE, KVM_GUESTDBG_SINGLESTEP etc and upper 16 bits are
  Architecture specific.

  I will add a comment to describe this.

 I don't think the sw/hw distinction belongs here -- it should be per 
 breakpoint.

  + run-exit_reason = KVM_EXIT_DEBUG;
  + run-debug.arch.pc = vcpu-arch.pc;
  + run-debug.arch.exception = exit_nr;
  + run-debug.arch.status = 0;
  + kvmppc_account_exit(vcpu, DEBUG_EXITS);
  + return RESUME_HOST;

  The interface isn't (clearly labelled as) booke specific, but you
  return booke- specific exception numbers.  How's userspace supposed
  to know what to do with them?  What do you plan on doing with them in QEMU?

  This is booke specific.

 Then put booke in the name, but what about it really needs to be booke 
 specific?
 Why does QEMU care about the exception type?

  +#ifndef CONFIG_PPC_FSL_BOOK3E
  + PPC_LD(r7, VCPU_HOST_DBG+KVMPPC_DBG_IAC3, r4)
  + PPC_LD(r8,

Re: [Qemu-devel] Windows slow boot: contractor wanted

2012-08-16 Thread Troy Benjegerdes

I'd be interested in working on this.. What I'd like to propose is to write
an automated regression test harness that will reboot the host hardware, and
start booting up guest VMs and report the time-to-boot, as well as relative
performance of the running VMs.

For best results, I'd need access to the specific hardware you are using.

I'd also like to release the test harness back to the community, so I would
like some feedback from the mailing list on what kinds of tests should be
written that would provide the best information for the KVM developers.

What do you want to know, and what is the most usefull data to record to
debug this and future performance regressions?

On Thu, Aug 16, 2012 at 11:47:27AM +0100, Richard Davies wrote:
 Hi,
 
 We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a
 contractor to track down and fix problems we have with large memory Windows
 guests booting very slowly - they can take several hours.
 
 We previously reported these problems in July (copied below) and they are
 still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1.
 
 This is a serious issue for us which is causing significant pain to our
 larger Windows VM customers when their servers are offline for many hours
 during boot.
 
 If anyone knowledgeable in the area would be interested in being paid to
 work on this, or if you know someone who might be, I would be delighted to
 hear from you.
 
 Cheers,
 
 Richard.
 
 
 = Previous bug report
 
 http://marc.info/?l=qemu-develm=134304194329745
 
 
 We have been experiencing this problem for a while now too, using qemu-kvm
 (currently at 1.1.1).
 
 Unfortunately, hv_relaxed doesn't seem to fix it. The following command line
 produces the issue:
 
 qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus 
 -usbdevice tablet -vnc :99 -monitor stdio -hda test.img
 
 The hardware consists of dual AMD Opteron 6128 processors (16 cores in
 total) and 64GB of memory. This command line was tested on kernel 3.1.4. 
 
 I've also tested with -no-hpet.
 
 What I have seen is much as described: the memory fills out slowly, and top
 on the host will show the process using 100% on all allocated CPU cores. The
 most extreme case was a machine which took something between 6 and 8 hours
 to boot.
 
 This seems to be related to the assigned memory, as described, but also the
 number of processor cores (which makes sense if we believe it's a timing
 issue?). I have seen slow-booting guests improved by switching down to a
 single or even two cores.
 
 Matthew, I agree that this seems to be linked to the number of VMs running -
 in fact, shutting down other VMs on a dedicated test host caused the machine
 to start booting at a normal speed (with no reboot required).
 
 However, the level of contention is never such that this could be explained
 by the host simply being overcommitted.
 
 If it helps anyone, there's an image of the hard drive I've been using to
 test at:
 
 http://46.20.114.253/
 
 It's 5G of gzip file containing a fairly standard Windows 2008 trial
 installation. Since it's in the trial period, anyone who wants to use it may
 have to re-arm the trial: http://support.microsoft.com/kb/948472
 
 Please let me know if I can provide any more information, or test anything.
 
 Best wishes,
 
 Owen Tuz
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Marcelo Tosatti

On Thu, Aug 16, 2012 at 12:03:01PM +0300, Avi Kivity wrote:
 On 08/15/2012 08:53 PM, Marcelo Tosatti wrote:
  On Wed, Aug 15, 2012 at 01:44:14PM +0300, Avi Kivity wrote:
  On 08/14/2012 06:51 PM, Marcelo Tosatti wrote:
   
   Userspace may want to modify the ROM (for example, when programming a
   flash device).  It is also possible to map an hva range rw through one
   slot and ro through another.
   
   Right, can do that with multiple userspace maps to the same anonymous 
   memory region (see other email).
  
  Yes it's possible.  It requires that we move all memory allocation to be
  fd based, since userspace can't predict what memory will be dual-mapped
  (at least if emulated hardware allows this).
  
  It can:
  - Create named memory object, with associated fd.
  - Copy data from large anonymous memory region to named memory.
 
 That doesn't work if dma is in progress (assigned device).  It also
 doubles the amount of memory in use.

  - Unmap region that must be dual-mapped from large anonymous memory chunk.
  - Map named memory object at address.
  
  The last step can be replaced by adjusting KVM memory slots.
  
  The disadvantage of protection information in memory slots
  is that it duplicates functionality that is handled by 
  userspace mappings.
 
 Agree.  So does the memory slots mechanism, and even dirty logging.

  Moreover, multiple memory maps are necessary for any
  split-qemu-into-smaller-pieces solutions.
 
 Complex users can use complex mechanism, but let's keep the simple stuff
 simple.
 
  
   Is this a reasonable
  requirement?  Do ksm/thp/autonuma work with this?
  
  As mentioned, only memory used for ROM purposes must be dual mapped. 
  
  I don't think there is any way to create multiple mappings 
  to one anonymous memory object ATM, but POSIX defines it
  (posix_typed_mem_open).
  
  The limitation of thp/ksm on shared memory also affects any other user
  of shared memory, so it should be fixed there.
  
  Also, QEMU ROM is allocated separately from RAM, correct?
  
 
 Correct.  But the chipset is also able to to write-protect some ranges
 in the 0xc-0x10 area via the PAM.  It is able to write-protect
 both RAM and PCI memory (usually mapped to flash).

You are convinced that adding read-write protection information to the
memory slots, which controls access by the guest, in addition to the
userspace host pagetables, is useful. OK.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Marcelo Tosatti

On Thu, Aug 16, 2012 at 01:49:11PM +0800, Xiao Guangrong wrote:
 On 08/14/2012 11:25 PM, Marcelo Tosatti wrote:
  On Tue, Aug 14, 2012 at 10:58:07AM +0800, Xiao Guangrong wrote:
  On 08/14/2012 01:39 AM, Marcelo Tosatti wrote:
  On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
  On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
  On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
  Changelog:
  - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
  - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
 
  The test case can be found at:
  http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
 
  In current code, if we map a readonly memory space from host to guest
  and the page is not currently mapped in the host, we will get a 
  fault-pfn
  and async is not allowed, then the vm will crash.
 
  As Avi's suggestion, We introduce readonly memory region to map 
  ROM/ROMD
  to the guest, read access is happy for readonly memslot, write access 
  on
  readonly memslot will cause KVM_EXIT_MMIO exit.
 
  Memory slots whose QEMU mapping is write protected is supported
  today, as long as there are no write faults.
 
  What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
  again?
 
 
  It is happy to map !write host memory space to the readonly memslot,
  and they can coexist as well.
 
  readonly memslot checks the write-permission by seeing slot-flags and
  !write memory checks the write-permission in hva_to_pfn() function
  which checks vma-flags. It is no conflict.
 
  Yes, there is no conflict. The point is, if you can use the
  mmap(PROT_READ) interface (supporting read faults on read-only slots)
  for this behavior, what is the advantage of a new memslot flag?
 
 
  You can get the discussion at:
  https://lkml.org/lkml/2012/5/22/228
 
  I'm not saying mmap(PROT_READ) is the best interface, i am just asking
  why it is not.
 
  My fault. :(
 
 
  The initial objective was to fix a vm crash, can you explain that
  initial problem?
 
 
  The issue was trigged by this code:
 
  } else {
  if (async  (vma-vm_flags  VM_WRITE))
  *async = true;
  pfn = KVM_PFN_ERR_FAULT;
  }
 
  If the host memory region is readonly (!vma-vm_flags  VM_WRITE) and
  its physical page is swapped out (or the file data does not be read in),
  get_user_page_nowait will fail, above code reject to set async,
  then we will get a fault pfn and async=false.
 
  I guess this issue also exists in QEMU write protected mapping as
  you mentioned above.
 
  Yes, it does. As far as i understand, what that check does from a high
  level pov is:
 
  - Did get_user_pages_nowait() fail due to a swapped out page (in which 
  case we should try to swappin the page asynchronously), or due to 
  another reason (for which case an error should be returned).
 
  Using vma-vm_flags VM_WRITE for that is trying to guess why
  get_user_pages_nowait() failed, because it (gup_nowait return values) 
  does not provide sufficient information by itself.
 
 
  That is exactly what i did in the first version. :)
 
  You can see it and the reason why it switched to the new way (readonly 
  memslot)
  in the above website (the first message in thread).
  
  Userspace can create multiple mappings for the same memory region, for
  example via shared memory (shm_open), and have different protections for
  the two (or more) regions. I had old patch doing this, its attached.
  
 
 In this way, if guest try to write a readonly gfn, the vm will be crashed 
 since
 it will return FAULT_PFN on the page-fault path. VMM can not detect this kind
 of fault, we have these problems:
 - even if guest try to write ROM on a PCI device, the guest will die, but
   we'd ignore this write, it looks more like the real machine.
 
 - can not implement ROMD beacuse write to a ROMD is MMIO access
 
 Yes, we can rework get_user_page_nowait and get_user_pages_fast, let them
 tell us the fault reason, but it is more complex i think.
 
  Can't that be fixed separately? 
 
  Another issue which is also present with the mmap(PROT_READ) scheme is
  interaction with reexecute_instruction. That is, unless i am mistaken,
  reexecute_instruction can succeed (return true) on a region that is
  write protected. This breaks the write faults on read-only slots exit
  to userspace via EXIT_MMIO behaviour.
 
  Sorry, Why? After re-entry to the guest, it can not generate a correct 
  MMIO?
  
  reexecute_instruction validates presence of GPA by looking at registered
  memslots. But if the access is a write, and userspace memory map is
  read-only, reexecute_instruction should exit via MMIO.
  
  That is, reexecute_instruction must validate GPA using registered
  memslots AND additionaly userspace map permission, not only registered
  memslot.
  
 
 What will happen if we always retry a unhandleable instruction which try to

Re: [PATCH 00/19] pci-assign: Refactor for upstream merge

2012-08-16 Thread Michael S. Tsirkin

On Thu, Aug 16, 2012 at 05:34:51PM +0300, Avi Kivity wrote:
 On 08/16/2012 04:54 PM, Jan Kiszka wrote:
  With this series, we are getting very close to obsoleting qemu-kvm. It
  refactors hw/device-assignment.c and the associated KVM helper functions
  into a form that should allow merging them into QEMU. Once the series is
  acceptable for qemu-kvm, I will break out the necessary uq/master
  patches and push pci-assign to upstream.
  
  The major step of this series is to define a regular set of kvm_device_*
  services that encapsulate classic (i.e. KVM-based, non-VFIO) device
  assignment features and export them to i386 targets only. There will
  never be another arch using them, therefore I pushed them into this
  corner. Moreover, the device assignment device now makes use of the new
  KVM IRQ/MSI routing API and no longer pokes into the internals of that
  layer. Finally, I moved the code into hw/kvm/pci-assign.c, dropped the
  superfluous configure option and did some basic code cleanups (mostly
  coding style) to bring things in shape.
  
  Note that patch 1 is a simple bug fix that should likely be applied for
  qemu-kvm-1.2 independently.
  
  This series depends on [1] and [2] and QEMU upstream (2b97f88c92) being
  merged into qemu-kvm.
  
  Please review.
 
 From a quick review it looks ready to merge.  Of course I'd appreciate a
 review from Alex or Michael as well.

Looks good to me too.

Acked-by: Michael S. Tsirkin m...@redhat.com



 -- 
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-16 Thread Avi Kivity

On 08/16/2012 06:57 PM, Marcelo Tosatti wrote:
 
 Correct.  But the chipset is also able to to write-protect some ranges
 in the 0xc-0x10 area via the PAM.  It is able to write-protect
 both RAM and PCI memory (usually mapped to flash).
 
 You are convinced that adding read-write protection information to the
 memory slots, which controls access by the guest, in addition to the
 userspace host pagetables, is useful. OK.

In fact if we started from scratch I'd go for one huge slot, with
PROT_NONE for mmio and non-kvm APIs for dirty bitmaps, and use Linux mm
APIs to manage the details.  This would make kvm x86_64 only (no way to
access the PCI space on i386) but it would simplify a lot of the
internal translation layer.  But we're not there.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Alex Williamson

On Thu, 2012-08-16 at 15:54 +0200, Jan Kiszka wrote:
 The refactored version cleanly hides the KVM IOCTL structure from the
 users and also zeros out the padding field.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/device-assignment.c |7 ++-
  qemu-kvm.c |8 
  qemu-kvm.h |4 
  target-i386/kvm.c  |   13 +
  target-i386/kvm_i386.h |2 ++
  5 files changed, 17 insertions(+), 17 deletions(-)
 
 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 0e2f8e6..af8a5aa 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1045,7 +1045,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  uint16_t entries_nr = 0;
  int i, r = 0;
 -struct kvm_assigned_msix_entry msix_entry;
  MSIXTableEntry *entry = adev-msix_table;
  
  /* Get the usable entry number for allocating */
 @@ -1075,7 +1074,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  adev-irq_entries_nr = adev-msix_max;
  adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
  
 -msix_entry.assigned_dev_id = adev-dev_id;
  entry = adev-msix_table;
  for (i = 0; i  adev-msix_max; i++, entry++) {
  if (msix_masked(entry)) {
 @@ -1098,9 +1096,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  
  kvm_add_routing_entry(kvm_state, adev-entry[i]);
  
 -msix_entry.gsi = adev-entry[i].gsi;
 -msix_entry.entry = i;
 -r = kvm_assign_set_msix_entry(kvm_state, msix_entry);
 +r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
 +   adev-entry[i].gsi);
  if (r) {
  fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
  break;
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 1a2a4fd..ec1911f 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -185,14 +185,6 @@ int kvm_get_irq_route_gsi(void)
  #endif
  }
  
 -#ifdef KVM_CAP_DEVICE_MSIX
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry)
 -{
 -return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
 -}
 -#endif
 -
  #if !defined(TARGET_I386)
  void kvm_arch_init_irq_routing(KVMState *s)
  {
 diff --git a/qemu-kvm.h b/qemu-kvm.h
 index 3fd6046..ad628d5 100644
 --- a/qemu-kvm.h
 +++ b/qemu-kvm.h
 @@ -65,10 +65,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry 
 *entry);
  int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
   struct kvm_irq_routing_entry *newentry);
  
 -
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry);
 -
  #endif /* CONFIG_KVM */
  
  #endif
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 676f45b..e9353ed 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -2173,6 +2173,19 @@ int kvm_device_msix_init_vectors(KVMState *s, uint32_t 
 dev_id,
  return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
  }
  
 +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
 +   int virq)
 +{
 +struct kvm_assigned_msix_entry msix_entry = {
 +.assigned_dev_id = dev_id,
 +.gsi = virq,
 +.entry = vector,
 +};
 +
 +memset(msix_entry.padding, 0, sizeof(msix_entry.padding));

nit, I think this can be done w/o a memset.  .padding = { 0 }?  Thanks,

Alex


 +return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, msix_entry);
 +}
 +
  int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id)
  {
  return kvm_deassign_irq_internal(s, dev_id, KVM_DEV_IRQ_GUEST_MSIX |
 diff --git a/target-i386/kvm_i386.h b/target-i386/kvm_i386.h
 index aac14eb..bd3b398 100644
 --- a/target-i386/kvm_i386.h
 +++ b/target-i386/kvm_i386.h
 @@ -30,6 +30,8 @@ int kvm_device_msi_deassign(KVMState *s, uint32_t dev_id);
  bool kvm_device_msix_supported(KVMState *s);
  int kvm_device_msix_init_vectors(KVMState *s, uint32_t dev_id,
   uint32_t nr_vectors);
 +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t vector,
 +   int virq);
  int kvm_device_msix_deassign(KVMState *s, uint32_t dev_id);
  
  #endif



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Jan Kiszka

On 2012-08-16 18:21, Alex Williamson wrote:
 On Thu, 2012-08-16 at 15:54 +0200, Jan Kiszka wrote:
 The refactored version cleanly hides the KVM IOCTL structure from the
 users and also zeros out the padding field.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/device-assignment.c |7 ++-
  qemu-kvm.c |8 
  qemu-kvm.h |4 
  target-i386/kvm.c  |   13 +
  target-i386/kvm_i386.h |2 ++
  5 files changed, 17 insertions(+), 17 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 0e2f8e6..af8a5aa 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1045,7 +1045,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  uint16_t entries_nr = 0;
  int i, r = 0;
 -struct kvm_assigned_msix_entry msix_entry;
  MSIXTableEntry *entry = adev-msix_table;
  
  /* Get the usable entry number for allocating */
 @@ -1075,7 +1074,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  adev-irq_entries_nr = adev-msix_max;
  adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
  
 -msix_entry.assigned_dev_id = adev-dev_id;
  entry = adev-msix_table;
  for (i = 0; i  adev-msix_max; i++, entry++) {
  if (msix_masked(entry)) {
 @@ -1098,9 +1096,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  
  kvm_add_routing_entry(kvm_state, adev-entry[i]);
  
 -msix_entry.gsi = adev-entry[i].gsi;
 -msix_entry.entry = i;
 -r = kvm_assign_set_msix_entry(kvm_state, msix_entry);
 +r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
 +   adev-entry[i].gsi);
  if (r) {
  fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
  break;
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 1a2a4fd..ec1911f 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -185,14 +185,6 @@ int kvm_get_irq_route_gsi(void)
  #endif
  }
  
 -#ifdef KVM_CAP_DEVICE_MSIX
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry)
 -{
 -return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
 -}
 -#endif
 -
  #if !defined(TARGET_I386)
  void kvm_arch_init_irq_routing(KVMState *s)
  {
 diff --git a/qemu-kvm.h b/qemu-kvm.h
 index 3fd6046..ad628d5 100644
 --- a/qemu-kvm.h
 +++ b/qemu-kvm.h
 @@ -65,10 +65,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry 
 *entry);
  int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
   struct kvm_irq_routing_entry *newentry);
  
 -
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry);
 -
  #endif /* CONFIG_KVM */
  
  #endif
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 676f45b..e9353ed 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -2173,6 +2173,19 @@ int kvm_device_msix_init_vectors(KVMState *s, 
 uint32_t dev_id,
  return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
  }
  
 +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t 
 vector,
 +   int virq)
 +{
 +struct kvm_assigned_msix_entry msix_entry = {
 +.assigned_dev_id = dev_id,
 +.gsi = virq,
 +.entry = vector,
 +};
 +
 +memset(msix_entry.padding, 0, sizeof(msix_entry.padding));
 
 nit, I think this can be done w/o a memset.  .padding = { 0 }?  Thanks,

I think to remember it has to be .padding = { 0, 0, 0 } (due to three
padding elements) to be standard conforming, but that would still be
nicer than the memset, indeed.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Michael S. Tsirkin

On Thu, Aug 16, 2012 at 07:29:40PM +0300, Avi Kivity wrote:
 On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
  On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
  On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
   On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
v8:

Trying a new approach.  Nobody seems to like the internal IRQ
source ID object and the interactions it implies between irqfd
and eoifd, so let's get rid of it.  Instead, simply expose
IRQ source IDs to userspace.  This lets the user be in charge
of freeing them or hanging onto a source ID for later use.
   
   In the end it turns out source ID is an optimization for shared
   interrupts, isn't it?  Can't we apply the optimization transparently to
   the user?  E.g. if we have some spare source IDs, allocate them, if we
   run out, use a shared source ID?
  
  Let's think about shared source IDs a bit more.  I think it's wrong that
  irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
  irqfd users can share a source ID.  We do not get the logical OR of all
  users by putting them on the same source ID, we get last set wins.
  KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
  logical OR happens in userspace.  How would we not starve a user if we
  define KVM_IRQFD_SOURCE_ID?  What am I missing?
  
  That all irqfds are deasserted on EOI anyway.  So there's no point
  to do a logical OR.
  
  
 
 What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
 EOI can de-assert the irqfd source, but the line is kept high by the
 last KVM_IRQ_LINE invocation.

Exactly. So 1 ID for userspace and 1 for irqfd.

 
 -- 
 error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Alex Williamson

On Thu, 2012-08-16 at 19:32 +0300, Avi Kivity wrote:
 On 08/11/2012 01:37 AM, Alex Williamson wrote:
  v8:
  
  Trying a new approach.  Nobody seems to like the internal IRQ
  source ID object and the interactions it implies between irqfd
  and eoifd, so let's get rid of it.  Instead, simply expose
  IRQ source IDs to userspace.  This lets the user be in charge
  of freeing them or hanging onto a source ID for later use.  They
  can also detach and re-attach components at will.  It also opens
  up the possibility that userspace might want to use each IRQ
  source ID for more than one GSI (and avoids the kernel needing
  to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
  
  I really wanted to add a de-assert-only option to irqfd so the
  irq_ackfd could be fed directly into an irqfd, but I'm dependent
  on the ordering of de-assert _then_ signal an eventfd.  Keeping
  that ordering doesn't seem to be possible, especially since irqfd
  uses a workqueue, if I attempt to make that connection.  Thanks,
 
 I can't say I'm happy with exposing irq source IDs.  It's true that they
 correspond to a physical entity so they can't be said to be an
 implementation detail, but adding more ABIs has a cost and I can't say
 that I see another user for this.
 
 Can you provide a link to the combined irqfd+ackfd implementation?  I'm
 inclined now to go for the simplest solution possible.

As soon as I write it :)  Keeping lists to handle the one-to-many
deassert-to-notify will notch up the complexity, but it'll be
interesting to see how it compares.  Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Jan Kiszka

On 2012-08-16 18:30, Jan Kiszka wrote:
 On 2012-08-16 18:21, Alex Williamson wrote:
 On Thu, 2012-08-16 at 15:54 +0200, Jan Kiszka wrote:
 The refactored version cleanly hides the KVM IOCTL structure from the
 users and also zeros out the padding field.

 Signed-off-by: Jan Kiszka jan.kis...@siemens.com
 ---
  hw/device-assignment.c |7 ++-
  qemu-kvm.c |8 
  qemu-kvm.h |4 
  target-i386/kvm.c  |   13 +
  target-i386/kvm_i386.h |2 ++
  5 files changed, 17 insertions(+), 17 deletions(-)

 diff --git a/hw/device-assignment.c b/hw/device-assignment.c
 index 0e2f8e6..af8a5aa 100644
 --- a/hw/device-assignment.c
 +++ b/hw/device-assignment.c
 @@ -1045,7 +1045,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  AssignedDevice *adev = DO_UPCAST(AssignedDevice, dev, pci_dev);
  uint16_t entries_nr = 0;
  int i, r = 0;
 -struct kvm_assigned_msix_entry msix_entry;
  MSIXTableEntry *entry = adev-msix_table;
  
  /* Get the usable entry number for allocating */
 @@ -1075,7 +1074,6 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  adev-irq_entries_nr = adev-msix_max;
  adev-entry = g_malloc0(adev-msix_max * sizeof(*(adev-entry)));
  
 -msix_entry.assigned_dev_id = adev-dev_id;
  entry = adev-msix_table;
  for (i = 0; i  adev-msix_max; i++, entry++) {
  if (msix_masked(entry)) {
 @@ -1098,9 +1096,8 @@ static int assigned_dev_update_msix_mmio(PCIDevice 
 *pci_dev)
  
  kvm_add_routing_entry(kvm_state, adev-entry[i]);
  
 -msix_entry.gsi = adev-entry[i].gsi;
 -msix_entry.entry = i;
 -r = kvm_assign_set_msix_entry(kvm_state, msix_entry);
 +r = kvm_device_msix_set_vector(kvm_state, adev-dev_id, i,
 +   adev-entry[i].gsi);
  if (r) {
  fprintf(stderr, fail to set MSI-X entry! %s\n, strerror(-r));
  break;
 diff --git a/qemu-kvm.c b/qemu-kvm.c
 index 1a2a4fd..ec1911f 100644
 --- a/qemu-kvm.c
 +++ b/qemu-kvm.c
 @@ -185,14 +185,6 @@ int kvm_get_irq_route_gsi(void)
  #endif
  }
  
 -#ifdef KVM_CAP_DEVICE_MSIX
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry)
 -{
 -return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_ENTRY, entry);
 -}
 -#endif
 -
  #if !defined(TARGET_I386)
  void kvm_arch_init_irq_routing(KVMState *s)
  {
 diff --git a/qemu-kvm.h b/qemu-kvm.h
 index 3fd6046..ad628d5 100644
 --- a/qemu-kvm.h
 +++ b/qemu-kvm.h
 @@ -65,10 +65,6 @@ int kvm_del_routing_entry(struct kvm_irq_routing_entry 
 *entry);
  int kvm_update_routing_entry(struct kvm_irq_routing_entry *entry,
   struct kvm_irq_routing_entry *newentry);
  
 -
 -int kvm_assign_set_msix_entry(KVMState *s,
 -  struct kvm_assigned_msix_entry *entry);
 -
  #endif /* CONFIG_KVM */
  
  #endif
 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index 676f45b..e9353ed 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -2173,6 +2173,19 @@ int kvm_device_msix_init_vectors(KVMState *s, 
 uint32_t dev_id,
  return kvm_vm_ioctl(s, KVM_ASSIGN_SET_MSIX_NR, msix_nr);
  }
  
 +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t 
 vector,
 +   int virq)
 +{
 +struct kvm_assigned_msix_entry msix_entry = {
 +.assigned_dev_id = dev_id,
 +.gsi = virq,
 +.entry = vector,
 +};
 +
 +memset(msix_entry.padding, 0, sizeof(msix_entry.padding));

 nit, I think this can be done w/o a memset.  .padding = { 0 }?  Thanks,
 
 I think to remember it has to be .padding = { 0, 0, 0 } (due to three
 padding elements) to be standard conforming, but that would still be
 nicer than the memset, indeed.

I've found some minor inconsistencies in the IOCTL struct
zero-initializations. Will send v2, just waiting in case you have more
comments.

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Alex Williamson

On Thu, 2012-08-16 at 19:29 +0300, Avi Kivity wrote:
 On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
  On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
  On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
   On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
v8:

Trying a new approach.  Nobody seems to like the internal IRQ
source ID object and the interactions it implies between irqfd
and eoifd, so let's get rid of it.  Instead, simply expose
IRQ source IDs to userspace.  This lets the user be in charge
of freeing them or hanging onto a source ID for later use.
   
   In the end it turns out source ID is an optimization for shared
   interrupts, isn't it?  Can't we apply the optimization transparently to
   the user?  E.g. if we have some spare source IDs, allocate them, if we
   run out, use a shared source ID?
  
  Let's think about shared source IDs a bit more.  I think it's wrong that
  irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
  irqfd users can share a source ID.  We do not get the logical OR of all
  users by putting them on the same source ID, we get last set wins.
  KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
  logical OR happens in userspace.  How would we not starve a user if we
  define KVM_IRQFD_SOURCE_ID?  What am I missing?
  
  That all irqfds are deasserted on EOI anyway.  So there's no point
  to do a logical OR.
  
  
 
 What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
 EOI can de-assert the irqfd source, but the line is kept high by the
 last KVM_IRQ_LINE invocation.

As I understand Michael's proposal, the shared irq source id used by
level-deassert-irqfds can only be asserted via an irqfd injection and
can only be de-asserted by the ack notifier.  If we let any other
interface have access to the irq source id it breaks.  If KVM_IRQ_LINE
picks up and extension to specify the irq source id, it would have to be
prevented from accessing this one.  Thanks,

Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Avi Kivity

On 08/11/2012 01:37 AM, Alex Williamson wrote:
 v8:
 
 Trying a new approach.  Nobody seems to like the internal IRQ
 source ID object and the interactions it implies between irqfd
 and eoifd, so let's get rid of it.  Instead, simply expose
 IRQ source IDs to userspace.  This lets the user be in charge
 of freeing them or hanging onto a source ID for later use.  They
 can also detach and re-attach components at will.  It also opens
 up the possibility that userspace might want to use each IRQ
 source ID for more than one GSI (and avoids the kernel needing
 to manage that).  Per suggestions, EOIFD is now IRQ_ACKFD.
 
 I really wanted to add a de-assert-only option to irqfd so the
 irq_ackfd could be fed directly into an irqfd, but I'm dependent
 on the ordering of de-assert _then_ signal an eventfd.  Keeping
 that ordering doesn't seem to be possible, especially since irqfd
 uses a workqueue, if I attempt to make that connection.  Thanks,

I can't say I'm happy with exposing irq source IDs.  It's true that they
correspond to a physical entity so they can't be said to be an
implementation detail, but adding more ABIs has a cost and I can't say
that I see another user for this.

Can you provide a link to the combined irqfd+ackfd implementation?  I'm
inclined now to go for the simplest solution possible.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/19] pci-assign: Refactor for upstream merge

2012-08-16 Thread Alex Williamson

On Thu, 2012-08-16 at 15:54 +0200, Jan Kiszka wrote:
 With this series, we are getting very close to obsoleting qemu-kvm. It
 refactors hw/device-assignment.c and the associated KVM helper functions
 into a form that should allow merging them into QEMU. Once the series is
 acceptable for qemu-kvm, I will break out the necessary uq/master
 patches and push pci-assign to upstream.
 
 The major step of this series is to define a regular set of kvm_device_*
 services that encapsulate classic (i.e. KVM-based, non-VFIO) device
 assignment features and export them to i386 targets only. There will
 never be another arch using them, therefore I pushed them into this
 corner. Moreover, the device assignment device now makes use of the new
 KVM IRQ/MSI routing API and no longer pokes into the internals of that
 layer. Finally, I moved the code into hw/kvm/pci-assign.c, dropped the
 superfluous configure option and did some basic code cleanups (mostly
 coding style) to bring things in shape.
 
 Note that patch 1 is a simple bug fix that should likely be applied for
 qemu-kvm-1.2 independently.
 
 This series depends on [1] and [2] and QEMU upstream (2b97f88c92) being
 merged into qemu-kvm.
 
 Please review.
 
 [1] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/95528
 [2] http://thread.gmane.org/gmane.comp.emulators.kvm.devel/95522
 
 Jan Kiszka (19):
   pci-assign: Only clean up registered IO resources
   pci-assign: Factor out kvm_device_pci_assign/deassign
   pci-assign: Rename assign_irq to assign_intx
   pci-assign: Refactor interrupt deassignment
   pci-assign: Factor out kvm_device_intx_assign
   qemu-kvm: Move kvm_device_intx_set_mask service
   pci-assign: Rework MSI assignment
   pci-assign: Factor out kvm_device_msix_supported
   pci-assign: Replace kvm_assign_set_msix_nr with
 kvm_device_msix_init_vectors
   pci-assign: Replace kvm_assign_set_msix_entry with
 kvm_device_msix_set_vector
   pci-assign: Rework MSI-X route setup
   pci-assign: Factor out kvm_device_msix_assign
   qemu-kvm: Kill qemu-kvm.[ch]
   pci-assign: Drop configure switches
   pci-assign: Move and rename source file
   pci-assign: Fix coding style issues
   pci-assign: Replace exit() with hw_error()
   pci-assign: Drop unused or write-only variables
   pci-assign: Gracefully handle missing in-kernel irqchip support
 
  configure|   11 -
  hw/i386/Makefile.objs|3 -
  hw/kvm/Makefile.objs |2 +-
  hw/{device-assignment.c = kvm/pci-assign.c} |  502 
 +-
  kvm-all.c|   54 +++-
  kvm-stub.c   |9 -
  kvm.h|   12 +-
  qemu-kvm.c   |  233 
  qemu-kvm.h   |  112 --
  target-i386/kvm.c|  142 
  target-i386/kvm_i386.h   |   22 ++
  11 files changed, 461 insertions(+), 641 deletions(-)
  rename hw/{device-assignment.c = kvm/pci-assign.c} (84%)
  delete mode 100644 qemu-kvm.c
  delete mode 100644 qemu-kvm.h

Really nice overall

Acked-by: Alex Williamson alex.william...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Alex Williamson

On Thu, 2012-08-16 at 19:34 +0300, Avi Kivity wrote:
 On 08/16/2012 07:21 PM, Alex Williamson wrote:
   
  +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t 
  vector,
  +   int virq)
  +{
  +struct kvm_assigned_msix_entry msix_entry = {
  +.assigned_dev_id = dev_id,
  +.gsi = virq,
  +.entry = vector,
  +};
  +
  +memset(msix_entry.padding, 0, sizeof(msix_entry.padding));
  
  nit, I think this can be done w/o a memset.  .padding = { 0 }?  Thanks,
 
 It can be done with a null statement.  The msix_entry initialization
 above zero-initializes all fields that are not mentioned in the initializer.

Yeah, I thought that was probably the case as well.  Thanks for
confirming.  The .padding = 0 in the previous patch could also be
dropped then.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Avi Kivity

On 08/15/2012 10:22 PM, Michael S. Tsirkin wrote:
 On Wed, Aug 15, 2012 at 11:36:31AM -0600, Alex Williamson wrote:
 On Wed, 2012-08-15 at 17:28 +0300, Michael S. Tsirkin wrote:
  On Fri, Aug 10, 2012 at 04:37:08PM -0600, Alex Williamson wrote:
   v8:
   
   Trying a new approach.  Nobody seems to like the internal IRQ
   source ID object and the interactions it implies between irqfd
   and eoifd, so let's get rid of it.  Instead, simply expose
   IRQ source IDs to userspace.  This lets the user be in charge
   of freeing them or hanging onto a source ID for later use.
  
  In the end it turns out source ID is an optimization for shared
  interrupts, isn't it?  Can't we apply the optimization transparently to
  the user?  E.g. if we have some spare source IDs, allocate them, if we
  run out, use a shared source ID?
 
 Let's think about shared source IDs a bit more.  I think it's wrong that
 irqfd uses KVM_USERSPACE_IRQ_SOURCE_ID, but I'm questioning whether all
 irqfd users can share a source ID.  We do not get the logical OR of all
 users by putting them on the same source ID, we get last set wins.
 KVM_USERSPACE_IRQ_SOURCE_ID is used for multiple inputs because the
 logical OR happens in userspace.  How would we not starve a user if we
 define KVM_IRQFD_SOURCE_ID?  What am I missing?
 
 That all irqfds are deasserted on EOI anyway.  So there's no point
 to do a logical OR.
 
 

What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
EOI can de-assert the irqfd source, but the line is kept high by the
last KVM_IRQ_LINE invocation.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Avi Kivity

On 08/16/2012 07:54 PM, Michael S. Tsirkin wrote:
 On Thu, Aug 16, 2012 at 07:39:35PM +0300, Avi Kivity wrote:
 On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:
 
  What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
  EOI can de-assert the irqfd source, but the line is kept high by the
  last KVM_IRQ_LINE invocation.
  
  Exactly. So 1 ID for userspace and 1 for irqfd.
 
 Gaa, this mess belongs in userspace.
 
 Not sure I understand what you refer to.
 
 I meant simply
 #define KVM_IRQFD_IRQ_SOURCE_ID1
 request it at kvm init.
 
 As opposed to using KVM_USERSPACE_IRQ_SOURCE_ID like we do now
 for edge.
 Does this seem acceptable to you?

I meant the pic/ioapic, not this particular bit.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v8 0/6] kvm: level irqfd support

2012-08-16 Thread Avi Kivity

On 08/16/2012 07:36 PM, Michael S. Tsirkin wrote:

 What if a level irqfd shares a line with a KVM_IRQ_LINE ioctl?  Then an
 EOI can de-assert the irqfd source, but the line is kept high by the
 last KVM_IRQ_LINE invocation.
 
 Exactly. So 1 ID for userspace and 1 for irqfd.

Gaa, this mess belongs in userspace.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 10/19] pci-assign: Replace kvm_assign_set_msix_entry with kvm_device_msix_set_vector

2012-08-16 Thread Avi Kivity

On 08/16/2012 07:21 PM, Alex Williamson wrote:
  
 +int kvm_device_msix_set_vector(KVMState *s, uint32_t dev_id, uint32_t 
 vector,
 +   int virq)
 +{
 +struct kvm_assigned_msix_entry msix_entry = {
 +.assigned_dev_id = dev_id,
 +.gsi = virq,
 +.entry = vector,
 +};
 +
 +memset(msix_entry.padding, 0, sizeof(msix_entry.padding));
 
 nit, I think this can be done w/o a memset.  .padding = { 0 }?  Thanks,

It can be done with a null statement.  The msix_entry initialization
above zero-initializes all fields that are not mentioned in the initializer.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

1 2 >

1 - 100 of 117 matches

Mail list logo