Re: [GIT PULL 0/5] KVM: s390: Let user space control the cpu states
On 18/07/14 16:49, Paolo Bonzini wrote: Il 15/07/2014 15:27, Christian Borntraeger ha scritto: Paolo, The following changes since commit 9f6226a762c7ae02f6a23a3d4fc552dafa57ea23: arch: x86: kvm: x86.c: Cleaning up variable is set more than once (2014-06-30 16:52:04 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-20140715 for you to fetch changes up to 6352e4d2dd9a349024a41356148eced553e1dce4: KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control (2014-07-10 14:11:17 +0200) This series enables the KVM_(S|G)ET_MP_STATE ioctls on s390 to make the cpu state settable by user space. This is necessary to avoid races in s390 SIGP/reset handling which happen because some SIGPs are handled in QEMU, while others are handled in the kernel. Together with the busy conditions as return value of SIGP races happen especially in areas like starting and stopping of CPUs. (For example, there is a program 'cpuplugd', that runs on several s390 distros which does automatic onlining and offlining on cpus.) As soon as the MPSTATE interface is used, user space takes complete control of the cpu states. Otherwise the kernel will use the old way. Therefore, the new kernel continues to work fine with old QEMUs. David Hildenbrand (5): KVM: s390: allow only one SIGP STOP (AND STORE STATUS) at a time KVM: s390: move finalization of SIGP STOP orders to kvm_s390_vcpu_stop KVM: s390: remove __cpu_is_stopped and expose is_vcpu_stopped KVM: prepare for KVM_(S|G)ET_MP_STATE on other architectures KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control Documentation/virtual/kvm/api.txt | 31 ++- arch/s390/include/asm/kvm_host.h | 1 + arch/s390/kvm/diag.c | 3 ++- arch/s390/kvm/intercept.c | 32 ++-- arch/s390/kvm/kvm-s390.c | 52 +++ arch/s390/kvm/kvm-s390.h | 10 ++-- arch/s390/kvm/sigp.c | 7 +- include/uapi/linux/kvm.h | 7 +- 8 files changed, 98 insertions(+), 45 deletions(-) Alex, wdyt about this patch series? Does it make sense to use KVM_GET/SET_MP_STATE or should the one-reg interface be a better match? It's a bit weird that running and halted map to the same mp_state on s390. I would be more confident that KVM_GET/SET_MP_STATE is the right choice if it had at least the KVM_MP_STATE_RUNNABLE and KVM_MP_STATE_HALTED. Christian, where is the halted state stored, is it in the PSW? Yes, there is a bit in the PSW called wait. It is pretty much similar to the HLT instruction: The CPU does not continue the execution, but it will accept all interrupts that are not fenced via control registers or PSW. Its mostly used for cpu_idle. KVM on s390 is always doing the wait in the kernel (IOW, we always have something like halt_in_kernel), except for the disabled wait which boils down to no execution and all interrupts off. This is used for error states of the OS and a special case (we set the guest in the panic state). So having such a state wont buy us much. It would be even wrong, because we want our MP_STATE defines to be a 1:1 match of the states that are defined in the architecture as proper CPU states. Some of the SIGP calls will return the state of the target CPU and that depends on the CPU state as defined in the architecture. The wait bit does not have an influence on the return value. So instead of modelling as x86, we actually want to model the mp_states as defined for the architecture. What I can see from the x86 defines, its somewhat similar: it matches the x86 architecture and not the QEMU model. ONEREG would work as well (you can make it work with almost every interface), but mp_state looks like a better fit to me, because its is an interface to define CPU states that are not directly tied to runtime registers. Furthermore, the bits in PSW and registers are only considered by the HW if the CPU is in the operating state. By using ONEREG, we would have a register that does not follow that rule. Christian PS: See SA22-7832 chapter 4-1 (http://publibfi.boulder.ibm.com/epubs/pdf/dz9zr009.pdf !! its big) ---snip--- The stopped, operating, load, and check-stop states are four mutually exclusive states of the CPU. When the CPU is in the stopped state, instructions and interruptions, other than the restart interruption, are not executed. In the operating state, the CPU exe- cutes instructions and takes interruptions, subject to the control of the program-status word (PSW) and control registers, and in the manner specified by the setting of the operator-facility rate control. The CPU is in the load state during the initial-program-loading
Re: [PATCH v2 2/2] docs: update ivshmem device spec
On 20.07.2014 11:38, David Marchand wrote: Add some notes on the parts needed to use ivshmem devices: more specifically, explain the purpose of an ivshmem server and the basic concept to use the ivshmem devices in guests. Move some parts of the documentation and re-organise it. Signed-off-by: David Marchand david.march...@6wind.com Reviewed-by: Claudio Fontana claudio.font...@huawei.com --- docs/specs/ivshmem_device_spec.txt | 124 +++- 1 file changed, 93 insertions(+), 31 deletions(-) diff --git a/docs/specs/ivshmem_device_spec.txt b/docs/specs/ivshmem_device_spec.txt index 667a862..f5f2b95 100644 --- a/docs/specs/ivshmem_device_spec.txt +++ b/docs/specs/ivshmem_device_spec.txt @@ -2,30 +2,103 @@ Device Specification for Inter-VM shared memory device -- -The Inter-VM shared memory device is designed to share a region of memory to -userspace in multiple virtual guests. The memory region does not belong to any -guest, but is a POSIX memory object on the host. Optionally, the device may -support sending interrupts to other guests sharing the same memory region. +The Inter-VM shared memory device is designed to share a memory region (created +on the host via the POSIX shared memory API) between multiple QEMU processes +running different guests. In order for all guests to be able to pick up the +shared memory area, it is modeled by QEMU as a PCI device exposing said memory +to the guest as a PCI BAR. +The memory region does not belong to any guest, but is a POSIX memory object on +the host. The host can access this shared memory if needed. + +The device also provides an optional communication mechanism between guests +sharing the same memory object. More details about that in the section 'Guest to +guest communication' section. The Inter-VM PCI device --- -*BARs* +From the VM point of view, the ivshmem PCI device supports three BARs. + +- BAR0 is a 1 Kbyte MMIO region to support registers and interrupts when MSI is + not used. +- BAR1 is used for MSI-X when it is enabled in the device. +- BAR2 is used to access the shared memory object. + +It is your choice how to use the device but you must choose between two +behaviors : + +- basically, if you only need the shared memory part, you will map BAR2. + This way, you have access to the shared memory in guest and can use it as you + see fit (memnic, for example, uses it in userland + http://dpdk.org/browse/memnic). + +- BAR0 and BAR1 are used to implement an optional communication mechanism + through interrupts in the guests. If you need an event mechanism between the + guests accessing the shared memory, you will most likely want to write a + kernel driver that will handle interrupts. See details in the section 'Guest + to guest communication' section. + +The behavior is chosen when starting your QEMU processes: +- no communication mechanism needed, the first QEMU to start creates the shared + memory on the host, subsequent QEMU processes will use it. + +- communication mechanism needed, an ivshmem server must be started before any + QEMU processes, then each QEMU process connects to the server unix socket. + +For more details on the QEMU ivshmem parameters, see qemu-doc documentation. + + +Guest to guest communication + + +This section details the communication mechanism between the guests accessing +the ivhsmem shared memory. -The device supports three BARs. BAR0 is a 1 Kbyte MMIO region to support -registers. BAR1 is used for MSI-X when it is enabled in the device. BAR2 is -used to map the shared memory object from the host. The size of BAR2 is -specified when the guest is started and must be a power of 2 in size. +*ivshmem server* -*Registers* +This server code is available in qemu.git/contrib/ivshmem-server. -The device currently supports 4 registers of 32-bits each. Registers -are used for synchronization between guests sharing the same memory object when -interrupts are supported (this requires using the shared memory server). +The server must be started on the host before any guest. +It creates a shared memory object then waits for clients to connect on an unix +socket. -The server assigns each VM an ID number and sends this ID number to the QEMU -process when the guest starts. +For each client (QEMU processes) that connects to the server: +- the server assigns an ID for this client and sends this ID to him as the first + message, +- the server sends a fd to the shared memory object to this client, +- the server creates a new set of host eventfds associated to the new client and + sends this set to all already connected clients, +- finally, the server sends all the eventfds sets for all clients to the new + client. + +The server signals all
Re: [GIT PULL 0/5] KVM: s390: Let user space control the cpu states
Il 21/07/2014 09:47, Christian Borntraeger ha scritto: So having such a state wont buy us much. It would be even wrong, because we want our MP_STATE defines to be a 1:1 match of the states that are defined in the architecture as proper CPU states. Some of the SIGP calls will return the state of the target CPU and that depends on the CPU state as defined in the architecture. The wait bit does not have an influence on the return value. Thanks for the explanation. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm-all: Use 'tmpcpu' instead of 'cpu' in sub-looping to avoid 'cpu' be NULL
Il 19/07/2014 03:21, Chen Gang ha scritto: If kvm_arch_remove_sw_breakpoint() in CPU_FOREACH() always be fail, it will let 'cpu' NULL. And the next kvm_arch_remove_sw_breakpoint() in QTAILQ_FOREACH_SAFE() will get NULL parameter for 'cpu'. And kvm_arch_remove_sw_breakpoint() can assumes 'cpu' must never be NULL, so need define additional temporary variable for 'cpu' to avoid the case. Signed-off-by: Chen Gang gang.chen.5...@gmail.com --- kvm-all.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 3ae30ee..1402f4f 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -2077,12 +2077,13 @@ void kvm_remove_all_breakpoints(CPUState *cpu) { struct kvm_sw_breakpoint *bp, *next; KVMState *s = cpu-kvm_state; +CPUState *tmpcpu; QTAILQ_FOREACH_SAFE(bp, s-kvm_sw_breakpoints, entry, next) { if (kvm_arch_remove_sw_breakpoint(cpu, bp) != 0) { /* Try harder to find a CPU that currently sees the breakpoint. */ -CPU_FOREACH(cpu) { -if (kvm_arch_remove_sw_breakpoint(cpu, bp) == 0) { +CPU_FOREACH(tmpcpu) { +if (kvm_arch_remove_sw_breakpoint(tmpcpu, bp) == 0) { break; } } Applying to uq/master, thanks. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+mihai.caraman=freescale@lists.ozlabs.org] On Behalf Of mihai.cara...@freescale.com Sent: Friday, July 18, 2014 12:06 PM To: Alexander Graf; kvm-...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org; kvm@vger.kernel.org Subject: RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 17, 2014 5:21 PM To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail On 17.07.14 13:22, Mihai Caraman wrote: On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- ... diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e2fd5a1..7f9c634 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -47,6 +47,11 @@ enum emulation_result { EMULATE_EXIT_USER,/* emulation requires exit to user- space */ }; +enum instruction_type { + INST_GENERIC, + INST_SC,/* system call */ +}; + extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern void kvmppc_handler_highmem(void); @@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, u64 val, unsigned int bytes, int is_default_endian); +extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, + enum instruction_type type, u32 *inst); + extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); @@ -234,6 +242,23 @@ struct kvmppc_ops { extern struct kvmppc_ops *kvmppc_hv_ops; extern struct kvmppc_ops *kvmppc_pr_ops; +static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, + enum instruction_type type, u32 *inst) +{ + int ret = EMULATE_DONE; + + /* Load the instruction manually if it failed to do so in the + * exit path */ + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) + ret = kvmppc_load_last_inst(vcpu, type, vcpu- arch.last_inst); + + + *inst = (ret == EMULATE_DONE kvmppc_need_byteswap(vcpu)) ? + swab32(vcpu-arch.last_inst) : vcpu-arch.last_inst; This makes even less sense than the previous version. Either you treat inst as definitely overwritten or as preserves previous data on failure. Both v4 and v5 versions treat inst as definitely overwritten. So either you unconditionally swap like you did before If we make abstraction of its symmetry, KVM_INST_FETCH_FAILED is operated in host endianness, so it doesn't need byte swap. I agree with your reasoning if last_inst is initialized and compared with data in guest endianess, which is not the case yet for KVM_INST_FETCH_FAILED. Alex, are you relying on the fact that KVM_INST_FETCH_FAILED value is symmetrical? With a non symmetrical value like 0xDEADBEEF, and considering a little-endian guest on a big-endian host, we need to fix kvm logic to initialize and compare last_inst with 0xEFBEADDE swaped value. Your suggestion to unconditionally swap makes sense only with the above fix, otherwise inst may end up with 0xEFBEADDE swaped value with is wrong. -Mike N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
[GIT PULL] KVM changes for 3.16-rc7
Linus, The following changes since commit cd3de83f147601356395b57a8673e9c5ff1e59d1: Linux 3.16-rc4 (2014-07-06 12:37:51 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/kvm.git tags/for-linus for you to fetch changes up to bb18b526a9d8d4a3fe56f234d5013b9f6036978d: Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into kvm-master (2014-07-08 12:08:58 +0200) These are mostly PPC changes for 3.16-new things. However, there is an x86 change too and it is a regression from 3.14. As it only affects nested virtualization and there were other changes in this area in 3.16, I am not nominating it for 3.15-stable. Alexander Graf (3): PPC: Add _GLOBAL_TOC for 32bit KVM: PPC: Book3S PR: Fix ABIv2 on LE KVM: PPC: RTAS: Do byte swaps explicitly Aneesh Kumar K.V (1): KVM: PPC: BOOK3S: HV: Use base page size when comparing against slb value Anton Blanchard (1): KVM: PPC: Assembly functions exported to modules need _GLOBAL_TOC() Bandan Das (1): KVM: x86: Check for nested events if there is an injectable interrupt Mihai Caraman (1): KVM: PPC: Book3E: Unlock mmu_lock when setting caching atttribute Paolo Bonzini (1): Merge tag 'signed-for-3.16' of git://github.com/agraf/linux-2.6 into kvm-master arch/powerpc/include/asm/kvm_book3s_64.h | 19 +- arch/powerpc/include/asm/ppc_asm.h | 2 + arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- arch/powerpc/kvm/book3s_hv_rm_mmu.c | 7 +--- arch/powerpc/kvm/book3s_hv_rmhandlers.S | 2 +- arch/powerpc/kvm/book3s_interrupts.S | 4 ++ arch/powerpc/kvm/book3s_rmhandlers.S | 6 ++- arch/powerpc/kvm/book3s_rtas.c | 65 +--- arch/powerpc/kvm/e500_mmu_host.c | 3 +- arch/x86/kvm/x86.c | 12 ++ 10 files changed, 64 insertions(+), 58 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call for agenda for 2014-07-22
Hi Please, send any topic that you are interested in covering. Thanks, Juan. Call details: 15:00 CEST 13:00 UTC 09:00 EDT Every two weeks If you need phone number details, contact me privately -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: nVMX: clean up nested_release_vmcs12 and code around it
Make nested_release_vmcs12 idempotent. Tested-by: Wanpeng Li wanpeng...@linux.intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/vmx.c | 42 +- 1 file changed, 21 insertions(+), 21 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 462334eaa3c0..3300f4f2da48 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -6109,20 +6109,27 @@ static int nested_vmx_check_permission(struct kvm_vcpu *vcpu) static inline void nested_release_vmcs12(struct vcpu_vmx *vmx) { u32 exec_control; + if (vmx-nested.current_vmptr == -1ull) + return; + + /* current_vmptr and current_vmcs12 are always set/reset together */ + if (WARN_ON(vmx-nested.current_vmcs12 == NULL)) + return; + if (enable_shadow_vmcs) { - if (vmx-nested.current_vmcs12 != NULL) { - /* copy to memory all shadowed fields in case - they were modified */ - copy_shadow_to_vmcs12(vmx); - vmx-nested.sync_shadow_vmcs = false; - exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); - exec_control = ~SECONDARY_EXEC_SHADOW_VMCS; - vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); - vmcs_write64(VMCS_LINK_POINTER, -1ull); - } + /* copy to memory all shadowed fields in case + they were modified */ + copy_shadow_to_vmcs12(vmx); + vmx-nested.sync_shadow_vmcs = false; + exec_control = vmcs_read32(SECONDARY_VM_EXEC_CONTROL); + exec_control = ~SECONDARY_EXEC_SHADOW_VMCS; + vmcs_write32(SECONDARY_VM_EXEC_CONTROL, exec_control); + vmcs_write64(VMCS_LINK_POINTER, -1ull); } kunmap(vmx-nested.current_vmcs12_page); nested_release_page(vmx-nested.current_vmcs12_page); + vmx-nested.current_vmptr = -1ull; + vmx-nested.current_vmcs12 = NULL; } /* @@ -6133,12 +6140,9 @@ static void free_nested(struct vcpu_vmx *vmx) { if (!vmx-nested.vmxon) return; + vmx-nested.vmxon = false; - if (vmx-nested.current_vmptr != -1ull) { - nested_release_vmcs12(vmx); - vmx-nested.current_vmptr = -1ull; - vmx-nested.current_vmcs12 = NULL; - } + nested_release_vmcs12(vmx); if (enable_shadow_vmcs) free_vmcs(vmx-nested.current_shadow_vmcs); /* Unpin physical memory we referred to in current vmcs02 */ @@ -6175,11 +6179,8 @@ static int handle_vmclear(struct kvm_vcpu *vcpu) if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMCLEAR, vmptr)) return 1; - if (vmptr == vmx-nested.current_vmptr) { + if (vmptr == vmx-nested.current_vmptr) nested_release_vmcs12(vmx); - vmx-nested.current_vmptr = -1ull; - vmx-nested.current_vmcs12 = NULL; - } page = nested_get_page(vcpu, vmptr); if (page == NULL) { @@ -6521,9 +6522,8 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu) skip_emulated_instruction(vcpu); return 1; } - if (vmx-nested.current_vmptr != -1ull) - nested_release_vmcs12(vmx); + nested_release_vmcs12(vmx); vmx-nested.current_vmptr = vmptr; vmx-nested.current_vmcs12 = new_vmcs12; vmx-nested.current_vmcs12_page = page; -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] KVM: nVMX: fix lifetime issues for vmcs02 and shadow VMCS
I think that commit 26a865f4aa8e (KVM: VMX: fix use after free of vmx-loaded_vmcs, 2014-01-03) was wrong, as it introduced a use of a dangling vmcs02. The first patch introduces what I think is the right fix, while the second patch strengthens the invariants around nested_release_vmcs12. Paolo Bonzini (2): KVM: nVMX: fix lifetime issues for vmcs02 KVM: nVMX: clean up nested_release_vmcs12 and code around it arch/x86/kvm/vmx.c | 91 -- 1 file changed, 54 insertions(+), 37 deletions(-) -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] KVM: nVMX: fix lifetime issues for vmcs02
free_nested needs the loaded_vmcs to be valid if it is a vmcs02, in order to detach it from the shadow vmcs. However, this is not available anymore after commit 26a865f4aa8e (KVM: VMX: fix use after free of vmx-loaded_vmcs, 2014-01-03). Revert that patch, and fix its problem by forcing a vmcs01 as the active VMCS before freeing all the nested VMX state. Reported-by: Wanpeng Li wanpeng...@linux.intel.com Tested-by: Wanpeng Li wanpeng...@linux.intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- arch/x86/kvm/vmx.c | 49 + 1 file changed, 33 insertions(+), 16 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 7534a9f67cc8..462334eaa3c0 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -5772,22 +5772,27 @@ static void nested_free_vmcs02(struct vcpu_vmx *vmx, gpa_t vmptr) /* * Free all VMCSs saved for this vcpu, except the one pointed by - * vmx-loaded_vmcs. These include the VMCSs in vmcs02_pool (except the one - * currently used, if running L2), and vmcs01 when running L2. + * vmx-loaded_vmcs. We must be running L1, so vmx-loaded_vmcs + * must be vmx-vmcs01. */ static void nested_free_all_saved_vmcss(struct vcpu_vmx *vmx) { struct vmcs02_list *item, *n; + + WARN_ON(vmx-loaded_vmcs != vmx-vmcs01); list_for_each_entry_safe(item, n, vmx-nested.vmcs02_pool, list) { - if (vmx-loaded_vmcs != item-vmcs02) - free_loaded_vmcs(item-vmcs02); + /* +* Something will leak if the above WARN triggers. Better than +* a use-after-free. +*/ + if (vmx-loaded_vmcs == item-vmcs02) + continue; + + free_loaded_vmcs(item-vmcs02); list_del(item-list); kfree(item); + vmx-nested.vmcs02_num--; } - vmx-nested.vmcs02_num = 0; - - if (vmx-loaded_vmcs != vmx-vmcs01) - free_loaded_vmcs(vmx-vmcs01); } /* @@ -7557,13 +7562,31 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) vmx_complete_interrupts(vmx); } +static void vmx_load_vmcs01(struct kvm_vcpu *vcpu) +{ + struct vcpu_vmx *vmx = to_vmx(vcpu); + int cpu; + + if (vmx-loaded_vmcs == vmx-vmcs01) + return; + + cpu = get_cpu(); + vmx-loaded_vmcs = vmx-vmcs01; + vmx_vcpu_put(vcpu); + vmx_vcpu_load(vcpu, cpu); + vcpu-cpu = cpu; + put_cpu(); +} + static void vmx_free_vcpu(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); free_vpid(vmx); - free_loaded_vmcs(vmx-loaded_vmcs); + leave_guest_mode(vcpu); + vmx_load_vmcs01(vcpu); free_nested(vmx); + free_loaded_vmcs(vmx-loaded_vmcs); kfree(vmx-guest_msrs); kvm_vcpu_uninit(vcpu); kmem_cache_free(kvm_vcpu_cache, vmx); @@ -8721,7 +8744,6 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, unsigned long exit_qualification) { struct vcpu_vmx *vmx = to_vmx(vcpu); - int cpu; struct vmcs12 *vmcs12 = get_vmcs12(vcpu); /* trying to cancel vmlaunch/vmresume is a bug */ @@ -8746,12 +8768,7 @@ static void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 exit_reason, vmcs12-vm_exit_intr_error_code, KVM_ISA_VMX); - cpu = get_cpu(); - vmx-loaded_vmcs = vmx-vmcs01; - vmx_vcpu_put(vcpu); - vmx_vcpu_load(vcpu, cpu); - vcpu-cpu = cpu; - put_cpu(); + vmx_load_vmcs01(vcpu); vm_entry_controls_init(vmx, vmcs_read32(VM_ENTRY_CONTROLS)); vm_exit_controls_init(vmx, vmcs_read32(VM_EXIT_CONTROLS)); -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Wed, Jul 09, 2014 at 04:12:53PM -0300, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; Why not mark all pinned gfns as dirty in kvm_vm_ioctl_get_dirty_log() while populating dirty_bitmap_buffer? + } else + mmu_reload_pinned_vcpus(kvm); Can you explain why do you need this? + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; @@ -1226,7 +1243,7 @@ } static bool __rmap_write_protect(struct kvm *kvm, unsigned long *rmapp, - bool pt_protect) + bool pt_protect, bool skip_pinned) { u64 *sptep; struct rmap_iterator iter; @@ -1235,7 +1252,7 @@ for (sptep = rmap_get_first(*rmapp, iter); sptep;) { BUG_ON(!(*sptep PT_PRESENT_MASK)); - flush |= spte_write_protect(kvm, sptep, pt_protect); + flush |= spte_write_protect(kvm, sptep, pt_protect, skip_pinned); sptep = rmap_get_next(iter); } @@ -1261,7 +1278,7 @@ while (mask) { rmapp = __gfn_to_rmap(slot-base_gfn + gfn_offset + __ffs(mask), PT_PAGE_TABLE_LEVEL, slot); - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, true); /* clear the first set bit */ mask = mask - 1; @@ -1280,7 +1297,7 @@ for (i = PT_PAGE_TABLE_LEVEL; i PT_PAGE_TABLE_LEVEL + KVM_NR_PAGE_SIZES; ++i) { rmapp = __gfn_to_rmap(gfn, i, slot); - write_protected |= __rmap_write_protect(kvm, rmapp, true); + write_protected |= __rmap_write_protect(kvm, rmapp, true, false); } return write_protected; @@ -4565,7 +4582,7 @@ for (index = 0; index = last_index; ++index, ++rmapp) { if (*rmapp) - __rmap_write_protect(kvm, rmapp, false); + __rmap_write_protect(kvm, rmapp, false, false); if (need_resched() || spin_needbreak(kvm-mmu_lock)) cond_resched_lock(kvm-mmu_lock); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:21 PM To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On 30.06.14 17:34, Mihai Caraman wrote: Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec which share the same interrupt numbers. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - remove outdated definitions arch/powerpc/include/asm/kvm_asm.h| 8 arch/powerpc/kvm/booke.c | 17 + arch/powerpc/kvm/booke.h | 4 ++-- arch/powerpc/kvm/booke_interrupts.S | 9 + arch/powerpc/kvm/bookehv_interrupts.S | 4 ++-- arch/powerpc/kvm/e500.c | 10 ++ arch/powerpc/kvm/e500_emulate.c | 10 ++ 7 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..c94fd33 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -56,14 +56,6 @@ /* E500 */ #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 -/* - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines - */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST I think I'd prefer to keep them separate. #define BOOKE_INTERRUPT_SPE_FP_ROUND 34 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35 #define BOOKE_INTERRUPT_DOORBELL 36 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..3c86d9b 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: - case BOOKE_IRQPRIO_SPE_UNAVAIL: - case BOOKE_IRQPRIO_SPE_FP_DATA: + case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL: + case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST: #ifdef CONFIG_KVM_E500V2 case ...SPE: #else case ..ALTIVEC: #endif case BOOKE_IRQPRIO_SPE_FP_ROUND: case BOOKE_IRQPRIO_AP_UNAVAIL: allowed = 1; @@ -977,18 +977,19 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; #ifdef CONFIG_SPE - case BOOKE_INTERRUPT_SPE_UNAVAIL: { + case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: { if (vcpu-arch.shared-msr MSR_SPE) kvmppc_vcpu_enable_spe(vcpu); else kvmppc_booke_queue_irqprio(vcpu, - BOOKE_IRQPRIO_SPE_UNAVAIL); + BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL); r = RESUME_GUEST; break; } - case BOOKE_INTERRUPT_SPE_FP_DATA: - kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA); + case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST: + kvmppc_booke_queue_irqprio(vcpu, + BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST); r = RESUME_GUEST; break; @@ -997,7 +998,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_GUEST; break; #else - case BOOKE_INTERRUPT_SPE_UNAVAIL: + case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: /* * Guest wants SPE, but host kernel doesn't support it. Send * an unimplemented operation program check to the guest. @@ -1010,7 +1011,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * These really should never happen without CONFIG_SPE, * as we should never enable the real MSR[SPE] in the guest. */ - case BOOKE_INTERRUPT_SPE_FP_DATA: + case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST: case BOOKE_INTERRUPT_SPE_FP_ROUND: printk(KERN_CRIT %s: unexpected SPE interrupt %u at %08lx\n, __func__, exit_nr, vcpu-arch.pc); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index b632cd3..f182b32 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -32,8 +32,8 @@ #define BOOKE_IRQPRIO_ALIGNMENT 2 #define BOOKE_IRQPRIO_PROGRAM 3 #define BOOKE_IRQPRIO_FP_UNAVAIL 4 -#define BOOKE_IRQPRIO_SPE_UNAVAIL 5 -#define BOOKE_IRQPRIO_SPE_FP_DATA 6 +#define
[PATCH] vhost: Add polling mode
Hello All, When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won?t be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html Comments are welcome, Thank you, Razya From: Razya Ladelsky ra...@il.ibm.com Add an optional polling mode to continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. Signed-off-by: Razya Ladelsky ra...@il.ibm.com --- drivers/vhost/net.c |6 +- drivers/vhost/scsi.c |5 +- drivers/vhost/vhost.c | 247 +++-- drivers/vhost/vhost.h | 37 +++- 4 files changed, 277 insertions(+), 18 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 971a760..558aecb 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, + vqs[VHOST_NET_VQ_TX]); + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, + vqs[VHOST_NET_VQ_RX]); f-private_data = n; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 4f4ffa4..56f0233 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) if (!vqs) goto err_vqs; - vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work); - vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work); - + vhost_work_init(vs-vs_completion_work, NULL, vhost_scsi_complete_cmd_work); + vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work); vs-vs_events_nr = 0; vs-vs_events_missed = false; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c90f437..678d766 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -24,9 +24,17 @@ #include linux/slab.h #include linux/kthread.h #include linux/cgroup.h +#include linux/jiffies.h #include linux/module.h #include vhost.h +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); + +static int poll_stop_idle = 3*HZ; /* 3 seconds */ +module_param(poll_stop_idle, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_stop_idle, Stop continuous polling of virtqueue after this many jiffies of no work.); enum { VHOST_MEMORY_MAX_NREGIONS = 64, @@ -58,27 +66,27 @@ static int vhost_poll_wakeup(wait_queue_t *wait, unsigned mode, int sync, return 0; }
Re: [PATCH v2 1/2] contrib: add ivshmem client and server
On 07/20/2014 03:38 AM, David Marchand wrote: When using ivshmem devices, notifications between guests can be sent as interrupts using a ivshmem-server (typical use described in documentation). The client is provided as a debug tool. Signed-off-by: Olivier Matz olivier.m...@6wind.com Signed-off-by: David Marchand david.march...@6wind.com --- contrib/ivshmem-client/Makefile | 26 ++ +++ b/contrib/ivshmem-client/Makefile @@ -0,0 +1,26 @@ +# Copyright 2014 6WIND S.A. +# All rights reserved This file has no other license, and is therefore incompatible with GPLv2. You'll need to resubmit under an appropriately open license. +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -0,0 +1,238 @@ +/* + * Copyright(c) 2014 6WIND S.A. + * All rights reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. I'm not a lawyer, but to me, this license is self-contradictory. You can't have All rights reserved and still be GPL, because the point of the GPL is that you are NOT reserving all rights, but explicitly granting your user various rights (on condition that they likewise grant those rights to others). But you're not the only file in the qemu code base with this questionable mix. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server
On Mon, Jul 21, 2014 at 08:21:21AM -0600, Eric Blake wrote: On 07/20/2014 03:38 AM, David Marchand wrote: When using ivshmem devices, notifications between guests can be sent as interrupts using a ivshmem-server (typical use described in documentation). The client is provided as a debug tool. Signed-off-by: Olivier Matz olivier.m...@6wind.com Signed-off-by: David Marchand david.march...@6wind.com --- contrib/ivshmem-client/Makefile | 26 ++ +++ b/contrib/ivshmem-client/Makefile @@ -0,0 +1,26 @@ +# Copyright 2014 6WIND S.A. +# All rights reserved This file has no other license, and is therefore incompatible with GPLv2. You'll need to resubmit under an appropriately open license. +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -0,0 +1,238 @@ +/* + * Copyright(c) 2014 6WIND S.A. + * All rights reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. I'm not a lawyer, but to me, this license is self-contradictory. You can't have All rights reserved and still be GPL, because the point of the GPL is that you are NOT reserving all rights, but explicitly granting your user various rights (on condition that they likewise grant those rights to others). But you're not the only file in the qemu code base with this questionable mix. In any case adding the term 'All rights reserved' is said to be redundant obsolete these days https://en.wikipedia.org/wiki/All_rights_reserved#Obsolescence Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests] x86: Check DR6.RTM is writable
Il 15/07/2014 16:41, Nadav Amit ha scritto: Recently discovered bug shows DR6.RTM is fixed to one. The bug is only apparent when the host emulates the MOV-DR instruction or when the host debugs the guest kernel. This patch tests whether DR6.RTM is indeed accessible according to RTM support as reported by cpuid. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- x86/emulator.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/x86/emulator.c b/x86/emulator.c index 1fd0ca6..f68882f 100644 --- a/x86/emulator.c +++ b/x86/emulator.c @@ -3,6 +3,7 @@ #include libcflat.h #include desc.h #include types.h +#include processor.h #define memset __builtin_memset #define TESTDEV_IO_PORT 0xe0 @@ -870,6 +871,19 @@ static void test_nop(uint64_t *mem, uint8_t *insn_page, report(nop, outregs.rax == inregs.rax); } +static void test_mov_dr(uint64_t *mem, uint8_t *insn_page, + uint8_t *alt_insn_page, void *insn_ram) +{ + bool rtm_support = cpuid(7).b (1 11); + unsigned long dr6_fixed_1 = rtm_support ? 0xfffe0ff0ul : 0x0ff0ul; + inregs = (struct regs){ .rax = 0 }; + MK_INSN(mov_to_dr6, movq %rax, %dr6\n\t); + trap_emulator(mem, alt_insn_page, insn_mov_to_dr6); + MK_INSN(mov_from_dr6, movq %dr6, %rax\n\t); + trap_emulator(mem, alt_insn_page, insn_mov_from_dr6); + report(mov_dr6, outregs.rax == dr6_fixed_1); +} + static void test_crosspage_mmio(volatile uint8_t *mem) { volatile uint16_t w, *pw; @@ -1072,6 +1086,7 @@ int main() test_movabs(mem, insn_page, alt_insn_page, insn_ram); test_smsw_reg(mem, insn_page, alt_insn_page, insn_ram); test_nop(mem, insn_page, alt_insn_page, insn_ram); + test_mov_dr(mem, insn_page, alt_insn_page, insn_ram); test_crosspage_mmio(mem); test_string_io_mmio(mem); Thanks, applying. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: x86: DR6/7.RTM cannot be written
Il 15/07/2014 16:37, Nadav Amit ha scritto: Haswell and newer Intel CPUs have support for RTM, and in that case DR6.RTM is not fixed to 1 and DR7.RTM is not fixed to zero. That is not the case in the current KVM implementation. This bug is apparent only if the MOV-DR instruction is emulated or the host also debugs the guest. This patch is a partial fix which enables DR6.RTM and DR7.RTM to be cleared and set respectively. It also sets DR6.RTM upon every debug exception. Obviously, it is not a complete fix, as debugging of RTM is still unsupported. Signed-off-by: Nadav Amit na...@cs.technion.ac.il --- arch/x86/include/asm/kvm_host.h | 8 +--- arch/x86/kvm/cpuid.h| 8 arch/x86/kvm/vmx.c | 4 ++-- arch/x86/kvm/x86.c | 22 -- 4 files changed, 31 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b8a4480..a84eaf7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -152,14 +152,16 @@ enum { #define DR6_BD (1 13) #define DR6_BS (1 14) -#define DR6_FIXED_1 0x0ff0 -#define DR6_VOLATILE 0xe00f +#define DR6_RTM (1 16) +#define DR6_FIXED_1 0xfffe0ff0 +#define DR6_INIT 0x0ff0 +#define DR6_VOLATILE 0x0001e00f #define DR7_BP_EN_MASK 0x00ff #define DR7_GE (1 9) #define DR7_GD (1 13) #define DR7_FIXED_1 0x0400 -#define DR7_VOLATILE 0x23ff +#define DR7_VOLATILE 0x2bff /* apic attention bits */ #define KVM_APIC_CHECK_VAPIC 0 diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h index f908731..a538059 100644 --- a/arch/x86/kvm/cpuid.h +++ b/arch/x86/kvm/cpuid.h @@ -95,4 +95,12 @@ static inline bool guest_cpuid_has_gbpages(struct kvm_vcpu *vcpu) best = kvm_find_cpuid_entry(vcpu, 0x8001, 0); return best (best-edx bit(X86_FEATURE_GBPAGES)); } + +static inline bool guest_cpuid_has_rtm(struct kvm_vcpu *vcpu) +{ + struct kvm_cpuid_entry2 *best; + + best = kvm_find_cpuid_entry(vcpu, 7, 0); + return best (best-ebx bit(X86_FEATURE_RTM)); +} #endif diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 0c9569b..1fd3598 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -4892,7 +4892,7 @@ static int handle_exception(struct kvm_vcpu *vcpu) if (!(vcpu-guest_debug (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP))) { vcpu-arch.dr6 = ~15; - vcpu-arch.dr6 |= dr6; + vcpu-arch.dr6 |= dr6 | DR6_RTM; if (!(dr6 ~DR6_RESERVED)) /* icebp */ skip_emulated_instruction(vcpu); @@ -5151,7 +5151,7 @@ static int handle_dr(struct kvm_vcpu *vcpu) return 0; } else { vcpu-arch.dr7 = ~DR7_GD; - vcpu-arch.dr6 |= DR6_BD; + vcpu-arch.dr6 |= DR6_BD | DR6_RTM; vmcs_writel(GUEST_DR7, vcpu-arch.dr7); kvm_queue_exception(vcpu, DB_VECTOR); return 1; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index f750b69..fae064f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -759,6 +759,15 @@ static void kvm_update_dr7(struct kvm_vcpu *vcpu) vcpu-arch.switch_db_regs |= KVM_DEBUGREG_BP_ENABLED; } +static u64 kvm_dr6_fixed(struct kvm_vcpu *vcpu) +{ + u64 fixed = DR6_FIXED_1; + + if (!guest_cpuid_has_rtm(vcpu)) + fixed |= DR6_RTM; + return fixed; +} + static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) { switch (dr) { @@ -774,7 +783,7 @@ static int __kvm_set_dr(struct kvm_vcpu *vcpu, int dr, unsigned long val) case 6: if (val 0xULL) return -1; /* #GP */ - vcpu-arch.dr6 = (val DR6_VOLATILE) | DR6_FIXED_1; + vcpu-arch.dr6 = (val DR6_VOLATILE) | kvm_dr6_fixed(vcpu); kvm_update_dr6(vcpu); break; case 5: @@ -5115,7 +5124,8 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu, unsigned long rflag */ if (unlikely(rflags X86_EFLAGS_TF)) { if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) { - kvm_run-debug.arch.dr6 = DR6_BS | DR6_FIXED_1; + kvm_run-debug.arch.dr6 = DR6_BS | DR6_FIXED_1 | + DR6_RTM; kvm_run-debug.arch.pc = vcpu-arch.singlestep_rip; kvm_run-debug.arch.exception = DB_VECTOR; kvm_run-exit_reason = KVM_EXIT_DEBUG; @@ -5128,7 +5138,7 @@ static void kvm_vcpu_check_singlestep(struct kvm_vcpu *vcpu,
Re: [PATCH v2 1/2] contrib: add ivshmem client and server
Hello Eric, On 07/21/2014 04:21 PM, Eric Blake wrote: On 07/20/2014 03:38 AM, David Marchand wrote: +# Copyright 2014 6WIND S.A. +# All rights reserved This file has no other license, and is therefore incompatible with GPLv2. You'll need to resubmit under an appropriately open license. missed the makefiles ... +++ b/contrib/ivshmem-client/ivshmem-client.h @@ -0,0 +1,238 @@ +/* + * Copyright(c) 2014 6WIND S.A. + * All rights reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. I'm not a lawyer, but to me, this license is self-contradictory. You can't have All rights reserved and still be GPL, because the point of the GPL is that you are NOT reserving all rights, but explicitly granting your user various rights (on condition that they likewise grant those rights to others). But you're not the only file in the qemu code base with this questionable mix. Hum, ok, will update. -- David Marchand -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v2 1/2] contrib: add ivshmem client and server
David Marchand david.march...@6wind.com writes: When using ivshmem devices, notifications between guests can be sent as interrupts using a ivshmem-server (typical use described in documentation). The client is provided as a debug tool. [...] diff --git a/contrib/ivshmem-client/ivshmem-client.c b/contrib/ivshmem-client/ivshmem-client.c new file mode 100644 index 000..32ef3ef --- /dev/null +++ b/contrib/ivshmem-client/ivshmem-client.c @@ -0,0 +1,418 @@ +/* + * Copyright(c) 2014 6WIND S.A. + * All rights reserved. + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + */ Do you have a compelling reason why you can't license under GPLv2+? If yes, please explain it to us. If no, please use * This work is licensed under the terms of the GNU GPL, version 2 or * later. See the COPYING file in the top-level directory. [...] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 9/9] KVM: s390: add ipte to trace event decoding
IPTE intercept can happen, let's decode that. Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com --- arch/s390/include/uapi/asm/sie.h | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/include/uapi/asm/sie.h b/arch/s390/include/uapi/asm/sie.h index 5d9cc19..d4096fd 100644 --- a/arch/s390/include/uapi/asm/sie.h +++ b/arch/s390/include/uapi/asm/sie.h @@ -108,6 +108,7 @@ exit_code_ipa0(0xB2, 0x17, STETR),\ exit_code_ipa0(0xB2, 0x18, PC), \ exit_code_ipa0(0xB2, 0x20, SERVC),\ + exit_code_ipa0(0xB2, 0x21, IPTE), \ exit_code_ipa0(0xB2, 0x28, PT), \ exit_code_ipa0(0xB2, 0x29, ISKE), \ exit_code_ipa0(0xB2, 0x2a, RRBE), \ -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 7/9] KVM: s390: document KVM_CAP_S390_IRQCHIP
From: Cornelia Huck cornelia.h...@de.ibm.com Let's document that this is a capability that may be enabled per-vm. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- Documentation/virtual/kvm/api.txt | 9 + 1 file changed, 9 insertions(+) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 7ab41e9..f1979c7 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -3023,3 +3023,12 @@ Parameters: args[0] is the XICS device fd args[1] is the XICS CPU number (server ID) for this vcpu This capability connects the vcpu to an in-kernel XICS device. + +6.8 KVM_CAP_S390_IRQCHIP + +Architectures: s390 +Target: vm +Parameters: none + +This capability enables the in-kernel irqchip for s390. Please refer to +4.24 KVM_CREATE_IRQCHIP for details. -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 6/9] KVM: document target of capability enablement
From: Cornelia Huck cornelia.h...@de.ibm.com Capabilities can be enabled on a vcpu or (since recently) on a vm. Document this and note for the existing capabilites whether they are per-vcpu or per-vm. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: David Hildenbrand d...@linux.vnet.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- Documentation/virtual/kvm/api.txt | 18 +++--- 1 file changed, 15 insertions(+), 3 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index a41465b..7ab41e9 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2875,15 +2875,18 @@ The fields in each entry are defined as follows: 6. Capabilities that can be enabled --- -There are certain capabilities that change the behavior of the virtual CPU when -enabled. To enable them, please see section 4.37. Below you can find a list of -capabilities and what their effect on the vCPU is when enabling them. +There are certain capabilities that change the behavior of the virtual CPU or +the virtual machine when enabled. To enable them, please see section 4.37. +Below you can find a list of capabilities and what their effect on the vCPU or +the virtual machine is when enabling them. The following information is provided along with the description: Architectures: which instruction set architectures provide this ioctl. x86 includes both i386 and x86_64. + Target: whether this is a per-vcpu or per-vm capability. + Parameters: what parameters are accepted by the capability. Returns: the return value. General error numbers (EBADF, ENOMEM, EINVAL) @@ -2893,6 +2896,7 @@ The following information is provided along with the description: 6.1 KVM_CAP_PPC_OSI Architectures: ppc +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2907,6 +2911,7 @@ When this capability is enabled, KVM_EXIT_OSI can occur. 6.2 KVM_CAP_PPC_PAPR Architectures: ppc +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2926,6 +2931,7 @@ When this capability is enabled, KVM_EXIT_PAPR_HCALL can occur. 6.3 KVM_CAP_SW_TLB Architectures: ppc +Target: vcpu Parameters: args[0] is the address of a struct kvm_config_tlb Returns: 0 on success; -1 on error @@ -2968,6 +2974,7 @@ For mmu types KVM_MMU_FSL_BOOKE_NOHV and KVM_MMU_FSL_BOOKE_HV: 6.4 KVM_CAP_S390_CSS_SUPPORT Architectures: s390 +Target: vcpu Parameters: none Returns: 0 on success; -1 on error @@ -2979,9 +2986,13 @@ handled in-kernel, while the other I/O instructions are passed to userspace. When this capability is enabled, KVM_EXIT_S390_TSCH will occur on TEST SUBCHANNEL intercepts. +Note that even though this capability is enabled per-vcpu, the complete +virtual machine is affected. + 6.5 KVM_CAP_PPC_EPR Architectures: ppc +Target: vcpu Parameters: args[0] defines whether the proxy facility is active Returns: 0 on success; -1 on error @@ -3007,6 +3018,7 @@ This capability connects the vcpu to an in-kernel MPIC device. 6.7 KVM_CAP_IRQ_XICS Architectures: ppc +Target: vcpu Parameters: args[0] is the XICS device fd args[1] is the XICS CPU number (server ID) for this vcpu -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 1/9] KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block
From: David Hildenbrand d...@linux.vnet.ibm.com This patch cleans up the code in handle_wait by reusing the common code function kvm_vcpu_block. signal_pending(), kvm_cpu_has_pending_timer() and kvm_arch_vcpu_runnable() are sufficient for checking if we need to wake-up that VCPU. kvm_vcpu_block uses these functions, so no checks are lost. The flag timer_due can be removed - kvm_cpu_has_pending_timer() tests whether the timer is pending, thus the vcpu is correctly woken up. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Acked-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/include/asm/kvm_host.h | 1 - arch/s390/kvm/interrupt.c| 41 +--- arch/s390/kvm/kvm-s390.c | 3 +++ 3 files changed, 8 insertions(+), 37 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index c2ba020..b3acf28 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -305,7 +305,6 @@ struct kvm_s390_local_interrupt { struct list_head list; atomic_t active; struct kvm_s390_float_interrupt *float_int; - int timer_due; /* event indicator for waitqueue below */ wait_queue_head_t *wq; atomic_t *cpuflags; unsigned int action_bits; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 90c8de2..5fd11ce 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -585,60 +585,32 @@ int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) int kvm_s390_handle_wait(struct kvm_vcpu *vcpu) { u64 now, sltime; - DECLARE_WAITQUEUE(wait, current); vcpu-stat.exit_wait_state++; - if (kvm_cpu_has_interrupt(vcpu)) - return 0; - __set_cpu_idle(vcpu); - spin_lock_bh(vcpu-arch.local_int.lock); - vcpu-arch.local_int.timer_due = 0; - spin_unlock_bh(vcpu-arch.local_int.lock); + /* fast path */ + if (kvm_cpu_has_pending_timer(vcpu) || kvm_arch_vcpu_runnable(vcpu)) + return 0; if (psw_interrupts_disabled(vcpu)) { VCPU_EVENT(vcpu, 3, %s, disabled wait); - __unset_cpu_idle(vcpu); return -EOPNOTSUPP; /* disabled wait */ } + __set_cpu_idle(vcpu); if (!ckc_interrupts_enabled(vcpu)) { VCPU_EVENT(vcpu, 3, %s, enabled wait w/o timer); goto no_timer; } now = get_tod_clock_fast() + vcpu-arch.sie_block-epoch; - if (vcpu-arch.sie_block-ckc now) { - __unset_cpu_idle(vcpu); - return 0; - } - sltime = tod_to_ns(vcpu-arch.sie_block-ckc - now); - hrtimer_start(vcpu-arch.ckc_timer, ktime_set (0, sltime) , HRTIMER_MODE_REL); VCPU_EVENT(vcpu, 5, enabled wait via clock comparator: %llx ns, sltime); no_timer: srcu_read_unlock(vcpu-kvm-srcu, vcpu-srcu_idx); - spin_lock(vcpu-arch.local_int.float_int-lock); - spin_lock_bh(vcpu-arch.local_int.lock); - add_wait_queue(vcpu-wq, wait); - while (list_empty(vcpu-arch.local_int.list) - list_empty(vcpu-arch.local_int.float_int-list) - (!vcpu-arch.local_int.timer_due) - !signal_pending(current) - !kvm_s390_si_ext_call_pending(vcpu)) { - set_current_state(TASK_INTERRUPTIBLE); - spin_unlock_bh(vcpu-arch.local_int.lock); - spin_unlock(vcpu-arch.local_int.float_int-lock); - schedule(); - spin_lock(vcpu-arch.local_int.float_int-lock); - spin_lock_bh(vcpu-arch.local_int.lock); - } + kvm_vcpu_block(vcpu); __unset_cpu_idle(vcpu); - __set_current_state(TASK_RUNNING); - remove_wait_queue(vcpu-wq, wait); - spin_unlock_bh(vcpu-arch.local_int.lock); - spin_unlock(vcpu-arch.local_int.float_int-lock); vcpu-srcu_idx = srcu_read_lock(vcpu-kvm-srcu); hrtimer_try_to_cancel(vcpu-arch.ckc_timer); @@ -649,11 +621,8 @@ void kvm_s390_tasklet(unsigned long parm) { struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm; - spin_lock(vcpu-arch.local_int.lock); - vcpu-arch.local_int.timer_due = 1; if (waitqueue_active(vcpu-wq)) wake_up_interruptible(vcpu-wq); - spin_unlock(vcpu-arch.local_int.lock); } /* diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index fdf88f7..ecb1357 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -1068,6 +1068,9 @@ retry: goto retry; } + /* nothing to do, just clear the request */ + clear_bit(KVM_REQ_UNHALT, vcpu-requests); + return 0; } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
[GIT PULL 5/9] KVM: s390: remove the tasklet used by the hrtimer
From: David Hildenbrand d...@linux.vnet.ibm.com We can get rid of the tasklet used for waking up a VCPU in the hrtimer code but wakeup the VCPU directly. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/include/asm/kvm_host.h | 1 - arch/s390/kvm/interrupt.c| 13 + arch/s390/kvm/kvm-s390.c | 2 -- arch/s390/kvm/kvm-s390.h | 1 - 4 files changed, 1 insertion(+), 16 deletions(-) diff --git a/arch/s390/include/asm/kvm_host.h b/arch/s390/include/asm/kvm_host.h index b3acf28..773bef7 100644 --- a/arch/s390/include/asm/kvm_host.h +++ b/arch/s390/include/asm/kvm_host.h @@ -366,7 +366,6 @@ struct kvm_vcpu_arch { s390_fp_regs guest_fpregs; struct kvm_s390_local_interrupt local_int; struct hrtimerckc_timer; - struct tasklet_struct tasklet; struct kvm_s390_pgm_info pgm; union { struct cpuidcpu_id; diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 65396e1..1be3d8d 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -629,23 +629,12 @@ void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu) } } -void kvm_s390_tasklet(unsigned long parm) -{ - struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm; - kvm_s390_vcpu_wakeup(vcpu); -} - -/* - * low level hrtimer wake routine. Because this runs in hardirq context - * we schedule a tasklet to do the real work. - */ enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer) { struct kvm_vcpu *vcpu; vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer); - vcpu-preempted = true; - tasklet_schedule(vcpu-arch.tasklet); + kvm_s390_vcpu_wakeup(vcpu); return HRTIMER_NORESTART; } diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index b29a031..dd902e6 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -649,8 +649,6 @@ int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) return rc; } hrtimer_init(vcpu-arch.ckc_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS); - tasklet_init(vcpu-arch.tasklet, kvm_s390_tasklet, -(unsigned long) vcpu); vcpu-arch.ckc_timer.function = kvm_s390_idle_wakeup; get_cpu_id(vcpu-arch.cpu_id); vcpu-arch.cpu_id.version = 0xff; diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 665eacc..3862fa2 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -138,7 +138,6 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm) int kvm_s390_handle_wait(struct kvm_vcpu *vcpu); void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu); enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer); -void kvm_s390_tasklet(unsigned long parm); void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu); void kvm_s390_deliver_pending_machine_checks(struct kvm_vcpu *vcpu); void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu); -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 2/9] KVM: s390: remove _bh locking from local_int.lock
From: David Hildenbrand d...@linux.vnet.ibm.com local_int.lock is not used in a bottom-half handler anymore, therefore we can turn it into an ordinary spin_lock at all occurrences. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/interrupt.c | 32 arch/s390/kvm/kvm-s390.c | 4 ++-- arch/s390/kvm/sigp.c | 20 ++-- 3 files changed, 28 insertions(+), 28 deletions(-) diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 5fd11ce..86575b4 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -544,13 +544,13 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *vcpu) int rc = 0; if (atomic_read(li-active)) { - spin_lock_bh(li-lock); + spin_lock(li-lock); list_for_each_entry(inti, li-list, list) if (__interrupt_is_deliverable(vcpu, inti)) { rc = 1; break; } - spin_unlock_bh(li-lock); + spin_unlock(li-lock); } if ((!rc) atomic_read(fi-active)) { @@ -645,13 +645,13 @@ void kvm_s390_clear_local_irqs(struct kvm_vcpu *vcpu) struct kvm_s390_local_interrupt *li = vcpu-arch.local_int; struct kvm_s390_interrupt_info *n, *inti = NULL; - spin_lock_bh(li-lock); + spin_lock(li-lock); list_for_each_entry_safe(inti, n, li-list, list) { list_del(inti-list); kfree(inti); } atomic_set(li-active, 0); - spin_unlock_bh(li-lock); + spin_unlock(li-lock); /* clear pending external calls set by sigp interpretation facility */ atomic_clear_mask(CPUSTAT_ECALL_PEND, vcpu-arch.sie_block-cpuflags); @@ -670,7 +670,7 @@ void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu) if (atomic_read(li-active)) { do { deliver = 0; - spin_lock_bh(li-lock); + spin_lock(li-lock); list_for_each_entry_safe(inti, n, li-list, list) { if (__interrupt_is_deliverable(vcpu, inti)) { list_del(inti-list); @@ -681,7 +681,7 @@ void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu) } if (list_empty(li-list)) atomic_set(li-active, 0); - spin_unlock_bh(li-lock); + spin_unlock(li-lock); if (deliver) { __do_deliver_interrupt(vcpu, inti); kfree(inti); @@ -727,7 +727,7 @@ void kvm_s390_deliver_pending_machine_checks(struct kvm_vcpu *vcpu) if (atomic_read(li-active)) { do { deliver = 0; - spin_lock_bh(li-lock); + spin_lock(li-lock); list_for_each_entry_safe(inti, n, li-list, list) { if ((inti-type == KVM_S390_MCHK) __interrupt_is_deliverable(vcpu, inti)) { @@ -739,7 +739,7 @@ void kvm_s390_deliver_pending_machine_checks(struct kvm_vcpu *vcpu) } if (list_empty(li-list)) atomic_set(li-active, 0); - spin_unlock_bh(li-lock); + spin_unlock(li-lock); if (deliver) { __do_deliver_interrupt(vcpu, inti); kfree(inti); @@ -786,11 +786,11 @@ int kvm_s390_inject_program_int(struct kvm_vcpu *vcpu, u16 code) VCPU_EVENT(vcpu, 3, inject: program check %d (from kernel), code); trace_kvm_s390_inject_vcpu(vcpu-vcpu_id, inti-type, code, 0, 1); - spin_lock_bh(li-lock); + spin_lock(li-lock); list_add(inti-list, li-list); atomic_set(li-active, 1); BUG_ON(waitqueue_active(li-wq)); - spin_unlock_bh(li-lock); + spin_unlock(li-lock); return 0; } @@ -811,11 +811,11 @@ int kvm_s390_inject_prog_irq(struct kvm_vcpu *vcpu, inti-type = KVM_S390_PROGRAM_INT; memcpy(inti-pgm, pgm_info, sizeof(inti-pgm)); - spin_lock_bh(li-lock); + spin_lock(li-lock); list_add(inti-list, li-list); atomic_set(li-active, 1); BUG_ON(waitqueue_active(li-wq)); - spin_unlock_bh(li-lock); + spin_unlock(li-lock); return 0; } @@ -903,12 +903,12 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti) } dst_vcpu = kvm_get_vcpu(kvm,
[GIT PULL 8/9] KVM: s390: advertise KVM_CAP_S390_IRQCHIP
From: Cornelia Huck cornelia.h...@de.ibm.com We should advertise all capabilities, including those that can be enabled. Signed-off-by: Cornelia Huck cornelia.h...@de.ibm.com Acked-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/kvm-s390.c | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index dd902e6..339b34a 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -166,6 +166,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_IOEVENTFD: case KVM_CAP_DEVICE_CTRL: case KVM_CAP_ENABLE_CAP_VM: + case KVM_CAP_S390_IRQCHIP: case KVM_CAP_VM_ATTRIBUTES: case KVM_CAP_MP_STATE: r = 1; -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 3/9] KVM: s390: remove _bh locking from start_stop_lock
From: David Hildenbrand d...@linux.vnet.ibm.com The start_stop_lock is no longer acquired when in atomic context, therefore we can convert it into an ordinary spin_lock. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/kvm-s390.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index a7bda18..b29a031 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -1478,7 +1478,7 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu) trace_kvm_s390_vcpu_start_stop(vcpu-vcpu_id, 1); /* Only one cpu at a time may enter/leave the STOPPED state. */ - spin_lock_bh(vcpu-kvm-arch.start_stop_lock); + spin_lock(vcpu-kvm-arch.start_stop_lock); online_vcpus = atomic_read(vcpu-kvm-online_vcpus); for (i = 0; i online_vcpus; i++) { @@ -1504,7 +1504,7 @@ void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu) * Let's play safe and flush the VCPU at startup. */ vcpu-arch.sie_block-ihcpu = 0x; - spin_unlock_bh(vcpu-kvm-arch.start_stop_lock); + spin_unlock(vcpu-kvm-arch.start_stop_lock); return; } @@ -1518,7 +1518,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu) trace_kvm_s390_vcpu_start_stop(vcpu-vcpu_id, 0); /* Only one cpu at a time may enter/leave the STOPPED state. */ - spin_lock_bh(vcpu-kvm-arch.start_stop_lock); + spin_lock(vcpu-kvm-arch.start_stop_lock); online_vcpus = atomic_read(vcpu-kvm-online_vcpus); /* Need to lock access to action_bits to avoid a SIGP race condition */ @@ -1547,7 +1547,7 @@ void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu) __enable_ibs_on_vcpu(started_vcpu); } - spin_unlock_bh(vcpu-kvm-arch.start_stop_lock); + spin_unlock(vcpu-kvm-arch.start_stop_lock); return; } -- 1.8.4.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 0/9] KVM: s390: Fixes and cleanups for 3.17
Paolo, this should be the last bunch of s390 patches for 3.17 (on top of the mp_state changes). Please consider to apply. The following changes since commit 6352e4d2dd9a349024a41356148eced553e1dce4: KVM: s390: implement KVM_(S|G)ET_MP_STATE for user space state control (2014-07-10 14:11:17 +0200) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux.git tags/kvm-s390-20140721 for you to fetch changes up to e59d120f96687a606db0513c427f10e30a427cc4: KVM: s390: add ipte to trace event decoding (2014-07-21 13:22:47 +0200) Bugfixes - add IPTE to trace event decoder - document and advertise KVM_CAP_S390_IRQCHIP Cleanups - Reuse kvm_vcpu_block for s390 - Get rid of tasklet for wakup processing Christian Borntraeger (1): KVM: s390: add ipte to trace event decoding Cornelia Huck (3): KVM: document target of capability enablement KVM: s390: document KVM_CAP_S390_IRQCHIP KVM: s390: advertise KVM_CAP_S390_IRQCHIP David Hildenbrand (5): KVM: s390: cleanup handle_wait by reusing kvm_vcpu_block KVM: s390: remove _bh locking from local_int.lock KVM: s390: remove _bh locking from start_stop_lock KVM: s390: move vcpu wakeup code to a central point KVM: s390: remove the tasklet used by the hrtimer Documentation/virtual/kvm/api.txt | 27 -- arch/s390/include/asm/kvm_host.h | 2 - arch/s390/include/uapi/asm/sie.h | 1 + arch/s390/kvm/interrupt.c | 100 -- arch/s390/kvm/kvm-s390.c | 18 --- arch/s390/kvm/kvm-s390.h | 2 +- arch/s390/kvm/sigp.c | 36 ++ 7 files changed, 82 insertions(+), 104 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL 4/9] KVM: s390: move vcpu wakeup code to a central point
From: David Hildenbrand d...@linux.vnet.ibm.com Let's move the vcpu wakeup code to a central point. We should set the vcpu-preempted flag only if the target is actually sleeping and before the real wakeup happens. Otherwise the preempted flag might be set, when not necessary. This may result in immediate reschedules after schedule() in some scenarios. The wakeup code doesn't require the local_int.lock to be held. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Acked-by: Cornelia Huck cornelia.h...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com --- arch/s390/kvm/interrupt.c | 24 +++- arch/s390/kvm/kvm-s390.h | 1 + arch/s390/kvm/sigp.c | 20 ++-- 3 files changed, 22 insertions(+), 23 deletions(-) diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c index 86575b4..65396e1 100644 --- a/arch/s390/kvm/interrupt.c +++ b/arch/s390/kvm/interrupt.c @@ -617,12 +617,22 @@ no_timer: return 0; } +void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu) +{ + if (waitqueue_active(vcpu-wq)) { + /* +* The vcpu gave up the cpu voluntarily, mark it as a good +* yield-candidate. +*/ + vcpu-preempted = true; + wake_up_interruptible(vcpu-wq); + } +} + void kvm_s390_tasklet(unsigned long parm) { struct kvm_vcpu *vcpu = (struct kvm_vcpu *) parm; - - if (waitqueue_active(vcpu-wq)) - wake_up_interruptible(vcpu-wq); + kvm_s390_vcpu_wakeup(vcpu); } /* @@ -905,10 +915,8 @@ static int __inject_vm(struct kvm *kvm, struct kvm_s390_interrupt_info *inti) li = dst_vcpu-arch.local_int; spin_lock(li-lock); atomic_set_mask(CPUSTAT_EXT_INT, li-cpuflags); - if (waitqueue_active(li-wq)) - wake_up_interruptible(li-wq); - kvm_get_vcpu(kvm, sigcpu)-preempted = true; spin_unlock(li-lock); + kvm_s390_vcpu_wakeup(kvm_get_vcpu(kvm, sigcpu)); unlock_fi: spin_unlock(fi-lock); mutex_unlock(kvm-lock); @@ -1059,11 +1067,9 @@ int kvm_s390_inject_vcpu(struct kvm_vcpu *vcpu, if (inti-type == KVM_S390_SIGP_STOP) li-action_bits |= ACTION_STOP_ON_STOP; atomic_set_mask(CPUSTAT_EXT_INT, li-cpuflags); - if (waitqueue_active(vcpu-wq)) - wake_up_interruptible(vcpu-wq); - vcpu-preempted = true; spin_unlock(li-lock); mutex_unlock(vcpu-kvm-lock); + kvm_s390_vcpu_wakeup(vcpu); return 0; } diff --git a/arch/s390/kvm/kvm-s390.h b/arch/s390/kvm/kvm-s390.h index 33a0e4b..665eacc 100644 --- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -136,6 +136,7 @@ static inline int kvm_s390_user_cpu_state_ctrl(struct kvm *kvm) } int kvm_s390_handle_wait(struct kvm_vcpu *vcpu); +void kvm_s390_vcpu_wakeup(struct kvm_vcpu *vcpu); enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer); void kvm_s390_tasklet(unsigned long parm); void kvm_s390_deliver_pending_interrupts(struct kvm_vcpu *vcpu); diff --git a/arch/s390/kvm/sigp.c b/arch/s390/kvm/sigp.c index 946992f..c6f1c2b 100644 --- a/arch/s390/kvm/sigp.c +++ b/arch/s390/kvm/sigp.c @@ -125,8 +125,9 @@ static int __sigp_external_call(struct kvm_vcpu *vcpu, u16 cpu_addr) return rc ? rc : SIGP_CC_ORDER_CODE_ACCEPTED; } -static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action) +static int __inject_sigp_stop(struct kvm_vcpu *dst_vcpu, int action) { + struct kvm_s390_local_interrupt *li = dst_vcpu-arch.local_int; struct kvm_s390_interrupt_info *inti; int rc = SIGP_CC_ORDER_CODE_ACCEPTED; @@ -151,8 +152,7 @@ static int __inject_sigp_stop(struct kvm_s390_local_interrupt *li, int action) atomic_set(li-active, 1); li-action_bits |= action; atomic_set_mask(CPUSTAT_STOP_INT, li-cpuflags); - if (waitqueue_active(li-wq)) - wake_up_interruptible(li-wq); + kvm_s390_vcpu_wakeup(dst_vcpu); out: spin_unlock(li-lock); @@ -161,7 +161,6 @@ out: static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action) { - struct kvm_s390_local_interrupt *li; struct kvm_vcpu *dst_vcpu = NULL; int rc; @@ -171,9 +170,8 @@ static int __sigp_stop(struct kvm_vcpu *vcpu, u16 cpu_addr, int action) dst_vcpu = kvm_get_vcpu(vcpu-kvm, cpu_addr); if (!dst_vcpu) return SIGP_CC_NOT_OPERATIONAL; - li = dst_vcpu-arch.local_int; - rc = __inject_sigp_stop(li, action); + rc = __inject_sigp_stop(dst_vcpu, action); VCPU_EVENT(vcpu, 4, sent sigp stop to cpu %x, cpu_addr); @@ -258,8 +256,7 @@ static int __sigp_set_prefix(struct kvm_vcpu *vcpu, u16 cpu_addr, u32 address, list_add_tail(inti-list, li-list); atomic_set(li-active, 1); - if (waitqueue_active(li-wq)) -
Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)
Hi Marcelo, On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: struct kvm_vcpu_arch { /* * rip and regs accesses must go through @@ -392,6 +402,9 @@ struct kvm_mmu_memory_cache mmu_page_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + struct list_head pinned_mmu_pages; + atomic_t nr_pinned_ranges; + I’m not sure per-cpu pinned list is a good idea, since currently all vcpu are using the same page table, the per-list can not reduce lock-contention, why not make it be global to vm instead? struct fpu guest_fpu; u64 xcr0; u64 guest_supported_xcr0; Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 12:05:34.837161264 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 12:09:21.856684314 -0300 @@ -148,6 +148,9 @@ #define SPTE_HOST_WRITEABLE (1ULL PT_FIRST_AVAIL_BITS_SHIFT) #define SPTE_MMU_WRITEABLE(1ULL (PT_FIRST_AVAIL_BITS_SHIFT + 1)) +#define SPTE_PINNED (1ULL (PT64_SECOND_AVAIL_BITS_SHIFT)) + +#define SPTE_PINNED_BIT PT64_SECOND_AVAIL_BITS_SHIFT #define SHADOW_PT_INDEX(addr, level) PT64_INDEX(addr, level) @@ -327,6 +330,11 @@ return pte PT_PRESENT_MASK !is_mmio_spte(pte); } +static int is_pinned_spte(u64 spte) +{ + return spte SPTE_PINNED is_shadow_present_pte(spte); +} + static int is_large_pte(u64 pte) { return pte PT_PAGE_SIZE_MASK; @@ -1176,6 +1184,16 @@ kvm_flush_remote_tlbs(vcpu-kvm); } +static bool vcpu_has_pinned(struct kvm_vcpu *vcpu) +{ + return atomic_read(vcpu-arch.nr_pinned_ranges); +} + +static void mmu_reload_pinned_vcpus(struct kvm *kvm) +{ + make_all_cpus_request(kvm, KVM_REQ_MMU_RELOAD, vcpu_has_pinned); +} + /* * Write-protect on the specified @sptep, @pt_protect indicates whether * spte write-protection is caused by protecting shadow page table. @@ -1268,7 +1286,8 @@ } static int kvm_unmap_rmapp(struct kvm *kvm, unsigned long *rmapp, -struct kvm_memory_slot *slot, unsigned long data) +struct kvm_memory_slot *slot, unsigned long data, +bool age) { u64 *sptep; struct rmap_iterator iter; @@ -1278,6 +1297,14 @@ BUG_ON(!(*sptep PT_PRESENT_MASK)); rmap_printk(kvm_rmap_unmap_hva: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(*sptep)) { + /* don't nuke pinned sptes if page aging: return + * young=yes instead. + */ + if (age) + return 1; + mmu_reload_pinned_vcpus(kvm); + } drop_spte(kvm, step); This has a window between zapping spte and re-pin spte, so guest will fail at this time. need_tlb_flush = 1; } @@ -1286,7 +1313,8 @@ } static int kvm_set_pte_rmapp(struct kvm *kvm, unsigned long *rmapp, - struct kvm_memory_slot *slot, unsigned long data) + struct kvm_memory_slot *slot, unsigned long data, + bool age) { u64 *sptep; struct rmap_iterator iter; @@ -1304,6 +1332,9 @@ need_flush = 1; + if (is_pinned_spte(*sptep)) + mmu_reload_pinned_vcpus(kvm); + if (pte_write(*ptep)) { drop_spte(kvm, sptep); sptep = rmap_get_first(*rmapp, iter); @@ -1334,7 +1365,8 @@ int (*handler)(struct kvm *kvm, unsigned long *rmapp, struct kvm_memory_slot *slot, -unsigned long data)) +unsigned long data, +bool age)) { int j; int ret = 0; @@ -1374,7 +1406,7 @@ rmapp = __gfn_to_rmap(gfn_start, j, memslot); for (; idx = idx_end; ++idx) - ret |= handler(kvm, rmapp++, memslot, data); + ret |= handler(kvm, rmapp++, memslot, data, false); } } @@ -1385,7 +1417,8 @@ unsigned long data, int (*handler)(struct kvm *kvm, unsigned long *rmapp, struct kvm_memory_slot *slot, - unsigned long data)) + unsigned long data, + bool age)) { return kvm_handle_hva_range(kvm, hva, hva + 1, data, handler); } @@ -1406,7 +1439,8 @@ } static int kvm_age_rmapp(struct kvm
Re: [patch 3/4] KVM: MMU: reload request from GET_DIRTY_LOG path
On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: Reload remote vcpus MMU from GET_DIRTY_LOG codepath, before deleting a pinned spte. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com --- arch/x86/kvm/mmu.c | 29 +++-- 1 file changed, 23 insertions(+), 6 deletions(-) Index: kvm.pinned-sptes/arch/x86/kvm/mmu.c === --- kvm.pinned-sptes.orig/arch/x86/kvm/mmu.c 2014-07-09 11:23:59.290744490 -0300 +++ kvm.pinned-sptes/arch/x86/kvm/mmu.c 2014-07-09 11:24:58.449632435 -0300 @@ -1208,7 +1208,8 @@ * * Return true if tlb need be flushed. */ -static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) +static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect, +bool skip_pinned) { u64 spte = *sptep; @@ -1218,6 +1219,22 @@ rmap_printk(rmap_write_protect: spte %p %llx\n, sptep, *sptep); + if (is_pinned_spte(spte)) { + /* keep pinned spte intact, mark page dirty again */ + if (skip_pinned) { + struct kvm_mmu_page *sp; + gfn_t gfn; + + sp = page_header(__pa(sptep)); + gfn = kvm_mmu_page_get_gfn(sp, sptep - sp-spt); + + mark_page_dirty(kvm, gfn); + return false; + } else + mmu_reload_pinned_vcpus(kvm); + } + + if (pt_protect) spte = ~SPTE_MMU_WRITEABLE; spte = spte ~PT_WRITABLE_MASK; This is also a window between marking spte readonly and re-ping… IIUC, I think all spte spte can not be zapped and write-protected at any time -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 4/4] KVM: MMU: pinned sps are not candidates for deletion.
On Jul 10, 2014, at 3:12 AM, mtosa...@redhat.com wrote: Skip pinned shadow pages when selecting pages to zap. It seems there is no way to prevent changing pinned spte on zap-all path? I am thing if we could move pinned spte to another list (eg. pinned_shadow_pages) instead of active list so that it can not be touched by any other free paths. Your idea? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2 V3] PCI: introduce device assignment interface and refactory related code
On 07/11/2014 05:30 AM, Ethan Zhao wrote: This patch introduces two new device assignment functions pci_iov_assign_device(), pci_iov_deassign_device() along with the existed one pci_vfs_assigned() They construct the VFs assignment management interface, used to assign/ deassign device to VM and query the VFs reference counter. instead of direct manipulation of device flag. This patch refashioned the related code and make them atomic. v3: change the naming of device assignment helpers, because they work for all kind of PCI device, not only SR-IOV (david.vra...@citrix.com) v2: reorder the patchset and make it bisectable and atomic, steps clear between interface defination and implemenation according to the suggestion from alex.william...@redhat.com Signed-off-by: Ethan Zhao ethan.z...@oracle.com --- drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 17 ++--- drivers/pci/iov.c | 20 drivers/xen/xen-pciback/pci_stub.c |4 ++-- include/linux/pci.h|4 virt/kvm/assigned-dev.c|2 +- virt/kvm/iommu.c |4 ++-- 6 files changed, 31 insertions(+), 20 deletions(-) diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c index 02c11a7..781040e 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c +++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c @@ -693,22 +693,9 @@ complete_reset: static bool i40e_vfs_are_assigned(struct i40e_pf *pf) { struct pci_dev *pdev = pf-pdev; - struct pci_dev *vfdev; - - /* loop through all the VFs to see if we own any that are assigned */ - vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, I40E_DEV_ID_VF , NULL); - while (vfdev) { - /* if we don't own it we don't care */ - if (vfdev-is_virtfn pci_physfn(vfdev) == pdev) { - /* if it is assigned we cannot release it */ - if (vfdev-dev_flags PCI_DEV_FLAGS_ASSIGNED) - return true; - } - vfdev = pci_get_device(PCI_VENDOR_ID_INTEL, -I40E_DEV_ID_VF, -vfdev); - } + if (pci_vfs_assigned(pdev)) + return true; return false; } This portion for i40e should be in one patch by itself. It shouldn't be included in the bits below. Normally this would go through netdev. The rest of this below would go through linux-pci. diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c index de7a747..090f827 100644 --- a/drivers/pci/iov.c +++ b/drivers/pci/iov.c @@ -644,6 +644,26 @@ int pci_vfs_assigned(struct pci_dev *dev) EXPORT_SYMBOL_GPL(pci_vfs_assigned); /** + * pci_iov_assign_device - assign device to VM + * @pdev: the device to be assigned. + */ +void pci_iov_assign_device(struct pci_dev *pdev) +{ + pdev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +} +EXPORT_SYMBOL_GPL(pci_iov_assign_device); + +/** + * pci_iov_deassign_device - deasign device from VM + * @pdev: the device to be deassigned. + */ +void pci_iov_deassign_device(struct pci_dev *pdev) +{ + pdev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; +} +EXPORT_SYMBOL_GPL(pci_iov_deassign_device); + +/** * pci_sriov_set_totalvfs -- reduce the TotalVFs available * @dev: the PCI PF device * @numvfs: number that should be used for TotalVFs supported The two functions above don't have anything to do with IOV. You can direct assign a device that doesn't even support SR-IOV or MR-IOV. You might be better off defining this as something like pci_set_flag_assigned/pci_clear_flag_assigned. I would likely also make them inline and possibly move them to pci.h since it would likely result in less actual code after you consider the overhead to push everything on the stack prior to making the call. diff --git a/drivers/xen/xen-pciback/pci_stub.c b/drivers/xen/xen-pciback/pci_stub.c index 62fcd48..27e00d1 100644 --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; + pci_iov_deassign_device(dev); pci_dev_put(dev); kfree(psdev); @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(dev-dev, reset device\n); xen_pcibk_reset_device(dev); - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_iov_assign_device(dev); return 0; config_release: diff --git a/include/linux/pci.h b/include/linux/pci.h index aab57b4..5ece6d6 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1603,6
buildbot failure in kvm on ia64
The Buildbot has detected a new failure on builder ia64 while building kvm. Full details are available at: http://buildbot.b1-systems.de/kvm/builders/ia64/builds/1315 Buildbot URL: http://buildbot.b1-systems.de/kvm/ Buildslave for this Build: b1_kvm_1 Build Reason: The Nightly scheduler named 'nightly_master' triggered this build Build Source Stamp: [branch master] HEAD Blamelist: BUILD FAILED: failed git sincerely, -The Buildbot
Re: [patch 2/4] KVM: MMU: allow pinning spte translations (TDP-only)
On 07/22/2014 05:46 AM, Xiao Guangrong wrote: +if (is_pinned_spte(*sptep)) { +/* don't nuke pinned sptes if page aging: return + * young=yes instead. + */ +if (age) +return 1; +mmu_reload_pinned_vcpus(kvm); +} drop_spte(kvm, step); This has a window between zapping spte and re-pin spte, so guest will fail at this time. I got it, mmu_reload_pinned_vcpus will kick all vcpus out of guest and pin the pages again... so it is ok. :) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 06/25/2014 11:19 AM, Benjamin Herrenschmidt wrote: On Wed, 2014-06-25 at 11:05 +0800, Mike Qiu wrote: Here maybe /sys/kernel/debug/powerpc/errinjct is better, because it will supply PCI_domain_nr in parameters, so no need supply errinjct for each PCI domain. Another reason is error inject not only for PCI(in future), so better not in PCI domain entry. Also it simple for userland tools to has a fixed path. I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Thanks, Mike Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail
-Original Message- From: Linuxppc-dev [mailto:linuxppc-dev- bounces+mihai.caraman=freescale@lists.ozlabs.org] On Behalf Of mihai.cara...@freescale.com Sent: Friday, July 18, 2014 12:06 PM To: Alexander Graf; kvm-ppc@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org; k...@vger.kernel.org Subject: RE: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail -Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 17, 2014 5:21 PM To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH v5 4/5] KVM: PPC: Alow kvmppc_get_last_inst() to fail On 17.07.14 13:22, Mihai Caraman wrote: On book3e, guest last instruction is read on the exit path using load external pid (lwepx) dedicated instruction. This load operation may fail due to TLB eviction and execute-but-not-read entries. This patch lay down the path for an alternative solution to read the guest last instruction, by allowing kvmppc_get_lat_inst() function to fail. Architecture specific implmentations of kvmppc_load_last_inst() may read last guest instruction and instruct the emulation layer to re-execute the guest in case of failure. Make kvmppc_get_last_inst() definition common between architectures. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- ... diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h index e2fd5a1..7f9c634 100644 --- a/arch/powerpc/include/asm/kvm_ppc.h +++ b/arch/powerpc/include/asm/kvm_ppc.h @@ -47,6 +47,11 @@ enum emulation_result { EMULATE_EXIT_USER,/* emulation requires exit to user- space */ }; +enum instruction_type { + INST_GENERIC, + INST_SC,/* system call */ +}; + extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu); extern void kvmppc_handler_highmem(void); @@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct kvm_vcpu *vcpu, u64 val, unsigned int bytes, int is_default_endian); +extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu, + enum instruction_type type, u32 *inst); + extern int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu); extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu); @@ -234,6 +242,23 @@ struct kvmppc_ops { extern struct kvmppc_ops *kvmppc_hv_ops; extern struct kvmppc_ops *kvmppc_pr_ops; +static inline int kvmppc_get_last_inst(struct kvm_vcpu *vcpu, + enum instruction_type type, u32 *inst) +{ + int ret = EMULATE_DONE; + + /* Load the instruction manually if it failed to do so in the + * exit path */ + if (vcpu-arch.last_inst == KVM_INST_FETCH_FAILED) + ret = kvmppc_load_last_inst(vcpu, type, vcpu- arch.last_inst); + + + *inst = (ret == EMULATE_DONE kvmppc_need_byteswap(vcpu)) ? + swab32(vcpu-arch.last_inst) : vcpu-arch.last_inst; This makes even less sense than the previous version. Either you treat inst as definitely overwritten or as preserves previous data on failure. Both v4 and v5 versions treat inst as definitely overwritten. So either you unconditionally swap like you did before If we make abstraction of its symmetry, KVM_INST_FETCH_FAILED is operated in host endianness, so it doesn't need byte swap. I agree with your reasoning if last_inst is initialized and compared with data in guest endianess, which is not the case yet for KVM_INST_FETCH_FAILED. Alex, are you relying on the fact that KVM_INST_FETCH_FAILED value is symmetrical? With a non symmetrical value like 0xDEADBEEF, and considering a little-endian guest on a big-endian host, we need to fix kvm logic to initialize and compare last_inst with 0xEFBEADDE swaped value. Your suggestion to unconditionally swap makes sense only with the above fix, otherwise inst may end up with 0xEFBEADDE swaped value with is wrong. -Mike N�r��yb�X��ǧv�^�){.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Thursday, July 03, 2014 3:21 PM To: Caraman Mihai Claudiu-B02008; kvm-ppc@vger.kernel.org Cc: k...@vger.kernel.org; linuxppc-...@lists.ozlabs.org Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers On 30.06.14 17:34, Mihai Caraman wrote: Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec which share the same interrupt numbers. Signed-off-by: Mihai Caraman mihai.cara...@freescale.com --- v2: - remove outdated definitions arch/powerpc/include/asm/kvm_asm.h| 8 arch/powerpc/kvm/booke.c | 17 + arch/powerpc/kvm/booke.h | 4 ++-- arch/powerpc/kvm/booke_interrupts.S | 9 + arch/powerpc/kvm/bookehv_interrupts.S | 4 ++-- arch/powerpc/kvm/e500.c | 10 ++ arch/powerpc/kvm/e500_emulate.c | 10 ++ 7 files changed, 30 insertions(+), 32 deletions(-) diff --git a/arch/powerpc/include/asm/kvm_asm.h b/arch/powerpc/include/asm/kvm_asm.h index 9601741..c94fd33 100644 --- a/arch/powerpc/include/asm/kvm_asm.h +++ b/arch/powerpc/include/asm/kvm_asm.h @@ -56,14 +56,6 @@ /* E500 */ #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32 #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33 -/* - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines - */ -#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \ - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST I think I'd prefer to keep them separate. #define BOOKE_INTERRUPT_SPE_FP_ROUND 34 #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35 #define BOOKE_INTERRUPT_DOORBELL 36 diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..3c86d9b 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu *vcpu, case BOOKE_IRQPRIO_ITLB_MISS: case BOOKE_IRQPRIO_SYSCALL: case BOOKE_IRQPRIO_FP_UNAVAIL: - case BOOKE_IRQPRIO_SPE_UNAVAIL: - case BOOKE_IRQPRIO_SPE_FP_DATA: + case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL: + case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST: #ifdef CONFIG_KVM_E500V2 case ...SPE: #else case ..ALTIVEC: #endif case BOOKE_IRQPRIO_SPE_FP_ROUND: case BOOKE_IRQPRIO_AP_UNAVAIL: allowed = 1; @@ -977,18 +977,19 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, break; #ifdef CONFIG_SPE - case BOOKE_INTERRUPT_SPE_UNAVAIL: { + case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: { if (vcpu-arch.shared-msr MSR_SPE) kvmppc_vcpu_enable_spe(vcpu); else kvmppc_booke_queue_irqprio(vcpu, - BOOKE_IRQPRIO_SPE_UNAVAIL); + BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL); r = RESUME_GUEST; break; } - case BOOKE_INTERRUPT_SPE_FP_DATA: - kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA); + case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST: + kvmppc_booke_queue_irqprio(vcpu, + BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST); r = RESUME_GUEST; break; @@ -997,7 +998,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, r = RESUME_GUEST; break; #else - case BOOKE_INTERRUPT_SPE_UNAVAIL: + case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: /* * Guest wants SPE, but host kernel doesn't support it. Send * an unimplemented operation program check to the guest. @@ -1010,7 +1011,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu, * These really should never happen without CONFIG_SPE, * as we should never enable the real MSR[SPE] in the guest. */ - case BOOKE_INTERRUPT_SPE_FP_DATA: + case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST: case BOOKE_INTERRUPT_SPE_FP_ROUND: printk(KERN_CRIT %s: unexpected SPE interrupt %u at %08lx\n, __func__, exit_nr, vcpu-arch.pc); diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h index b632cd3..f182b32 100644 --- a/arch/powerpc/kvm/booke.h +++ b/arch/powerpc/kvm/booke.h @@ -32,8 +32,8 @@ #define BOOKE_IRQPRIO_ALIGNMENT 2 #define BOOKE_IRQPRIO_PROGRAM 3 #define BOOKE_IRQPRIO_FP_UNAVAIL 4 -#define BOOKE_IRQPRIO_SPE_UNAVAIL 5 -#define BOOKE_IRQPRIO_SPE_FP_DATA 6 +#define
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. Thanks, Mike Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On Tue, 2014-07-22 at 11:10 +0800, Mike Qiu wrote: On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. No, we certainly don't want that RTAS stuff. I though Linux had some kind of error injection infrastructure nowadays... somebody needs to have a look. Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On Tue, Jul 22, 2014 at 11:10:42AM +0800, Mike Qiu wrote: On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. We won't use syscall for routing the error injection on PowerNV any more. Generally speaking, we will use ioctl commands or subcode of EEH ioctl command, which was invented for EEH support for VFIO devices to suport QEMU. For the utility (errinjct) running on PowerNV, we will use debugfs entries. I have premature code for that, but don't have chance to polish it yet. Let me send you that so that you can start working from there. Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 2/3] powerpc/powernv: Support PCI error injection
On 07/22/2014 11:26 AM, Gavin Shan wrote: On Tue, Jul 22, 2014 at 11:10:42AM +0800, Mike Qiu wrote: On 07/22/2014 06:49 AM, Benjamin Herrenschmidt wrote: On Mon, 2014-07-21 at 16:06 +0800, Mike Qiu wrote: I don't like this. I much prefer have dedicated error injection files in their respective locations, something for PCI under the corresponding PCI bridge etc... So PowerNV error injection will be designed rely on debugfs been configured, right? Not necessarily. If we create a better debugfs layout for our PHBs, then yes. It might be useful to provide more info in there for example access to some of the counters ... But on the other hand, for error injection in general, I wonder if we should be under sysfs instead... something to study a bit. In pHyp, general error injection use syscall: #define __NR_rtas255 I don't know if it is a good idea to reuse this syscall for PowerNV. At least, it is another choice without sysfs rely. We won't use syscall for routing the error injection on PowerNV any more. Generally speaking, we will use ioctl commands or subcode of EEH ioctl command, which was invented for EEH support for VFIO devices to suport QEMU. For the utility (errinjct) running on PowerNV, we will use debugfs entries. I have premature code for that, but don't have chance to polish it yet. Let me send you that so that you can start working from there. OK, thanks Thanks, Gavin -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html