Re: Commit f9afbd45b0d0 broke mips r4k.
On 06/13/2013 04:00:02 AM, Ralf Baechle wrote: On Wed, Jun 12, 2013 at 09:35:16PM -0500, Rob Landley wrote: My aboriginal linux project builds tiny linux systems to run under qemu, producing as close to the same system as possible across a bunch of different architectures. The above change broke the mips r4k build I've been running under qemu. Here's a toolchain and reproduction sequence: wget http://landley.net/aboriginal/bin/cross-compiler-mips.tar.bz2 tar xvjf cross-compiler-mips.tar.bz2 export PATH=$PWD/cross-compiler-mips/bin:$PATH make ARCH=mips allnoconfig KCONFIG_ALLCONFIG=miniconfig.mips make CROSS_COMPILE=mips- ARCH=mips (The file miniconfig.mips is attached.) It ends: CC init/version.o LD init/built-in.o arch/mips/built-in.o: In function `local_r4k_flush_cache_page': c-r4k.c:(.text+0xe278): undefined reference to `kvm_local_flush_tlb_all' c-r4k.c:(.text+0xe278): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' arch/mips/built-in.o: In function `local_flush_tlb_range': (.text+0xe938): undefined reference to `kvm_local_flush_tlb_all' arch/mips/built-in.o: In function `local_flush_tlb_range': (.text+0xe938): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' arch/mips/built-in.o: In function `local_flush_tlb_mm': (.text+0xed38): undefined reference to `kvm_local_flush_tlb_all' arch/mips/built-in.o: In function `local_flush_tlb_mm': (.text+0xed38): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' kernel/built-in.o: In function `__schedule': core.c:(.sched.text+0x16a0): undefined reference to `kvm_local_flush_tlb_all' core.c:(.sched.text+0x16a0): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' mm/built-in.o: In function `use_mm': (.text+0x182c8): undefined reference to `kvm_local_flush_tlb_all' mm/built-in.o: In function `use_mm': (.text+0x182c8): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' fs/built-in.o:(.text+0x7b50): more undefined references to `kvm_local_flush_tlb_all' follow fs/built-in.o: In function `flush_old_exec': (.text+0x7b50): relocation truncated to fit: R_MIPS_26 against `kvm_local_flush_tlb_all' Revert the above commit and it builds to the end. Commit d414976d1ca721456f7b7c603a8699d117c2ec07 [MIPS: include: mmu_context.h: Replace VIRTUALIZATION with KVM] fixes the issue and was pulled by Linus only yesterday. I cannot reproduce the error following your receipe using the latest Linux/MIPS tree. Confirmed, that fixed it. Thanks, Rob-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 19/31] mips/kvm: Add host definitions for MIPS VZ based host.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 20/31] mips/kvm: Hook into TLB fault handlers.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 21/31] mips/kvm: Allow set_except_vector() to be used from MIPSVZ code.
On Fri, Jun 07, 2013 at 04:03:25PM -0700, David Daney wrote: From: David Daney david.da...@cavium.com We need to move it out of __init so we don't have section mismatch problems. Signed-off-by: David Daney david.da...@cavium.com --- arch/mips/include/asm/uasm.h | 2 +- arch/mips/kernel/traps.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/mips/include/asm/uasm.h b/arch/mips/include/asm/uasm.h index 370d967..90b4f5e 100644 --- a/arch/mips/include/asm/uasm.h +++ b/arch/mips/include/asm/uasm.h @@ -11,7 +11,7 @@ #include linux/types.h -#ifdef CONFIG_EXPORT_UASM +#if defined(CONFIG_EXPORT_UASM) || IS_ENABLED(CONFIG_KVM_MIPSVZ) #include linux/export.h #define __uasminit #define __uasminitdata I'd rather keep KVM bits out of uasm.h. A select EXPORT_UASM in Kconfig would have been cleaner - but read below. diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c index f008795..fca0a2f 100644 --- a/arch/mips/kernel/traps.c +++ b/arch/mips/kernel/traps.c @@ -1457,7 +1457,7 @@ unsigned long ebase; unsigned long exception_handlers[32]; unsigned long vi_handlers[64]; -void __init *set_except_vector(int n, void *addr) +void __uasminit *set_except_vector(int n, void *addr) A __uasminit tag is a bit unobvious because set_except_vector is not part of uasm. I could understand __cpuinit - but of course that doesn't sort your problem. Maybe we should just drop the __init tag alltogether? Or if we really want set_except_vector to become part of the uasm subsystem, then probably it's declaration should move from setup.h to uasm.h. Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 22/31] mips/kvm: Split get_new_mmu_context into two parts.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 23/31] mips/kvm: Hook into CP unusable exception handler.
On Fri, Jun 07, 2013 at 04:03:27PM -0700, David Daney wrote: From: David Daney david.da...@cavium.com The MIPS VZ KVM code needs this to be able to manage the FPU. Signed-off-by: David Daney david.da...@cavium.com Looks good, Acked-by: Ralf Baechle r...@linux-mips.org. However I get cold shivers at the thought of SMTC FPU management with VZ, it sounds like a source of new entertainment ... But thinkin gaobu this is something for another rainy day, not now. Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 24/31] mips/kvm: Add thread_struct fields used by MIPSVZ hosts.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 25/31] mips/kvm: Add some asm-offsets constants used by MIPSVZ.
Patch looks ok but why not combine this patch with the previous one? Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 26/31] mips/kvm: Split up Kconfig and Makefile definitions in preperation for MIPSVZ.
The Trademark guys (and readability in general) sould probably be happier if MIPSTE was spelled as MIPS_TE and for that matter, MIPZVZ as MIPS_VZ? Other than that, Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 31/31] mips/kvm: Allow for upto 8 KVM vcpus per vm.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 27/31] mips/kvm: Gate the use of kvm_local_flush_tlb_all() by KVM_MIPSTE
On Fri, Jun 07, 2013 at 04:03:31PM -0700, David Daney wrote: From: David Daney david.da...@cavium.com Only the trap-and-emulate KVM code needs a Special tlb flusher. All other configurations should use the regular version. Signed-off-by: David Daney david.da...@cavium.com --- arch/mips/include/asm/mmu_context.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/mips/include/asm/mmu_context.h b/arch/mips/include/asm/mmu_context.h index 5609a32..04d0b74 100644 --- a/arch/mips/include/asm/mmu_context.h +++ b/arch/mips/include/asm/mmu_context.h @@ -117,7 +117,7 @@ get_new_asid(unsigned long cpu) if (! ((asid += ASID_INC) ASID_MASK) ) { if (cpu_has_vtag_icache) flush_icache_all(); -#ifdef CONFIG_VIRTUALIZATION +#if IS_ENABLED(CONFIG_KVM_MIPSTE) kvm_local_flush_tlb_all(); /* start new asid cycle */ #else local_flush_tlb_all(); /* start new asid cycle */ Sanjay, it would seem this is actually a bug if KVM is built as a module and should be fixed for 3.10? Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 30/31] mips/kvm: Enable MIPSVZ in Kconfig/Makefile
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 29/31] mips/kvm: Add MIPSVZ support.
Acked-by: Ralf Baechle r...@linux-mips.org Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/31] KVM/MIPS: Implement hardware virtualization via the MIPS-VZ extensions.
On Sun, Jun 09, 2013 at 04:23:51PM -0700, David Daney wrote: Come to think of it, Emulating SGI hardware might be an interesting case. There may be old IRIX systems and applications that could be running low on real hardware. Some of those systems take up a whole room and draw a lot of power. They might run faster and at much lower power consumption on a modern 48-Way SMP SoC based system. Many SGI MIPS system have RTCs powered by builtin batteries with a nominal livetime of ten years and for which no more replacements are available. This is beginning to limit usable SGI MIPS systems to those who know how to solve these issues with a Dremel and a soldering iron. That said, SGI platforms are all more or less weird custom architectures so the platform emulation - let alone the firmware blobs - will be a chunk of work. Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 28/31] mips/kvm: Only use KVM_COALESCED_MMIO_PAGE_OFFSET with KVM_MIPSTE
On Fri, Jun 07, 2013 at 04:03:32PM -0700, David Daney wrote: From: David Daney david.da...@cavium.com The forthcoming MIPSVZ code doesn't currently use this, so it must only be enabled for KVM_MIPSTE. Signed-off-by: David Daney david.da...@cavium.com --- arch/mips/include/asm/kvm_host.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/mips/include/asm/kvm_host.h b/arch/mips/include/asm/kvm_host.h index 505b804..9f209e1 100644 --- a/arch/mips/include/asm/kvm_host.h +++ b/arch/mips/include/asm/kvm_host.h @@ -25,7 +25,9 @@ /* memory slots that does not exposed to userspace */ #define KVM_PRIVATE_MEM_SLOTS0 +#ifdef CONFIG_KVM_MIPSTE #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 +#endif What if KVM_MIPSTE and KVM_MIPSVZ both get enabled? Ralf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [virt-tools-list] cache write back barriers
On Fri, Jun 14, 2013 at 12:53:04PM +0200, Stefan Hajnoczi wrote: On Thu, Jun 13, 2013 at 10:47:32AM +0200, folkert wrote: Hi, In virt-manager I saw that there's the option for cache writeback for storage devices. I'm wondering: does this also make kvm to ignore write barriers invoked by the virtual machine? Looking at current git, the cache types supported by virt-manager are: - none - writethrough - writeback - default [virt-manager only, not in virt-install] These translate directly into the libvirt driver ... cache=... field which you can find documented here: http://libvirt.org/formatdomain.html#elementsDisks As far as I can tell (from looking at libvirt sources) as long as you have a modern qemu these will translate to the same names on the qemu command line. No, that would be unsafe. When the guest issues a flush then QEMU will ensure that data reaches the disk with -drive cache=writeback. Aha so the writeback behaves like the consume harddisks with write-cache on them. In answer to the original question by 'folkert': In that case maybe an extra note could be added to the virt-manager (excellent software by the way!) that if the client vm supports barriers, that write-back in that case then is safe. Agree? I suspect the problem with doing this is it depends on the hypervisor. Likely for qemu and Xen (since it uses a qemu device model) this would be true. Possibly not for other hypervisors that virt-manager can control. Generally speaking, it would be nice to document these properly and also how they are implemented in different hypervisors, because I know I for one don't find these settings very obvious. So, patches welcome! Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-p2v converts physical machines to virtual machines. Boot with a live CD or over the network (PXE) and turn machines into KVM guests. http://libguestfs.org/virt-v2v -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 02/29] kvm: Change cpu_synchronize_state() argument to CPUState
Change Monitor::mon_cpu to CPUState as well. In cpu_synchronize_all_states() use qemu_for_each_cpu() now. Reviewed-by: liguang lig.f...@cn.fujitsu.com Signed-off-by: Andreas Färber afaer...@suse.de --- cpus.c | 8 gdbstub.c | 8 hw/i386/kvm/apic.c | 2 +- hw/i386/kvmvapic.c | 4 ++-- hw/misc/vmport.c| 2 +- hw/ppc/ppce500_spin.c | 2 +- include/sysemu/kvm.h| 4 ++-- monitor.c | 6 +++--- target-i386/helper.c| 4 ++-- target-i386/kvm.c | 2 +- target-ppc/mmu-hash64.c | 2 +- target-ppc/translate.c | 2 +- target-s390x/kvm.c | 9 + 13 files changed, 28 insertions(+), 27 deletions(-) diff --git a/cpus.c b/cpus.c index c232265..3260f09 100644 --- a/cpus.c +++ b/cpus.c @@ -407,10 +407,10 @@ void hw_error(const char *fmt, ...) void cpu_synchronize_all_states(void) { -CPUArchState *cpu; +CPUArchState *env; -for (cpu = first_cpu; cpu; cpu = cpu-next_cpu) { -cpu_synchronize_state(cpu); +for (env = first_cpu; env; env = env-next_cpu) { +cpu_synchronize_state(ENV_GET_CPU(env)); } } @@ -1219,7 +1219,7 @@ CpuInfoList *qmp_query_cpus(Error **errp) CPUState *cpu = ENV_GET_CPU(env); CpuInfoList *info; -cpu_synchronize_state(env); +cpu_synchronize_state(cpu); info = g_malloc0(sizeof(*info)); info-value = g_malloc0(sizeof(*info-value)); diff --git a/gdbstub.c b/gdbstub.c index 94c78ce..bbae06d 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -2033,7 +2033,7 @@ static void gdb_breakpoint_remove_all(void) static void gdb_set_cpu_pc(GDBState *s, target_ulong pc) { -cpu_synchronize_state(s-c_cpu); +cpu_synchronize_state(ENV_GET_CPU(s-c_cpu)); #if defined(TARGET_I386) s-c_cpu-eip = pc; #elif defined (TARGET_PPC) @@ -2232,7 +2232,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) } break; case 'g': -cpu_synchronize_state(s-g_cpu); +cpu_synchronize_state(ENV_GET_CPU(s-g_cpu)); env = s-g_cpu; len = 0; for (addr = 0; addr num_g_regs; addr++) { @@ -2243,7 +2243,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) put_packet(s, buf); break; case 'G': -cpu_synchronize_state(s-g_cpu); +cpu_synchronize_state(ENV_GET_CPU(s-g_cpu)); env = s-g_cpu; registers = mem_buf; len = strlen(p) / 2; @@ -2411,7 +2411,7 @@ static int gdb_handle_packet(GDBState *s, const char *line_buf) env = find_cpu(thread); if (env != NULL) { CPUState *cpu = ENV_GET_CPU(env); -cpu_synchronize_state(env); +cpu_synchronize_state(cpu); len = snprintf((char *)mem_buf, sizeof(mem_buf), CPU#%d [%s], cpu-cpu_index, cpu-halted ? halted : running); diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c index 8f80425..bd0bdd8 100644 --- a/hw/i386/kvm/apic.c +++ b/hw/i386/kvm/apic.c @@ -129,7 +129,7 @@ static void do_inject_external_nmi(void *data) uint32_t lvt; int ret; -cpu_synchronize_state(s-cpu-env); +cpu_synchronize_state(cpu); lvt = s-lvt[APIC_LVT_LINT1]; if (!(lvt APIC_LVT_MASKED) ((lvt 8) 7) == APIC_DM_NMI) { diff --git a/hw/i386/kvmvapic.c b/hw/i386/kvmvapic.c index 655483b..f93629f 100644 --- a/hw/i386/kvmvapic.c +++ b/hw/i386/kvmvapic.c @@ -456,7 +456,7 @@ void vapic_report_tpr_access(DeviceState *dev, CPUState *cs, target_ulong ip, X86CPU *cpu = X86_CPU(cs); CPUX86State *env = cpu-env; -cpu_synchronize_state(env); +cpu_synchronize_state(cs); if (evaluate_tpr_instruction(s, env, ip, access) 0) { if (s-state == VAPIC_ACTIVE) { @@ -627,7 +627,7 @@ static void vapic_write(void *opaque, hwaddr addr, uint64_t data, hwaddr rom_paddr; VAPICROMState *s = opaque; -cpu_synchronize_state(env); +cpu_synchronize_state(CPU(x86_env_get_cpu(env))); /* * The VAPIC supports two PIO-based hypercalls, both via port 0x7E. diff --git a/hw/misc/vmport.c b/hw/misc/vmport.c index 57b71f5..8363dfd 100644 --- a/hw/misc/vmport.c +++ b/hw/misc/vmport.c @@ -66,7 +66,7 @@ static uint64_t vmport_ioport_read(void *opaque, hwaddr addr, unsigned char command; uint32_t eax; -cpu_synchronize_state(env); +cpu_synchronize_state(CPU(x86_env_get_cpu(env))); eax = env-regs[R_EAX]; if (eax != VMPORT_MAGIC) diff --git a/hw/ppc/ppce500_spin.c b/hw/ppc/ppce500_spin.c index 1290d37..ea65414 100644 --- a/hw/ppc/ppce500_spin.c +++ b/hw/ppc/ppce500_spin.c @@ -98,7 +98,7 @@ static void spin_kick(void *data) hwaddr map_size = 64 * 1024 * 1024; hwaddr map_start; -cpu_synchronize_state(env); +cpu_synchronize_state(cpu); stl_p(curspin-pir, env-spr[SPR_PIR]); env-nip = ldq_p(curspin-addr) (map_size - 1);
[PATCH qom-cpu v2 20/29] kvm: Change kvm_remove_all_breakpoints() argument to CPUState
Signed-off-by: Andreas Färber afaer...@suse.de --- gdbstub.c| 2 +- include/sysemu/kvm.h | 2 +- kvm-all.c| 5 ++--- kvm-stub.c | 2 +- 4 files changed, 5 insertions(+), 6 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 3101a43..9e7f7a1 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -2019,7 +2019,7 @@ static void gdb_breakpoint_remove_all(void) CPUArchState *env; if (kvm_enabled()) { -kvm_remove_all_breakpoints(gdbserver_state-c_cpu); +kvm_remove_all_breakpoints(ENV_GET_CPU(gdbserver_state-c_cpu)); return; } diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index fe8bc40..c767488 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -163,7 +163,7 @@ int kvm_insert_breakpoint(CPUArchState *current_env, target_ulong addr, target_ulong len, int type); int kvm_remove_breakpoint(CPUArchState *current_env, target_ulong addr, target_ulong len, int type); -void kvm_remove_all_breakpoints(CPUArchState *current_env); +void kvm_remove_all_breakpoints(CPUState *current_cpu); int kvm_update_guest_debug(CPUArchState *env, unsigned long reinject_trap); #ifndef _WIN32 int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset); diff --git a/kvm-all.c b/kvm-all.c index 90b89cd..b3ba6aa 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1979,9 +1979,8 @@ int kvm_remove_breakpoint(CPUArchState *current_env, target_ulong addr, return 0; } -void kvm_remove_all_breakpoints(CPUArchState *current_env) +void kvm_remove_all_breakpoints(CPUState *current_cpu) { -CPUState *current_cpu = ENV_GET_CPU(current_env); struct kvm_sw_breakpoint *bp, *next; KVMState *s = current_cpu-kvm_state; CPUArchState *env; @@ -2026,7 +2025,7 @@ int kvm_remove_breakpoint(CPUArchState *current_env, target_ulong addr, return -EINVAL; } -void kvm_remove_all_breakpoints(CPUArchState *current_env) +void kvm_remove_all_breakpoints(CPUState *current_cpu) { } #endif /* !KVM_CAP_SET_GUEST_DEBUG */ diff --git a/kvm-stub.c b/kvm-stub.c index 5457fe8..f614f92 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -95,7 +95,7 @@ int kvm_remove_breakpoint(CPUArchState *current_env, target_ulong addr, return -EINVAL; } -void kvm_remove_all_breakpoints(CPUArchState *current_env) +void kvm_remove_all_breakpoints(CPUState *current_cpu) { } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 11/29] kvm: Change kvm_cpu_exec() argument to CPUState
Signed-off-by: Andreas Färber afaer...@suse.de --- cpus.c | 2 +- include/sysemu/kvm.h | 2 +- kvm-all.c| 3 +-- kvm-stub.c | 4 ++-- 4 files changed, 5 insertions(+), 6 deletions(-) diff --git a/cpus.c b/cpus.c index bbaf13c..47ab818 100644 --- a/cpus.c +++ b/cpus.c @@ -752,7 +752,7 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) while (1) { if (cpu_can_run(cpu)) { -r = kvm_cpu_exec(env); +r = kvm_cpu_exec(cpu); if (r == EXCP_DEBUG) { cpu_handle_guest_debug(env); } diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 5adb044..fe8bc40 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -147,9 +147,9 @@ int kvm_has_gsi_routing(void); int kvm_has_intx_set_mask(void); int kvm_init_vcpu(CPUState *cpu); +int kvm_cpu_exec(CPUState *cpu); #ifdef NEED_CPU_H -int kvm_cpu_exec(CPUArchState *env); #if !defined(CONFIG_USER_ONLY) void *kvm_ram_alloc(ram_addr_t size); diff --git a/kvm-all.c b/kvm-all.c index 1675311..90b89cd 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1602,9 +1602,8 @@ void kvm_cpu_synchronize_post_init(CPUState *cpu) cpu-kvm_vcpu_dirty = false; } -int kvm_cpu_exec(CPUArchState *env) +int kvm_cpu_exec(CPUState *cpu) { -CPUState *cpu = ENV_GET_CPU(env); struct kvm_run *run = cpu-kvm_run; int ret, run_ret; diff --git a/kvm-stub.c b/kvm-stub.c index 50af700..5457fe8 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -54,9 +54,9 @@ void kvm_cpu_synchronize_post_init(CPUState *cpu) { } -int kvm_cpu_exec(CPUArchState *env) +int kvm_cpu_exec(CPUState *cpu) { -abort (); +abort(); } int kvm_has_sync_mmu(void) -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 09/29] cpu: Turn cpu_dump_{state,statistics}() into CPUState hooks
Make cpustats monitor command available unconditionally. Signed-off-by: Andreas Färber afaer...@suse.de --- bsd-user/main.c | 3 ++- cpus.c| 2 +- exec.c| 3 ++- include/exec/cpu-all.h| 10 -- include/qemu/log.h| 2 +- include/qom/cpu.h | 42 ++ kvm-all.c | 4 ++-- linux-user/main.c | 38 +++--- monitor.c | 13 ++--- qom/cpu.c | 22 +- stubs/cpus.c | 1 + target-alpha/cpu-qom.h| 2 ++ target-alpha/cpu.c| 1 + target-alpha/helper.c | 6 -- target-arm/arm-semi.c | 3 ++- target-arm/cpu-qom.h | 3 +++ target-arm/cpu.c | 1 + target-arm/translate.c| 6 -- target-cris/cpu-qom.h | 3 +++ target-cris/cpu.c | 1 + target-cris/helper.c | 4 +++- target-cris/translate.c | 6 -- target-i386/cpu-qom.h | 3 +++ target-i386/cpu.c | 1 + target-i386/helper.c | 7 --- target-lm32/cpu-qom.h | 2 ++ target-lm32/cpu.c | 1 + target-lm32/translate.c | 6 -- target-m68k/cpu-qom.h | 2 ++ target-m68k/cpu.c | 1 + target-m68k/translate.c | 6 -- target-microblaze/cpu-qom.h | 2 ++ target-microblaze/cpu.c | 1 + target-microblaze/helper.c| 4 +++- target-microblaze/translate.c | 6 -- target-mips/cpu-qom.h | 2 ++ target-mips/cpu.c | 1 + target-mips/translate.c | 6 -- target-moxie/cpu.c| 3 ++- target-moxie/cpu.h| 2 ++ target-moxie/helper.c | 4 +++- target-moxie/translate.c | 6 -- target-openrisc/cpu.c | 1 + target-openrisc/cpu.h | 2 ++ target-openrisc/translate.c | 12 +++- target-ppc/cpu-qom.h | 4 target-ppc/translate.c| 15 +-- target-ppc/translate_init.c | 2 ++ target-s390x/cpu-qom.h| 2 ++ target-s390x/cpu.c| 1 + target-s390x/translate.c | 6 -- target-sh4/cpu-qom.h | 2 ++ target-sh4/cpu.c | 1 + target-sh4/translate.c| 7 --- target-sparc/cpu-qom.h| 2 ++ target-sparc/cpu.c| 7 +-- target-unicore32/cpu-qom.h| 2 ++ target-unicore32/cpu.c| 1 + target-unicore32/translate.c | 6 -- target-xtensa/cpu-qom.h | 2 ++ target-xtensa/cpu.c | 1 + target-xtensa/op_helper.c | 4 +++- target-xtensa/translate.c | 6 -- 63 files changed, 242 insertions(+), 86 deletions(-) diff --git a/bsd-user/main.c b/bsd-user/main.c index 0da3ab9..b13803e 100644 --- a/bsd-user/main.c +++ b/bsd-user/main.c @@ -511,6 +511,7 @@ static void flush_windows(CPUSPARCState *env) void cpu_loop(CPUSPARCState *env) { +CPUState *cs = CPU(sparc_env_get_cpu(env)); int trapnr, ret, syscall_nr; //target_siginfo_t info; @@ -659,7 +660,7 @@ void cpu_loop(CPUSPARCState *env) badtrap: #endif printf (Unhandled trap: 0x%x\n, trapnr); -cpu_dump_state(env, stderr, fprintf, 0); +cpu_dump_state(cs, stderr, fprintf, 0); exit (1); } process_pending_signals (env); diff --git a/cpus.c b/cpus.c index a5b0e46..bbaf13c 100644 --- a/cpus.c +++ b/cpus.c @@ -397,7 +397,7 @@ void hw_error(const char *fmt, ...) for (env = first_cpu; env != NULL; env = env-next_cpu) { cpu = ENV_GET_CPU(env); fprintf(stderr, CPU #%d:\n, cpu-cpu_index); -cpu_dump_state(env, stderr, fprintf, CPU_DUMP_FPU); +cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_FPU); } va_end(ap); abort(); diff --git a/exec.c b/exec.c index c0fa5a3..2b99bb9 100644 --- a/exec.c +++ b/exec.c @@ -517,6 +517,7 @@ void cpu_single_step(CPUArchState *env, int enabled) void cpu_abort(CPUArchState *env, const char *fmt, ...) { +CPUState *cpu = ENV_GET_CPU(env); va_list ap; va_list ap2; @@ -525,7 +526,7 @@ void cpu_abort(CPUArchState *env, const char *fmt, ...) fprintf(stderr, qemu: fatal: ); vfprintf(stderr, fmt, ap); fprintf(stderr, \n); -cpu_dump_state(env, stderr, fprintf, CPU_DUMP_FPU | CPU_DUMP_CCOP); +cpu_dump_state(cpu, stderr, fprintf, CPU_DUMP_FPU | CPU_DUMP_CCOP); if (qemu_log_enabled()) { qemu_log(qemu: fatal: ); qemu_log_vprintf(fmt, ap2); diff --git a/include/exec/cpu-all.h b/include/exec/cpu-all.h index e1cc62e..35bdf85 100644 --- a/include/exec/cpu-all.h +++ b/include/exec/cpu-all.h @@ -355,16 +355,6 @@ int page_check_range(target_ulong start, target_ulong len, int flags); CPUArchState *cpu_copy(CPUArchState *env); -#define CPU_DUMP_CODE 0x0001 -#define CPU_DUMP_FPU
[PATCH qom-cpu v2 07/29] kvm: Change kvm_set_signal_mask() argument to CPUState
CPUArchState is no longer needed. Signed-off-by: Andreas Färber afaer...@suse.de --- cpus.c | 3 ++- include/sysemu/kvm.h | 2 +- kvm-all.c| 3 +-- kvm-stub.c | 2 +- 4 files changed, 5 insertions(+), 5 deletions(-) diff --git a/cpus.c b/cpus.c index fe19962..fca5e1f 100644 --- a/cpus.c +++ b/cpus.c @@ -570,6 +570,7 @@ static void dummy_signal(int sig) static void qemu_kvm_init_cpu_signals(CPUArchState *env) { +CPUState *cpu = ENV_GET_CPU(env); int r; sigset_t set; struct sigaction sigact; @@ -581,7 +582,7 @@ static void qemu_kvm_init_cpu_signals(CPUArchState *env) pthread_sigmask(SIG_BLOCK, NULL, set); sigdelset(set, SIG_IPI); sigdelset(set, SIGBUS); -r = kvm_set_signal_mask(env, set); +r = kvm_set_signal_mask(cpu, set); if (r) { fprintf(stderr, kvm_set_signal_mask: %s\n, strerror(-r)); exit(1); diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 06da2b3..5adb044 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -166,7 +166,7 @@ int kvm_remove_breakpoint(CPUArchState *current_env, target_ulong addr, void kvm_remove_all_breakpoints(CPUArchState *current_env); int kvm_update_guest_debug(CPUArchState *env, unsigned long reinject_trap); #ifndef _WIN32 -int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset); +int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset); #endif int kvm_on_sigbus_vcpu(CPUState *cpu, int code, void *addr); diff --git a/kvm-all.c b/kvm-all.c index e7202ff..88297b1 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -2034,9 +2034,8 @@ void kvm_remove_all_breakpoints(CPUArchState *current_env) } #endif /* !KVM_CAP_SET_GUEST_DEBUG */ -int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset) +int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset) { -CPUState *cpu = ENV_GET_CPU(env); struct kvm_signal_mask *sigmask; int r; diff --git a/kvm-stub.c b/kvm-stub.c index 128faf7..50af700 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -100,7 +100,7 @@ void kvm_remove_all_breakpoints(CPUArchState *current_env) } #ifndef _WIN32 -int kvm_set_signal_mask(CPUArchState *env, const sigset_t *sigset) +int kvm_set_signal_mask(CPUState *cpu, const sigset_t *sigset) { abort(); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 10/29] kvm: Change kvm_handle_internal_error() argument to CPUState
Signed-off-by: Andreas Färber afaer...@suse.de --- kvm-all.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/kvm-all.c b/kvm-all.c index 1cd4573..1675311 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1520,10 +1520,8 @@ static void kvm_handle_io(uint16_t port, void *data, int direction, int size, } } -static int kvm_handle_internal_error(CPUArchState *env, struct kvm_run *run) +static int kvm_handle_internal_error(CPUState *cpu, struct kvm_run *run) { -CPUState *cpu = ENV_GET_CPU(env); - fprintf(stderr, KVM internal error.); if (kvm_check_extension(kvm_state, KVM_CAP_INTERNAL_ERROR_DATA)) { int i; @@ -1685,7 +1683,7 @@ int kvm_cpu_exec(CPUArchState *env) ret = -1; break; case KVM_EXIT_INTERNAL_ERROR: -ret = kvm_handle_internal_error(env, run); +ret = kvm_handle_internal_error(cpu, run); break; default: DPRINTF(kvm_arch_handle_exit\n); -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH qom-cpu v2 01/29] kvm: Change kvm_cpu_synchronize_state() argument to CPUState
It no longer relies on CPUArchState since 20d695a. Reviewed-by: liguang lig.f...@cn.fujitsu.com Signed-off-by: Andreas Färber afaer...@suse.de --- hw/ppc/spapr_rtas.c | 2 +- include/sysemu/kvm.h | 4 ++-- kvm-all.c| 4 +--- kvm-stub.c | 2 +- target-i386/kvm.c| 10 +- 5 files changed, 10 insertions(+), 12 deletions(-) diff --git a/hw/ppc/spapr_rtas.c b/hw/ppc/spapr_rtas.c index f4bd3c9..42ed7dc 100644 --- a/hw/ppc/spapr_rtas.c +++ b/hw/ppc/spapr_rtas.c @@ -184,7 +184,7 @@ static void rtas_start_cpu(sPAPREnvironment *spapr, /* This will make sure qemu state is up to date with kvm, and * mark it dirty so our changes get flushed back before the * new cpu enters */ -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); env-msr = (1ULL MSR_SF) | (1ULL MSR_ME); env-nip = start; diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h index 8b19322..3e1db28 100644 --- a/include/sysemu/kvm.h +++ b/include/sysemu/kvm.h @@ -259,14 +259,14 @@ int kvm_check_extension(KVMState *s, unsigned int extension); uint32_t kvm_arch_get_supported_cpuid(KVMState *env, uint32_t function, uint32_t index, int reg); -void kvm_cpu_synchronize_state(CPUArchState *env); +void kvm_cpu_synchronize_state(CPUState *cpu); /* generic hooks - to be moved/refactored once there are more users */ static inline void cpu_synchronize_state(CPUArchState *env) { if (kvm_enabled()) { -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(ENV_GET_CPU(env)); } } diff --git a/kvm-all.c b/kvm-all.c index 405480e..e7202ff 100644 --- a/kvm-all.c +++ b/kvm-all.c @@ -1585,10 +1585,8 @@ static void do_kvm_cpu_synchronize_state(void *arg) } } -void kvm_cpu_synchronize_state(CPUArchState *env) +void kvm_cpu_synchronize_state(CPUState *cpu) { -CPUState *cpu = ENV_GET_CPU(env); - if (!cpu-kvm_vcpu_dirty) { run_on_cpu(cpu, do_kvm_cpu_synchronize_state, cpu); } diff --git a/kvm-stub.c b/kvm-stub.c index 22eaff0..128faf7 100644 --- a/kvm-stub.c +++ b/kvm-stub.c @@ -42,7 +42,7 @@ void kvm_flush_coalesced_mmio_buffer(void) { } -void kvm_cpu_synchronize_state(CPUArchState *env) +void kvm_cpu_synchronize_state(CPUState *cpu) { } diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 9ffb6ca..0b0adfd 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -1857,7 +1857,7 @@ int kvm_arch_process_async_events(CPUState *cs) cs-interrupt_request = ~CPU_INTERRUPT_MCE; -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); if (env-exception_injected == EXCP08_DBLE) { /* this means triple fault */ @@ -1888,16 +1888,16 @@ int kvm_arch_process_async_events(CPUState *cs) cs-halted = 0; } if (cs-interrupt_request CPU_INTERRUPT_INIT) { -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); do_cpu_init(cpu); } if (cs-interrupt_request CPU_INTERRUPT_SIPI) { -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); do_cpu_sipi(cpu); } if (cs-interrupt_request CPU_INTERRUPT_TPR) { cs-interrupt_request = ~CPU_INTERRUPT_TPR; -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); apic_handle_tpr_access_report(env-apic_state, env-eip, env-tpr_access_type); } @@ -2184,7 +2184,7 @@ bool kvm_arch_stop_on_emulation_error(CPUState *cs) X86CPU *cpu = X86_CPU(cs); CPUX86State *env = cpu-env; -kvm_cpu_synchronize_state(env); +kvm_cpu_synchronize_state(cs); return !(env-cr[0] CR0_PE_MASK) || ((env-segs[R_CS].selector 3) != 3); } -- 1.8.1.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This adds real mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition to/from real mode. This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs (copied from user and verified) before writing the whole list into the TCE table. This cache will be utilized more in the upcoming VFIO/IOMMU support to continue TCE list processing in the virtual mode in the case if the real mode handler failed for some reason. This adds a guest physical to host real address converter and calls the existing H_PUT_TCE handler. The converting function is going to be fully utilized by upcoming VFIO supporting patches. This also implements the KVM_CAP_PPC_MULTITCE capability, so in order to support the functionality of this patch, QEMU needs to query for this capability and set the hcall-multi-tce hypertas property only if the capability is present, otherwise there will be serious performance degradation. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org Only a few minor nits. Ben already commented on implementation details. --- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the commit message * updated doc and moved it to another section * changed capability number 2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup if put_indirect failed, instead we do not even start writing to TCE table if we cannot get TCEs from the user and they are invalid * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for the previous item) * fixed bug with failthrough for H_IPI * removed all get_user() from real mode handlers * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public) --- Documentation/virtual/kvm/api.txt | 17 ++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h | 16 +- arch/powerpc/kvm/book3s_64_vio.c| 118 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 266 +++ arch/powerpc/kvm/book3s_hv.c| 39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c | 37 - arch/powerpc/kvm/powerpc.c |3 + include/uapi/linux/kvm.h|1 + 10 files changed, 473 insertions(+), 32 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to userspace to be handled. +4.83 KVM_CAP_PPC_MULTITCE + +Capability: KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This capability tells the guest that multiple TCE entry add/remove hypercalls +handling is supported by the kernel. This significanly accelerates DMA +operations for PPC KVM guests. + +Unlike other capabilities in this section, this one does not have an ioctl. +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed to +the guest. Othwerwise it might be better for the guest to continue using H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). While this describes perfectly well what the consequences are of the patches, it does not describe properly what the CAP actually expresses. The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All other consequences are nice to document, but the semantics of the CAP are missing. We also usually try to keep KVM behavior unchanged with regards to older versions until a CAP is enabled. In this case I don't think it matters all that much, so I'm fine with declaring it as enabled by default. Please document that this is a change in behavior versus older KVM versions though. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..85d8f26 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -609,6 +609,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ #endif }; [...] [...] diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 550f592..a39039a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -568,6 +568,30 @@ int
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. Ok, please walk me through the security model you have in mind here. Basically what this ioctl does is that it creates a guest TCE table that reflects its changes into a host TCE table whenever it gets modified. So far so good. Now I don't see any checks that verify whether iommu_id is actually good to use from that user's access rights. Just because I have access to /dev/kvm I don't necessarily have access to an iommu control device. So the least I can see would be a local DoS attack where one user space program with only access to /dev/kvm can simply kill any access to another process's device by overflowing a host iommu TCE table with junk entries. There's even a certain chance of an information disclosure exploit here where a malicious user space program could get itself all network traffic DMA'd from another VM. How does this work on the host level? What is the security token to take control of a host TCE table? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] KVM: PPC: Add support for multiple-TCE hcalls
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This adds real mode handlers for the H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls for QEMU emulated devices such as IBMVIO devices or emulated PCI. These calls allow adding multiple entries (up to 512) into the TCE table in one call which saves time on transition to/from real mode. This adds a tce_tmp cache to kvm_vcpu_arch to save valid TCEs (copied from user and verified) before writing the whole list into the TCE table. This cache will be utilized more in the upcoming VFIO/IOMMU support to continue TCE list processing in the virtual mode in the case if the real mode handler failed for some reason. This adds a guest physical to host real address converter and calls the existing H_PUT_TCE handler. The converting function is going to be fully utilized by upcoming VFIO supporting patches. This also implements the KVM_CAP_PPC_MULTITCE capability, so in order to support the functionality of this patch, QEMU needs to query for this capability and set the hcall-multi-tce hypertas property only if the capability is present, otherwise there will be serious performance degradation. Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org Only a few minor nits. Ben already commented on implementation details. --- Changelog: 2013/06/05: * fixed mistype about IBMVIO in the commit message * updated doc and moved it to another section * changed capability number 2013/05/21: * added kvm_vcpu_arch::tce_tmp * removed cleanup if put_indirect failed, instead we do not even start writing to TCE table if we cannot get TCEs from the user and they are invalid * kvmppc_emulated_h_put_tce is split to kvmppc_emulated_put_tce and kvmppc_emulated_validate_tce (for the previous item) * fixed bug with failthrough for H_IPI * removed all get_user() from real mode handlers * kvmppc_lookup_pte() added (instead of making lookup_linux_pte public) --- Documentation/virtual/kvm/api.txt | 17 ++ arch/powerpc/include/asm/kvm_host.h |2 + arch/powerpc/include/asm/kvm_ppc.h | 16 +- arch/powerpc/kvm/book3s_64_vio.c| 118 ++ arch/powerpc/kvm/book3s_64_vio_hv.c | 266 +++ arch/powerpc/kvm/book3s_hv.c| 39 + arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c | 37 - arch/powerpc/kvm/powerpc.c |3 + include/uapi/linux/kvm.h|1 + 10 files changed, 473 insertions(+), 32 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 5f91eda..6c082ff 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2362,6 +2362,23 @@ calls by the guest for that service will be passed to userspace to be handled. +4.83 KVM_CAP_PPC_MULTITCE + +Capability: KVM_CAP_PPC_MULTITCE +Architectures: ppc +Type: vm + +This capability tells the guest that multiple TCE entry add/remove hypercalls +handling is supported by the kernel. This significanly accelerates DMA +operations for PPC KVM guests. + +Unlike other capabilities in this section, this one does not have an ioctl. +Instead, when the capability is present, the H_PUT_TCE_INDIRECT and +H_STUFF_TCE hypercalls are to be handled in the host kernel and not passed to +the guest. Othwerwise it might be better for the guest to continue using H_PUT_TCE +hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). While this describes perfectly well what the consequences are of the patches, it does not describe properly what the CAP actually expresses. The CAP only says this kernel is able to handle H_PUT_TCE_INDIRECT and H_STUFF_TCE hypercalls directly. All other consequences are nice to document, but the semantics of the CAP are missing. We also usually try to keep KVM behavior unchanged with regards to older versions until a CAP is enabled. In this case I don't think it matters all that much, so I'm fine with declaring it as enabled by default. Please document that this is a change in behavior versus older KVM versions though. + + 5. The kvm_run structure diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index af326cd..85d8f26 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -609,6 +609,8 @@ struct kvm_vcpu_arch { spinlock_t tbacct_lock; u64 busy_stolen; u64 busy_preempt; + + unsigned long *tce_tmp;/* TCE cache for TCE_PUT_INDIRECT hall */ #endif }; [...] [...] diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c index 550f592..a39039a 100644 --- a/arch/powerpc/kvm/book3s_hv.c +++ b/arch/powerpc/kvm/book3s_hv.c @@ -568,6 +568,30 @@ int
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On 05.06.2013, at 08:11, Alexey Kardashevskiy wrote: This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT and H_STUFF_TCE requests without passing them to QEMU, which should save time on switching to QEMU and back. Both real and virtual modes are supported - whenever the kernel fails to handle TCE request, it passes it to the virtual mode. If it the virtual mode handlers fail, then the request is passed to the user mode, for example, to QEMU. This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables in-kernel handling of IOMMU map/unmap. Tests show that this patch increases transmission speed from 220MB/s to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card). Cc: David Gibson da...@gibson.dropbear.id.au Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru Signed-off-by: Paul Mackerras pau...@samba.org --- Changes: 2013/06/05: * changed capability number * changed ioctl number * update the doc article number 2013/05/20: * removed get_user() from real mode handlers * kvm_vcpu_arch::tce_tmp usage extended. Now real mode handler puts there translated TCEs, tries realmode_get_page() on those and if it fails, it passes control over the virtual mode handler which tries to finish the request handling * kvmppc_lookup_pte() now does realmode_get_page() protected by BUSY bit on a page * The only reason to pass the request to user mode now is when the user mode did not register TCE table in the kernel, in all other cases the virtual mode handler is expected to do the job --- Documentation/virtual/kvm/api.txt | 28 + arch/powerpc/include/asm/kvm_host.h |3 + arch/powerpc/include/asm/kvm_ppc.h |2 + arch/powerpc/include/uapi/asm/kvm.h |7 ++ arch/powerpc/kvm/book3s_64_vio.c| 198 ++- arch/powerpc/kvm/book3s_64_vio_hv.c | 193 +- arch/powerpc/kvm/powerpc.c | 12 +++ include/uapi/linux/kvm.h|2 + 8 files changed, 439 insertions(+), 6 deletions(-) diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6c082ff..e962e3b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -2379,6 +2379,34 @@ the guest. Othwerwise it might be better for the guest to continue using H_PUT_T hypercall (if KVM_CAP_SPAPR_TCE or KVM_CAP_SPAPR_TCE_IOMMU are present). +4.84 KVM_CREATE_SPAPR_TCE_IOMMU + +Capability: KVM_CAP_SPAPR_TCE_IOMMU +Architectures: powerpc +Type: vm ioctl +Parameters: struct kvm_create_spapr_tce_iommu (in) +Returns: 0 on success, -1 on error + +This creates a link between IOMMU group and a hardware TCE (translation +control entry) table. This link lets the host kernel know what IOMMU +group (i.e. TCE table) to use for the LIOBN number passed with +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls. + +/* for KVM_CAP_SPAPR_TCE_IOMMU */ +struct kvm_create_spapr_tce_iommu { + __u64 liobn; + __u32 iommu_id; + __u32 flags; +}; + +No flag is supported at the moment. + +When the guest issues TCE call on a liobn for which a TCE table has been +registered, the kernel will handle it in real mode, updating the hardware +TCE table. TCE table calls for other liobns will cause a vm exit and must +be handled by userspace. Ok, please walk me through the security model you have in mind here. Basically what this ioctl does is that it creates a guest TCE table that reflects its changes into a host TCE table whenever it gets modified. So far so good. Now I don't see any checks that verify whether iommu_id is actually good to use from that user's access rights. Just because I have access to /dev/kvm I don't necessarily have access to an iommu control device. So the least I can see would be a local DoS attack where one user space program with only access to /dev/kvm can simply kill any access to another process's device by overflowing a host iommu TCE table with junk entries. There's even a certain chance of an information disclosure exploit here where a malicious user space program could get itself all network traffic DMA'd from another VM. How does this work on the host level? What is the security token to take control of a host TCE table? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Mon, 2013-06-17 at 08:39 +1000, Benjamin Herrenschmidt wrote: On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: +long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm, + struct kvm_create_spapr_tce_iommu *args) +{ + struct kvmppc_spapr_tce_table *tt = NULL; + struct iommu_group *grp; + struct iommu_table *tbl; + + /* Find an IOMMU table for the given ID */ + grp = iommu_group_get_by_id(args-iommu_id); + if (!grp) + return -ENXIO; + + tbl = iommu_group_get_iommudata(grp); + if (!tbl) + return -ENXIO; So Alex Graf pointed out here, there is a security issue here, or are we missing something ? What prevents a malicious program that has access to /dev/kvm from taking over random iommu groups (including host used ones) that way? What is the security model of that whole iommu stuff to begin with ? IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4] KVM: PPC: Add support for IOMMU in-kernel handling
On Sun, 2013-06-16 at 21:13 -0600, Alex Williamson wrote: IOMMU groups themselves don't provide security, they're accessed by interfaces like VFIO, which provide the security. Given a brief look, I agree, this looks like a possible backdoor. The typical VFIO way to handle this would be to pass a VFIO file descriptor here to prove that the process has access to the IOMMU group. This is how /dev/vfio/vfio gains the ability to setup an IOMMU domain an do mappings with the SET_CONTAINER ioctl using a group fd. Thanks, How do you envision that in the kernel ? IE. I'm in KVM code, gets that vfio fd, what do I do with it ? Basically, KVM needs to know that the user is allowed to use that iommu group. I don't think we want KVM however to call into VFIO directly right ? Cheers, Ben. -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html