Re: [BUG] Guest kernel divide error in kvm_unlock_kick
Paolo Bonzini pbonz...@redhat.com wrote: Il 11/09/2014 19:03, Chris Webb ha scritto: Paolo Bonzini pbonz...@redhat.com wrote: This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and have put the resulting trace.dat at http://cdw.me.uk/tmp/trace.dat This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having trouble persuading the -smp 4 qemu to crash as reliably under tracing. (Something timing related?) Otherwise the qemu-system-x86 command line is exactly as before. Do you by chance have CONFIG_DEBUG_RODATA set? In that case, the fix is simply not to set it. Absolutely right: my host and guest kernels do have CONFIG_DEBUG_RODATA set! Your patch to use alternatives for VMCALL vs VMMCALL definitely fixed the divide-by-zero crashes I saw. Given that I can easily use either (or both) of these solutions, is it be more efficient to turn off CONFIG_DEBUG_RODATA in the guest kernel so kvm can fix up the instructions in-place, or is using alternatives for VMCALL/VMMCALL as implemented by your patch just as good? Best wishes, Chris.-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Guest kernel divide error in kvm_unlock_kick
Paolo Bonzini pbonz...@redhat.com wrote: This is a hypercall that should have kicked VCPU 3 (see rcx). Can you please apply this patch and gather a trace of the host (using trace-cmd -e kvm qemu-kvm arguments)? Sure, no problem. I've built the trace-cmd tool against udis86 (I hope) and have put the resulting trace.dat at http://cdw.me.uk/tmp/trace.dat This is actually for a -smp 2 qemu (failing to kick VCPU 1?) as I was having trouble persuading the -smp 4 qemu to crash as reliably under tracing. (Something timing related?) Otherwise the qemu-system-x86 command line is exactly as before. The guest kernel crash message which corresponds to this trace was: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 618 Comm: mkdir Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c997080 ti: 88007c614000 task.ti: 88007c614000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: 0018:88007c617d40 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0001 RDX: 0001 RSI: 88007fd11c40 RDI: RBP: 88007fd11c40 R08: 81b98940 R09: 0001 R10: R11: 0007 R12: 00f6 R13: 0001 R14: 0001 R15: 00011c40 FS: 7f43eb1ed700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7f43eace0a30 CR3: 01a12000 CR4: 000406f0 Stack: 88007c994380 88007c9949aa 0046 81689715 810f3174 0001 ea0001f16320 ea0001f17860 88007c99e1e8 88007c997080 0001 Call Trace: [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 RIP [81037fe2] kvm_unlock_kick+0x72/0x80 RSP 88007c617d40 ---[ end trace bf5a4445f9decdbb ]--- Fixing recursive fault but reboot is needed! BUG: scheduling while atomic: mkdir/618/0x0006 Modules linked in: CPU: 0 PID: 618 Comm: mkdir Tainted: G D 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 c022d302 81684029 810ee956 81686266 00011c40 88007c617fd8 00011c40 88007c997080 0006 0046 Call Trace: [81684029] ? dump_stack+0x49/0x6a [810ee956] ? __schedule_bug+0x46/0x60 [81686266] ? __schedule+0x5a6/0x7c0 [816828cd] ? printk+0x59/0x75 [810cb33b] ? do_exit+0x85b/0xab0 [816828cd] ? printk+0x59/0x75 [8100614a] ? oops_end+0x7a/0x100 [810033e5] ? do_error_trap+0x85/0x110 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [8114a358] ? __alloc_pages_nodemask+0x108/0xa60 [8168b57e] ? divide_error+0x1e/0x30 [81037fe2] ? kvm_unlock_kick+0x72/0x80 [81689715] ? _raw_spin_unlock+0x45/0x70 [810f3174] ? try_to_wake_up+0x2a4/0x330 [81101e2c] ? __wake_up_common+0x4c/0x80 [81102418] ? __wake_up_sync_key+0x38/0x60 [810d873a] ? do_notify_parent+0x19a/0x280 [810f4d56] ? sched_move_task+0xb6/0x190 [810cb4fc] ? do_exit+0xa1c/0xab0 [810cc344] ? do_group_exit+0x34/0xb0 [810cc3cb] ? SyS_exit_group+0xb/0x10 [8168a16d] ? system_call_fastpath+0x1a/0x1f Best wishes, Chris.-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG] Guest kernel divide error in kvm_unlock_kick
I've reported this bug before, which reliably crashes a guest kernel shortly after boot, but have just reconfirmed that it is still present with Linux 3.16.2 guest and host kernels and Qemu 2.1. Running a 3.16.2 x86-64 SMP guest kernel on qemu-2.1, with kvm enabled and -cpu host on a 3.16.2 AMD Opteron host, I'm seeing a reliable kernel panic from the guest shortly after boot. I think is happening in kvm_unlock_kick() in the paravirt_ops code: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 743 Comm: syslogd Not tainted 3.16.2-guest #2 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014 task: 88007c972580 ti: 88007cb7c000 task.ti: 88007cb7c000 RIP: 0010:[81037fe2] [81037fe2] kvm_unlock_kick+0x72/0x80 RSP: :88007fc03ec8 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0003 RDX: 0003 RSI: 81a466a0 RDI: RBP: 81a466a0 R08: 81b98940 R09: 0246 R10: 0400 R11: R12: 00ea R13: 0009 R14: 0002 R15: 88007fc0d300 FS: 7f2a6473e700() GS:88007fc0() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 004a8240 CR3: 7ac75000 CR4: 000406f0 Stack: 81a46400 0246 0001 8168979d 0282 81110d97 0007 88007cb7ffd8 88007c972580 4b0782e8 0002 81a0b0c8 Call Trace: IRQ [8168979d] ? _raw_spin_unlock_irqrestore+0x5d/0x80 [81110d97] ? rcu_process_callbacks+0x337/0x4f0 [810cde2d] ? __do_softirq+0xfd/0x210 [810ce06e] ? irq_exit+0x7e/0xa0 [8103063b] ? smp_apic_timer_interrupt+0x3b/0x50 [8168b04d] ? apic_timer_interrupt+0x6d/0x80 EOI [8114180b] ? filemap_map_pages+0x17b/0x240 [811418c0] ? filemap_map_pages+0x230/0x240 [811679e2] ? do_read_fault.isra.70+0x2a2/0x320 [811696cc] ? handle_mm_fault+0x37c/0xd00 [8103bb45] ? __do_page_fault+0x185/0x4c0 [8168b958] ? async_page_fault+0x28/0x30 [813b9610] ? __put_user_4+0x20/0x30 [8168b958] ? async_page_fault+0x28/0x30 Code: c0 ca a7 81 48 8d 04 0b 48 8b 30 48 39 ee 75 c9 0f b6 40 08 44 38 e0 75 c0 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 0f 1f 00 5b 5d 41 5c c3 0f 1f 00 48 c7 c0 10 cf 00 00 RIP [81037fe2] kvm_unlock_kick+0x72/0x80 RSP 88007fc03ec8 ---[ end trace be08885ac2c94c6a ]--- Kernel panic - not syncing: Fatal exception in interrupt My host kernel config is http://cdw.me.uk/tmp/host-config.txt and the guest config is http://cdw.me.uk/tmp/guest-config.txt with qemu command line: qemu-system-x86 -enable-kvm -cpu host -machine q35 -m 2048 -name $1 \ -smp sockets=1,cores=4 -pidfile /run/$1.pid -runas nobody \ -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \ -append console=ttyS0 root=/dev/vda \ -drive file=/dev/guest/$1,cache=none,format=raw,if=virtio \ -device virtio-rng-pci \ -device virtio-net-pci,netdev=nic,mac=$( /sys/class/net/$1/address) \ -netdev tap,id=nic,fd=3 3/dev/tap$( /sys/class/net/$1/ifindex) I can stop this crash by disabling CONFIG_PARAVIRT_SPINLOCKS in my guest kernel, running with -cpu qemu64 instead of -cpu host, or running with -smp 1 instead of -smp 4. (Removing/changing the -machine q35 makes no difference.) /proc/cpuinfo on the host has 8 of these: processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6328 stepping: 0 microcode : 0x600081c cpu MHz : 3200.000 cache size : 2048 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 32 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips: 6399.70 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro and on the guest, has 4 of these: processor : 0 vendor_id : AuthenticAMD cpu family :
Re: Divide error in kvm_unlock_kick()
I see kernel 3.15 is now out, so I retested with 3.15 guest and host. I'm still getting exactly the same guest kernel panic: a divide error in kvm_unlock_kick with -cpu host, but not with -cpu qemu64: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 781 Comm: mkdir Not tainted 3.15.0-guest #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 task: 88007cbf6180 ti: 88088000 task.ti: 88088000 RIP: 0010:[8102d1e0] [8102d1e0] kvm_unlock_kick+0x63/0x6b RSP: :88007fc83d38 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0002 RDX: 0002 RSI: 88007fd11d80 RDI: 81994840 RBP: 88007fd11d80 R08: R09: 81994840 R10: 88007c480c88 R11: 0005 R12: cec0 R13: 88007d38332a R14: 0002 R15: 88007d382d00 FS: 7fdabf7fd700() GS:88007fc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fd0643f6509 CR3: 7c028000 CR4: 000406e0 Stack: 00011d80 0002 88007fd11d80 8156f83f 810dba53 0046 88007fd0 88007d3bbe70 81845da8 0003 Call Trace: IRQ [8156f83f] ? _raw_spin_unlock+0x32/0x55 [810dba53] ? try_to_wake_up+0x1ed/0x20f [810e78b8] ? autoremove_wake_function+0x9/0x2a [810e739a] ? __wake_up_common+0x47/0x73 [810e7547] ? __wake_up+0x33/0x44 [8110f10b] ? irq_work_run+0x72/0x8f [81006079] ? smp_irq_work_interrupt+0x26/0x2b [8157185d] ? irq_work_interrupt+0x6d/0x80 [810dba64] ? try_to_wake_up+0x1fe/0x20f [8102ad01] ? native_apic_msr_read+0x6/0x4e [8156f89f] ? _raw_spin_unlock_irqrestore+0x3d/0x65 [810f2de3] ? rcu_process_callbacks+0x15e/0x47d [810cccf3] ? execute_in_process_context+0x55/0x55 [810bdb98] ? __do_softirq+0xe0/0x1e6 [810bde23] ? irq_exit+0x3c/0x81 [810270e4] ? smp_apic_timer_interrupt+0x3b/0x46 [8157135d] ? apic_timer_interrupt+0x6d/0x80 EOI Code: 0c c5 c0 b8 87 81 49 8d 04 0c 48 8b 30 48 39 ee 75 ca 8a 40 08 38 d8 75 c3 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 5b 5d 41 5c c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a 00 00 RIP [8102d1e0] kvm_unlock_kick+0x63/0x6b RSP 88007fc83d38 ---[ end trace 949b1bf47cc57d09 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) ---[ end Kernel panic - not syncing: Fatal exception in interrupt I'm at a complete loss as to what to do next to debug this. Any help would be extremely gratefully received! I've put 3.15 host and guest configs here: http://cdw.me.uk/tmp/3.15-guest-config.txt http://cdw.me.uk/tmp/3.15-host-config.txt dmesg just after boot here: http://cdw.me.uk/tmp/3.15-guest-dmesg.txt http://cdw.me.uk/tmp/3.15-host-dmesg.txt and /proc/cpuinfo from both host and guest here: http://cdw.me.uk/tmp/3.15-guest-cpuinfo.txt http://cdw.me.uk/tmp/3.15-host-cpuinfo.txt The qemu command line was qemu-system-x86 -enable-kvm -cpu host -machine q35 -m 2048 -name omega \ -smp sockets=1,cores=4 -pidfile /run/omega.pid -runas nobody \ -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \ -append console=ttyS0 root=/dev/vda \ -drive file=/dev/guest/omega,cache=none,format=raw,if=virtio \ -device virtio-rng-pci \ -device virtio-net-pci,netdev=nic,mac=02:14:72:3c:69:54 \ -netdev tap,id=nic,fd=3,vhost=on 3/dev/tapNNN but removing the -machine q35 and -device virtio-rng-pci doesn't affect the crash. Dropping to -smp 1, running with -cpu qemu64, or compiling the guest kernel without paravirtualised spinlock support does remove the panic, albeit at the cost of performance. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Divide error in kvm_unlock_kick()
I realised my original bug report was for a guest kernel compiled without frame pointers which might be unhelpful, so I enabled CONFIG_DEBUG_INFO and CONFIG_FRAME_POINTER, but I don't think this has made the backtrace any more detailed. Is there anything more I can do to pinpoint what might be going on here? Cheers, Chris. divide error: [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 1013 Comm: mkdir Not tainted 3.14.4-guest #21 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 task: 88007c8cf400 ti: 88007c7c6000 task.ti: 88007c7c6000 RIP: 0010:[8102ea86] [8102ea86] kvm_unlock_kick+0x69/0x73 RSP: :88007fc83ca8 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0002 RDX: 0002 RSI: 88007fd11d40 RDI: 8198f840 RBP: 88007fc83cc0 R08: R09: 8198f840 R10: b5e0 R11: 0005 R12: 88007fd11d40 R13: cec0 R14: 88007d382b80 R15: 0002 FS: 7f4c6e265700() GS:88007fc8() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f4c6dc9a080 CR3: 7c62e000 CR4: 000406e0 Stack: 00011d40 88007fd11d40 0002 88007fc83cd0 815852d0 88007fc83d20 810dd694 88007fd0 0046 88007d383172 88007d3abe68 0003 Call Trace: IRQ [815852d0] _raw_spin_unlock+0x36/0x5b [810dd694] try_to_wake_up+0x1f4/0x217 [810dd6f6] default_wake_function+0xd/0xf [810e99f0] autoremove_wake_function+0xd/0x2f [810e944f] __wake_up_common+0x50/0x7c [810e962f] __wake_up+0x34/0x46 [810f3b45] rsp_wakeup+0x1c/0x1e [81112e31] irq_work_run+0x77/0x9b [810063e2] smp_irq_work_interrupt+0x2a/0x31 [8158739d] irq_work_interrupt+0x6d/0x80 [81585336] ? _raw_spin_unlock_irqrestore+0x41/0x6a [810f5402] rcu_process_callbacks+0x162/0x486 [810c4140] ? run_timer_softirq+0x19f/0x1c0 [810be612] __do_softirq+0xe1/0x1e9 [810be8b7] irq_exit+0x40/0x87 [810283f1] smp_apic_timer_interrupt+0x3f/0x4b [81586e9d] apic_timer_interrupt+0x6d/0x80 EOI Code: c5 40 50 87 81 49 8d 44 0d 00 48 8b 30 4c 39 e6 75 c9 8a 40 08 38 d8 75 c2 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 5b 41 5c 41 5d 5d c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a RIP [8102ea86] kvm_unlock_kick+0x69/0x73 RSP 88007fc83ca8 ---[ end trace ed563ea2dedc59b5 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Divide error in kvm_unlock_kick()
Chris Webb ch...@arachsys.com wrote: My CPU flags inside the crashing guest look like this: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1 whereas in a (working) -cpu qemu64 guest, they look like this: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm nopl pni cx16 x2apic popcnt hypervisor lahf_lm cmp_legacy svm abm sse4a I thought I'd try to bisect on processor flags to see which was/were implicated. The extra flags from -cpu host compared to -cpu qemu64 are: 3dnowprefetch aes arat avx bmi1 cr8_legacy extd_apicid f16c fma fma4 fxsr_opt misalignsse mmxext npt nrip_save osvw pclmulqdq pdpe1gb rep_good sse4_1 sse4_2 ssse3 tbm tsc_adjust vme xop xsave I can add all of these to -cpu qemu64 with the +FLAG,... syntax and obtain a working guest, but qemu doesn't recognise a handful of them: CPU feature tsc_adjust not found CPU feature arat not found CPU feature cr8_legacy not found CPU feature extd_apicid not found CPU feature rep_good not found CPU feature tsc_adjust not found Failed to access perfctr msr (MSR c0010001 is ) [...] Doing this results in a working, non-crashing guest, which suggests the behaviour is triggered by one of tsc_adjust, arat, cr8_legacy, extd_apicid or rep_good. However, because qemu doesn't recognise the flags, I can't run with -cpu host,-tsc_adjust,-arat,... to investigate further. :( Cheers, Chris.-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Divide error in kvm_unlock_kick()
Paolo Bonzini pbonz...@redhat.com wrote: Il 29/05/2014 19:45, Chris Webb ha scritto: Chris Webb ch...@arachsys.com wrote: My CPU flags inside the crashing guest look like this: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1 whereas in a (working) -cpu qemu64 guest, they look like this: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm nopl pni cx16 x2apic popcnt hypervisor lahf_lm cmp_legacy svm abm sse4a I thought I'd try to bisect on processor flags to see which was/were implicated. Can you dump the full /proc/cpuinfo? On the host, it looks like this: processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6328 stepping: 0 microcode : 0x600081c cpu MHz : 3200.000 cache size : 2048 KB physical id : 0 siblings: 8 core id : 0 cpu cores : 4 apicid : 32 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid amd_dcm aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips: 6399.89 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro [ x8 for processor 0 - 7; full dump at http://cdw.me.uk/tmp/host-cpuinfo.txt ] and on the guest it looks like: processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron(tm) Processor 6328 stepping: 0 microcode : 0x165 cpu MHz : 3199.946 cache size : 2048 KB physical id : 0 siblings: 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1 bogomips: 6399.89 TLB size: 1536 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: [ x4 for processor 0 - 3; full dump at http://cdw.me.uk/tmp/guest-cpuinfo.txt ] Many thanks in advance for any pointers. Best wishes, Chris.-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Divide error in kvm_unlock_kick()
Running a 3.14.4 x86-64 SMP guest kernel on qemu-2.0, with kvm enabled and -cpu host on a 3.14.4 AMD Opteron host, I'm seeing a reliable kernel panic from the guest shortly after boot. I think is happening in kvm_unlock_kick() in the paravirt_ops code: divide error: [#1] PREEMPT SMP Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.14.4-guest #16 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS Bochs 01/01/2011 task: 88007d384880 ti: 88007d3b2000 task.ti: 88007d3b2000 RIP: 0010:[8102f0cc] [8102f0cc] kvm_unlock_kick+0x63/0x6b RSP: 0018:88007fc83db0 EFLAGS: 00010046 RAX: 0005 RBX: RCX: 0003 RDX: 0003 RSI: 88007fd91d40 RDI: 0008 RBP: 88007fd91d40 R08: R09: 8198e840 R10: 88007cbc7400 R11: 88007cbc9d00 R12: cec0 R13: 0001 R14: 88007fd91d40 R15: 0001 FS: 7ff42a4d3700() GS:88007fc8() knlGS: CS: 0010 DS: ES: CR0: 8005003b CR2: 7ff42a290006 CR3: 7c76d000 CR4: 000406e0 Stack: 88007fd11d40 88007d361cc0 88007fc8d240 81563990 810e42a6 00038102fa73 0282 88007fd12668 88007fc83ecc 00ff 006b Call Trace: IRQ [81563990] ? _raw_spin_unlock+0x57/0x61 [810e42a6] ? load_balance+0x4ff/0x783 [810e4681] ? rebalance_domains+0x157/0x20c [810e4841] ? run_rebalance_domains+0x10b/0x148 [810be7c1] ? __do_softirq+0xec/0x1fe [810beacc] ? irq_exit+0x48/0x8d [815658dd] ? reschedule_interrupt+0x6d/0x80 EOI [8100a842] ? hard_enable_TSC+0x2e/0x2e [8102fbe1] ? native_safe_halt+0x2/0x3 [8100a853] ? default_idle+0x11/0x14 [810ed4e7] ? cpu_startup_entry+0x153/0x1d2 [810277ad] ? start_secondary+0x220/0x23c Code: 0c c5 40 50 87 81 49 8d 04 0c 48 8b 30 48 39 ee 75 ca 8a 40 08 38 d8 75 c3 48 c7 c0 22 b0 00 00 31 db 0f b7 0c 08 b8 05 00 00 00 0f 01 c1 5b 5d 41 5c c3 4c 8d 54 24 08 48 83 e4 f0 b9 0a 00 00 RIP [8102f0cc] kvm_unlock_kick+0x63/0x6b RSP 88007fc83db0 ---[ end trace 2278d9742b4dff74 ]--- Kernel panic - not syncing: Fatal exception in interrupt Shutting down cpus with NMI Kernel Offset: 0x0 from 0x8100 (relocation range: 0x8000-0x9fff) My host kernel config is http://cdw.me.uk/tmp/host-config.txt and the guest config is http://cdw.me.uk/tmp/guest-config.txt with qemu command line: qemu-system-x86 -enable-kvm -cpu qemu64 -machine q35 -m 2048 -name $1 \ -smp sockets=1,cores=4 -pidfile /run/$1.pid -runas nobody \ -serial stdio -vga none -vnc none -kernel /boot/vmlinuz-guest \ -append console=ttyS0 root=/dev/vda \ -drive file=/dev/guest/$1,cache=none,format=raw,if=virtio \ -device virtio-net-pci,netdev=nic,mac=$( /sys/class/net/$1/address) \ -netdev tap,id=nic,fd=3 3/dev/tap$( /sys/class/net/$1/ifindex) I can stop this crash by disabling CONFIG_PARAVIRT_SPINLOCKS in my guest kernel, running with -cpu qemu64 instead of -cpu host, or running with -smp 1 instead of -smp 4. (Removing/changing the -machine q35 makes no difference.) My CPU flags inside the crashing guest look like this: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic popcnt aes xsave avx f16c hypervisor lahf_lm cmp_legacy svm cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw xop fma4 tbm arat npt nrip_save tsc_adjust bmi1 whereas in a (working) -cpu qemu64 guest, they look like this: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx lm nopl pni cx16 x2apic popcnt hypervisor lahf_lm cmp_legacy svm abm sse4a I tried enabling CONFIG_PARAVIRT_DEBUG, but no extra information was reported. Very happy to do any testing at my end which might help track down what's going on here. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm guest which won't 'cont' (emulation failure?)
I have a qemu-kvm guest (apparently a Ubuntu 11.04 x86-64 install) which has stopped and refuses to continue: (qemu) info status VM status: paused (qemu) cont (qemu) info status VM status: paused The host is running linux 2.6.39.2 with qemu-kvm 0.14.1 on 24-core Opteron 6176 box, and has nine other 2GB production guests on it running absolutely fine. It's been a while since I've seen one of these. When I last saw a cluster of them, they were emulation failures (big real mode instructions, maybe?). I also remember a message about abnormal exit in the dmesg previously, but I don't have that here. This time, there is no host kernel output at all, just the paused guest. I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: Am 24.10.2011 12:00, schrieb Chris Webb: I have qemu monitor access and can even strace the relevant qemu process if necessary: is it possible to use this to diagnose what's caused this guest to stop, e.g. the unsupported instruction if it's an emulation failure? Another common cause for stopped VMs are I/O errors, for example writes to a sparse image when the disk is full. This guest are backed by LVM LVs so I don't think they can return EFULL, but I could imagine read errors, so I've just done a trivial test to make sure I can read them end-to-end: 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:0 of=/dev/null bs=1M 3136+0 records in 3136+0 records out 3288334336 bytes (3.3 GB) copied, 20.898 s, 157 MB/s 0015# dd if=/dev/mapper/guest\:e549f8e1-4c0e-4dea-826a-e4b877282c07\:ide\:0\:1 of=/dev/null bs=1M 276+0 records in 276+0 records out 289406976 bytes (289 MB) copied, 1.85218 s, 156 MB/s Is there any way to ask qemu why a guest has stopped, so I can distinguish IO problems from emulation problems from anything else? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: In qemu 1.0 we'll have an extended 'info status' that includes the stop reason, but 0.14 doesn't have this yet (was committed to git master only recently). Right, okay. I might take a look at cherry-picking and back-porting that to our version of qemu-kvm if it's not too entangled with other changes. It would be very useful in these situations. If you attach a QMP monitor (see QMP/README, don't forget to send the capabilities command, it's part of creating the connection) you will receive messages for I/O errors, though. Thanks. I don't think I can do this with an already-running qemu-kvm that's in a stopped state can I, only with a new qemu-kvm invocation and wait to try to catch the problem again? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] qemu-kvm guest which won't 'cont' (emulation failure?)
Kevin Wolf kw...@redhat.com writes: Good point... The only other thing that I can think of would be attaching gdb and setting a breakpoint in vm_stop() or something. Perfect, that seems to identified what's going on very nicely: (gdb) break vm_stop Breakpoint 1 at 0x407d10: file /home/root/packages/qemu-kvm/src-UMBurO/cpus.c, line 318. (gdb) fg Continuing. Breakpoint 1, vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318 318 /home/root/packages/qemu-kvm/src-UMBurO/cpus.c: No such file or directory. in /home/root/packages/qemu-kvm/src-UMBurO/cpus.c (gdb) bt #0 vm_stop (reason=0) at /home/root/packages/qemu-kvm/src-UMBurO/cpus.c:318 #1 0x0058585f in ide_handle_rw_error (s=0x20330d8, error=28, op=8) at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:468 #2 0x00588376 in ide_dma_cb (opaque=0x20330d8, ret=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/hw/ide/core.c:494 #3 0x00590092 in dma_bdrv_cb (opaque=0x2043a10, ret=-28) at /home/root/packages/qemu-kvm/src-UMBurO/dma-helpers.c:94 #4 0x0044d64a in qcow2_aio_write_cb (opaque=0x2034900, ret=-28) at block/qcow2.c:714 #5 0x0043df6d in posix_aio_process_queue ( opaque=value optimized out) at posix-aio-compat.c:462 #6 0x0043e07d in posix_aio_read (opaque=0x17c8110) at posix-aio-compat.c:503 #7 0x00415fca in main_loop_wait (nonblocking=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1383 #8 0x0042ca37 in kvm_main_loop () at /home/root/packages/qemu-kvm/src-UMBurO/qemu-kvm.c:1589 #9 0x004170a3 in main (argc=32, argv=value optimized out, envp=value optimized out) at /home/root/packages/qemu-kvm/src-UMBurO/vl.c:1429 I see what's happened here: we're not explicitly setting format=raw when we start that guest and someone's uploaded a qcow2 image directly to a block device. Ouch. Sorry for the noise! Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Host where KSM appears to save a negative amount of memory
Hugh Dickins hu...@google.com writes: KSM chooses to show the numbers pages_shared and pages_sharing as exclusive counts: pages_sharing indicates the saving being made. So it would be perfectly reasonable to add those two numbers together to get the total number of pages sharing, the number you expected it to show; but it doesn't make sense to subtract shared from sharing. Hi. Many thanks for your helpful and detailed explanation. I've fixed our monitoring to correctly use just pages_sharing to measure the savings. I think I just assumed the meanings of pages_shared and pages_sharing from their names. This means that ksm has been saving even more memory than we thought on our hosts in the past! Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Host where KSM appears to save a negative amount of memory
We're running KSM on kernel 2.6.39.2 with hosts running a number qemu-kvm virtual machines, and it has consistently been saving us a useful amount of RAM. To monitor the effective amount of memory saved, I've been looking at the difference between /sys/kernel/mm/ksm/pages_sharing and pages_shared. On a typical 32GB host, this has been coming out as at least a hundred thousand or so, which is presumably half to one gigabyte worth of 4k pages. However, this morning we've spotted something odd - a host where pages_sharing is smaller than pages_shared, giving a negative saving by the above calculation: # cat /sys/kernel/mm/ksm/pages_sharing 104 # cat /sys/kernel/mm/ksm/pages_shared 1761313 I think this means my interpretation of these values must be wrong, as I presumably can't have more pages being shared than instances of their use! Can anyone shed any light on what might be going on here for me? Am I misinterpreting these values, or does this look like it might be an accounting bug? (If the latter, what useful debug info can I extract from the system to help identify it?) Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
High CPU use of -usbdevice tablet (was Re: KVM usability)
Avi Kivity a...@redhat.com writes: On 03/02/2010 11:34 AM, Jernej Simončič wrote: On Tuesday, March 2, 2010, 9:21:18, Chris Webb wrote: I remember about a year ago, someone asserting on the list that -usbdevice tablet was very CPU intensive even when not in use, and should be avoided if mouse support wasn't needed, e.g. on non-graphical VMs. Was that actually a significant hit, and is it still true today? It would appear that this is still the case, at least on slower hosts - on Atom Z530 (1,6GHz), the XP VM uses ~30% CPU when idle with -usbdevice tablet, but only ~4% without it. However, on a faster host (Core2 Quad 2,66GHz), there's practically no difference (Vista x64 VM uses ~1% CPU when idle regardless of -usbdevice tablet). Looks like the tablet is set to 100 Hz polling rate. We may be able to get away with 30 Hz or even less (ep_bInterval, in ms, in hw/usb-wacom.c). Hi Avi. Sorry for the very late follow-up, but I decided to experiment with this. The cpu impact of the usb tablet device shows up fairly clearly on a crude test on my (relatively low-spec) desktop. Running an idle Fedora 11 livecd on qemu-kvm 0.12.3, top shows around 0.1% of my cpu in use, but this increases to roughly 5% when specifying -usbdevice tablet, and more detailed examination with perf record/report suggests about a factor of thirty too. It's actually a more general symptom with USB or at least HID devices by the look of things: although -usb doesn't increase CPU use on its own, the same increase in load can also be triggered by -usbdevice keyboard or mouse. However, running with all three of -usbdevice mouse, keyboard and tablet doesn't increase load any more than just one of these. Changing the USB tablet polling interval from 10ms to 100ms in both hw/usb-wacom.c and hw/usb-hid.c made no difference except the an increase in bInterval shown in lsusb -v in the guest and the hint of jerky mouse movement I expected from setting this value so high. A similar change to the polling interval for the keyboard and mouse also made no difference to their performance impact. Taking the FRAME_TIMER_FREQ down to 100 in hw/usb-uhci.c does seem to reduce the CPU load quite a bit, but at the expense of making the USB tablet (and presumably all other USB devices) very laggy. Could there be some bug here that causes the usb hid devices to wake qemu at the maximum rate possible (FRAME_TIMER_FREQ?) rather than the configured polling interval? Best wishes, Chris. PS Vmmouse works fine as an absolute pointing device in the place of -usbdevice tablet without the performance impact, but this isn't supported out of the box with typical linux live CDs (e.g. Fedora 11 and 12 or Knoppix) so unfortunately it's probably less suitable as a default configuration to expose to end-users. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Chris Webb ch...@arachsys.com writes: Okay. What I was driving at in describing these systems as 'already broken' is that they will already lose data (in this sense) if they're run on bare metal with normal commodity SATA disks with their 32MB write caches on. That configuration surely describes the vast majority of PC-class desktops and servers! If I understand correctly, your point here is that the small cache on a real SATA drive gives a relatively small time window for data loss, whereas the worry with cache=writeback is that the host page cache can be gigabytes, so the time window for unsynced data to be lost is potentially enormous. Isn't the fix for that just forcing periodic sync on the host to bound-above the time window for unsynced data loss in the guest? For the benefit of the archives, it turns out the simplest fix for this is already implemented as a vm sysctl in linux. Set vm.dirty_bytes to 3220, and the size of dirty page cache is bounded above by 32MB, so we are simulating exactly the case of a SATA drive with a 32MB writeback-cache. Unless I'm missing something, the risk to guest OSes in this configuration should therefore be exactly the same as the risk from running on normal commodity hardware with such drives and no expensive battery-backed RAM. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity a...@redhat.com writes: On 03/22/2010 11:04 PM, Chris Webb wrote: Unless I'm missing something, the risk to guest OSes in this configuration should therefore be exactly the same as the risk from running on normal commodity hardware with such drives and no expensive battery-backed RAM. A host crash will destroy your data. If your machine is connected to a UPS, only a firmware crash can destroy your data. Yes, that's a good point: in this configuration a host crash is equivalent to a power failure rather than a OS crash in terms of data loss. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Anthony Liguori anth...@codemonkey.ws writes: This really gets down to your definition of safe behaviour. As it stands, if you suffer a power outage, it may lead to guest corruption. While we are correct in advertising a write-cache, write-caches are volatile and should a drive lose power, it could lead to data corruption. Enterprise disks tend to have battery backed write caches to prevent this. In the set up you're emulating, the host is acting as a giant write cache. Should your host fail, you can get data corruption. Hi Anthony. I suspected my post might spark an interesting discussion! Before considering anything like this, we did quite a bit of testing with OSes in qemu-kvm guests running filesystem-intensive work, using an ipmitool power off to kill the host. I didn't manage to corrupt any ext3, ext4 or NTFS filesystems despite these efforts. Is your claim here that:- (a) qemu doesn't emulate a disk write cache correctly; or (b) operating systems are inherently unsafe running on top of a disk with a write-cache; or (c) installations that are already broken and lose data with a physical drive with a write-cache can lose much more in this case because the write cache is much bigger? Following Christoph Hellwig's patch series from last September, I'm pretty convinced that (a) isn't true apart from the inability to disable the write-cache at run-time, which is something that neither recent linux nor windows seem to want to do out-of-the box. Given that modern SATA drives come with fairly substantial write-caches nowadays which operating systems leave on without widespread disaster, I don't really believe in (b) either, at least for the ide and scsi case. Filesystems know they have to flush the disk cache to avoid corruption. (Virtio makes the write cache invisible to the OS except in linux 2.6.32+ so I know virtio-blk has to be avoided for current windows and obsolete linux when writeback caching is on.) I can certainly imagine (c) might be the case, although when I use strace to watch the IO to the block device, I see pretty regular fdatasyncs being issued by the guests, interleaved with the writes, so I'm not sure how likely the problem would be in practice. Perhaps my test guests were unrepresentatively well-behaved. However, the potentially unlimited time-window for loss of incorrectly unsynced data is also something one could imagine fixing at the qemu level. Perhaps I should be implementing something like cache=writeback,flushtimeout=N which, upon a write being issued to the block device, starts an N second timer if it isn't already running. The timer is destroyed on flush, and if it expires before it's destroyed, a gratuitous flush is sent. Do you think this is worth doing? Just a simple 'while sleep 10; do sync; done' on the host even! We've used cache=none and cache=writethrough, and whilst performance is fine with a single guest accessing a disk, when we chop the disks up with LVM and run a even a small handful of guests, the constant seeking to serve tiny synchronous IOs leads to truly abysmal throughput---we've seen less than 700kB/s streaming write rates within guests when the backing store is capable of 100MB/s. With cache=writeback, there's still IO contention between guests, but the write granularity is a bit coarser, so the host's elevator seems to get a bit more of a chance to help us out and we can at least squeeze out 5-10MB/s from two or three concurrently running guests, getting a total of 20-30% of the performance of the underlying block device rather than a total of around 5%. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity a...@redhat.com writes: On 03/15/2010 10:23 PM, Chris Webb wrote: Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. Is this with qcow2, raw file, or direct volume access? This is with direct access to logical volumes. No file systems or qcow2 in the stack. Our typical host has a couple of SATA disks, combined in md RAID1, chopped up into volumes with LVM2 (really just dm linear targets). The performance measured outside qemu is excellent, inside qemu-kvm is fine too until multiple guests are trying to access their drives at once, but then everything starts to grind badly. I can understand it for qcow2, but for direct volume access this shouldn't happen. The guest schedules as many writes as it can, followed by a sync. The host (and disk) can then reschedule them whether they are in the writeback cache or in the block layer, and must sync in the same way once completed. I don't really understand what's going on here, but I wonder if the underlying problem might be that all the O_DIRECT/O_SYNC writes from the guests go down into the same block device at the bottom of the device mapper stack, and thus can't be reordered with respect to one another. For our purposes, Guest AA Guest BB Guest AA Guest BB Guest AA Guest BB write A1 write A1 write B1 write B1 write A2 write A1 write A2 write B1 write A2 are all equivalent, but the system isn't allowed to reorder in this way because there isn't a separate request queue for each logical volume, just the one at the bottom. (I don't know whether nested request queues would behave remotely reasonably either, though!) Also, if my guest kernel issues (say) three small writes, one at the start of the disk, one in the middle, one at the end, and then does a flush, can virtio really express this as one non-contiguous O_DIRECT write (the three components of which can be reordered by the elevator with respect to one another) rather than three distinct O_DIRECT writes which can't be permuted? Can qemu issue a write like that? cache=writeback + flush allows this to be optimised by the block layer in the normal way. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Anthony Liguori anth...@codemonkey.ws writes: On 03/17/2010 10:14 AM, Chris Webb wrote: (c) installations that are already broken and lose data with a physical drive with a write-cache can lose much more in this case because the write cache is much bigger? This is the closest to the most accurate. It basically boils down to this: most enterprises use a disks with battery backed write caches. Having the host act as a giant write cache means that you can lose data. I agree that a well behaved file system will not become corrupt, but my contention is that for many types of applications, data lose == corruption and not all file systems are well behaved. And it's certainly valid to argue about whether common filesystems are broken but from a purely pragmatic perspective, this is going to be the case. Okay. What I was driving at in describing these systems as 'already broken' is that they will already lose data (in this sense) if they're run on bare metal with normal commodity SATA disks with their 32MB write caches on. That configuration surely describes the vast majority of PC-class desktops and servers! If I understand correctly, your point here is that the small cache on a real SATA drive gives a relatively small time window for data loss, whereas the worry with cache=writeback is that the host page cache can be gigabytes, so the time window for unsynced data to be lost is potentially enormous. Isn't the fix for that just forcing periodic sync on the host to bound-above the time window for unsynced data loss in the guest? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity a...@redhat.com writes: Chris, can you carry out an experiment? Write a program that pwrite()s a byte to a file at the same location repeatedly, with the file opened using O_SYNC. Measure the write rate, and run blktrace on the host to see what the disk (/dev/sda, not the volume) sees. Should be a (write, flush, write, flush) per pwrite pattern or similar (for writing the data and a journal block, perhaps even three writes will be needed). Then scale this across multiple guests, measure and trace again. If we're lucky, the flushes will be coalesced, if not, we need to work on it. Sure, sounds like an excellent plan. I don't have a test machine at the moment as the last host I was using for this has gone into production, but I'm due to get another one to install later today or first thing tomorrow which would be ideal for doing this. I'll follow up with the results once I have them. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Vivek Goyal vgo...@redhat.com writes: Are you using CFQ in the host? What is the host kernel version? I am not sure what is the problem here but you might want to play with IO controller and put these guests in individual cgroups and see if you get better throughput even with cache=writethrough. Hi. We're using the deadline IO scheduler on 2.6.32.7. We got better performance from deadline than from cfq when we last tested, which was admittedly around the 2.6.30 timescale so is now a rather outdated measurement. If the problem is that if sync writes from different guests get intermixed resulting in more seeks, IO controller might help as these writes will now go on different group service trees and in CFQ, we try to service requests from one service tree at a time for a period before we switch the service tree. Thanks for the suggestion: I'll have a play with this. I currently use /sys/kernel/uids/N/cpu_share with one UID per guest to divide up the CPU between guests, but this could just as easily be done with a cgroup per guest if a side-effect is to provide a hint about IO independence to CFQ. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH][RF C/T/D] Unmapped page cache control - via boot parameter
Avi Kivity a...@redhat.com writes: On 03/15/2010 10:07 AM, Balbir Singh wrote: Yes, it is a virtio call away, but is the cost of paying twice in terms of memory acceptable? Usually, it isn't, which is why I recommend cache=off. Hi Avi. One observation about your recommendation for cache=none: We run hosts of VMs accessing drives backed by logical volumes carved out from md RAID1. Each host has 32GB RAM and eight cores, divided between (say) twenty virtual machines, which pretty much fill the available memory on the host. Our qemu-kvm is new enough that IDE and SCSI drives with writeback caching turned on get advertised to the guest as having a write-cache, and FLUSH gets translated to fsync() by qemu. (Consequently cache=writeback isn't acting as cache=neverflush like it would have done a year ago. I know that comparing performance for cache=none against that unsafe behaviour would be somewhat unfair!) Wasteful duplication of page cache between guest and host notwithstanding, turning on cache=writeback is a spectacular performance win for our guests. For example, even IDE with cache=writeback easily beats virtio with cache=none in most of the guest filesystem performance tests I've tried. The anecdotal feedback from clients is also very strongly in favour of cache=writeback. With a host full of cache=none guests, IO contention between guests is hugely problematic with non-stop seek from the disks to service tiny O_DIRECT writes (especially without virtio), many of which needn't have been synchronous if only there had been some way for the guest OS to tell qemu that. Running with cache=writeback seems to reduce the frequency of disk flush per guest to a much more manageable level, and to allow the host's elevator to optimise writing out across the guests in between these flushes. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix SIGFPE for vnc display of width/height = 1
Chris Webb ch...@arachsys.com writes: During boot, the screen gets resized to height 1 and a mouse click at this point will cause a division by zero when calculating the absolute pointer position from the pixel (x, y). Return a click in the middle of the screen instead in this case. I think this probably ought to be a candidate for 0.12-stable too. We're seeing these crashes for real from time-to-time so it's not just a theoretical problem. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: Another VNC crash, qemu-kvm-0.12.3
Alexander Graf ag...@suse.de writes: On 05.03.2010, at 17:52, Chris Webb wrote: Of course, if the screen width or height is 1, it doesn't really matter what the value of the mouse position for the click is, so something as simple as diff --git a/vnc.c b/vnc.c --- a/vnc.c +++ b/vnc.c @@ -1421,8 +1421,10 @@ dz = 1; if (vs-absolute) { -kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs-ds) - 1), -y * 0x7FFF / (ds_get_height(vs-ds) - 1), +kbd_mouse_event(ds_get_width(vs-ds) 1 ? + x * 0x7FFF / (ds_get_width(vs-ds) - 1) : 0x4000, +ds_get_height(vs-ds) 1 ? + y * 0x7FFF / (ds_get_height(vs-ds) - 1) : 0x4000, dz, buttons); } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) { x -= 0x7FFF; will fix the symptom: the division by zero. The underlying cause of a 9x1 display surface is a bit mysterious though. Is it? When booting the screen gets resized to something like 9x1 for a few ms. Try putting debug code in the resize callback - you'll see it. Ah, okay. In that case, this patch could well be the correct fix rather than just a work-around. I'll have a look for any other places in vnc.c that might do a similar division-by-zero for small screen sizes at the same point. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Another VNC crash, qemu-kvm-0.12.3
Anthony Liguori anth...@codemonkey.ws writes: On 03/01/2010 12:14 PM, Chris Webb wrote: We've just seen another VNC related qemu-kvm crash, this time an arithmetic exception at vnc.c:1424 in the newly release qemu-kvm 0.12.3. [...] 1423 if (vs-absolute) { 1424 kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs-ds) - 1), 1425 y * 0x7FFF / (ds_get_height(vs-ds) - 1), 1426 dz, buttons); 1427 } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) { 1428 x -= 0x7FFF; [...] and sure enough: (gdb) p vs-ds-surface-width $1 = 9 (gdb) p vs-ds-surface-height $2 = 1 What a 9x1 display surface is doing on this guest is a mystery to me, but you definitely can't divide by one less than its height! Can you reproduce this reliably? If so, what's the procedure? No, I'm afraid not, although I have had a thorough play myself with a variety of VNC clients in an attempt to reproduce. The background here is that we're running a public hosting service where customers can install and run their own OSes on their own qemu-kvm virtual machines. I don't even know what VNC client (if any) was connected at the time. I only see the core dump if the qemu-kvm crashes. Of course, if the screen width or height is 1, it doesn't really matter what the value of the mouse position for the click is, so something as simple as diff --git a/vnc.c b/vnc.c --- a/vnc.c +++ b/vnc.c @@ -1421,8 +1421,10 @@ dz = 1; if (vs-absolute) { -kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs-ds) - 1), -y * 0x7FFF / (ds_get_height(vs-ds) - 1), +kbd_mouse_event(ds_get_width(vs-ds) 1 ? + x * 0x7FFF / (ds_get_width(vs-ds) - 1) : 0x4000, +ds_get_height(vs-ds) 1 ? + y * 0x7FFF / (ds_get_height(vs-ds) - 1) : 0x4000, dz, buttons); } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) { x -= 0x7FFF; will fix the symptom: the division by zero. The underlying cause of a 9x1 display surface is a bit mysterious though. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
Dustin Kirkland kirkl...@canonical.com writes: On Mon, 2010-03-01 at 15:59 -0600, Anthony Liguori wrote: Defaulting usb to on and defaulting to a usb tablet is a reasonable thing to do IMHO. \o/ Definitely a better user experience. I remember about a year ago, someone asserting on the list that -usbdevice tablet was very CPU intensive even when not in use, and should be avoided if mouse support wasn't needed, e.g. on non-graphical VMs. Was that actually a significant hit, and is it still true today? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM usability
Ingo Molnar mi...@elte.hu writes: Yes, you are quite correct: udev has been argued to be a prime candidate for tools/. (and some other kernel utilities as well) A small, static set of userspace like klibc (only 5M unpacked!) with enough tools for rolling up in a standard initramfs would be especially nice, and vastly less difficult to import than qemu. It's a pain in the neck to have to build two versions of lots of bits of userspace: one stripped down and statically linked for initramfs and one full-featured for the main system. However, trying to avoid initramfs altogether is an increasingly losing battle these days, and for quite understandable reasons. klibc + md* + mini lvm2 (enough to activate volumes) perhaps? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Another VNC crash, qemu-kvm-0.12.3
We've just seen another VNC related qemu-kvm crash, this time an arithmetic exception at vnc.c:1424 in the newly release qemu-kvm 0.12.3. [...] 1423 if (vs-absolute) { 1424 kbd_mouse_event(x * 0x7FFF / (ds_get_width(vs-ds) - 1), 1425 y * 0x7FFF / (ds_get_height(vs-ds) - 1), 1426 dz, buttons); 1427 } else if (vnc_has_feature(vs, VNC_FEATURE_POINTER_TYPE_CHANGE)) { 1428 x -= 0x7FFF; [...] and sure enough: (gdb) p vs-ds-surface-width $1 = 9 (gdb) p vs-ds-surface-height $2 = 1 What a 9x1 display surface is doing on this guest is a mystery to me, but you definitely can't divide by one less than its height! (gdb) p *vs $3 = {csock = 19, ds = 0x1c60fa0, dirty = {{4294967295, 4294967295, 4294967295, 4294967295, 4294967295} repeats 2048 times}, vd = 0x26a0110, need_update = 1, force_update = 0, features = 67, absolute = 1, last_x = -1, last_y = -1, vnc_encoding = 5, tight_quality = 9 '\t', tight_compression = 9 '\t', major = 3, minor = 8, challenge = ¹{\177\226\200kÕjéPñÄA¤o), output = {capacity = 925115, offset = 0, buffer = 0x28ba4b0 }, input = {capacity = 5120, offset = 6, buffer = 0x28b90a0 \005}, write_pixels = 0x4bb9e0 vnc_write_pixels_generic, send_hextile_tile = 0x4bcdf0 send_hextile_tile_generic_32, clientds = {flags = 0 '\0', width = 800, height = 600, linesize = 3200, data = 0x7fcd00ab6010 , pf = { bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004', depth = 24 '\030', rmask = 0, gmask = 0, bmask = 0, amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\0', ashift = 24 '\030', rmax = 255 'ÿ', gmax = 255 'ÿ', bmax = 255 'ÿ', amax = 255 'ÿ', rbits = 8 '\b', gbits = 8 '\b', bbits = 8 '\b', abits = 8 '\b'}}, audio_cap = 0x0, as = {freq = 44100, nchannels = 2, fmt = AUD_FMT_S16, endianness = 0}, read_handler = 0x4beac0 protocol_client_msg, read_handler_expect = 6, modifiers_state = '\0' repeats 255 times, zlib = {capacity = 0, offset = 0, buffer = 0x0}, zlib_tmp = {capacity = 0, offset = 0, buffer = 0x0}, zlib_stream = {{next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}, next = 0x0} (gdb) p *vs-ds $4 = {surface = 0x1c81f40, opaque = 0x26a0110, gui_timer = 0x0, allocator = 0x8199d0, listeners = 0x1c95fa0, mouse_set = 0, cursor_define = 0, next = 0x0} (gdb) p *vs-ds-surface $5 = {flags = 2 '\002', width = 9, height = 1, linesize = 36, data = 0x7fcd00ab6010 , pf = { bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004', depth = 24 '\030', rmask = 16711680, gmask = 65280, bmask = 255, amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\0', ashift = 24 '\030', rmax = 255 'ÿ', gmax = 255 'ÿ', bmax = 255 'ÿ', amax = 255 'ÿ', rbits = 8 '\b', gbits = 8 '\b', bbits = 8 '\b', abits = 8 '\b'}} Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: qemu-kvm 0.12.2 VNC segfault
Avi Kivity a...@redhat.com writes: On 02/21/2010 07:23 PM, Chris Webb wrote: Some sort of race where a client disconnects and is removed from the client list while the vnc_refresh() loop is iterating over it, maybe? Looks like c727a05459, and high time for 0.12.3. Anthony? Ah yes, looks like this was exactly the case that commit was trying to prevent. Thanks! Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm 0.12.2 VNC segfault
I've just had a segfault from one of the qemu-kvm virtual machines we run. This is qemu-kvm 0.12.2 running with the in-kernel kvm modules on linux 2.6.32.7 on a dual quad-core Xeon E5420 machine, with ksm enabled. The backtrace looks like #0 vnc_update_client (vs=0x83f0, has_dirty=18) at vnc.c:908 #1 0x004c015b in vnc_refresh (opaque=value optimized out) at vnc.c:2305 #2 0x00405f50 in qemu_run_timers (ptimer_head=0x836cc0, current_time=1606536889) at /packages/qemu-kvm-0.12/src-gktOMQ/vl.c:1127 #3 0x00408edf in main_loop_wait (timeout=1000) at /packages/qemu-kvm-0.12/src-gktOMQ/vl.c:4036 #4 0x00421d7a in kvm_main_loop () at /packages/qemu-kvm-0.12/src-gktOMQ/qemu-kvm.c:2121 #5 0x0040b755 in main (argc=value optimized out, argv=0x7fffcc2fa1b8, envp=value optimized out) at /packages/qemu-kvm-0.12/src-gktOMQ/vl.c:4209 and the segfault itself is rather puzzling. #0 vnc_update_client (vs=0x83f0, has_dirty=18) at vnc.c:908 908 if (vs-need_update vs-csock != -1) { (gdb) p vs $1 = (VncState *) 0x83f0 (gdb) p *vs Cannot access memory at address 0x83f0 The call site in vnc_refresh() looks like: vs = vd-clients; while (vs != NULL) { rects += vnc_update_client(vs, has_dirty); vs = vs-next; } but when I go up a stack frame and look at the vd over which this loop would be iterating: (gdb) up #1 0x004c015b in vnc_refresh (opaque=value optimized out) at vnc.c:2305 2305rects += vnc_update_client(vs, has_dirty); (gdb) p *vd-clients $2 = {csock = 17, ds = 0x19b2760, dirty = {{0, 0, 0, 0} repeats 293 times, {50331648, 0, 0, 0}, {50331648, 0, 0, 0}, {50331648, 0, 0, 0}, {50331648, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {16777216, 0, 0, 0}, {50331648, 0, 0, 0}, {0, 0, 0, 0} repeats 1742 times}, vd = 0x1ef60b0, need_update = 0, force_update = 0, features = 0, absolute = 0, last_x = -1, last_y = -1, vnc_encoding = 0, tight_quality = 0 '\0', tight_compression = 0 '\0', major = 0, minor = 0, challenge = '\0' repeats 15 times, output = {capacity = 1036, offset = 0, buffer = 0x1ec7420 RFB 003.008\n¦\177}, input = {capacity = 0, offset = 0, buffer = 0x0}, write_pixels = 0, send_hextile_tile = 0, clientds = {flags = 0 '\0', width = 0, height = 0, linesize = 0, data = 0x0, pf = {bits_per_pixel = 0 '\0', bytes_per_pixel = 0 '\0', depth = 0 '\0', rmask = 0, gmask = 0, bmask = 0, amask = 0, rshift = 0 '\0', gshift = 0 '\0', bshift = 0 '\0', ashift = 0 '\0', rmax = 0 '\0', gmax = 0 '\0', bmax = 0 '\0', amax = 0 '\0', rbits = 0 '\0', gbits = 0 '\0', bbits = 0 '\0', abits = 0 '\0'}}, audio_cap = 0x0, as = {freq = 44100, nchannels = 2, fmt = AUD_FMT_S16, endianness = 0}, read_handler = 0x4bdb30 protocol_version, read_handler_expect = 12, modifiers_state = '\0' repeats 255 times, zlib = {capacity = 0, offset = 0, buffer = 0x0}, zlib_tmp = {capacity = 0, offset = 0, buffer = 0x0}, zlib_stream = {{next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}, next = 0x0} (gdb) p vd-clients.next $3 = (VncState *) 0x0 So the first client in vd is fine, and the next pointer is set to zero, not 0x83f0. Some sort of race where a client disconnects and is removed from the client list while the vnc_refresh() loop is iterating over it, maybe? Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes: We use JGroups (Java library) for reliable multicast communication in our cluster manager daemon. We don't worry about the performance much since the cluster manager daemon is not involved in the I/O path. We might think about moving to corosync if it is more stable than JGroups. I'd love to see this running on top of corosync too. Corosync is a well tested, stable cluster manager, and doesn't have the JVM dependency of jgroups so feels more suitable for building 'thin virtualisation fabrics'. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Chris Webb ch...@arachsys.com writes: MORITA Kazutaka morita.kazut...@lab.ntt.co.jp writes: We use JGroups (Java library) for reliable multicast communication in our cluster manager daemon. We don't worry about the performance much since the cluster manager daemon is not involved in the I/O path. We might think about moving to corosync if it is more stable than JGroups. I'd love to see this running on top of corosync too. Corosync is a well tested, stable cluster manager, and doesn't have the JVM dependency of jgroups so feels more suitable for building 'thin virtualisation fabrics'. Very exciting project, by the way! Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
Javier Guerra jav...@guerrag.com writes: i'd just want to add my '+1 votes' on both getting rid of JVM dependency and using block devices (usually LVM) instead of ext3/btrfs If the chunks into which the virtual drives are split are quite small (say the 64MB used by Hadoop), LVM may be a less appropriate choice. It doesn't support very large numbers of very small logical volumes very well. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Chris Webb ch...@arachsys.com writes: With the following applied, VNC connections and disconnections still work correctly, so it doesn't horribly break anything, but I can't immediately confirm whether it will cure the rare segfaults as I haven't yet found a rapid way of reproducing the crashes other than by waiting for one. Just to follow up on this: the backported patch has cured the vast majority of VNC crashes we've been seeing on 0.10.6, although I've still seen this earlier today: Core was generated by `qemu-kvm -m 512 -smp 1 -uuid d6f2cb13-7421-4baa-a978-eda9bec9d075 -pidfile /var'. Program terminated with signal 11, Segmentation fault. [New process 16847] [New process 16855] (gdb) bt #0 0x7fe42e9c6cb1 in memcpy () from /lib/libc.so.6 #1 0x004917e4 in vnc_write (vs=0x31a7f50, data=0x7fffe3a19230, len=2) at vnc.c:323 #2 0x004919bf in vnc_write_u16 (vs=0x7fe2f8cae023, value=value optimized out) at vnc.c:1035 #3 0x00491bf3 in vnc_framebuffer_update (vs=0x7fe2f8cae023, x=-475950544, y=2, w=16385, h=1, encoding=6) at vnc.c:286 #4 0x00496660 in send_framebuffer_update (vs=0x7fe2f8cae023, x=-475950544, y=196, w=208, h=1) at vnc.c:598 #5 0x00496f65 in vnc_update_client (opaque=value optimized out) at vnc.c:754 #6 0x0040822a in main_loop_wait (timeout=value optimized out) at /packages/qemu-kvm+vncfix/src-nUlCId/vl.c:1240 #7 0x0051753a in kvm_main_loop () at /packages/qemu-kvm+vncfix/src-nUlCId/qemu-kvm.c:596 #8 0x0040c8a5 in main (argc=value optimized out, argv=value optimized out, envp=value optimized out) at /packages/qemu-kvm+vncfix/src-nUlCId/vl.c:3850 (gdb) f 1 #1 0x004917e4 in vnc_write (vs=0x31a7f50, data=0x7fffe3a19230, len=2) at vnc.c:323 323 memcpy(buffer-buffer + buffer-offset, data, len); (gdb) f 1 #1 0x004917e4 in vnc_write (vs=0x31a7f50, data=0x7fffe3a19230, len=2) at vnc.c:323 323 memcpy(buffer-buffer + buffer-offset, data, len); (gdb) p *vs $1 = {timer = 0x2b90b20, csock = 18, ds = 0x28a1a20, vd = 0x28b0fc0, need_update = 1, dirty_row = {{0, 0, 0, 0} repeats 197 times, {65535, 262128, 0, 0}, {4294967295, 1, 0, 0}, {4294967288, 262143, 0, 0}, {4294443008, 262143, 0, 0}, {131071, 262128, 0, 0}, {4294967295, 1, 0, 0}, {4294967292, 262143, 0, 0}, {4294443008, 262143, 0, 0}, {131071, 262136, 0, 0}, {4294967295, 1, 0, 0}, {4294967292, 262143, 0, 0}, {4294443008, 262143, 0, 0}, { 131071, 262136, 0, 0}, {4294967295, 1, 0, 0}, {4294967292, 262143, 0, 0}, {4294705152, 262143, 0, 0}, {131071, 262136, 0, 0}, {4294967295, 1, 0, 0}, {4294967294, 262143, 0, 0}, {4294705152, 262143, 0, 0}, {131071, 262140, 0, 0}, {4294967295, 1, 0, 0}, {4294967294, 262143, 0, 0}, {4294836224, 262143, 0, 0}, {131071, 262140, 0, 0}, { 4294967295, 1, 0, 0}, {4294967294, 262143, 0, 0}, {4294836224, 262143, 0, 0}, {131071, 262140, 0, 0}, { 4294967295, 1, 0, 0}, {4294967295, 262143, 0, 0}, {4294836224, 262143, 0, 0}, {131071, 262142, 0, 0}, { 4294967295, 1, 0, 0}, {4294967295, 262143, 0, 0}, {4294901760, 262143, 0, 0}, {131071, 262142, 0, 0}, { 4294967295, 1, 0, 0}, {4294967295, 262143, 0, 0}, {4294901760, 262143, 0, 0}, {131071, 262142, 0, 0}, { 4294967295, 131073, 0, 0}, {4294967295, 262143, 0, 0}, {4294901760, 262143, 0, 0}, {131071, 262143, 0, 0}, { 4294967295, 131073, 0, 0}, {4294967295, 262143, 0, 0}, {4294934528, 262143, 0, 0}, {131071, 262143, 0, 0}, { 4294967295, 131075, 0, 0}, {4294967295, 262143, 0, 0}, {4294934528, 262143, 0, 0}, {131071, 262143, 0, 0}, { 4294967295, 196611, 0, 0}, {4294967295, 262143, 0, 0}, {4294934528, 262143, 0, 0}, {2147614719, 262143, 0, 0}, { 4294967295, 196611, 0, 0}, {4294967295, 262143, 0, 0}, {4294950912, 262143, 0, 0}, {2147614719, 262143, 0, 0}, { 4294967295, 196611, 0, 0}, {4294967295, 262143, 0, 0}, {4294950912, 262143, 0, 0}, {2147614719, 262143, 0, 0}, { 4294967295, 229379, 0, 0}, {4294967295, 262143, 0, 0}, {4294950912, 262143, 0, 0}, {3221356543, 262143, 0, 0}, { 4294967295, 229379, 0, 0}, {4294967295, 262143, 0, 0}, {4294950912, 262143, 0, 0}, {3221356543, 262143, 0, 0}, { 4294967295, 229377, 0, 0}, {4294967295, 262143, 0, 0}, {4294959104, 262143, 0, 0}, {3221356543, 262143, 0, 0}, { 4294967295, 245761, 0, 0}, {4294967295, 262143, 0, 0}, {4294959104, 262143, 0, 0}, {3758227455, 262143, 0, 0}, { 4294967295, 245761, 0, 0}, {4294967295, 262143, 0, 0}, {4294959104, 262143, 0, 0}, {3758227455, 262143, 0, 0}, { 4294967295, 245761, 0, 0}, {4294967295, 262143, 0, 0}, {4294963200, 262143, 0, 0}, {3758227455, 262143, 0, 0}, { 4294967295, 253953, 0, 0}, {4294967295, 262143, 0, 0}, {4294963200, 262143, 0, 0}, {4026662911, 262143, 0, 0}, { 4294967295, 253953, 0, 0}, {4294967295, 262143, 0, 0}, {4294963200, 262143, 0, 0}, {4026662911, 262143, 0, 0}, { 4294967295, 253953, 0, 0
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Avi Kivity a...@redhat.com writes: master branch has a patch that fixes a use-after-free when disconnecting. Unfortunately it doesn't port cleanly to stable-0.10. I've collected quite a few more core dumps from segfaults of client virtual machines now, all of which are VNC related and could quite plausibly be use of a VncState after it has been freed. I looked at Gerd's patch [198a00: vnc: rework VncState release workflow] and have taken a stab at the equivalent patch for stable qemu qemu-kvm 0.10. With the following applied, VNC connections and disconnections still work correctly, so it doesn't horribly break anything, but I can't immediately confirm whether it will cure the rare segfaults as I haven't yet found a rapid way of reproducing the crashes other than by waiting for one. diff --git a/vnc.c b/vnc.c --- a/vnc.c +++ b/vnc.c @@ -200,6 +200,8 @@ static void vnc_write_u8(VncState *vs, uint8_t value); static void vnc_flush(VncState *vs); static void vnc_update_client(void *opaque); +static void vnc_disconnect_start(VncState *vs); +static void vnc_disconnect_finish(VncState *vs); static void vnc_client_read(void *opaque); static void vnc_colordepth(VncState *vs); @@ -633,8 +635,6 @@ static void vnc_copy(VncState *vs, int src_x, int src_y, int dst_x, int dst_y, int w, int h) { -vnc_update_client(vs); - vnc_write_u8(vs, 0); /* msg id */ vnc_write_u8(vs, 0); vnc_write_u16(vs, 1); /* number of rects */ @@ -647,13 +647,21 @@ static void vnc_dpy_copy(DisplayState *ds, int src_x, int src_y, int dst_x, int dst_y, int w, int h) { VncDisplay *vd = ds-opaque; -VncState *vs = vd-clients; -while (vs != NULL) { +VncState *vs, *vn; + +for (vs = vd-clients; vs != NULL; vs = vn) { +vn = vs-next; +if (vnc_has_feature(vs, VNC_FEATURE_COPYRECT)) { +vnc_update_client(vs); +/* vs might be free()ed here */ +} +} + +for (vs = vd-clients; vs != NULL; vs = vs-next) { if (vnc_has_feature(vs, VNC_FEATURE_COPYRECT)) vnc_copy(vs, src_x, src_y, dst_x, dst_y, w, h); else /* TODO */ vnc_update(vs, dst_x, dst_y, w, h); -vs = vs-next; } } @@ -763,6 +771,8 @@ if (vs-csock != -1) { qemu_mod_timer(vs-timer, qemu_get_clock(rt_clock) + VNC_REFRESH_INTERVAL); +} else { +vnc_disconnect_finish(vs); } } @@ -832,6 +842,47 @@ } } +static void vnc_disconnect_start(VncState *vs) +{ +if (vs-csock == -1) +return; +qemu_set_fd_handler2(vs-csock, NULL, NULL, NULL, NULL); +closesocket(vs-csock); +vs-csock = -1; +} + +static void vnc_disconnect_finish(VncState *vs) +{ +qemu_del_timer(vs-timer); +qemu_free_timer(vs-timer); +if (vs-input.buffer) qemu_free(vs-input.buffer); +if (vs-output.buffer) qemu_free(vs-output.buffer); +#ifdef CONFIG_VNC_TLS +if (vs-tls_session) { +gnutls_deinit(vs-tls_session); +vs-tls_session = NULL; +} +#endif /* CONFIG_VNC_TLS */ +audio_del(vs); + +VncState *p, *parent = NULL; +for (p = vs-vd-clients; p != NULL; p = p-next) { +if (p == vs) { +if (parent) +parent-next = p-next; +else +vs-vd-clients = p-next; +break; +} +parent = p; +} +if (!vs-vd-clients) +dcl-idle = 1; + +qemu_free(vs-old_data); +qemu_free(vs); +} + static int vnc_client_io_error(VncState *vs, int ret, int last_errno) { if (ret == 0 || ret == -1) { @@ -849,36 +900,7 @@ } VNC_DEBUG(Closing down client sock %d %d\n, ret, ret 0 ? last_errno : 0); - qemu_set_fd_handler2(vs-csock, NULL, NULL, NULL, NULL); - closesocket(vs-csock); -qemu_del_timer(vs-timer); -qemu_free_timer(vs-timer); -if (vs-input.buffer) qemu_free(vs-input.buffer); -if (vs-output.buffer) qemu_free(vs-output.buffer); -#ifdef CONFIG_VNC_TLS - if (vs-tls_session) { - gnutls_deinit(vs-tls_session); - vs-tls_session = NULL; - } -#endif /* CONFIG_VNC_TLS */ -audio_del(vs); - -VncState *p, *parent = NULL; -for (p = vs-vd-clients; p != NULL; p = p-next) { -if (p == vs) { -if (parent) -parent-next = p-next; -else -vs-vd-clients = p-next; -break; -} -parent = p; -} -if (!vs-vd-clients) -dcl-idle = 1; - -qemu_free(vs-old_data); -qemu_free(vs); +vnc_disconnect_start(vs); return 0; } @@ -887,7 +909,8 @@ static void vnc_client_error(VncState *vs) { -vnc_client_io_error(vs, -1, EINVAL); +VNC_DEBUG(Closing down client sock: protocol error\n); +vnc_disconnect_start(vs); } static void vnc_client_write(void *opaque) @@ -947,8 +970,11 @@ #endif /* CONFIG_VNC_TLS */
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Chris Webb ch...@arachsys.com writes: Avi Kivity a...@redhat.com writes: I understand it's hard, but it's nearly impossible to work out the problem from so little data, so please do make the effort to obtain dumps. We're trying for this at the moment, but since we can't change the rlimit for the running qemu-kvm processes (?), we'll have to wait until one of the new ones dies, which may take some time. I'll follow up when I do have something. We've been lucky and relatively quickly got a core dump from one of the new qemu-kvms with the non-zero core file rlimit. A backtrace looks like this: (gdb) bt #0 0x004068f7 in qemu_mod_timer (ts=0x30d1f30, expire_time=430489) at /packages/qemu-kvm/src-f39tF1/vl.c:1161 #1 0x00495dd5 in vnc_update_client (opaque=value optimized out) at vnc.c:765 #2 0x004081da in main_loop_wait (timeout=value optimized out) at /packages/qemu-kvm/src-f39tF1/vl.c:1240 #3 0x0051613a in kvm_main_loop () at /packages/qemu-kvm/src-f39tF1/qemu-kvm.c:596 #4 0x0040c7b7 in main (argc=value optimized out, argv=value optimized out, envp=value optimized out) at /packages/qemu-kvm/src-f39tF1/vl.c:3850 The segfault appears to be a null pointer dereference. ts-clock is NULL and line 1161 uses ts-clock-type: (gdb) p ts $4 = (QEMUTimer *) 0x30d1f30 (gdb) p ts-clock $5 = (QEMUClock *) 0x0 The VncState in vnc_update_client is as follows: (gdb) f 1 #1 0x00495dd5 in vnc_update_client (opaque=value optimized out) at vnc.c:765 765 qemu_mod_timer(vs-timer, qemu_get_clock(rt_clock) + VNC_REFRESH_INTERVAL); (gdb) p *vs $12 = {timer = 0x30d1f30, csock = -986235208, ds = 0x0, vd = 0x0, need_update = 1, dirty_row = {{0, 0, 4294967295, 4294967295} repeats 768 times, {4294967295, 4294967295, 4294967295, 4294967295} repeats 1280 times}, old_data = 0x7f9b8276f010 Address 0x7f9b8276f010 out of bounds, features = 98, absolute = 1, last_x = -1, last_y = -1, vnc_encoding = 5, tight_quality = 6 '\006', tight_compression = 1 '\001', major = 3, minor = 3, challenge = \032\314i\257\302t1(\320\312\263\024pH\226, output = {capacity = 1545078, offset = 684, buffer = 0x3107860 }, input = {capacity = 5120, offset = 0, buffer = 0x3106450 \020\220(\003}, write_pixels = 0x490b50 vnc_write_pixels_generic, send_hextile_tile = 0x492030 send_hextile_tile_generic_32, clientds = {flags = 0 '\0', width = 800, height = 600, linesize = 3200, data = 0x7f9b82944010 Address 0x7f9b82944010 out of bounds, pf = {bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004', depth = 24 '\030', rmask = 0, gmask = 0, bmask = 0, amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\0', ashift = 24 '\030', rmax = 255 '\377', gmax = 255 '\377', bmax = 255 '\377', amax = 255 '\377', rbits = 8 '\b', gbits = 8 '\b', bbits = 8 '\b', abits = 8 '\b'}}, serverds = { flags = 2 '\002', width = 1024, height = 768, linesize = 4096, data = 0x7f9b8246e010 , pf = { bits_per_pixel = 32 ' ', bytes_per_pixel = 4 '\004', depth = 24 '\030', rmask = 16711680, gmask = 65280, bmask = 255, amask = 0, rshift = 16 '\020', gshift = 8 '\b', bshift = 0 '\0', ashift = 24 '\030', rmax = 255 '\377', gmax = 255 '\377', bmax = 255 '\377', amax = 255 '\377', rbits = 8 '\b', gbits = 8 '\b', bbits = 8 '\b', abits = 8 '\b'}}, audio_cap = 0x0, as = {freq = 44100, nchannels = 2, fmt = AUD_FMT_S16, endianness = 0}, read_handler = 0x494b40 protocol_client_msg, read_handler_expect = 1, modifiers_state = '\0' repeats 255 times, zlib = {capacity = 0, offset = 0, buffer = 0x0}, zlib_tmp = { capacity = 0, offset = 0, buffer = 0x0}, zlib_stream = {{next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}, {next_in = 0x0, avail_in = 0, total_in = 0, next_out = 0x0, avail_out = 0, total_out = 0, msg = 0x0, state = 0x0, zalloc = 0, zfree = 0, opaque = 0x0, data_type = 0, adler = 0, reserved = 0}}, next = 0x0} I'm afraid I only have one of these, so I can't say whether the other segfaults were exactly the same or different (other than knowing the source line matched), but I'll keep my eye out for more core dumps. qemu-kvm command line for this guest would have been qemu-kvm -m 1024 -smp 1
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Chris Webb ch...@arachsys.com writes: The segfault appears to be a null pointer dereference. ts-clock is NULL and line 1161 uses ts-clock-type: (gdb) p ts $4 = (QEMUTimer *) 0x30d1f30 (gdb) p ts-clock $5 = (QEMUClock *) 0x0 Sorry, meant to paste this too: (gdb) p *ts $1 = {clock = 0x0, expire_time = 49, cb = 0x2b63630, opaque = 0x30fe000, next = 0x495b40} Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Avi Kivity a...@redhat.com writes: csock looks corrupted, should be -1 or an fd. Was a vnc client connected? Was the guest playing with the display resolution? Yes, I think in this case there was a vncviewer connected, and the guest had started booting up into windows, which changes the resolution a couple of times. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
Chris Webb ch...@arachsys.com writes: Avi Kivity a...@redhat.com writes: csock looks corrupted, should be -1 or an fd. Was a vnc client connected? Was the guest playing with the display resolution? Yes, I think in this case there was a vncviewer connected, and the guest had started booting up into windows, which changes the resolution a couple of times. Also, I think the vncviewer might actually have been disconnecting at about the time the segfault happened. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
qemu-kvm segfaults in qemu_del_timer (0.10.5 and 0.10.6)
I have a couple of clusters hosting qemu-kvm virtual machines. One of these clusters consists of dual quad-core Xeon E5420s (vmx), the other consists of dual quad-core Barcelona Opterons (svm), and both are running x86-64 Linux 2.6.30.4 with the kvm modules included with the upstream kernel compiled in. Running qemu-kvm 0.10.5, I was seeing occasional segfaults from the virtual machines, perhaps two or three a day across each cluster. The guest OS didn't appear to be a factor, as both Linux and Windows VMs have crashed. I then switched to the recently released qemu-kvm 0.10.6, and am still seeing these segfaults. It's very hard for me to arrange for core dumps on these live clusters, and the segfaults are hard to reproduce on test machines because they are rare. However, I have unstripped copies of the respective binaries and have used gdb to translate the segfault ip into a source file and line number, which I hope might be useful. On both clusters and for each version of qemu-kvm, segfaults are happening at lines #1161 and #1163 of vl.c: [...] /* stop a timer, but do not dealloc it */ void qemu_del_timer(QEMUTimer *ts) { QEMUTimer **pt, *t; /* NOTE: this code must be signal safe because qemu_timer_expired() can be called from a signal. */ HERE ==pt = active_timers[ts-clock-type]; for(;;) { HERE ==t = *pt; if (!t) break; if (t == ts) { *pt = t-next; break; } pt = t-next; } } [...] For qemu-kvm 0.10.5, I have large numbers of segfaults in both locations. For qemu-kvm 0.10.6, my sample is much smaller, but the segfaults I have are all at line #1161, not #1163. Final data-point: prior to the 0.10.5 upgrade, we had been successfully running a (fairly old) kvm-83 userspace without experiencing this segfault problem. Any help fixing this would be gratefully received! Cheers, Chris. PS One other place I have seen a segfault in 0.10.6 since we rolled it out is at line #141 of hw/scsi-disk.c, but this has only happened once---very rare compared to the problem I describe above. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Trouble understanding net config options
Michael Jinks michael.ji...@gmail.com writes: How do I make a guest use a specific tap? Quoting from my initial post, my -net options are: -net nic -net tap,name=tap11 -net nic -net tap,name=tap12 You want -net nic,vlan=0 -net tap,vlan=0,ifname=tap11 -net nic,vlan=1 -net tap,vlan=1,ifname=tap12 to get the effect that (I think) you're looking for: one nic connected to tap11 using vlan0 and one nic connected to tap12 using vlan1. Without the vlan parameters, everything's on vlan0 so you get two nics and two tap interfaces all connected together inside qemu on a single virtual switch. Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Two VNC patches
I sent this pair of VNC-related patches to the qemu-devel list a couple of weeks back and I'm not sure whether they've got lost in the cracks or were in some way not acceptable and need fixing up. The first one is a straightforward bug-fix, and the second is a trivial convenience feature in the monitor which I imagine ought to be fairly uncontroversial? Cheers, Chris. ---BeginMessage--- Fix off-by-one bug limiting VNC passwords to 7 characters instead of 8 monitor_readline expects buf_size to include the terminating \0, but do_change_vnc in monitor.c calls it as though it doesn't. The other site where monitor_readline reads a password (in vl.c) passes the buffer length correctly. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/monitor.c b/monitor.c index 22360fc..a252838 100644 --- a/monitor.c +++ b/monitor.c @@ -433,8 +433,7 @@ static void do_change_vnc(const char *target) if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)-1); - password[sizeof(password)-1] = '\0'; + monitor_readline(Password: , 1, password, sizeof(password)); if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); } else { ---End Message--- ---BeginMessage--- Accept password as an argument to 'change vnc password' monitor command This allows easier use of the change vnc password monitor command from management scripts, without having to implement expect(1)-like behaviour. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c | 14 +- qemu-doc.texi |8 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/monitor.c b/monitor.c index a252838..f6a2783 100644 --- a/monitor.c +++ b/monitor.c @@ -428,12 +428,16 @@ static void do_change_block(const char *device, const char *filename, const char qemu_key_check(bs, filename); } -static void do_change_vnc(const char *target) +static void do_change_vnc(const char *target, const char *arg) { if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)); + if (arg) { + strncpy(password, arg, sizeof(password)); + password[sizeof(password) - 1] = '\0'; + } else + monitor_readline(Password: , 1, password, sizeof(password)); if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); } else { @@ -442,12 +446,12 @@ static void do_change_vnc(const char *target) } } -static void do_change(const char *device, const char *target, const char *fmt) +static void do_change(const char *device, const char *target, const char *arg) { if (strcmp(device, vnc) == 0) { - do_change_vnc(target); + do_change_vnc(target, arg); } else { - do_change_block(device, target, fmt); + do_change_block(device, target, arg); } } diff --git a/qemu-doc.texi b/qemu-doc.texi index 1735d92..ca3b181 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1233,11 +1233,11 @@ and @var{options} are described at @ref{sec_invocation}. eg (qemu) change vnc localhost:1 @end example [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] -Change the password associated with the VNC server. The monitor will prompt for -the new password to be entered. VNC passwords are only significant upto 8 letters. -eg. +Change the password associated with the VNC server. If the new password is not +supplied, the monitor will prompt for it to be entered. VNC passwords are only +significant up to 8 letters. eg @example (qemu) change vnc password ---End Message---
[RESEND] [PATCH v2] Accept password as an argument to 'change vnc password'
Accept password as an argument to 'change vnc password' monitor command This allows easier use of the change vnc password monitor command from management scripts, without having to implement expect(1)-like behaviour. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c | 14 +- qemu-doc.texi |8 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/monitor.c b/monitor.c index a252838..f6a2783 100644 --- a/monitor.c +++ b/monitor.c @@ -428,12 +428,16 @@ static void do_change_block(const char *device, const char *filename, const char qemu_key_check(bs, filename); } -static void do_change_vnc(const char *target) +static void do_change_vnc(const char *target, const char *arg) { if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)); + if (arg) { + strncpy(password, arg, sizeof(password)); + password[sizeof(password) - 1] = '\0'; + } else + monitor_readline(Password: , 1, password, sizeof(password)); if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); } else { @@ -442,12 +446,12 @@ static void do_change_vnc(const char *target) } } -static void do_change(const char *device, const char *target, const char *fmt) +static void do_change(const char *device, const char *target, const char *arg) { if (strcmp(device, vnc) == 0) { - do_change_vnc(target); + do_change_vnc(target, arg); } else { - do_change_block(device, target, fmt); + do_change_block(device, target, arg); } } diff --git a/qemu-doc.texi b/qemu-doc.texi index 1735d92..ca3b181 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1233,11 +1233,11 @@ and @var{options} are described at @ref{sec_invocation}. eg (qemu) change vnc localhost:1 @end example [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] -Change the password associated with the VNC server. The monitor will prompt for -the new password to be entered. VNC passwords are only significant upto 8 letters. -eg. +Change the password associated with the VNC server. If the new password is not +supplied, the monitor will prompt for it to be entered. VNC passwords are only +significant up to 8 letters. eg @example (qemu) change vnc password -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Fix off-by-one bug limiting VNC passwords to 7 chars
Thiemo Seufer [EMAIL PROTECTED] writes: Chris Webb wrote: [...] - monitor_readline(Password: , 1, password, sizeof(password)-1); + monitor_readline(Password: , 1, password, sizeof(password)); password[sizeof(password)-1] = '\0'; The next line can go as well, the string is already NULL terminated. You're quite right. I'll update the two patches to reflect this change. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] Accept password as an argument to 'change vnc password'
Accept password as an argument to 'change vnc password' monitor command This allows easier use of the change vnc password monitor command from management scripts, without having to implement expect(1)-like behaviour. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c | 14 +- qemu-doc.texi |8 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/monitor.c b/monitor.c index a252838..f6a2783 100644 --- a/monitor.c +++ b/monitor.c @@ -428,12 +428,16 @@ static void do_change_block(const char *device, const char *filename, const char qemu_key_check(bs, filename); } -static void do_change_vnc(const char *target) +static void do_change_vnc(const char *target, const char *arg) { if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)); + if (arg) { + strncpy(password, arg, sizeof(password)); + password[sizeof(password) - 1] = '\0'; + } else + monitor_readline(Password: , 1, password, sizeof(password)); if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); } else { @@ -442,12 +446,12 @@ static void do_change_vnc(const char *target) } } -static void do_change(const char *device, const char *target, const char *fmt) +static void do_change(const char *device, const char *target, const char *arg) { if (strcmp(device, vnc) == 0) { - do_change_vnc(target); + do_change_vnc(target, arg); } else { - do_change_block(device, target, fmt); + do_change_block(device, target, arg); } } diff --git a/qemu-doc.texi b/qemu-doc.texi index 1735d92..ca3b181 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1233,11 +1233,11 @@ and @var{options} are described at @ref{sec_invocation}. eg (qemu) change vnc localhost:1 @end example [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] -Change the password associated with the VNC server. The monitor will prompt for -the new password to be entered. VNC passwords are only significant upto 8 letters. -eg. +Change the password associated with the VNC server. If the new password is not +supplied, the monitor will prompt for it to be entered. VNC passwords are only +significant up to 8 letters. eg @example (qemu) change vnc password -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix off-by-one bug limiting VNC passwords to 7 chars
Fix off-by-one bug limiting VNC passwords to 7 characters instead of 8 monitor_readline expects buf_size to include the terminating \0, but do_change_vnc in monitor.c calls it as though it doesn't. The other site where monitor_readline reads a password (in vl.c) passes the buffer length correctly. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/monitor.c b/monitor.c index 22360fc..6ae5729 100644 --- a/monitor.c +++ b/monitor.c @@ -433,7 +433,7 @@ static void do_change_vnc(const char *target) if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)-1); + monitor_readline(Password: , 1, password, sizeof(password)); password[sizeof(password)-1] = '\0'; if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Accept password as an argument to 'change vnc password'
Accept password as an argument to 'change vnc password' monitor command This allows easier use of the change vnc password monitor command from management scripts, without having to implement expect(1)-like behaviour. Signed-off-by: Chris Webb [EMAIL PROTECTED] --- monitor.c | 13 - qemu-doc.texi |8 2 files changed, 12 insertions(+), 9 deletions(-) diff --git a/monitor.c b/monitor.c index 22360fc..8ac73c1 100644 --- a/monitor.c +++ b/monitor.c @@ -428,12 +428,15 @@ static void do_change_block(const char *device, const char *filename, const char qemu_key_check(bs, filename); } -static void do_change_vnc(const char *target) +static void do_change_vnc(const char *target, const char *arg) { if (strcmp(target, passwd) == 0 || strcmp(target, password) == 0) { char password[9]; - monitor_readline(Password: , 1, password, sizeof(password)); + if (arg) + strncpy(password, arg, sizeof(password)); + else + monitor_readline(Password: , 1, password, sizeof(password)); password[sizeof(password)-1] = '\0'; if (vnc_display_password(NULL, password) 0) term_printf(could not set VNC server password\n); @@ -443,12 +446,12 @@ static void do_change_vnc(const char *target) } } -static void do_change(const char *device, const char *target, const char *fmt) +static void do_change(const char *device, const char *target, const char *arg) { if (strcmp(device, vnc) == 0) { - do_change_vnc(target); + do_change_vnc(target, arg); } else { - do_change_block(device, target, fmt); + do_change_block(device, target, arg); } } diff --git a/qemu-doc.texi b/qemu-doc.texi index 1735d92..ca3b181 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -1233,11 +1233,11 @@ and @var{options} are described at @ref{sec_invocation}. eg (qemu) change vnc localhost:1 @end example [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] change vnc password [EMAIL PROTECTED] -Change the password associated with the VNC server. The monitor will prompt for -the new password to be entered. VNC passwords are only significant upto 8 letters. -eg. +Change the password associated with the VNC server. If the new password is not +supplied, the monitor will prompt for it to be entered. VNC passwords are only +significant up to 8 letters. eg @example (qemu) change vnc password -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Unsupported delivery mode 7
We're running kvm-78 in production on Linux 2.6.27 x86_64 on dual quad-core Opteron 'Barcelona' machines. Our kvm modules are built from the kvm-78 sources rather than the older version bundled with the kernel, and we're using the NPT features of the processors. For the most part, everything is performing very well and running reliably. However, occasionally a guest will hang as it starts (or is reset) with a large number of messages of the form Unsupported delivery mode 7 in the dmesg. Following this, killing and relaunching the qemu process is usually sufficient to get a working guest. I'm aware that our versions of the kvm kernel modules and userspace are not the latest release, but because we're running long-lived guests on behalf of clients, it's quite a major operation to upgrade. Does this look like a known bug which has already been fixed or should I try to reproduce it properly on a test machine with an ability to debug, use magic sysrq, etc? (It seems impossible to reproduce on my lower spec desktop machine, for what it's worth. Normally I'd reproduce kernel problems in a KVM virtual machine---but that's obviously not an option here!) Am I right in suspecting it might be connected to interrupt delivery following page migration when a guest moves from one processor to another, and that a workaround might be to taskset guests to one or other physical CPU until we're able to upgrade to a more recent version of KVM? Many thanks in advance for any advice anyone can offer. Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qemu-send.c (was Re: Since we're sharing, here's my kvmctl script)
Javier Guerra Giraldez [EMAIL PROTECTED] writes: On Wednesday 11 June 2008, Chris Webb wrote: Hi. I have a small 'qemu-send' utility for talking to a running qemu/kvm process whose monitor console listens on a filesystem socket, which I think might be a useful building block when extending these kinds of script to do things like migratation, pausing, and so on. The source is attached. there's a utility called socat that let's you send text to/from TCP sockets and unix-domain sockets. it can even (temporarily) attach the terminal, or use GNU's readline to regain interactive control of KVM/Qemu Hi. Yes, I'm aware of socat, netcat, tcpclient et al. and even have a similar pair of little unix/tcp/udp/syslogging utilities myself called sk/skd which I initially used for scripting our local kvm management system. However, it's a little bit clumsy to use these tools correctly from a shell script if you want to get back the command output intact. You need to open your connection to the unix server socket, wait for the prompt (skipping the welcome banner), send the command, copy the response out until you get a line '(qemu) ', then disconnect. For the same reason you can't do echo -e GET / HTTP/1.1\n\n /dev/tcp/www.google.com/80 cat /dev/tcp/www.google.com/80 having to write exec 3/dev/tcp/www.google.com/80 echo -e GET / HTTP/1.1\n\n 3 cat 3 instead, you need to avoid disconnecting from the socket in the middle of the command/response exchange. (In fact, with qemu, it nearly works anyway: the new connection gets all the output and the next prompt from the old one before the new banner, so you just have a couple of extra prompts, a command echo and a banner at the top and bottom to filter away. However, I'd be very reluctant to rely on this behaviour, and in particular on it not losing output between connections. The method I implemented in qemu-send.c should be robust again changes in the way qemu handles its monitor sockets.) To get the convenient syntax and behaviour I wanted, it felt easier and cleaner to write the few lines of C needed for a standalone utility rather than introduce a parsing shell script/function plus a dependency on one of sk/socat/netcat/tcpclient. I suspect also that I'm just more comfortable in C than sh; YMMV! Cheers, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html