Re: [Qemu-devel] Windows slow boot: contractor wanted
Hi Rik, Are there any more tests which I can usefully do for you? I notice that 3.6.0-rc4 is out - are there changes from rc3 which are worth me retesting? Cheers, Richard. Richard Davies wrote: Rik van Riel wrote: Can you get a backtrace to that _raw_spin_lock_irqsave, to see from where it is running into lock contention? It would be good to know whether it is isolate_freepages_block, yield_to, kvm_vcpu_on_spin or something else... Hi Rik, I got into a slow boot situation on 3.6.0-rc3, ran perf record -g -a for a while, then ran perf report with the output below. This trace looks more like the second perf top trace that I sent on Saturday (there were two in my email and they were different from each other as well as different from on 3.5.2). The symptoms were a bit different too - the VM boots appeared to be completely locked up rather than just slow, and I couldn't quit qemu-kvm at the monitor - I had to restart the host. So perhaps this one is actually a deadlock rather than just slow? Cheers, Richard. # # captured on: Sun Aug 26 10:08:28 2012 # os release : 3.6.0-rc3-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131971760 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # # # Samples: 2M of event 'cycles' # Event count (approx.): 1040676441385 # # Overhead Command Shared Object Symbol # ... .. # 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.99%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.91%-- 0x1010002 | | | --45.09%-- 0x1010006 --0.01%-- [...] 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count | --- sub_preempt_count | |--99.77%-- _raw_spin_unlock_irqrestore | | | |--99.99%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | |
Re: [Qemu-devel] Windows slow boot: contractor wanted
Rik van Riel wrote: Can you get a backtrace to that _raw_spin_lock_irqsave, to see from where it is running into lock contention? It would be good to know whether it is isolate_freepages_block, yield_to, kvm_vcpu_on_spin or something else... Hi Rik, I got into a slow boot situation on 3.6.0-rc3, ran perf record -g -a for a while, then ran perf report with the output below. This trace looks more like the second perf top trace that I sent on Saturday (there were two in my email and they were different from each other as well as different from on 3.5.2). The symptoms were a bit different too - the VM boots appeared to be completely locked up rather than just slow, and I couldn't quit qemu-kvm at the monitor - I had to restart the host. So perhaps this one is actually a deadlock rather than just slow? Cheers, Richard. # # captured on: Sun Aug 26 10:08:28 2012 # os release : 3.6.0-rc3-elastic # perf version : 3.5.2 # arch : x86_64 # nrcpus online : 16 # nrcpus avail : 16 # cpudesc : AMD Opteron(tm) Processor 6128 # cpuid : AuthenticAMD,16,9,1 # total memory : 131971760 kB # cmdline : /home/root/bin/perf record -g -a # event : name = cycles, type = 0, config = 0x0, config1 = 0x0, config2 = 0x0, excl_usr = 0, excl_kern = 0, id = { 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64 } # HEADER_CPU_TOPOLOGY info available, use -I to display # HEADER_NUMA_TOPOLOGY info available, use -I to display # # # Samples: 2M of event 'cycles' # Event count (approx.): 1040676441385 # # Overhead Command Shared Object Symbol # ... .. # 90.01% qemu-kvm [kernel.kallsyms] [k] _raw_spin_lock_irqsave | --- _raw_spin_lock_irqsave | |--99.99%-- isolate_migratepages_range | compact_zone | compact_zone_order | try_to_compact_pages | __alloc_pages_direct_compact | __alloc_pages_nodemask | alloc_pages_vma | do_huge_pmd_anonymous_page | handle_mm_fault | __get_user_pages | get_user_page_nowait | hva_to_pfn.isra.33 | __gfn_to_pfn | gfn_to_pfn_async | try_async_pf | tdp_page_fault | kvm_mmu_page_fault | pf_interception | handle_exit | kvm_arch_vcpu_ioctl_run | kvm_vcpu_ioctl | do_vfs_ioctl | sys_ioctl | system_call_fastpath | ioctl | | | |--54.91%-- 0x1010002 | | | --45.09%-- 0x1010006 --0.01%-- [...] 4.66% qemu-kvm [kernel.kallsyms] [k] sub_preempt_count | --- sub_preempt_count | |--99.77%-- _raw_spin_unlock_irqrestore | | | |--99.99%-- compact_checklock_irqsave | | isolate_migratepages_range | | compact_zone | | compact_zone_order | | try_to_compact_pages | | __alloc_pages_direct_compact | | __alloc_pages_nodemask | | alloc_pages_vma | | do_huge_pmd_anonymous_page | | handle_mm_fault | | __get_user_pages | | get_user_page_nowait | | hva_to_pfn.isra.33 | | __gfn_to_pfn | | gfn_to_pfn_async | | try_async_pf | | tdp_page_fault | | kvm_mmu_page_fault | | pf_interception | |
Re: [Qemu-devel] Windows slow boot: contractor wanted
Rik van Riel wrote: Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. That's the page compaction code. Mel Gorman and I have been working to fix that, the latest fixes and improvements are in the -mm kernel already. Hi Rik, Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mmm=134521289221259 If so, I believe those are in 3.6.0-rc3, so I tested with that. Unfortunately, I can still get the slow boots and perf top showing _raw_spin_lock_irqsave. Here are two perf top traces on 3.6.0-rc3. They do look a bit different from 3.5.2, but _raw_spin_lock_irqsave is still at the top: PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -- 61.85% [kernel] [k] _raw_spin_lock_irqsave 7.18% [kernel] [k] sub_preempt_count 5.03% [kernel] [k] isolate_freepages_block 2.49% [kernel] [k] yield_to 2.05% [kernel] [k] memcmp 2.01% [kernel] [k] compact_zone 1.76% [kernel] [k] add_preempt_count 1.52% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] kvm_vcpu_on_spin 0.92% [kernel] [k] svm_vcpu_run 0.78% [kernel] [k] __rcu_read_unlock 0.76% [kernel] [k] migrate_pages 0.68% [kernel] [k] kvm_vcpu_yield_to 0.46% [kernel] [k] pid_task 0.42% [kernel] [k] isolate_migratepages_range 0.41% [kernel] [k] kvm_arch_vcpu_ioctl_run 0.40% [kernel] [k] clear_page_c 0.40% [kernel] [k] get_pid_task 0.40% [kernel] [k] get_parent_ip 0.39% [kernel] [k] __zone_watermark_ok 0.34% [kernel] [k] trace_hardirqs_off 0.34% [kernel] [k] trace_hardirqs_on 0.32% [kernel] [k] _raw_spin_unlock_irqrestore 0.27% [kernel] [k] _raw_spin_unlock 0.22% [kernel] [k] mod_zone_page_state 0.21% [kernel] [k] rcu_note_context_switch 0.21% [kernel] [k] trace_preempt_on 0.21% [kernel] [k] trace_preempt_off 0.19% [kernel] [k] in_lock_functions 0.16% [kernel] [k] __srcu_read_lock 0.14% [kernel] [k] ktime_get 0.11% [kernel] [k] get_pageblock_flags_group 0.11% [kernel] [k] compact_checklock_irqsave 0.11% [kernel] [k] find_busiest_group 0.10% [kernel] [k] __srcu_read_unlock 0.09% [kernel] [k] __rcu_read_lock 0.09% libc-2.10.1.so[.] 0x00072c9d 0.09% [kernel] [k] cpumask_next_and 0.08% [kernel] [k] smp_call_function_many 0.08% [kernel] [k] read_tsc 0.08% [kernel] [k] kmem_cache_alloc 0.08% libc-2.10.1.so[.] strcmp 0.08% [kernel] [k] generic_smp_call_function_interrupt 0.07% [kernel] [k] __schedule 0.07% qemu-kvm [.] main_loop_wait 0.07% [kernel] [k] __hrtimer_start_range_ns 0.06% qemu-kvm [.] qemu_iohandler_poll 0.06% [kernel] [k] ktime_get_update_offsets 0.06% [kernel] [k] ktime_add_safe 0.06% [kernel] [k] find_next_bit 0.06% [kernel] [k] irq_exit 0.06% [kernel] [k] select_task_rq_fair 0.06% [kernel] [k] handle_exit 0.05% [kernel] [k] update_curr 0.05% [kernel] [k] flush_tlb_func 0.05% perf [.] dso__find_symbol 0.05% [kernel] [k] kvm_check_async_pf_completion 0.05% [kernel] [k] rcu_check_callbacks 0.05% [kernel] [k] apic_update_ppr 0.05% [kernel] [k] irq_enter 0.04% [kernel] [k] copy_user_generic_string 0.04% [kernel] [k] copy_page_c 0.04% [kernel] [k] rcu_idle_exit_common.isra.34 0.04% [kernel] [k] load_balance 0.04% [kernel] [k] rb_erase 0.04% libc-2.10.1.so[.] __select 1904
Re: [Qemu-devel] Windows slow boot: contractor wanted
Troy Benjegerdes wrote: Is there a way to capture/reproduce this 'slow boot' behavior with a simple regression test? I'd like to know if it happens on a single-physical CPU socket machine, or just on dual-sockets. Yes, definitely. These two emails earlier in the thread give a fairly complete description of what I am doing - please do ask any further questions? http://marc.info/?l=qemu-develm=134511429415347 http://marc.info/?l=qemu-develm=134520701317153 Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/25/2012 01:45 PM, Richard Davies wrote: Are you talking about these patches? http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commit;h=c67fe3752abe6ab47639e2f9b836900c3dc3da84 http://marc.info/?l=linux-mmm=134521289221259 If so, I believe those are in 3.6.0-rc3, so I tested with that. Unfortunately, I can still get the slow boots and perf top showing _raw_spin_lock_irqsave. Here are two perf top traces on 3.6.0-rc3. They do look a bit different from 3.5.2, but _raw_spin_lock_irqsave is still at the top: PerfTop: 35272 irqs/sec kernel:98.1% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) -- 61.85% [kernel] [k] _raw_spin_lock_irqsave 7.18% [kernel] [k] sub_preempt_count 5.03% [kernel] [k] isolate_freepages_block 2.49% [kernel] [k] yield_to 2.05% [kernel] [k] memcmp 2.01% [kernel] [k] compact_zone 1.76% [kernel] [k] add_preempt_count 1.52% [kernel] [k] _raw_spin_lock 1.31% [kernel] [k] kvm_vcpu_on_spin 0.92% [kernel] [k] svm_vcpu_run However, the compaction code is not as prominent as before. Can you get a backtrace to that _raw_spin_lock_irqsave, to see from where it is running into lock contention? It would be good to know whether it is isolate_freepages_block, yield_to, kvm_vcpu_on_spin or something else... -- All rights reversed
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/21/2012 06:21 PM, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more unprocessable samples recorded messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block Please disable ksm, and if this function persists in the profile, reduce some memory from the guests. 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin Except for isolate_freepages_block, all functions up to here have to do with dealing with cpu overcommit. But let's deal with them after we see a profile with isolate_freepages_block removed. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/22/2012 03:40 PM, Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. In the slowest boot that I have so far (1-2 minutes), this is the perf top ouput: PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) --- 53.94% [kernel] [k] clear_page_c 2.77% [kernel] [k] svm_vcpu_put 2.60% [kernel] [k] svm_vcpu_run 1.79% [kernel] [k] sub_preempt_count 1.56% [kernel] [k] svm_vcpu_load 1.44% [kernel] [k] __schedule 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.34% [kernel] [k] resched_task 1.32% [kernel] [k] _raw_spin_lock 0.98% [kernel] [k] trace_preempt_on 0.95% [kernel] [k] get_parent_ip 0.94% [kernel] [k] yield_to This is pretty normal, Widows is touching memory so clear_page_c() is called to scrub it. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/22/2012 05:41 PM, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Then it's still memory starved. Please provide /proc/zoneinfo while this is happening. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/22/2012 10:41 AM, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. That's the page compaction code. Mel Gorman and I have been working to fix that, the latest fixes and improvements are in the -mm kernel already.
Re: [Qemu-devel] Windows slow boot: contractor wanted
Rik van Riel wrote: Richard Davies wrote: I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. That's the page compaction code. Mel Gorman and I have been working to fix that, the latest fixes and improvements are in the -mm kernel already. Hi Rik, That's good news. Can you point me to specific patches which we can backport to a 3.5.2 kernel to test whether they fix our problem? Thanks, Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Then it's still memory starved. Please provide /proc/zoneinfo while this is happening. Is there a way to capture/reproduce this 'slow boot' behavior with a simple regression test? I'd like to know if it happens on a single-physical CPU socket machine, or just on dual-sockets. I'm also observing an interesting phenomenon here.. Kernel development can move so fast as to make regression testing pointless. ;)
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. ... PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block Please disable ksm, and if this function persists in the profile, reduce some memory from the guests. 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin Except for isolate_freepages_block, all functions up to here have to do with dealing with cpu overcommit. But let's deal with them after we see a profile with isolate_freepages_block removed. I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. In the slowest boot that I have so far (1-2 minutes), this is the perf top ouput: PerfTop: 26741 irqs/sec kernel:97.5% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) --- 53.94% [kernel] [k] clear_page_c 2.77% [kernel] [k] svm_vcpu_put 2.60% [kernel] [k] svm_vcpu_run 1.79% [kernel] [k] sub_preempt_count 1.56% [kernel] [k] svm_vcpu_load 1.44% [kernel] [k] __schedule 1.36% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.34% [kernel] [k] resched_task 1.32% [kernel] [k] _raw_spin_lock 0.98% [kernel] [k] trace_preempt_on 0.95% [kernel] [k] get_parent_ip 0.94% [kernel] [k] yield_to 0.88% [kernel] [k] __switch_to 0.87% [kernel] [k] get_page_from_freelist 0.81% [kernel] [k] in_lock_functions 0.76% [kernel] [k] add_preempt_count 0.72% [kernel] [k] kvm_vcpu_on_spin 0.69% [kernel] [k] free_pages_prepare 0.59% [kernel] [k] find_highest_vector 0.57% [kernel] [k] rcu_note_context_switch 0.55% [kernel] [k] paging64_walk_addr_generic 0.54% [kernel] [k] __srcu_read_lock 0.49% [kernel] [k] trace_preempt_off 0.47% [kernel] [k] reschedule_interrupt 0.45% [kernel] [k] sched_clock_cpu 0.40% [kernel] [k] trace_hardirqs_on 0.38% [kernel] [k] clear_huge_page 0.37% [kernel] [k] prep_compound_page 0.32% [kernel] [k] x86_emulate_instruction 0.32% [kernel] [k] _raw_spin_lock_irq 0.31% [kernel] [k] __srcu_read_unlock 0.31% [kernel] [k] trace_hardirqs_off 0.30% [kernel] [k] pick_next_task_fair 0.29% [kernel] [k] kvm_find_cpuid_entry 0.28% [kernel] [k] x86_decode_insn 0.26% [kernel] [k] kvm_cpu_has_pending_timer 0.26% [kernel] [k] init_emulate_ctxt 0.25% [kernel] [k] kvm_vcpu_yield_to 0.24% [kernel] [k] clear_buddies 0.24% [kernel] [k] gs_change 0.23% [kernel] [k] handle_exit 0.22% qemu-kvm [.] vnc_refresh_server_surface 0.22% [kernel] [k] update_min_vruntime 0.22% [kernel] [k] gfn_to_memslot 0.22% [kernel] [k] x86_emulate_insn 0.19% [kernel] [k] kvm_sched_out 0.19% [kernel] [k] pid_task 0.18% [kernel] [k] _raw_spin_unlock 0.18% libc-2.10.1.so[.] strcmp 0.17% [kernel] [k] get_pid_task 0.17% [kernel] [k] yield_task_fair 0.17% [kernel] [k] default_send_IPI_mask_sequence_phys 0.16% [kernel] [k] __rcu_read_unlock 0.16% [kernel] [k] kvm_get_cr8 0.16% [kernel] [k] native_sched_clock 0.16% [kernel] [k] do_insn_fetch 0.15% [kernel] [k] set_next_entity 0.14% [kernel] [k] update_rq_clock 0.14% [kernel] [k] __enqueue_entity 0.14% [kernel] [k] kvm_read_guest 0.13% qemu-kvm [.] g_hash_table_lookup 0.13% [kernel] [k] rb_erase 0.12% [kernel] [k] decode_operand 0.12% libz.so.1.2.3 [.] 0x6451 0.12% [kernel] [k] update_curr 0.12%
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: I can trigger the slow boots without KSM and they have the same profile, with _raw_spin_lock_irqsave and isolate_freepages_block at the top. I reduced to 3x 20GB 8-core VMs on a 128GB host (rather than 3x 40GB 8-core VMs), and haven't managed to get a really slow boot yet (5 minutes). I'll post agan when I get one. I think you can go higher than that. But 120GB on a 128GB host is pushing it. I've now triggered a very slow boot at 3x 36GB 8-core VMs on a 128GB host (i.e. 108GB on a 128GB host). It has the same profile with _raw_spin_lock_irqsave and isolate_freepages_block at the top. Then it's still memory starved. Please provide /proc/zoneinfo while this is happening. Here are two copies at /proc/zoneinfo a minute or so apart during a situation where there are 3x 36GB 8-core VMs on a 128GB host, with two of the three VMs slow booting. Node 0, zone DMA pages free 3968 min 3 low 3 high 4 scanned 0 spanned 4080 present 3904 nr_free_pages 3968 nr_inactive_anon 0 nr_active_anon 0 nr_inactive_file 0 nr_active_file 0 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped0 nr_file_pages 0 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 0 nr_slab_unreclaimable 0 nr_page_table_pages 0 nr_kernel_stack 0 nr_unstable 0 nr_bounce0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 0 nr_written 0 numa_hit 0 numa_miss0 numa_foreign 0 numa_interleave 0 numa_local 0 numa_other 0 nr_anon_transparent_hugepages 0 protection: (0, 3502, 32230, 32230) pagesets cpu: 0 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 1 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 2 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 3 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 4 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 5 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 6 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 7 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 8 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 9 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 10 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 11 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 12 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 13 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 14 count: 0 high: 0 batch: 1 vm stats threshold: 10 cpu: 15 count: 0 high: 0 batch: 1 vm stats threshold: 10 all_unreclaimable: 1 start_pfn: 16 inactive_ratio:1 Node 0, zoneDMA32 pages free 29798 min 917 low 1146 high 1375 scanned 0 spanned 1044480 present 896720 nr_free_pages 29798 nr_inactive_anon 0 nr_active_anon 817152 nr_inactive_file 29243 nr_active_file 574 nr_unevictable 0 nr_mlock 0 nr_anon_pages 0 nr_mapped1 nr_file_pages 29817 nr_dirty 0 nr_writeback 0 nr_slab_reclaimable 26 nr_slab_unreclaimable 2 nr_page_table_pages 244 nr_kernel_stack 0 nr_unstable 0 nr_bounce0 nr_vmscan_write 0 nr_vmscan_immediate_reclaim 0 nr_writeback_temp 0 nr_isolated_anon 0 nr_isolated_file 0 nr_shmem 0 nr_dirtied 30546 nr_written 30546 numa_hit 42617 numa_miss124755 numa_foreign 0 numa_interleave 0 numa_local 42023 numa_other 125349 nr_anon_transparent_hugepages 1596 protection: (0, 0, 28728, 28728) pagesets cpu: 0 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 1 count: 0 high: 186 batch: 31 vm stats threshold: 60 cpu: 2 count: 0 high: 186 batch: 31 vm
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/20/2012 04:56 PM, Richard Davies wrote: We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more unprocessable samples recorded messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin 2.74% [kernel] [k] add_preempt_count 2.45% [kernel] [k] _raw_spin_unlock 2.33% [kernel] [k] sub_preempt_count 2.18% [kernel] [k] svm_vcpu_run 2.17% [kernel] [k] kvm_vcpu_yield_to 1.89% [kernel] [k] memcmp 1.50% [kernel] [k] get_pid_task 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.16% [kernel] [k] pid_task 0.70% [kernel] [k] rcu_note_context_switch 0.70% [kernel] [k] trace_hardirqs_on 0.52% [kernel] [k] __rcu_read_unlock 0.51% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.43% [kernel] [k] get_parent_ip 0.42% [kernel] [k] get_pageblock_flags_group 0.38% [kernel] [k] in_lock_functions 0.34% [kernel] [k] trace_preempt_off 0.34% [kernel] [k] trace_hardirqs_off 0.29% [kernel] [k] clear_page_c 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.14% [kernel] [k] handle_exit 0.11% libc-2.10.1.so[.] strcmp 0.11% [kernel] [k] _raw_spin_unlock_irqrestore 0.11% [kernel] [k] _raw_spin_lock_irq 0.11% [kernel] [k] find_highest_vector 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] copy_page_c 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] kmem_cache_alloc 0.08% [kernel] [k] resched_task 0.08% perf [.] dso__find_symbol 0.06% [kernel] [k] compaction_alloc 0.06% libc-2.10.1.so[.] 0x00076dab 0.06% [kernel] [k] read_tsc 0.06% perf [.] add_hist_entry 0.05% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] copy_user_generic_string 0.05% [kernel] [k] ktime_get_update_offsets 0.04% [kernel] [k] kvm_check_async_pf_completion 0.04% [kernel] [k] __schedule 0.04% [kernel] [k] __rcu_pending 0.04% [kernel] [k] svm_complete_interrupts 0.04% [kernel] [k] perf_pmu_disable 0.04% [kernel] [k] isolate_migratepages_range 0.04% [kernel] [k] sched_clock_cpu 0.04% [kernel] [k] kvm_cpu_has_pending_timer 0.04% [kernel] [k] apic_timer_interrupt 0.04% [vdso][.] 0x7fff2e1ff607 0.04% [kernel] [k] apic_update_ppr 0.04% [kernel] [k] do_select 0.04% [kernel] [k] svm_scale_tsc 0.04% [kernel] [k] system_call_after_swapgs 0.03% [kernel] [k] kvm_lapic_get_cr8 0.03% perf [.] sort__sym_cmp 0.03% [kernel] [k] find_next_bit 0.03% [kernel] [k] kvm_set_cr8 0.03% [kernel] [k] rcu_check_callbacks 9750 unprocessable samples recorded.9751 unprocessable samples recorded.9752 unprocessable samples recorded.9753 unprocessable samples recorded.9754 unprocessable samples recorded.9755 unprocessable samples recorded.9756 unprocessable samples
Re: [Qemu-devel] Windows slow boot: contractor wanted
Do you have any way to determine what CPU groups the different VMs are running on? If you end up in an overcommit situation where half the 'virtual' cpus are on one AMD socket, and the other half are on a different AMD socket, then you'll be thrashing the hypertransport link. At Cray we were very carefull to never overcommit runnable processes to CPUS, and generally locked processes to a single cpu. Have a read of http://berrange.com/posts/2010/02/12/controlling-guest-cpu-numa-affinity-in-libvirt-with-qemu-kvm-xen/ I'm going to speculate that when things don't work very well you end up with memory from a booting guest scattered across many different NUMA nodes/cpus, and then it really won't matter how good the spin loop/scheduler code is because you are bound by the additional latency and bandwidth limitations of running on one socekt and accessing half the memory that's resident on a different socket. On Tue, Aug 21, 2012 at 04:21:07PM +0100, Richard Davies wrote: Avi Kivity wrote: Richard Davies wrote: We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Could be a bug. The scheduler and the spin-loop handling code fight each other instead of working well. Please provide snapshots of 'perf top' while a slow boot is in progress. Below are two 'perf top' snapshots during a slow boot, which appear to me to support your idea of a spin-lock problem. There are a lot more unprocessable samples recorded messages at the end of each snapshot which I haven't included. I think these may be from the guest OS - the kernel is listed, and qemu-kvm itself is listed on some other traces which I did, although not these. Richard. PerfTop: 62249 irqs/sec kernel:96.9% exact: 0.0% [4000Hz cycles], (all, 16 CPUs) 35.80% [kernel] [k] _raw_spin_lock_irqsave 21.64% [kernel] [k] isolate_freepages_block 5.91% [kernel] [k] yield_to 4.95% [kernel] [k] _raw_spin_lock 3.37% [kernel] [k] kvm_vcpu_on_spin 2.74% [kernel] [k] add_preempt_count 2.45% [kernel] [k] _raw_spin_unlock 2.33% [kernel] [k] sub_preempt_count 2.18% [kernel] [k] svm_vcpu_run 2.17% [kernel] [k] kvm_vcpu_yield_to 1.89% [kernel] [k] memcmp 1.50% [kernel] [k] get_pid_task 1.26% [kernel] [k] kvm_arch_vcpu_ioctl_run 1.16% [kernel] [k] pid_task 0.70% [kernel] [k] rcu_note_context_switch 0.70% [kernel] [k] trace_hardirqs_on 0.52% [kernel] [k] __rcu_read_unlock 0.51% [kernel] [k] trace_preempt_on 0.47% [kernel] [k] __srcu_read_lock 0.43% [kernel] [k] get_parent_ip 0.42% [kernel] [k] get_pageblock_flags_group 0.38% [kernel] [k] in_lock_functions 0.34% [kernel] [k] trace_preempt_off 0.34% [kernel] [k] trace_hardirqs_off 0.29% [kernel] [k] clear_page_c 0.23% [kernel] [k] __srcu_read_unlock 0.20% [kernel] [k] __rcu_read_lock 0.14% [kernel] [k] handle_exit 0.11% libc-2.10.1.so[.] strcmp 0.11% [kernel] [k] _raw_spin_unlock_irqrestore 0.11% [kernel] [k] _raw_spin_lock_irq 0.11% [kernel] [k] find_highest_vector 0.09% [kernel] [k] ktime_get 0.08% [kernel] [k] copy_page_c 0.08% [kernel] [k] pause_interception 0.08% [kernel] [k] kmem_cache_alloc 0.08% [kernel] [k] resched_task 0.08% perf [.] dso__find_symbol 0.06% [kernel] [k] compaction_alloc 0.06% libc-2.10.1.so[.] 0x00076dab 0.06% [kernel] [k] read_tsc 0.06% perf [.] add_hist_entry 0.05% [kernel] [k] svm_read_l1_tsc 0.05% [kernel] [k] native_read_tsc 0.05% perf [.] sort__dso_cmp 0.05% [kernel] [k] copy_user_generic_string 0.05% [kernel] [k] ktime_get_update_offsets 0.04% [kernel] [k] kvm_check_async_pf_completion 0.04% [kernel] [k] __schedule 0.04% [kernel] [k] __rcu_pending 0.04%
Re: [Qemu-devel] Windows slow boot: contractor wanted
Brian Jackson wrote: Richard Davies wrote: The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw What memory options have you tried? (KSM, hugepages, -mem-preallocate)? The host kernel has KSM and CONFIG_TRANSPARENT_HUGEPAGE=y and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y. Our qemu-kvm command lines are as above, so we aren't using -mem-prealloc. We'll try that. Is this only with 2008? (is that regular? R2?) It is intermittent. We definitely see it with 2008 R2, and I believe with 2008 as well. We don't have many customers running earlier versions of Windows. Have you tried any of the hyperv features/hints? We have tried -cpu host and -cpu host,hv_relaxed as above, which both exhibit the bug. What other hyperv options do you think we should try? Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? There are tons of PAUSE exits indicating cpu overcommit (and indeed you are overcommitted by about 50%). What host kernel version are you running? Does this reproduce without overcommit? We're running host kernel 3.5.1 and qemu-kvm 1.1.1. I hadn't though about it, but I agree this is related to cpu overcommit. The slow boots are intermittent (and infrequent) with cpu overcommit whereas I don't think it occurs without cpu overcommit. In addition, if there is a slow boot ongoing, and you kill some other VMs to reduce cpu overcommit then this will sometimes speed it up. I guess the question is why even with overcommit most boots are fine, but some small fraction then go slow? Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/17/2012 03:36 PM, Richard Davies wrote: Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw 40+40+40=120, pretty close to your server specs. Are you swapping? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
Avi Kivity wrote: Richard Davies wrote: The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): 40+40+40=120, pretty close to your server specs. Are you swapping? No - you can see on the top screenshot that there's no swap in use. Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/17/2012 03:36 PM, Richard Davies wrote: Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? There are tons of PAUSE exits indicating cpu overcommit (and indeed you are overcommitted by about 50%). What host kernel version are you running? Does this reproduce without overcommit? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
Hi Robert, Robert Vineyard wrote: Not sure if you've tried this, but I noticed massive performance gains (easily booting 2-3 times as fast) by converting from RAW disk images to direct-mapped raw partitions and making sure that IOMMU support was enabled in the BIOS and in the kernel at boot time. Thanks for the suggestions, but unfortunately do we have IOMMU support enabled, and in production (rather than this test case), we run from LVM LVs, which are effectively direct raw partitions and still have this slow boot problem. Thanks anyway, Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
On Friday 17 August 2012 07:36:42 Richard Davies wrote: Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. What memory options have you tried? (KSM, hugepages, -mem-preallocate)? Is this only with 2008? (is that regular? R2?) Have you tried any of the hyperv features/hints? Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? Thank you, Richard. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Windows slow boot: contractor wanted
Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? Thank you, Richard.
Re: [Qemu-devel] Windows slow boot: contractor wanted
Richard, Not sure if you've tried this, but I noticed massive performance gains (easily booting 2-3 times as fast) by converting from RAW disk images to direct-mapped raw partitions and making sure that IOMMU support was enabled in the BIOS and in the kernel at boot time. The obvious downside to using raw partitions is a loss of flexibility and portability across physical machines, but in some cases the trade-offs may be worth it. I never ran any formal benchmarks, but it felt like about a 50% performance boost going from RAW disk images to raw partitions (don't even think about using QCOW2 disk images for Windows, your VM's will still be booting next week...). The real gains, which I can't yet fully explain, came from passing iommu=on intel_iommu=on to the host kernel on bootup. I believe the boot option to enable IOMMU support may be different on AMD hardware. Granted, this is on a much smaller VM than you're using (Windows 7 x64 with two vCPUs and 4gb of vRAM), but might be worth investigating. Good luck! -- Robert Vineyard On 08/17/2012 08:36 AM, Richard Davies wrote: Hi Avi, Thanks to you and several others for offering help. We will work with Avi at first, but are grateful for all the other offers of help. We have a number of other qemu-related projects which we'd be interested in getting done, and will get in touch with these names (and anyone else who comes forward) to see if any are of interest to you. This slow boot problem is intermittent and varys in how slow the boots are, but I managed to trigger it this morning with medium slow booting (5-10 minutes) and link to the requested traces below. The host in question has 128GB RAM and dual AMD Opteron 6128 (16 cores total). It is running kernel 3.5.1 and qemu-kvm 1.1.1. In this morning's test, we have 3 guests, all booting Windows with 40GB RAM and 8 cores each (we have seen small VMs go slow as I originally said, but it is easier to trigger with big VMs): pid 15665: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test1.raw pid 15676: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :98 -monitor stdio -hda test2.raw pid 15653: qemu-kvm -nodefaults -m 40960 -smp 8 -cpu host,hv_relaxed \ -vga cirrus -usbdevice tablet -vnc :97 -monitor stdio -hda test3.raw We are running with hv_relaxed since this was suggested in the previous thread, but we see intermittent slow boots with and without this flag. All 3 VMs are booting slowly for most of the attached capture, which I started after confirming the slow boots and stopped as soon as the first of them (15665) had booted. In terms of visible symptoms, the VMs are showing the Windows boot progress bar, which is moving very slowly. In top, the VMs are at 400% CPU and their resident state size (RES) memory is slowly counting up until it reaches the full VM size, at which point they finish booting. Here are the trace files: http://users.org.uk/slow-win-boot-1/ps.txt (ps auxwwwf as root) http://users.org.uk/slow-win-boot-1/top.txt (top with 2 VMs still slow) http://users.org.uk/slow-win-boot-1/trace-console.txt (running trace-cmd) http://users.org.uk/slow-win-boot-1/trace.dat (the 1.7G trace data file) http://users.org.uk/slow-win-boot-1/trace-report.txt (the 4G trace report) Please let me know if there is anything else which I can provide? Thank you, Richard. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Windows slow boot: contractor wanted
On 08/16/2012 01:47 PM, Richard Davies wrote: Hi, We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a contractor to track down and fix problems we have with large memory Windows guests booting very slowly - they can take several hours. We previously reported these problems in July (copied below) and they are still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. This is a serious issue for us which is causing significant pain to our larger Windows VM customers when their servers are offline for many hours during boot. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. I happen to be gainfully employed but maybe I can help. Can you collect a trace during the slow boot period and post in somewhere? See http://www.linux-kvm.org/page/Tracing for instructions. 4G/8way is not a particularly large guest. What is the host configuration (memory, core count)? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] Windows slow boot: contractor wanted
Le Thursday 16 Aug 2012 à 11:47:27 (+0100), Richard Davies a écrit : Hi, We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a contractor to track down and fix problems we have with large memory Windows guests booting very slowly - they can take several hours. We previously reported these problems in July (copied below) and they are still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. This is a serious issue for us which is causing significant pain to our larger Windows VM customers when their servers are offline for many hours during boot. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. Cheers, Richard. = Previous bug report http://marc.info/?l=qemu-develm=134304194329745 We have been experiencing this problem for a while now too, using qemu-kvm (currently at 1.1.1). Unfortunately, hv_relaxed doesn't seem to fix it. The following command line produces the issue: qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img The hardware consists of dual AMD Opteron 6128 processors (16 cores in total) and 64GB of memory. This command line was tested on kernel 3.1.4. I've also tested with -no-hpet. What I have seen is much as described: the memory fills out slowly, and top on the host will show the process using 100% on all allocated CPU cores. The most extreme case was a machine which took something between 6 and 8 hours to boot. This seems to be related to the assigned memory, as described, but also the number of processor cores (which makes sense if we believe it's a timing issue?). I have seen slow-booting guests improved by switching down to a single or even two cores. Matthew, I agree that this seems to be linked to the number of VMs running - in fact, shutting down other VMs on a dedicated test host caused the machine to start booting at a normal speed (with no reboot required). However, the level of contention is never such that this could be explained by the host simply being overcommitted. If it helps anyone, there's an image of the hard drive I've been using to test at: http://46.20.114.253/ It's 5G of gzip file containing a fairly standard Windows 2008 trial installation. Since it's in the trial period, anyone who wants to use it may have to re-arm the trial: http://support.microsoft.com/kb/948472 Please let me know if I can provide any more information, or test anything. For info the image boot pretty fast with qemu-kvm 1.1.1 and a 3.2.0-29 ubuntu kernel on a core i7 with these parameters. Benoît Best wishes, Owen Tuz
Re: [Qemu-devel] Windows slow boot: contractor wanted
I'd be interested in working on this.. What I'd like to propose is to write an automated regression test harness that will reboot the host hardware, and start booting up guest VMs and report the time-to-boot, as well as relative performance of the running VMs. For best results, I'd need access to the specific hardware you are using. I'd also like to release the test harness back to the community, so I would like some feedback from the mailing list on what kinds of tests should be written that would provide the best information for the KVM developers. What do you want to know, and what is the most usefull data to record to debug this and future performance regressions? On Thu, Aug 16, 2012 at 11:47:27AM +0100, Richard Davies wrote: Hi, We run a cloud hosting provider using qemu-kvm 1.1, and are keen to find a contractor to track down and fix problems we have with large memory Windows guests booting very slowly - they can take several hours. We previously reported these problems in July (copied below) and they are still present with Linux kernel 3.5.1 and qemu-kvm 1.1.1. This is a serious issue for us which is causing significant pain to our larger Windows VM customers when their servers are offline for many hours during boot. If anyone knowledgeable in the area would be interested in being paid to work on this, or if you know someone who might be, I would be delighted to hear from you. Cheers, Richard. = Previous bug report http://marc.info/?l=qemu-develm=134304194329745 We have been experiencing this problem for a while now too, using qemu-kvm (currently at 1.1.1). Unfortunately, hv_relaxed doesn't seem to fix it. The following command line produces the issue: qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img The hardware consists of dual AMD Opteron 6128 processors (16 cores in total) and 64GB of memory. This command line was tested on kernel 3.1.4. I've also tested with -no-hpet. What I have seen is much as described: the memory fills out slowly, and top on the host will show the process using 100% on all allocated CPU cores. The most extreme case was a machine which took something between 6 and 8 hours to boot. This seems to be related to the assigned memory, as described, but also the number of processor cores (which makes sense if we believe it's a timing issue?). I have seen slow-booting guests improved by switching down to a single or even two cores. Matthew, I agree that this seems to be linked to the number of VMs running - in fact, shutting down other VMs on a dedicated test host caused the machine to start booting at a normal speed (with no reboot required). However, the level of contention is never such that this could be explained by the host simply being overcommitted. If it helps anyone, there's an image of the hard drive I've been using to test at: http://46.20.114.253/ It's 5G of gzip file containing a fairly standard Windows 2008 trial installation. Since it's in the trial period, anyone who wants to use it may have to re-arm the trial: http://support.microsoft.com/kb/948472 Please let me know if I can provide any more information, or test anything. Best wishes, Owen Tuz