Re: KVM with hugepages generate huge load with two guests
Hi, So, nobody has any idea what's going wrong with all these massive IRQs and spin_locks that cause virtual machines to almost completely stop? :( Thanks, Dmitry On Wed, Dec 1, 2010 at 5:38 AM, Dmitry Golubev wrote: > Hi, > > Sorry it took so slow to reply you - there are only few moments when I > can poke a production server and I need to notify people in advance > about that :( > >> Can you post kvm_stat output while slowness is happening? 'perf top' on the >> host? and on the guest? > > I took 'perf top' and first thing I saw is that while guest is on > acpi_pm, it shows more or less normal amount of IRQs (under 1000/s), > however when I switched back to the default (which is nohz with > kvm_clock), there are 40 times (!!!) more IRQs under normal operation > (about 40 000/s). When the slowdown is happening, there are a lot of > _spin_lock events and a lot of messages like: "WARNING: failed to keep > up with mmap data. Last read 810 msecs ago." > > As I told before, switching to acpi_pm does not save the day, but > makes situation a lot more workable (i.e., servers recover faster from > the period of slowness). During slowdowns on acpi_pm I also see > "_spin_lock" > > Raw data follows: > > > > vmstat -5 on the host: > > procs ---memory-- ---swap-- -io -system-- cpu > r b swpd free buff cache si so bi bo in cs us sy id wa > 0 0 0 131904 13952 205872 0 0 0 24 2495 9813 6 3 91 0 > 0 0 0 132984 13952 205872 0 0 0 47 2596 9851 5 3 91 1 > 1 0 0 132148 13952 205872 0 0 0 54 2644 10559 3 3 93 1 > 0 1 0 129084 13952 205872 0 0 0 38 3039 9752 7 3 87 2 > 6 0 0 126388 13952 205872 0 0 0 311 15619 9009 42 17 39 2 > 9 0 0 125868 13960 205872 0 0 6 86 4659 6504 98 2 0 0 > 8 0 0 123320 13960 205872 0 0 0 26 4682 6649 98 2 0 0 > 8 0 0 126252 13960 205872 0 0 0 124 4923 6776 98 2 0 0 > 8 0 0 125376 13960 205872 0 0 136 11 4287 5865 98 2 0 0 > 9 0 0 123812 13960 205872 0 0 205 51 4497 6134 98 2 0 0 > 8 0 0 126020 13960 205872 0 0 904 26 4483 5999 98 2 0 0 > 8 0 0 124052 13960 205872 0 0 15 10 4397 6200 98 2 0 0 > 8 0 0 125928 13960 205872 0 0 14 41 4335 5823 98 2 0 0 > 8 0 0 126184 13960 205872 0 0 6 14 4966 6588 98 2 0 0 > 8 0 0 123588 13960 205872 0 0 143 18 5234 6891 98 2 0 0 > 8 0 0 126640 13960 205872 0 0 6 91 5554 7334 98 2 0 0 > 8 0 0 123144 13960 205872 0 0 146 11 5235 7145 98 2 0 0 > 8 0 0 125856 13968 205872 0 0 1282 98 5481 7159 98 2 0 0 > 9 19 0 124124 13968 205872 0 0 782 2433 8587 8987 97 3 0 0 > 8 0 0 122584 13968 205872 0 0 432 90 5359 6960 98 2 0 0 > 8 0 0 125320 13968 205872 0 0 3074 52 5448 7095 97 3 0 0 > 8 0 0 121436 13968 205872 0 0 2519 81 5714 7279 98 2 0 0 > 8 0 0 124436 13968 205872 0 0 1 56 5242 6864 98 2 0 0 > 8 0 0 111324 13968 205872 0 0 2 22 10660 6686 97 3 0 0 > 8 0 0 107824 13968 205872 0 0 0 24 14329 8147 97 3 0 0 > 8 0 0 110420 13968 205872 0 0 0 68 13486 6985 98 2 0 0 > 8 0 0 110024 13968 205872 0 0 0 19 13085 6659 98 2 0 0 > 8 0 0 109932 13968 205872 0 0 0 3 12952 6415 98 2 0 0 > 8 0 0 108552 13968 205880 0 0 2 41 13400 7349 98 2 0 0 > > Few shots with kvm_stat on the host: > > Every 2.0s: kvm_stat -1 > > Wed Dec 1 04:45:47 2010 > > efer_reload 0 0 > exits 56264102 14074 > fpu_reload 311506 50 > halt_exits 4733166 935 > halt_wakeup 3845079 840 > host_state_reload 8795964 4085 > hypercalls 0 0 > insn_emulation 13573212 7249 > insn_emulation_fail 0 0 > invlpg 1846050 20 > io_exits 3579406 843 > irq_exits 3038887 4879 > irq_injections 5242157 3681 > irq_window 124361 540 > largepages 2253 0 > mmio_exits 64274 20 > mmu_cache_miss 664011 16 > mmu_flooded 164506 1 > mmu_pde_zapped 212686 8 > mmu_pte_updated 729268 0 > mmu_pte_write 81323616 551 > mmu_recycled 277 0 > mmu_shadow_zapped 652691 23 > mmu_unsync 5630 8 > nmi_injections 0 0 > nmi_window 0
Re: KVM with hugepages generate huge load with two guests
Hi, Sorry it took so slow to reply you - there are only few moments when I can poke a production server and I need to notify people in advance about that :( > Can you post kvm_stat output while slowness is happening? 'perf top' on the > host? and on the guest? I took 'perf top' and first thing I saw is that while guest is on acpi_pm, it shows more or less normal amount of IRQs (under 1000/s), however when I switched back to the default (which is nohz with kvm_clock), there are 40 times (!!!) more IRQs under normal operation (about 40 000/s). When the slowdown is happening, there are a lot of _spin_lock events and a lot of messages like: "WARNING: failed to keep up with mmap data. Last read 810 msecs ago." As I told before, switching to acpi_pm does not save the day, but makes situation a lot more workable (i.e., servers recover faster from the period of slowness). During slowdowns on acpi_pm I also see "_spin_lock" Raw data follows: vmstat -5 on the host: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 131904 13952 20587200 024 2495 9813 6 3 91 0 0 0 0 132984 13952 20587200 047 2596 9851 5 3 91 1 1 0 0 132148 13952 20587200 054 2644 10559 3 3 93 1 0 1 0 129084 13952 20587200 038 3039 9752 7 3 87 2 6 0 0 126388 13952 20587200 0 311 15619 9009 42 17 39 2 9 0 0 125868 13960 20587200 686 4659 6504 98 2 0 0 8 0 0 123320 13960 20587200 026 4682 6649 98 2 0 0 8 0 0 126252 13960 20587200 0 124 4923 6776 98 2 0 0 8 0 0 125376 13960 20587200 13611 4287 5865 98 2 0 0 9 0 0 123812 13960 20587200 20551 4497 6134 98 2 0 0 8 0 0 126020 13960 20587200 90426 4483 5999 98 2 0 0 8 0 0 124052 13960 205872001510 4397 6200 98 2 0 0 8 0 0 125928 13960 205872001441 4335 5823 98 2 0 0 8 0 0 126184 13960 20587200 614 4966 6588 98 2 0 0 8 0 0 123588 13960 20587200 14318 5234 6891 98 2 0 0 8 0 0 126640 13960 20587200 691 5554 7334 98 2 0 0 8 0 0 123144 13960 20587200 14611 5235 7145 98 2 0 0 8 0 0 125856 13968 20587200 128298 5481 7159 98 2 0 0 9 19 0 124124 13968 20587200 782 2433 8587 8987 97 3 0 0 8 0 0 122584 13968 20587200 43290 5359 6960 98 2 0 0 8 0 0 125320 13968 20587200 307452 5448 7095 97 3 0 0 8 0 0 121436 13968 20587200 251981 5714 7279 98 2 0 0 8 0 0 124436 13968 20587200 156 5242 6864 98 2 0 0 8 0 0 111324 13968 20587200 222 10660 6686 97 3 0 0 8 0 0 107824 13968 20587200 024 14329 8147 97 3 0 0 8 0 0 110420 13968 20587200 068 13486 6985 98 2 0 0 8 0 0 110024 13968 20587200 019 13085 6659 98 2 0 0 8 0 0 109932 13968 20587200 0 3 12952 6415 98 2 0 0 8 0 0 108552 13968 20588000 241 13400 7349 98 2 0 0 Few shots with kvm_stat on the host: Every 2.0s: kvm_stat -1 Wed Dec 1 04:45:47 2010 efer_reload0 0 exits 56264102 14074 fpu_reload31150650 halt_exits 4733166 935 halt_wakeup 3845079 840 host_state_reload8795964 4085 hypercalls 0 0 insn_emulation 13573212 7249 insn_emulation_fail0 0 invlpg 184605020 io_exits 3579406 843 irq_exits3038887 4879 irq_injections 5242157 3681 irq_window124361 540 largepages 2253 0 mmio_exits 6427420 mmu_cache_miss66401116 mmu_flooded 164506 1 mmu_pde_zapped212686 8 mmu_pte_updated 729268 0 mmu_pte_write 81323616 551 mmu_recycled 277 0 mmu_shadow_zapped 65269123 mmu_unsync 5630 8 nmi_injections 0 0 nmi_window 0 0 pf_fixed17470658 218 pf_guest1335220581 remote_tlb_flush 189893096 request_irq0 0 signal_exits 0 0 tlb_flush5827433 108 Every 2.0s: kvm_stat -1 Wed Dec 1 04:47:33 2010 efer_reload0 0 exits 58
Re: KVM with hugepages generate huge load with two guests
Thanks for the answer. > Are you sure it is hugepages related? Well, empirically it looked like either hugepages-related, or regression of qemu-kvm 0.12.3 -> 0.12.5, as this did not happen until I upgraded (needed to avoid disk corruption caused by a bug in 0.12.3) and put hugepages. However as frequency of problem does seem related to memory each guest consumes (more memory = faster the problem appears) and in the beginning it might have been that the memory consumption of the guests did not hit some kind of threshold, maybe it is not really hugepages related. > Can you post kvm_stat output while slowness is happening? 'perf top' on the > host? and on the guest? OK, I will test this and write back. Thanks, Dmitry -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
On 11/21/2010 02:24 AM, Dmitry Golubev wrote: Hi, Seems that nobody is interested in this bug :( It's because the information is somewhat confused. There's a way to prepare bug reports that gets developers competing to see who solves it first. Anyway I wanted to add a bit more to this investigation. Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel options, the guests started to behave better - they do not stay in the slow state, but rather get there for some seconds (usually up to minute, but sometimes 2-3 minutes) and then get out of it (this cycle repeats once in a while - every approx 3-6 minutes). Once the situation became stable, so that I am able to leave the guests without very much worries, I also noticed that sometimes the predicted swapping occurs, although rarely (I waited about half an hour to catch the first swapping on the host). Here is a fragment of vmstat. Note that when the first column shows 8-9 - the slowness and huge load happens. You can also see how is appears and disappears (with nohz and kvm-clock it did not go out of slowness period, but with tsc clock the probability of getting out is significantly lower): Are you sure it is hugepages related? Can you post kvm_stat output while slowness is happening? 'perf top' on the host? and on the guest? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
> Just out of curiocity: did you try updating the BIOS on your > motherboard? The issus you're facing seems to be quite unique, > and I've seen more than once how various different weird issues > were fixed just by updating the BIOS. Provided they actually > did they own homework and fixed something and released the fixes > too... ;) Thank you for reply, I really appreciate that somebody found time to answer. Unfortunately for this investigation I managed to upgrade BIOS version few months ago. I just checked - there are no newer versions. I do see, however, that many people advise to change to acpi_pm ckocksource (and, thus, disable nohz option) in case similar problems are experienced - I did not invent this workaround (got the idea here: http://forum.proxmox.com/threads/5144-100-CPU-on-host-VM-hang-every-night?p=29143#post29143 ). Looks like an ancient bug. I even upgraded my qemu-kvm to version 0.13 without any significant changes to this behavior. It is really weird, however how one guest can work fine, but two start messing with each other. Shouldn't there be some kind of isolation between them? As they both start to behave exactly the same at exactly the same time. And it does not happen once a month or a year, but pretty frequently. Thanks, Dmitry -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
21.11.2010 03:24, Dmitry Golubev wrote: > Hi, > > Seems that nobody is interested in this bug :( > > Anyway I wanted to add a bit more to this investigation. > > Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel > options, the guests started to behave better - they do not stay in the > slow state, but rather get there for some seconds (usually up to > minute, but sometimes 2-3 minutes) and then get out of it (this cycle Just out of curiocity: did you try updating the BIOS on your motherboard? The issus you're facing seems to be quite unique, and I've seen more than once how various different weird issues were fixed just by updating the BIOS. Provided they actually did they own homework and fixed something and released the fixes too... ;) P.S. I'm Not A Guru (tm) :) /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
Hi, Seems that nobody is interested in this bug :( Anyway I wanted to add a bit more to this investigation. Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel options, the guests started to behave better - they do not stay in the slow state, but rather get there for some seconds (usually up to minute, but sometimes 2-3 minutes) and then get out of it (this cycle repeats once in a while - every approx 3-6 minutes). Once the situation became stable, so that I am able to leave the guests without very much worries, I also noticed that sometimes the predicted swapping occurs, although rarely (I waited about half an hour to catch the first swapping on the host). Here is a fragment of vmstat. Note that when the first column shows 8-9 - the slowness and huge load happens. You can also see how is appears and disappears (with nohz and kvm-clock it did not go out of slowness period, but with tsc clock the probability of getting out is significantly lower): procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 8 0 0 60456 19708 25368800 6 170 5771 1712 97 3 0 0 9 5 0 58752 19708 253688001157 6457 1500 96 4 0 0 8 0 0 58192 19708 2536880055 106 5112 1588 98 3 0 0 8 0 0 58068 19708 2536880021 0 2609 1498 100 0 0 0 8 2 0 57728 19708 25368800 996 2645 1620 100 0 0 0 8 0 0 53852 19716 25368000 2 186 6321 1935 97 4 0 0 8 0 0 49636 19716 25368800 045 3482 1484 99 1 0 0 8 0 0 49452 19716 25368800 034 3253 1851 100 0 0 0 4 1 1468 126252 16780 182256 53 317 393 788 29318 3498 79 21 0 0 4 0 1468 135596 16780 18233200 7 360 26782 2459 79 21 0 0 1 0 1468 169720 16780 182340007581 22024 3194 40 15 42 3 3 0 1464 167608 16780 1823406026 1579 9404 5526 22 8 35 35 0 0 1460 164232 16780 1825040085 170 4955 3345 21 5 69 5 0 0 1460 163636 16780 18250400 090 1288 1855 5 2 90 3 1 0 1460 164836 16780 18250400 034 1166 1789 4 2 93 1 1 0 1452 165628 16780 18250400 28570 1981 2692 10 2 83 4 1 0 1452 160044 16952 18484060 832 146 5046 3303 11 6 76 7 1 0 1452 161416 16960 1848400019 170 1732 2577 10 2 74 13 0 1 1452 161920 16960 18484000 11153 1084 1986 0 1 96 3 0 0 1452 161332 16960 18484000 25434 856 1505 2 1 95 3 1 0 1452 159168 16960 18484000 36646 2137 2774 3 2 94 1 1 0 1452 157408 16968 18484000 069 2423 2991 9 5 84 2 0 0 1444 157876 16968 18484000 045 6343 3079 24 10 65 1 0 0 1428 159644 16968 18484460 852 724 1276 0 0 98 2 0 0 1428 160336 16968 184844003198 1115 1835 1 1 92 6 1 0 1428 161360 16968 18484400 045 1333 1849 2 1 95 2 0 0 1428 162092 16968 18484400 0 408 3517 4267 11 2 78 8 1 1 1428 163868 16968 1848440024 121 1714 2036 10 2 86 2 1 3 1428 161292 16968 18484400 3 143 2906 3503 16 4 77 3 0 0 1428 156448 16976 18483600 1 781 5661 4464 16 7 74 3 1 0 1428 156924 16976 18484400 58892 2341 3845 7 2 87 4 0 0 1428 158816 16976 1848440027 119 2052 3830 5 1 89 4 0 0 1428 161420 16976 18484400 156 3923 3132 26 4 68 1 0 0 1428 162724 16976 1848440010 107 2806 3558 10 2 86 2 1 0 1428 165244 16976 1848440034 155 2084 2469 8 2 78 12 0 0 1428 165204 16976 18484400 390 282 9568 4924 17 11 55 17 1 0 1392 163864 16976 185064 1020 218 411 11762 16591 6 9 68 17 8 0 1384 164992 16984 18505600 988 7540 5761 73 6 17 4 8 0 1384 163620 16984 18507600 189 21936 45040 90 10 0 0 8 0 1384 165324 16992 18507600 5 194 3330 1678 99 1 0 0 8 0 1384 165704 16992 18507600 154 2651 1457 99 1 0 0 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 8 0 1384 163016 17000 18507600 0 126 4988 1536 97 3 0 0 9 1 1384 162608 17000 1850760034 477 20106 2351 83 17 0 0 0 0 1384 184052 17000 18507600 102 1198 48951 3628 48 38 6 8 0 0 1384 183088 17008 18507600 8 156 1228 1419 2 2 82 14 0 0 1384 184436 17008 1851640028 113 3176 2785 12 7 75 6 0 0 1384 184568 17008 1851640030 107 1547 1821 4 3 87 6 4 2 1228 228808 17008
Re: KVM with hugepages generate huge load with two guests
Hi, Sorry to bother you again. I have more info: > 1. router with 32MB of RAM (hugepages) and 1VCPU ... > Is it too much to have 3 guests with hugepages? OK, this router is also out of equation - I disabled hugepages for it. There should be also additional pages available to guests because of that. I think this should be pretty reproducible... Two exactly similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4 VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt 0.8.3) from Ubuntu Maverick. Still no swapping and the effect is pretty much the same: one guest runs well, two guests work for some minutes - then slow down few hundred times, showing huge load both inside (unlimited rapid growth of loadaverage) and outside (host load is not making it unresponsive though - but loaded to the max). Load growth on host is instant and finite ('r' column change indicate this sudden rise): # vmstat 5 procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 1 3 0 194220 30680 7671200 31928 2633 1960 6 6 67 20 1 2 0 193776 30680 7671200 4 231 55081 78491 3 39 17 41 10 1 0 185508 30680 7671200 487 53042 34212 55 27 9 9 12 0 0 185180 30680 7671200 295 41007 21990 84 16 0 0 Thanks, Dmitry On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev wrote: > Hi, > > Maybe you remember that I wrote few weeks ago about KVM cpu load > problem with hugepages. The problem was lost hanging, however I have > now some new information. So the description remains, however I have > decreased both guest memory and the amount of hugepages: > > Ram = 8GB, hugepages = 3546 > > Total of 2 virual machines: > 1. router with 32MB of RAM (hugepages) and 1VCPU > 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU > > Everything works fine until I start the second linux guest with the > same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of > description is the same as before: after a while the host shows > loadaverage of about 8 (on a Core2Quad) and it seems that both big > guests consume exactly the same amount of resources. The hosts seems > responsive though. Inside the guests, however, things are not so good > - the load sky rockets to at least 20. Guests are not responsive and > even a 'ps' executes inappropriately slow (may take few minutes - > here, however, load builds up and it seems that machine becomes slower > with time, unlike host, which shows the jump in resource consumption > instantly). It also seem that the more guests uses memory, the faster > the problem appers. Still at least a gig of RAM is free on each guest > and there is no swap activity inside the guest. > > The most important thing - why I went back and quoted older message > than the last one, is that there is no more swap activity on host, so > the previous track of thought may also be wrong and I returned to the > beginning. There is plenty of RAM now and swap on host is always on 0 > as seen in 'top'. And there is 100% cpu load, equally shared between > the two large guests. To stop the load I can destroy either large > guest. Additionally, I have just discovered that suspending any large > guest works as well. Moreover, after resume, the load does not come > back for a while. Both methods stop the high load instantly (faster > than a second). As you were asking for a 'top' inside the guest, here > it is: > > top - 03:27:27 up 42 min, 1 user, load average: 18.37, 7.68, 3.12 > Tasks: 197 total, 23 running, 174 sleeping, 0 stopped, 0 zombie > Cpu(s): 0.0%us, 89.2%sy, 0.0%ni, 10.5%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st > Mem: 3510912k total, 1159760k used, 2351152k free, 62568k buffers > Swap: 4194296k total, 0k used, 4194296k free, 484492k cached > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 12303 root 20 0 0 0 0 R 100 0.0 0:33.72 > vpsnetclean > 11772 99 20 0 149m 11m 2104 R 82 0.3 0:15.10 httpd > 10906 99 20 0 149m 11m 2124 R 73 0.3 0:11.52 httpd > 10247 99 20 0 149m 11m 2128 R 31 0.3 0:05.39 httpd > 3916 root 20 0 86468 11m 1476 R 16 0.3 0:15.14 > cpsrvd-ssl > 10919 99 20 0 149m 11m 2124 R 8 0.3 0:03.43 httpd > 11296 99 20 0 149m 11m 2112 R 7 0.3 0:03.26 httpd > 12265 99 20 0 149m 11m 2088 R 7 0.3 0:08.01 httpd > 12317 root 20 0 99.6m 1384 716 R 7 0.0 0:06.57 crond > 12326 503 20 0 8872 96 72 R 7 0.0 0:01.13 php > 3634 root 20 0 74804 1176 596 R 6 0.0 0:12.15 crond > 11864 32005 20 0 87224 13m 2528 R 6 0.4 0:30.84 > cpsrvd-ssl > 12275 root 20 0 30628 9976 1364 R 6 0.3 0:24.68 cpgs_chk > 11305 99 20
Re: KVM with hugepages generate huge load with two guests
Hi, Maybe you remember that I wrote few weeks ago about KVM cpu load problem with hugepages. The problem was lost hanging, however I have now some new information. So the description remains, however I have decreased both guest memory and the amount of hugepages: Ram = 8GB, hugepages = 3546 Total of 2 virual machines: 1. router with 32MB of RAM (hugepages) and 1VCPU 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU Everything works fine until I start the second linux guest with the same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of description is the same as before: after a while the host shows loadaverage of about 8 (on a Core2Quad) and it seems that both big guests consume exactly the same amount of resources. The hosts seems responsive though. Inside the guests, however, things are not so good - the load sky rockets to at least 20. Guests are not responsive and even a 'ps' executes inappropriately slow (may take few minutes - here, however, load builds up and it seems that machine becomes slower with time, unlike host, which shows the jump in resource consumption instantly). It also seem that the more guests uses memory, the faster the problem appers. Still at least a gig of RAM is free on each guest and there is no swap activity inside the guest. The most important thing - why I went back and quoted older message than the last one, is that there is no more swap activity on host, so the previous track of thought may also be wrong and I returned to the beginning. There is plenty of RAM now and swap on host is always on 0 as seen in 'top'. And there is 100% cpu load, equally shared between the two large guests. To stop the load I can destroy either large guest. Additionally, I have just discovered that suspending any large guest works as well. Moreover, after resume, the load does not come back for a while. Both methods stop the high load instantly (faster than a second). As you were asking for a 'top' inside the guest, here it is: top - 03:27:27 up 42 min, 1 user, load average: 18.37, 7.68, 3.12 Tasks: 197 total, 23 running, 174 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 89.2%sy, 0.0%ni, 10.5%id, 0.0%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 3510912k total, 1159760k used, 2351152k free, 62568k buffers Swap: 4194296k total, 0k used, 4194296k free, 484492k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12303 root 20 0 0 0 0 R 100 0.0 0:33.72 vpsnetclean 11772 99 20 0 149m 11m 2104 R 82 0.3 0:15.10 httpd 10906 99 20 0 149m 11m 2124 R 73 0.3 0:11.52 httpd 10247 99 20 0 149m 11m 2128 R 31 0.3 0:05.39 httpd 3916 root 20 0 86468 11m 1476 R 16 0.3 0:15.14 cpsrvd-ssl 10919 99 20 0 149m 11m 2124 R 8 0.3 0:03.43 httpd 11296 99 20 0 149m 11m 2112 R 7 0.3 0:03.26 httpd 12265 99 20 0 149m 11m 2088 R 7 0.3 0:08.01 httpd 12317 root 20 0 99.6m 1384 716 R 7 0.0 0:06.57 crond 12326 503 20 0 8872 96 72 R 7 0.0 0:01.13 php 3634 root 20 0 74804 1176 596 R 6 0.0 0:12.15 crond 11864 32005 20 0 87224 13m 2528 R 6 0.4 0:30.84 cpsrvd-ssl 12275 root 20 0 30628 9976 1364 R 6 0.3 0:24.68 cpgs_chk 11305 99 20 0 149m 11m 2104 R 6 0.3 0:02.53 httpd 12278 root 20 0 8808 1328 968 R 6 0.0 0:04.63 sim 1534 root 20 0 0 0 0 S 6 0.0 0:03.29 flush-254:2 3626 root 20 0 149m 13m 5324 R 6 0.4 0:27.62 httpd 12279 32008 20 0 87472 7668 2480 R 6 0.2 0:27.63 munin-update 10243 99 20 0 149m 11m 2128 R 5 0.3 0:08.47 httpd 12321 root 20 0 99.6m 1460 792 R 5 0.0 0:07.43 crond 12325 root 20 0 74804 672 92 R 5 0.0 0:00.76 crond 1531 root 20 0 0 0 0 S 2 0.0 0:02.26 kjournald 1 root 20 0 10316 756 620 S 0 0.0 0:02.10 init 2 root 20 0 0 0 0 S 0 0.0 0:00.01 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:01.08 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/0 6 root RT 0 0 0 0 S 0 0.0 0:00.47 migration/1 7 root 20 0 0 0 0 S 0 0.0 0:00.03 ksoftirqd/1 8 root RT 0 0 0 0 S 0 0.0 0:00.00 watchdog/1 The tasks are changing in the 'top' view, so it is nothing like a single task hanging - it is more like a machine working off a swap. The problem is, however that according to vmstat, there is no swap activity during this time. Should I try to decrease RAM I give to my guests even more? Is it too much to have 3 guests with hugepages? Should I try something else? Unfortunately it is a production system and I can't play with it very much. Here is 'top' on the host: top - 03:32:12 up 25
Re: KVM with hugepages generate huge load with two guests
> Please don't top post. Sorry > Please use 'top' to find out which processes are busy, the aggregate > statistics don't help to find out what the problem is. The thing is - all more or less active processes become busy, like httpd, etc - I can't identify any single process that generates all the load. I see at least 10 different processes in the list that look busy in each guest... From what I see, there is nothing out of the ordinary in guest 'top', except that the whole guest becomes extremely slow. But OK, I will try to repeat the problem few hours later and send you the whole 'top' output if it is required. Thanks, Dmitry -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
On 10/03/2010 10:24 PM, Dmitry Golubev wrote: So, I started anew. I decreased the memory allocated to each guest to 3500MB (from 3500MB as I told earlier), but have not decreased number of hugepages - it is still 3696. Please don't top post. Please use 'top' to find out which processes are busy, the aggregate statistics don't help to find out what the problem is. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
So, I started anew. I decreased the memory allocated to each guest to 3500MB (from 3500MB as I told earlier), but have not decreased number of hugepages - it is still 3696. On one host I started one guest. it looked like this: HugePages_Total:3696 HugePages_Free: 1933 HugePages_Rsvd: 19 HugePages_Surp:0 Hugepagesize: 2048 kB top - 22:05:53 up 2 days, 3:44, 1 user, load average: 0.29, 0.33, 0.29 Tasks: 131 total, 1 running, 130 sleeping, 0 stopped, 0 zombie Cpu(s): 0.9%us, 4.6%sy, 0.0%ni, 90.8%id, 1.0%wa, 0.0%hi, 2.7%si, 0.0%st Mem: 8193472k total, 8118248k used,75224k free,29036k buffers Swap: 11716412k total,0k used, 11716412k free,75864k cached procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 74668 29036 7586400 1 8 54 51 1 7 91 1 Now I am starting the second virtual machine, and that's what happens: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 0 74272 29216 7666400 0 0 447 961 0 0 100 0 0 0 0 73172 29216 7746400 19216 899 1575 1 1 96 2 0 0 0 72528 29224 7746400 014 475 1022 1 0 99 0 0 0 0 72720 29232 7745600 049 519 999 0 0 97 3 1 0 52 77988 28776 404920 10 119117 988 2285 8 9 72 11 4 0 52 68868 28784 4049200 285438 7452 2817 17 16 67 1 2 0 52 66052 28784 4098400 190618 24057 4620 25 20 48 7 1 0 52 67044 28792 4098400 163035 3175 3966 9 12 72 7 0 0 52 63684 28800 4098000 1433 228 6021 4479 10 11 65 14 0 1 52 65516 28800 4098400 1288 109 4143 4179 10 10 58 21 2 2 52 62216 28808 4098400 1698 241 4357 4183 9 8 58 25 2 2 52 60292 28816 4098400 2874 258 11538 5324 15 14 39 33 2 2 52 57352 28816 4098400 5303 278 8528 5176 9 11 39 42 0 7 52 54000 28824 4098000 5263 249 10580 6214 16 10 32 42 0 4396 55180 19740 401880 70 10304 315 7359 9633 19 8 44 28 1 0320 61520 19748 4048000 5361 302 2509 5743 23 2 50 25 1 5316 59940 19748 4072800 2343 8 2225 4690 13 3 75 10 3 1316 55616 19748 4072800 4435 215 7660 6057 15 6 51 28 0 16 2528 53596 17392 384680 529 832 834 6600 4675 8 5 11 76 3 0 2404 56176 17392 3848010 6530 301 8371 5646 20 7 14 59 2 5 7480 58012 14836 33720 13 1082 3666 3155 12290 7752 17 10 20 54 2 1 7340 59628 14836 3388400 5550 690 9513 7258 13 9 38 41 2 1 7288 59124 14844 3447200 1524 481 4597 4688 5 6 58 31 0 3 7284 58848 14844 3447200 1365 364 2171 3813 3 2 58 38 0 1 7056 59324 14844 3447270 841 372 2159 3940 3 2 48 47 0 30 7056 54456 14844 3447200 2 248 1402 2705 2 1 85 13 0 1 6892 55336 14828 3839610 888 268 1927 4124 2 2 41 55 0 0 6892 57808 14060 36988001792 948 1682 1 1 93 5 0 0 6888 58616 14060 3769600 14043 747 1566 1 1 94 5 1 0 6884 59444 14060 3769600 714 942 1747 3 1 95 1 1 0 6884 58820 14060 3769600 046 722 1480 1 1 97 2 0 0 6884 58608 14060 3769600 041 858 1564 3 1 93 3 3 8 6884 51752 14060 3779200 354 147 8243 2447 20 7 71 2 2 0 6880 52840 14060 3779200 604 281 10430 5859 21 15 50 14 0 0 6880 55176 14060 3779200 699 232 3271 3656 20 4 66 10 0 0 6880 56120 14060 3779200 0 280 1064 2116 1 1 85 14 0 0 6880 55628 14060 3779200 0 0 616 1367 1 0 98 0 1 0 6880 56388 14060 3779200 018 689 1381 1 1 97 2 Unlike I have expected given that in the previous case I had only 6 unreserved pages, and I thought I would have 56 now, I have 156 free unreserved pages: HugePages_Total:3696 HugePages_Free: 1113 HugePages_Rsvd: 957 HugePages_Surp:0 Then at one moment both guests almost stopped working for a minute or so - both went up to huge load and became unresponsive. I didn't get to catch hot they looked in 'top', but they did not use any swap themselves (they have at least 1GB of free memory each) and their load average went to something like 10. vmstat from the host looked like this: procs ---memory-- ---swap-- -io -system-- cpu r b swpd free buff cache si sobibo in cs us sy id wa 0 0 6740 61948 11140 3410400 0 3 663 1435 1 0 99
Re: KVM with hugepages generate huge load with two guests
On 09/30/2010 11:07 AM, Dmitry Golubev wrote: Hi, I am not sure what's really happening, but every few hours (unpredictable) two virtual machines (Linux 2.6.32) start to generate huge cpu loads. It looks like some kind of loop is unable to complete or something... What does 'top' inside the guest show when this is happening? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
02.10.2010 03:50, Dmitry Golubev wrote: > Hi, > > Thanks for reply. Well, although there is plenty of RAM left (about > 100MB), some swap space was used during the operation: > > Mem: 8193472k total, 8089788k used, 103684k free, 5768k buffers > Swap: 11716412k total,36636k used, 11679776k free, 103112k cached If you want to see swapping, run vmstat with, say, 5-second interval: $ vmstat 5 Amount of swap used is interesting, but amount of swapins/swapouts per secound is much more so. JFYI. /mjt -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
OK, I have repeated the problem. The two machines were working fine for few hours without some services running (these would take up some gigabyte additionally in total), I ran these services again and some 40 minutes later the problem reappeared (may be a coincidence, though, but I don't think so). From top command output it looks like this: top - 03:38:10 up 2 days, 20:08, 1 user, load average: 9.60, 6.92, 5.36 Tasks: 143 total, 3 running, 140 sleeping, 0 stopped, 0 zombie Cpu(s): 85.7%us, 4.2%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 10.0%si, 0.0%st Mem: 8193472k total, 8056700k used, 136772k free, 4912k buffers Swap: 11716412k total,64884k used, 11651528k free,55640k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 21306 libvirt- 20 0 3781m 10m 2408 S 190 0.1 31:36.09 kvm 4984 libvirt- 20 0 3771m 19m 1440 S 180 0.2 390:30.04 kvm Comparing to the previous shot i sent before (that was taken few hours ago), and you will not see much difference in my opinion. Note that I have 8GB of RAM and totally both VMs take up 7GB. There is nothing else running on the server, except the VMs and cluster software (drbd, pacemaker etc). Right now the drbd sync process is taking some cpu resources - that is why the libvirt processes do not show as 200% (physically, it is a quad-core processor). Is almost 1GB really not enough for KVM to support two 3.5GB guests? I see 136MB of free memory right now - it is not even used... Thanks, Dmitry On Sat, Oct 2, 2010 at 2:50 AM, Dmitry Golubev wrote: > Hi, > > Thanks for reply. Well, although there is plenty of RAM left (about > 100MB), some swap space was used during the operation: > > Mem: 8193472k total, 8089788k used, 103684k free, 5768k buffers > Swap: 11716412k total, 36636k used, 11679776k free, 103112k cached > > I am not sure why, though. Are you saying that there are bursts of > memory usage that push some pages to swap and they are not unswapped > although used? I will try to replicate the problem now and send you > some better printout from the moment the problem happens. I have not > noticed anything unusual when I was watching the system - there was > plenty of RAM free and a few megabytes in swap... Is there any kind of > check I can try during the problem occurring? Or should I free > 50-100MB from hugepages and the system shall be stable again? > > Thanks, > Dmitry > > On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti wrote: >> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote: >>> Hi, >>> >>> I am not sure what's really happening, but every few hours >>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate >>> huge cpu loads. It looks like some kind of loop is unable to complete >>> or something... >>> >>> So the idea is: >>> >>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests >>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600 >>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small >>> 32bit linux virtual machine (16MB of ram) with a router inside (i >>> doubt it contributes to the problem). >>> >>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I >>> reserved 3696 huge pages (page size is 2MB) on the server, and I am >>> running the main guests each having 3550MB of virtual memory. The >>> third guest, as I wrote before, takes 16MB of virtual memory. >>> >>> 3. Once run, the guests reserve huge pages for themselves normally. As >>> mem-prealloc is default, they grab all the memory they should have, >>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all >>> times - so as I understand they should not want to get any more, >>> right? >>> >>> 4. All virtual machines run perfectly normal without any disturbances >>> for few hours. They do not, however, use all their memory, so maybe >>> the issue arises when they pass some kind of a threshold. >>> >>> 5. At some point of time both guests exhibit cpu load over the top >>> (16-24). At the same time, host works perfectly well, showing load of >>> 8 and that both kvm processes use CPU equally and fully. This point of >>> time is unpredictable - it can be anything from one to twenty hours, >>> but it will be less than a day. Sometimes the load disappears in a >>> moment, but usually it stays like that, and everything works extremely >>> slow (even a 'ps' command executes some 2-5 minutes). >>> >>> 6. If I am patient, I can start rebooting the gueat systems - once >>> they have restarted, everything returns to normal. If I destroy one of >>> the guests (virsh destroy), the other one starts working normally at >>> once (!). >>> >>> I am relatively new to kvm and I am absolutely lost here. I have not >>> experienced such problems before, but recently I upgraded from ubuntu >>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5) >>> and started to use hugepages. These two virtual machines are not >>> normally ru
Re: KVM with hugepages generate huge load with two guests
Hi, Thanks for reply. Well, although there is plenty of RAM left (about 100MB), some swap space was used during the operation: Mem: 8193472k total, 8089788k used, 103684k free, 5768k buffers Swap: 11716412k total,36636k used, 11679776k free, 103112k cached I am not sure why, though. Are you saying that there are bursts of memory usage that push some pages to swap and they are not unswapped although used? I will try to replicate the problem now and send you some better printout from the moment the problem happens. I have not noticed anything unusual when I was watching the system - there was plenty of RAM free and a few megabytes in swap... Is there any kind of check I can try during the problem occurring? Or should I free 50-100MB from hugepages and the system shall be stable again? Thanks, Dmitry On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti wrote: > On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote: >> Hi, >> >> I am not sure what's really happening, but every few hours >> (unpredictable) two virtual machines (Linux 2.6.32) start to generate >> huge cpu loads. It looks like some kind of loop is unable to complete >> or something... >> >> So the idea is: >> >> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests >> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600 >> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small >> 32bit linux virtual machine (16MB of ram) with a router inside (i >> doubt it contributes to the problem). >> >> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I >> reserved 3696 huge pages (page size is 2MB) on the server, and I am >> running the main guests each having 3550MB of virtual memory. The >> third guest, as I wrote before, takes 16MB of virtual memory. >> >> 3. Once run, the guests reserve huge pages for themselves normally. As >> mem-prealloc is default, they grab all the memory they should have, >> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all >> times - so as I understand they should not want to get any more, >> right? >> >> 4. All virtual machines run perfectly normal without any disturbances >> for few hours. They do not, however, use all their memory, so maybe >> the issue arises when they pass some kind of a threshold. >> >> 5. At some point of time both guests exhibit cpu load over the top >> (16-24). At the same time, host works perfectly well, showing load of >> 8 and that both kvm processes use CPU equally and fully. This point of >> time is unpredictable - it can be anything from one to twenty hours, >> but it will be less than a day. Sometimes the load disappears in a >> moment, but usually it stays like that, and everything works extremely >> slow (even a 'ps' command executes some 2-5 minutes). >> >> 6. If I am patient, I can start rebooting the gueat systems - once >> they have restarted, everything returns to normal. If I destroy one of >> the guests (virsh destroy), the other one starts working normally at >> once (!). >> >> I am relatively new to kvm and I am absolutely lost here. I have not >> experienced such problems before, but recently I upgraded from ubuntu >> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5) >> and started to use hugepages. These two virtual machines are not >> normally run on the same host system (i have a corosync/pacemaker >> cluster with drbd storage), but when one of the hosts is not >> abailable, they start running on the same host. That is the reason I >> have not noticed this earlier. >> >> Unfortunately, I don't have any spare hardware to experiment and this >> is a production system, so my debugging options are rather limited. >> >> Do you have any ideas, what could be wrong? > > Is there swapping activity on the host when this happens? > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM with hugepages generate huge load with two guests
On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote: > Hi, > > I am not sure what's really happening, but every few hours > (unpredictable) two virtual machines (Linux 2.6.32) start to generate > huge cpu loads. It looks like some kind of loop is unable to complete > or something... > > So the idea is: > > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600 > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small > 32bit linux virtual machine (16MB of ram) with a router inside (i > doubt it contributes to the problem). > > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I > reserved 3696 huge pages (page size is 2MB) on the server, and I am > running the main guests each having 3550MB of virtual memory. The > third guest, as I wrote before, takes 16MB of virtual memory. > > 3. Once run, the guests reserve huge pages for themselves normally. As > mem-prealloc is default, they grab all the memory they should have, > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all > times - so as I understand they should not want to get any more, > right? > > 4. All virtual machines run perfectly normal without any disturbances > for few hours. They do not, however, use all their memory, so maybe > the issue arises when they pass some kind of a threshold. > > 5. At some point of time both guests exhibit cpu load over the top > (16-24). At the same time, host works perfectly well, showing load of > 8 and that both kvm processes use CPU equally and fully. This point of > time is unpredictable - it can be anything from one to twenty hours, > but it will be less than a day. Sometimes the load disappears in a > moment, but usually it stays like that, and everything works extremely > slow (even a 'ps' command executes some 2-5 minutes). > > 6. If I am patient, I can start rebooting the gueat systems - once > they have restarted, everything returns to normal. If I destroy one of > the guests (virsh destroy), the other one starts working normally at > once (!). > > I am relatively new to kvm and I am absolutely lost here. I have not > experienced such problems before, but recently I upgraded from ubuntu > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5) > and started to use hugepages. These two virtual machines are not > normally run on the same host system (i have a corosync/pacemaker > cluster with drbd storage), but when one of the hosts is not > abailable, they start running on the same host. That is the reason I > have not noticed this earlier. > > Unfortunately, I don't have any spare hardware to experiment and this > is a production system, so my debugging options are rather limited. > > Do you have any ideas, what could be wrong? Is there swapping activity on the host when this happens? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM with hugepages generate huge load with two guests
Hi, I am not sure what's really happening, but every few hours (unpredictable) two virtual machines (Linux 2.6.32) start to generate huge cpu loads. It looks like some kind of loop is unable to complete or something... So the idea is: 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600 Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small 32bit linux virtual machine (16MB of ram) with a router inside (i doubt it contributes to the problem). 2. All these machines use hufetlbfs. The server has 8GB of RAM, I reserved 3696 huge pages (page size is 2MB) on the server, and I am running the main guests each having 3550MB of virtual memory. The third guest, as I wrote before, takes 16MB of virtual memory. 3. Once run, the guests reserve huge pages for themselves normally. As mem-prealloc is default, they grab all the memory they should have, leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all times - so as I understand they should not want to get any more, right? 4. All virtual machines run perfectly normal without any disturbances for few hours. They do not, however, use all their memory, so maybe the issue arises when they pass some kind of a threshold. 5. At some point of time both guests exhibit cpu load over the top (16-24). At the same time, host works perfectly well, showing load of 8 and that both kvm processes use CPU equally and fully. This point of time is unpredictable - it can be anything from one to twenty hours, but it will be less than a day. Sometimes the load disappears in a moment, but usually it stays like that, and everything works extremely slow (even a 'ps' command executes some 2-5 minutes). 6. If I am patient, I can start rebooting the gueat systems - once they have restarted, everything returns to normal. If I destroy one of the guests (virsh destroy), the other one starts working normally at once (!). I am relatively new to kvm and I am absolutely lost here. I have not experienced such problems before, but recently I upgraded from ubuntu lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5) and started to use hugepages. These two virtual machines are not normally run on the same host system (i have a corosync/pacemaker cluster with drbd storage), but when one of the hosts is not abailable, they start running on the same host. That is the reason I have not noticed this earlier. Unfortunately, I don't have any spare hardware to experiment and this is a production system, so my debugging options are rather limited. Do you have any ideas, what could be wrong? Thanks, Dmitry -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html