Re: KVM with hugepages generate huge load with two guests

2010-12-13 Thread Dmitry Golubev
Hi,

So, nobody has any idea what's going wrong with all these massive IRQs
and spin_locks that cause virtual machines to almost completely stop?
:(

Thanks,
Dmitry

On Wed, Dec 1, 2010 at 5:38 AM, Dmitry Golubev lastg...@gmail.com wrote:
 Hi,

 Sorry it took so slow to reply you - there are only few moments when I
 can poke a production server and I need to notify people in advance
 about that :(

 Can you post kvm_stat output while slowness is happening? 'perf top' on the 
 host?  and on the guest?

 I took 'perf top' and first thing I saw is that while guest is on
 acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
 however when I switched back to the default (which is nohz with
 kvm_clock), there are 40 times (!!!) more IRQs under normal operation
 (about 40 000/s). When the slowdown is happening, there are a lot of
 _spin_lock events and a lot of messages like: WARNING: failed to keep
 up with mmap data.  Last read 810 msecs ago.

 As I told before, switching to acpi_pm does not save the day, but
 makes situation a lot more workable (i.e., servers recover faster from
 the period of slowness). During slowdowns on acpi_pm I also see
 _spin_lock

 Raw data follows:



 vmstat -5 on the host:

 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  0  0      0 131904  13952 205872    0    0     0    24 2495 9813  6  3 91  0
  0  0      0 132984  13952 205872    0    0     0    47 2596 9851  5  3 91  1
  1  0      0 132148  13952 205872    0    0     0    54 2644 10559  3  3 93  1
  0  1      0 129084  13952 205872    0    0     0    38 3039 9752  7  3 87  2
  6  0      0 126388  13952 205872    0    0     0   311 15619 9009 42 17 39  2
  9  0      0 125868  13960 205872    0    0     6    86 4659 6504 98  2  0  0
  8  0      0 123320  13960 205872    0    0     0    26 4682 6649 98  2  0  0
  8  0      0 126252  13960 205872    0    0     0   124 4923 6776 98  2  0  0
  8  0      0 125376  13960 205872    0    0   136    11 4287 5865 98  2  0  0
  9  0      0 123812  13960 205872    0    0   205    51 4497 6134 98  2  0  0
  8  0      0 126020  13960 205872    0    0   904    26 4483 5999 98  2  0  0
  8  0      0 124052  13960 205872    0    0    15    10 4397 6200 98  2  0  0
  8  0      0 125928  13960 205872    0    0    14    41 4335 5823 98  2  0  0
  8  0      0 126184  13960 205872    0    0     6    14 4966 6588 98  2  0  0
  8  0      0 123588  13960 205872    0    0   143    18 5234 6891 98  2  0  0
  8  0      0 126640  13960 205872    0    0     6    91 5554 7334 98  2  0  0
  8  0      0 123144  13960 205872    0    0   146    11 5235 7145 98  2  0  0
  8  0      0 125856  13968 205872    0    0  1282    98 5481 7159 98  2  0  0
  9 19      0 124124  13968 205872    0    0   782  2433 8587 8987 97  3  0  0
  8  0      0 122584  13968 205872    0    0   432    90 5359 6960 98  2  0  0
  8  0      0 125320  13968 205872    0    0  3074    52 5448 7095 97  3  0  0
  8  0      0 121436  13968 205872    0    0  2519    81 5714 7279 98  2  0  0
  8  0      0 124436  13968 205872    0    0     1    56 5242 6864 98  2  0  0
  8  0      0 111324  13968 205872    0    0     2    22 10660 6686 97  3  0  0
  8  0      0 107824  13968 205872    0    0     0    24 14329 8147 97  3  0  0
  8  0      0 110420  13968 205872    0    0     0    68 13486 6985 98  2  0  0
  8  0      0 110024  13968 205872    0    0     0    19 13085 6659 98  2  0  0
  8  0      0 109932  13968 205872    0    0     0     3 12952 6415 98  2  0  0
  8  0      0 108552  13968 205880    0    0     2    41 13400 7349 98  2  0  0

 Few shots with kvm_stat on the host:

 Every 2.0s: kvm_stat -1

  Wed Dec  1 04:45:47 2010

 efer_reload                    0         0
 exits                   56264102     14074
 fpu_reload                311506        50
 halt_exits               4733166       935
 halt_wakeup              3845079       840
 host_state_reload        8795964      4085
 hypercalls                     0         0
 insn_emulation          13573212      7249
 insn_emulation_fail            0         0
 invlpg                   1846050        20
 io_exits                 3579406       843
 irq_exits                3038887      4879
 irq_injections           5242157      3681
 irq_window                124361       540
 largepages                  2253         0
 mmio_exits                 64274        20
 mmu_cache_miss            664011        16
 mmu_flooded               164506         1
 mmu_pde_zapped            212686         8
 mmu_pte_updated           729268         0
 mmu_pte_write           81323616       551
 mmu_recycled                 277         0
 mmu_shadow_zapped         652691        23
 mmu_unsync                  5630         8
 nmi_injections                 0         0
 nmi_window                     0         0
 pf_fixed                17470658       218
 pf_guest

Re: KVM with hugepages generate huge load with two guests

2010-11-30 Thread Dmitry Golubev
Hi,

Sorry it took so slow to reply you - there are only few moments when I
can poke a production server and I need to notify people in advance
about that :(

 Can you post kvm_stat output while slowness is happening? 'perf top' on the 
 host?  and on the guest?

I took 'perf top' and first thing I saw is that while guest is on
acpi_pm, it shows more or less normal amount of IRQs (under 1000/s),
however when I switched back to the default (which is nohz with
kvm_clock), there are 40 times (!!!) more IRQs under normal operation
(about 40 000/s). When the slowdown is happening, there are a lot of
_spin_lock events and a lot of messages like: WARNING: failed to keep
up with mmap data.  Last read 810 msecs ago.

As I told before, switching to acpi_pm does not save the day, but
makes situation a lot more workable (i.e., servers recover faster from
the period of slowness). During slowdowns on acpi_pm I also see
_spin_lock

Raw data follows:



vmstat -5 on the host:

procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 0  0  0 131904  13952 20587200 024 2495 9813  6  3 91  0
 0  0  0 132984  13952 20587200 047 2596 9851  5  3 91  1
 1  0  0 132148  13952 20587200 054 2644 10559  3  3 93  1
 0  1  0 129084  13952 20587200 038 3039 9752  7  3 87  2
 6  0  0 126388  13952 20587200 0   311 15619 9009 42 17 39  2
 9  0  0 125868  13960 20587200 686 4659 6504 98  2  0  0
 8  0  0 123320  13960 20587200 026 4682 6649 98  2  0  0
 8  0  0 126252  13960 20587200 0   124 4923 6776 98  2  0  0
 8  0  0 125376  13960 20587200   13611 4287 5865 98  2  0  0
 9  0  0 123812  13960 20587200   20551 4497 6134 98  2  0  0
 8  0  0 126020  13960 20587200   90426 4483 5999 98  2  0  0
 8  0  0 124052  13960 205872001510 4397 6200 98  2  0  0
 8  0  0 125928  13960 205872001441 4335 5823 98  2  0  0
 8  0  0 126184  13960 20587200 614 4966 6588 98  2  0  0
 8  0  0 123588  13960 20587200   14318 5234 6891 98  2  0  0
 8  0  0 126640  13960 20587200 691 5554 7334 98  2  0  0
 8  0  0 123144  13960 20587200   14611 5235 7145 98  2  0  0
 8  0  0 125856  13968 20587200  128298 5481 7159 98  2  0  0
 9 19  0 124124  13968 20587200   782  2433 8587 8987 97  3  0  0
 8  0  0 122584  13968 20587200   43290 5359 6960 98  2  0  0
 8  0  0 125320  13968 20587200  307452 5448 7095 97  3  0  0
 8  0  0 121436  13968 20587200  251981 5714 7279 98  2  0  0
 8  0  0 124436  13968 20587200 156 5242 6864 98  2  0  0
 8  0  0 111324  13968 20587200 222 10660 6686 97  3  0  0
 8  0  0 107824  13968 20587200 024 14329 8147 97  3  0  0
 8  0  0 110420  13968 20587200 068 13486 6985 98  2  0  0
 8  0  0 110024  13968 20587200 019 13085 6659 98  2  0  0
 8  0  0 109932  13968 20587200 0 3 12952 6415 98  2  0  0
 8  0  0 108552  13968 20588000 241 13400 7349 98  2  0  0

Few shots with kvm_stat on the host:

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:45:47 2010

efer_reload0 0
exits   56264102 14074
fpu_reload31150650
halt_exits   4733166   935
halt_wakeup  3845079   840
host_state_reload8795964  4085
hypercalls 0 0
insn_emulation  13573212  7249
insn_emulation_fail0 0
invlpg   184605020
io_exits 3579406   843
irq_exits3038887  4879
irq_injections   5242157  3681
irq_window124361   540
largepages  2253 0
mmio_exits 6427420
mmu_cache_miss66401116
mmu_flooded   164506 1
mmu_pde_zapped212686 8
mmu_pte_updated   729268 0
mmu_pte_write   81323616   551
mmu_recycled 277 0
mmu_shadow_zapped 65269123
mmu_unsync  5630 8
nmi_injections 0 0
nmi_window 0 0
pf_fixed17470658   218
pf_guest1335220581
remote_tlb_flush 189893096
request_irq0 0
signal_exits   0 0
tlb_flush5827433   108

Every 2.0s: kvm_stat -1

  Wed Dec  1 04:47:33 2010

efer_reload0 0
exits   

Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Dmitry Golubev
 Just out of curiocity: did you try updating the BIOS on your
 motherboard?  The issus you're facing seems to be quite unique,
 and I've seen more than once how various different weird issues
 were fixed just by updating the BIOS.  Provided they actually
 did they own homework and fixed something and released the fixes
 too... ;)

Thank you for reply, I really appreciate that somebody found time to
answer. Unfortunately for this investigation I managed to upgrade BIOS
version few months ago. I just checked - there are no newer versions.

I do see, however, that many people advise to change to acpi_pm
ckocksource (and, thus, disable nohz option) in case similar problems
are experienced - I did not invent this workaround (got the idea here:
http://forum.proxmox.com/threads/5144-100-CPU-on-host-VM-hang-every-night?p=29143#post29143
). Looks like an ancient bug. I even upgraded my qemu-kvm to version
0.13 without any significant changes to this behavior.

It is really weird, however how one guest can work fine, but two start
messing with each other. Shouldn't there be some kind of isolation
between them? As they both start to behave exactly the same at exactly
the same time. And it does not happen once a month or a year, but
pretty frequently.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-21 Thread Dmitry Golubev
Thanks for the answer.

 Are you sure it is hugepages related?

Well, empirically it looked like either hugepages-related, or
regression of qemu-kvm 0.12.3 - 0.12.5, as this did not happen until
I upgraded (needed to avoid disk corruption caused by a bug in 0.12.3)
and put hugepages. However as frequency of problem does seem related
to memory each guest consumes (more memory = faster the problem
appears) and in the beginning it might have been that the memory
consumption of the guests did not hit some kind of threshold, maybe it
is not really hugepages related.

 Can you post kvm_stat output while slowness is happening? 'perf top' on the 
 host?  and on the guest?

OK, I will test this and write back.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-11-20 Thread Dmitry Golubev
 185212   340   243 9 1591 1212 10 14 76  1
 9  0   1228 223644  17016 18516400  2872   857 18515 5134 45 20  9 26
 0  3   1228 224840  17016 18516400  1080   786 8281 5490 35 12 21 33
 2  0   1228 222032  17016 18516400  118499 21056 3713 26 17 48  9
 1  0   1228 221784  17016 18516400  207569 3089 3749  9  7 73 11
 3  0   1228 220544  17016 18516400  1501   150 3815 3520  7  8 73 12
 3  0   1228 219736  17024 18516400  1129   103 7726 4177 20 11 60  9
 0  4   1228 217224  17024 18516400  2844   211 6068 4643  9  7 60 23

Thanks,
Dmitry

On Thu, Nov 18, 2010 at 8:53 AM, Dmitry Golubev lastg...@gmail.com wrote:
 Hi,

 Sorry to bother you again. I have more info:

 1. router with 32MB of RAM (hugepages) and 1VCPU
 ...
 Is it too much to have 3 guests with hugepages?

 OK, this router is also out of equation - I disabled hugepages for it.
 There should be also additional pages available to guests because of
 that. I think this should be pretty reproducible... Two exactly
 similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
 VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
 RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
 0.8.3) from Ubuntu Maverick.

 Still no swapping and the effect is pretty much the same: one guest
 runs well, two guests work for some minutes - then slow down few
 hundred times, showing huge load both inside (unlimited rapid growth
 of loadaverage) and outside (host load is not making it unresponsive
 though - but loaded to the max). Load growth on host is instant and
 finite ('r' column change indicate this sudden rise):

 # vmstat 5
 procs ---memory-- ---swap-- -io -system-- cpu
  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
  1  3      0 194220  30680  76712    0    0   319    28 2633 1960  6  6 67 20
  1  2      0 193776  30680  76712    0    0     4   231 55081 78491  3 39 17 
 41
 10  1      0 185508  30680  76712    0    0     4    87 53042 34212 55 27  9  
 9
 12  0      0 185180  30680  76712    0    0     2    95 41007 21990 84 16  0   0

 Thanks,
 Dmitry

 On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev lastg...@gmail.com wrote:
 Hi,

 Maybe you remember that I wrote few weeks ago about KVM cpu load
 problem with hugepages. The problem was lost hanging, however I have
 now some new information. So the description remains, however I have
 decreased both guest memory and the amount of hugepages:

 Ram = 8GB, hugepages = 3546

 Total of 2 virual machines:
 1. router with 32MB of RAM (hugepages) and 1VCPU
 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU

 Everything works fine until I start the second linux guest with the
 same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
 description is the same as before: after a while the host shows
 loadaverage of about 8 (on a Core2Quad) and it seems that both big
 guests consume exactly the same amount of resources. The hosts seems
 responsive though. Inside the guests, however, things are not so good
 - the load sky rockets to at least 20. Guests are not responsive and
 even a 'ps' executes inappropriately slow (may take few minutes -
 here, however, load builds up and it seems that machine becomes slower
 with time, unlike host, which shows the jump in resource consumption
 instantly). It also seem that the more guests uses memory, the faster
 the problem appers. Still at least a gig of RAM is free on each guest
 and there is no swap activity inside the guest.

 The most important thing - why I went back and quoted older message
 than the last one, is that there is no more swap activity on host, so
 the previous track of thought may also be wrong and I returned to the
 beginning. There is plenty of RAM now and swap on host is always on 0
 as seen in 'top'. And there is 100% cpu load, equally shared between
 the two large guests. To stop the load I can destroy either large
 guest. Additionally, I have just discovered that suspending any large
 guest works as well. Moreover, after resume, the load does not come
 back for a while. Both methods stop the high load instantly (faster
 than a second). As you were asking for a 'top' inside the guest, here
 it is:

 top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
 Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
 Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  
 0.0%st
 Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
 Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
 vpsnetclean
 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
 10247 99        20   0  149m  11m

Re: KVM with hugepages generate huge load with two guests

2010-11-17 Thread Dmitry Golubev
Hi,

Sorry to bother you again. I have more info:

 1. router with 32MB of RAM (hugepages) and 1VCPU
...
 Is it too much to have 3 guests with hugepages?

OK, this router is also out of equation - I disabled hugepages for it.
There should be also additional pages available to guests because of
that. I think this should be pretty reproducible... Two exactly
similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
0.8.3) from Ubuntu Maverick.

Still no swapping and the effect is pretty much the same: one guest
runs well, two guests work for some minutes - then slow down few
hundred times, showing huge load both inside (unlimited rapid growth
of loadaverage) and outside (host load is not making it unresponsive
though - but loaded to the max). Load growth on host is instant and
finite ('r' column change indicate this sudden rise):

# vmstat 5
procs ---memory-- ---swap-- -io -system-- cpu
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa
 1  3  0 194220  30680  7671200   31928 2633 1960  6  6 67 20
 1  2  0 193776  30680  7671200 4   231 55081 78491  3 39 17 41
10  1  0 185508  30680  7671200 487 53042 34212 55 27  9  9
12  0  0 185180  30680  7671200 295 41007 21990 84 16  0  0

Thanks,
Dmitry

On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev lastg...@gmail.com wrote:
 Hi,

 Maybe you remember that I wrote few weeks ago about KVM cpu load
 problem with hugepages. The problem was lost hanging, however I have
 now some new information. So the description remains, however I have
 decreased both guest memory and the amount of hugepages:

 Ram = 8GB, hugepages = 3546

 Total of 2 virual machines:
 1. router with 32MB of RAM (hugepages) and 1VCPU
 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU

 Everything works fine until I start the second linux guest with the
 same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
 description is the same as before: after a while the host shows
 loadaverage of about 8 (on a Core2Quad) and it seems that both big
 guests consume exactly the same amount of resources. The hosts seems
 responsive though. Inside the guests, however, things are not so good
 - the load sky rockets to at least 20. Guests are not responsive and
 even a 'ps' executes inappropriately slow (may take few minutes -
 here, however, load builds up and it seems that machine becomes slower
 with time, unlike host, which shows the jump in resource consumption
 instantly). It also seem that the more guests uses memory, the faster
 the problem appers. Still at least a gig of RAM is free on each guest
 and there is no swap activity inside the guest.

 The most important thing - why I went back and quoted older message
 than the last one, is that there is no more swap activity on host, so
 the previous track of thought may also be wrong and I returned to the
 beginning. There is plenty of RAM now and swap on host is always on 0
 as seen in 'top'. And there is 100% cpu load, equally shared between
 the two large guests. To stop the load I can destroy either large
 guest. Additionally, I have just discovered that suspending any large
 guest works as well. Moreover, after resume, the load does not come
 back for a while. Both methods stop the high load instantly (faster
 than a second). As you were asking for a 'top' inside the guest, here
 it is:

 top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
 Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
 Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
 Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
 Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
 vpsnetclean
 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
 cpsrvd-ssl
 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
 cpsrvd-ssl
 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
 11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53

Re: KVM with hugepages generate huge load with two guests

2010-11-16 Thread Dmitry Golubev
 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
 8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
 1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
   72 root      20   0     0    0    0 S    0  0.0   6:22.54
kondemand/3
  379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
    1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:12.66
ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:07.58
migration/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:15.05
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:19.64
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1
    9 root      RT   0     0    0    0 S    0  0.0   0:07.21
migration/2
   10 root      20   0     0    0    0 S    0  0.0   0:41.74
ksoftirqd/2
   11 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/2
   12 root      RT   0     0    0    0 S    0  0.0   0:13.62
migration/3
   13 root      20   0     0    0    0 S    0  0.0   0:24.63
ksoftirqd/3
   14 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/3
   15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
   16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
   17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
   18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
   19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
   20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
   21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
   25 root      20   0     0    0    0 S    0  0.0   0:02.47
sync_supers
   26 root      20   0     0    0    0 S    0  0.0   0:03.86
bdi-default


Please help...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti mtosa...@redhat.com wrote:

 On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
  Hi,
 
  I am not sure what's really happening, but every few hours
  (unpredictable) two virtual machines (Linux 2.6.32) start to generate
  huge cpu loads. It looks like some kind of loop is unable to complete
  or something...
 
  So the idea is:
 
  1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
  running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
  Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
  32bit linux virtual machine (16MB of ram) with a router inside (i
  doubt it contributes to the problem).
 
  2. All these machines use hufetlbfs. The server has 8GB of RAM, I
  reserved 3696 huge pages (page size is 2MB) on the server, and I am
  running the main guests each having 3550MB of virtual memory. The
  third guest, as I wrote before, takes 16MB of virtual memory.
 
  3. Once run, the guests reserve huge pages for themselves normally. As
  mem-prealloc is default, they grab all the memory they should have,
  leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
  times - so as I understand they should not want to get any more,
  right?
 
  4. All virtual machines run perfectly normal without any disturbances
  for few hours. They do not, however, use all their memory, so maybe
  the issue arises when they pass some kind of a threshold.
 
  5. At some point of time both guests exhibit cpu load over the top
  (16-24). At the same time, host works perfectly well, showing load of
  8 and that both kvm processes use CPU equally and fully. This point of
  time is unpredictable - it can be anything from one to twenty hours,
  but it will be less than a day. Sometimes the load disappears in a
  moment, but usually it stays like that, and everything works extremely
  slow (even a 'ps' command executes some 2-5 minutes).
 
  6. If I am patient, I can start rebooting the gueat systems - once
  they have restarted, everything returns to normal. If I destroy one of
  the guests (virsh destroy), the other one starts working normally at
  once (!).
 
  I am relatively new to kvm and I am absolutely lost here. I have not
  experienced such problems before, but recently I upgraded from ubuntu
  lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
  and started to use hugepages

Re: KVM with hugepages generate huge load with two guests

2010-10-04 Thread Dmitry Golubev
 Please don't top post.

Sorry

 Please use 'top' to find out which processes are busy, the aggregate
 statistics don't help to find out what the problem is.

The thing is - all more or less active processes become busy, like
httpd, etc - I can't identify any single process that generates all
the load. I see at least 10 different processes in the list that look
busy in each guest... From what I see, there is nothing out of the
ordinary in guest 'top', except that the whole guest becomes extremely
slow. But OK, I will try to repeat the problem few hours later and
send you the whole 'top' output if it is required.

Thanks,
Dmitry
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-03 Thread Dmitry Golubev
 1648 99  1  0  0
 8  0   7528  58992   7096  3192000 045 2536 1745 100  0  0  0
 9  1   7528  59024   7096  3192000 138 2548 1635 99  1  0  0
 8  0   7528  58768   7096  3192000 044 2496 1741 100  0  0  0
 8  0   7528  59388   7096  3192000 018 2429 1617 100  0  0  0
 8  0   7528  58868   7096  3192000 051 2600 1745 100  0  0  0
 8  0   7528  59156   7104  3192000 047 2441 1682 100  0  0  0
 9  0   7528  57380   7104  3192000 040 2709 1690 100  1  0  0
 8  0   7528  58056   7104  3192000 1 8 3127 1629 100  0  0  0
 9  0   7528  57544   7104  3192000 036 2615 1704 100  0  0  0
 8  0   7528  57196   7104  3192000 026 2530 1710 100  0  0  0
 8  0   7528  59792   6836  2864800 042 2613 1761 100  0  0  0
 8  0   7528  59156   6844  29036007864 2641 1757 100  0  0  0
 8  0   7528  59576   6844  29164002626 2462 1632 100  0  0  0
 8  0   7528  59716   6844  2916400 0 8 2414 1706 100  0  0  0
 8  0   7528  59600   6844  2916400 048 2505 1649 100  0  0  0
 8  0   7528  59600   6844  2916400 0 9 2373 1648 99  1  0  0
 8  0   7528  59492   6844  2916400 010 2387 1564 100  1  0  0
 8  0   7528  59624   6844  2916400 040 2551 1691 100  0  0  0
 8  0   7528  59080   6844  2916400 050 2733 1643 100  0  0  0
 8  0   7528  58956   6844  2916400 029 2823 1652 100  0  0  0
 8  1   7528  58624   6844  2916400 034 2478 1633 100  0  0  0
 8  0   7528  58716   6844  2916400 039 2398 1688 99  1  0  0
 8  0   7528  57592   6844  3022800   21230 2373 1666 100  1  0  0
 8  0   7528  57468   6852  3022800 051 2453 1695 100  0  0  0
 8  0   7528  58244   6852  3022800 026 2756 1617 99  1  0  0
 8  0   7528  58244   6852  3022800 0   112 3872 1952 99  1  0  0
 9  1   7528  58320   6852  3022800 048 2718 1719 100  0  0  0
 8  0   7528  58204   6852  3022800 017 2692 1697 100  0  0  0
 8  0   7528  59220   6852  3022800 548 4666 1651 98  2  0  0
 9  0   7528  57716   6852  3022800 0   101 5128 1874 98  2  0  0
 9  0   7528  55692   6860  3022800 5   100 5875 1825 97  3  0  0
 9  1   7528  55668   6860  3022800 0   156 3910 1960 99  1  0  0
 8  0   7528  55668   6860  3022800 038 2578 1671 100  0  0  0
 8  0   7528  55600   6860  3022800 281 2783 1888 100  0  0  0
 9  1   7528  59660   5188  2832000 050 2601 1918 100  0  0  0
 8  0   7528  63280   5196  2832800 263 4347 1855 99  1  0  0
 8  0   7528  62560   5196  2832800 0   101 3383 1748 99  1  0  0
 9  0   7528  62132   5196  2832800 150 2656 1724 100  0  0  0

One guest showed this:
top - 23:11:35 up 53 min,  1 user,  load average: 20.42, 14.61, 7.26
Tasks: 205 total,  40 running, 165 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us, 99.8%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3510920k total,   972876k used,  2538044k free,30256k buffers
Swap:  4194296k total,0k used,  4194296k free,   321288k cached

The other one:
top - 23:11:12 up 1 day,  9:19,  1 user,  load average: 19.38, 14.54, 7.40
Tasks: 219 total,  15 running, 204 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.2%us, 99.6%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.1%si,  0.0%st
Mem:   3510920k total,  1758688k used,  1752232k free,   298000k buffers
Swap:  4194296k total,0k used,  4194296k free,   577068k cached

Thanks,
Dmitry

On Sun, Oct 3, 2010 at 12:28 PM, Avi Kivity a...@redhat.com wrote:
  On 09/30/2010 11:07 AM, Dmitry Golubev wrote:

 Hi,

 I am not sure what's really happening, but every few hours
 (unpredictable) two virtual machines (Linux 2.6.32) start to generate
 huge cpu loads. It looks like some kind of loop is unable to complete
 or something...


 What does 'top' inside the guest show when this is happening?

 --
 error compiling committee.c: too many arguments to function


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-01 Thread Dmitry Golubev
Hi,

Thanks for reply. Well, although there is plenty of RAM left (about
100MB), some swap space was used during the operation:

Mem:   8193472k total,  8089788k used,   103684k free, 5768k buffers
Swap: 11716412k total,36636k used, 11679776k free,   103112k cached

I am not sure why, though. Are you saying that there are bursts of
memory usage that push some pages to swap and they are not unswapped
although used? I will try to replicate the problem now and send you
some better printout from the moment the problem happens. I have not
noticed anything unusual when I was watching the system - there was
plenty of RAM free and a few megabytes in swap... Is there any kind of
check I can try during the problem occurring? Or should I free
50-100MB from hugepages and the system shall be stable again?

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
 Hi,

 I am not sure what's really happening, but every few hours
 (unpredictable) two virtual machines (Linux 2.6.32) start to generate
 huge cpu loads. It looks like some kind of loop is unable to complete
 or something...

 So the idea is:

 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
 running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
 Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
 32bit linux virtual machine (16MB of ram) with a router inside (i
 doubt it contributes to the problem).

 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
 reserved 3696 huge pages (page size is 2MB) on the server, and I am
 running the main guests each having 3550MB of virtual memory. The
 third guest, as I wrote before, takes 16MB of virtual memory.

 3. Once run, the guests reserve huge pages for themselves normally. As
 mem-prealloc is default, they grab all the memory they should have,
 leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
 times - so as I understand they should not want to get any more,
 right?

 4. All virtual machines run perfectly normal without any disturbances
 for few hours. They do not, however, use all their memory, so maybe
 the issue arises when they pass some kind of a threshold.

 5. At some point of time both guests exhibit cpu load over the top
 (16-24). At the same time, host works perfectly well, showing load of
 8 and that both kvm processes use CPU equally and fully. This point of
 time is unpredictable - it can be anything from one to twenty hours,
 but it will be less than a day. Sometimes the load disappears in a
 moment, but usually it stays like that, and everything works extremely
 slow (even a 'ps' command executes some 2-5 minutes).

 6. If I am patient, I can start rebooting the gueat systems - once
 they have restarted, everything returns to normal. If I destroy one of
 the guests (virsh destroy), the other one starts working normally at
 once (!).

 I am relatively new to kvm and I am absolutely lost here. I have not
 experienced such problems before, but recently I upgraded from ubuntu
 lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
 and started to use hugepages. These two virtual machines are not
 normally run on the same host system (i have a corosync/pacemaker
 cluster with drbd storage), but when one of the hosts is not
 abailable, they start running on the same host. That is the reason I
 have not noticed this earlier.

 Unfortunately, I don't have any spare hardware to experiment and this
 is a production system, so my debugging options are rather limited.

 Do you have any ideas, what could be wrong?

 Is there swapping activity on the host when this happens?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: KVM with hugepages generate huge load with two guests

2010-10-01 Thread Dmitry Golubev
OK, I have repeated the problem. The two machines were working fine
for few hours without some services running (these would take up some
gigabyte additionally in total), I ran these services again and some
40 minutes later the problem reappeared (may be a coincidence, though,
but I don't think so). From top command output it looks like this:

top - 03:38:10 up 2 days, 20:08,  1 user,  load average: 9.60, 6.92, 5.36
Tasks: 143 total,   3 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s): 85.7%us,  4.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 10.0%si,  0.0%st
Mem:   8193472k total,  8056700k used,   136772k free, 4912k buffers
Swap: 11716412k total,64884k used, 11651528k free,55640k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
21306 libvirt-  20   0 3781m  10m 2408 S  190  0.1  31:36.09 kvm
 4984 libvirt-  20   0 3771m  19m 1440 S  180  0.2 390:30.04 kvm

Comparing to the previous shot i sent before (that was taken few hours
ago), and you will not see much difference in my opinion.

Note that I have 8GB of RAM and totally both VMs take up 7GB. There is
nothing else running on the server, except the VMs and cluster
software (drbd, pacemaker etc). Right now the drbd sync process is
taking some cpu resources - that is why the libvirt processes do not
show as 200% (physically, it is a quad-core processor). Is almost 1GB
really not enough for KVM to support two 3.5GB guests? I see 136MB of
free memory right now - it is not even used...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 2:50 AM, Dmitry Golubev lastg...@gmail.com wrote:
 Hi,

 Thanks for reply. Well, although there is plenty of RAM left (about
 100MB), some swap space was used during the operation:

 Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
 Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached

 I am not sure why, though. Are you saying that there are bursts of
 memory usage that push some pages to swap and they are not unswapped
 although used? I will try to replicate the problem now and send you
 some better printout from the moment the problem happens. I have not
 noticed anything unusual when I was watching the system - there was
 plenty of RAM free and a few megabytes in swap... Is there any kind of
 check I can try during the problem occurring? Or should I free
 50-100MB from hugepages and the system shall be stable again?

 Thanks,
 Dmitry

 On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti mtosa...@redhat.com wrote:
 On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
 Hi,

 I am not sure what's really happening, but every few hours
 (unpredictable) two virtual machines (Linux 2.6.32) start to generate
 huge cpu loads. It looks like some kind of loop is unable to complete
 or something...

 So the idea is:

 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
 running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
 Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
 32bit linux virtual machine (16MB of ram) with a router inside (i
 doubt it contributes to the problem).

 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
 reserved 3696 huge pages (page size is 2MB) on the server, and I am
 running the main guests each having 3550MB of virtual memory. The
 third guest, as I wrote before, takes 16MB of virtual memory.

 3. Once run, the guests reserve huge pages for themselves normally. As
 mem-prealloc is default, they grab all the memory they should have,
 leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
 times - so as I understand they should not want to get any more,
 right?

 4. All virtual machines run perfectly normal without any disturbances
 for few hours. They do not, however, use all their memory, so maybe
 the issue arises when they pass some kind of a threshold.

 5. At some point of time both guests exhibit cpu load over the top
 (16-24). At the same time, host works perfectly well, showing load of
 8 and that both kvm processes use CPU equally and fully. This point of
 time is unpredictable - it can be anything from one to twenty hours,
 but it will be less than a day. Sometimes the load disappears in a
 moment, but usually it stays like that, and everything works extremely
 slow (even a 'ps' command executes some 2-5 minutes).

 6. If I am patient, I can start rebooting the gueat systems - once
 they have restarted, everything returns to normal. If I destroy one of
 the guests (virsh destroy), the other one starts working normally at
 once (!).

 I am relatively new to kvm and I am absolutely lost here. I have not
 experienced such problems before, but recently I upgraded from ubuntu
 lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
 and started to use hugepages. These two virtual machines are not
 normally run on the same host system (i have a corosync/pacemaker
 cluster with drbd storage), but when one of the hosts is not
 abailable

KVM with hugepages generate huge load with two guests

2010-09-30 Thread Dmitry Golubev
Hi,

I am not sure what's really happening, but every few hours
(unpredictable) two virtual machines (Linux 2.6.32) start to generate
huge cpu loads. It looks like some kind of loop is unable to complete
or something...

So the idea is:

1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
32bit linux virtual machine (16MB of ram) with a router inside (i
doubt it contributes to the problem).

2. All these machines use hufetlbfs. The server has 8GB of RAM, I
reserved 3696 huge pages (page size is 2MB) on the server, and I am
running the main guests each having 3550MB of virtual memory. The
third guest, as I wrote before, takes 16MB of virtual memory.

3. Once run, the guests reserve huge pages for themselves normally. As
mem-prealloc is default, they grab all the memory they should have,
leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
times - so as I understand they should not want to get any more,
right?

4. All virtual machines run perfectly normal without any disturbances
for few hours. They do not, however, use all their memory, so maybe
the issue arises when they pass some kind of a threshold.

5. At some point of time both guests exhibit cpu load over the top
(16-24). At the same time, host works perfectly well, showing load of
8 and that both kvm processes use CPU equally and fully. This point of
time is unpredictable - it can be anything from one to twenty hours,
but it will be less than a day. Sometimes the load disappears in a
moment, but usually it stays like that, and everything works extremely
slow (even a 'ps' command executes some 2-5 minutes).

6. If I am patient, I can start rebooting the gueat systems - once
they have restarted, everything returns to normal. If I destroy one of
the guests (virsh destroy), the other one starts working normally at
once (!).

I am relatively new to kvm and I am absolutely lost here. I have not
experienced such problems before, but recently I upgraded from ubuntu
lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
and started to use hugepages. These two virtual machines are not
normally run on the same host system (i have a corosync/pacemaker
cluster with drbd storage), but when one of the hosts is not
abailable, they start running on the same host. That is the reason I
have not noticed this earlier.

Unfortunately, I don't have any spare hardware to experiment and this
is a production system, so my debugging options are rather limited.

Do you have any ideas, what could be wrong?

Thanks,
Dmitry
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html